Finally the Yahoo! Site Explorer (BETA) got launched. It’s a nice tool showing a site owner and the competitors all indexed pages per domain, and it offers subdomain filters. Inbound links get counted per page and per site. The tool provides links to the standard submit forms. Yahoo! accepts mass submissions of plain URL lists here.
The number of inbound links seems to be way more accurate than the guessings available from linkdomain: and link: searches. Unfortunately there is no simple way to exclude internal links. So if one wants to check only 3rd party inbounds, a painfull procedure begins:
1. Export of each result page to TSV files, that’s a tab delimited format, readable by Excel and other applications.
2. The export goes per SERP with a maximum of 50 URLs, so one must delete the two header lines per file and append file by file to produce one sheet.
3. Sorting the work sheet by the second column gives a list ordered by URL.
4. Deleting all URLs from the own site gives the list of 3rd party inbounds.
5. Wait for the bugfix “exported data of all result pages are equal” (each exported data set contains the first 50 results, regardless from which result page one clicks the export link).
The result pages provide assorted lists of all URLs known to Yahoo. The ordering does not represent the site’s logical structure (defined by linkage), not even the physical structure seems to be part of the sort order (that’s not exactly what I would call a “comprehensive site map”). It looks like the first results are ordered by popularity, followed by a more or less unordered list. The URL listings contain fully indexed pages, with known but not (yet) indexed URLs mixed in (e.g. pages with a robots “noindex” meta tag). The latter can be identified by the missing cached link.
1. A filter “with/without internal links”.
2. An export function outputting the data of all result pages to one single file.
3. A filter “with/without” known but not indexed URLs.
4. Optional structural ordering on the result pages.
5. Operators like filetype: and -site:domain.com.
6. Removal of the 1,000 results limit.
7. Revisiting of submitted URL lists a la Google sitemaps.
Overall, the site explorer is a great tool and an appreciated improvement, despite the wish list above. The most interesting part of the new toy is its API, which allows querying for up to 1,000 results (page data or link data) in batches of 50 to 100 results, returned in a simple XML format (max. 5,000 queries per IP address per day).
Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to Entries Comments All Comments