Archived posts from the 'XML-Sitemaps' Category

An Inofficial FAQ on Google Sitemaps

Yesterday I’ve launched the inofficial Google Sitemaps FAQ. It’s not yet complete, but it may be helpful if you’ve questions like

I’ve tried to answer questions which Google cannot or will not answer, or where Google would get murdered for any answer other than “42″. Even if the answer is a plain “no”, I’ve provided backgrounds, and solutions. I’m not a content thief, and I hate useless redundancies, so don’t miss out on the official FAQ and the sitemaps blog.

Enjoy, and submit interesting questions here. Your feedback is very much appreciated!

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Yahoo! Site Explorer Finally Launched

Finally the Yahoo! Site Explorer (BETA) got launched. It’s a nice tool showing a site owner and the competitors all indexed pages per domain, and it offers subdomain filters. Inbound links get counted per page and per site. The tool provides links to the standard submit forms. Yahoo! accepts mass submissions of plain URL lists here.

The number of inbound links seems to be way more accurate than the guessings available from linkdomain: and link: searches. Unfortunately there is no simple way to exclude internal links. So if one wants to check only 3rd party inbounds, a painfull procedure begins:
1. Export of each result page to TSV files, that’s a tab delimited format, readable by Excel and other applications.
2. The export goes per SERP with a maximum of 50 URLs, so one must delete the two header lines per file and append file by file to produce one sheet.
3. Sorting the work sheet by the second column gives a list ordered by URL.
4. Deleting all URLs from the own site gives the list of 3rd party inbounds.
5. Wait for the bugfix “exported data of all result pages are equal” (each exported data set contains the first 50 results, regardless from which result page one clicks the export link).

The result pages provide assorted lists of all URLs known to Yahoo. The ordering does not represent the site’s logical structure (defined by linkage), not even the physical structure seems to be part of the sort order (that’s not exactly what I would call a “comprehensive site map”). It looks like the first results are ordered by popularity, followed by a more or less unordered list. The URL listings contain fully indexed pages, with known but not (yet) indexed URLs mixed in (e.g. pages with a robots “noindex” meta tag). The latter can be identified by the missing cached link.

Desired improvements:
1. A filter “with/without internal links”.
2. An export function outputting the data of all result pages to one single file.
3. A filter “with/without” known but not indexed URLs.
4. Optional structural ordering on the result pages.
5. Operators like filetype: and -site:domain.com.
6. Removal of the 1,000 results limit.
7. Revisiting of submitted URL lists a la Google sitemaps.

Overall, the site explorer is a great tool and an appreciated improvement, despite the wish list above. The most interesting part of the new toy is its API, which allows querying for up to 1,000 results (page data or link data) in batches of 50 to 100 results, returned in a simple XML format (max. 5,000 queries per IP address per day).

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google’s Own XML Sitemap Explored

While developing a Google XML Sitemap parser, I couldn’t resist to test my tool with Google’s own XML sitemap. The result is somewhat astonishing, even for a BETA software.

Parsing only the first 100 entries, I found lots of 404s (page not found), and both 301 (moved permanently) and 302 (found somewhere else) redirects. Site owners get their sitemaps declined for less invalid entries. It seems Google does not use its own XML sitemap.

View the page list parsed from Google’s Sitemap here.

Dear Google, please forgive me. I just had to publish this finding ;)

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Just Another Free Sitemap Tool Launched

FREE (Google) Sitemap Tool for Smaller Web Sites I get a lot of feedback on my Google Sitemaps Tutorial and related publications. I read the message boards and newsgroups. I’ve learned that there are lots of smaller Web sites out there, where the site owner wants to provide both a Google XML Sitemap and a HTML site map, but there are close to zero tools available to support those Web publishers. At least the suitable tools are not free of charge, respectively most low-cost content management systems don’t create both sitemap variants.

To help out those Web site owners, I’ve written a pretty simple PHP script generating dynamic Google XML Sitemaps as well as pseudo-static HTML site maps from one set of page data. Both the XML sitemap and the viewable version pull their data from a plain text file, where the site owner or Web designer adds a new a line per page after updates.

The Google XML Sitemap is a PHP script reflecting the current text files’s content on request. It writes a static HTML site map page to disk. Since Googlebot downloads XML site maps every 12 hours like a clockwork, the renderable sitemap gets refreshed at least twice per day.

The site owner or Web designer just needs to change a simple text file on updates, and after the upload Googlebot recreates the sitemaps. Ain’t that cute?

Curious? Here is the link: Simple Sitemaps 1.0 BETA

Although this free script provides a pretty simple sitemap solution, I wouldn’t use it with Web sites containing more than 100 pages. Why not? Site map pages carrying more than 100 links may devalue the links. On the average Web server my script will work with hundreds of pages, but from a SEOs point of view that’s counter productive.

Please download the script and tell me what You think. Thanks!

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Googlebots go Fishing with Sitemaps

I’ve used Google Sitemaps since it was launched in June. Six weeks later I say ‘Kudos to Google’, because it works even better than expected. Making use of Google Sitemaps is definitely a must, at least for established Web sites (it doesn’t help much with new sites).

From my logging I found some patterns, here is how the Googlebot sisters go fishing:
· Googlebot-Mozilla downloads the sitemaps 6 times per day, every 8 hours 2 fetches like a clockwork (or every 12 hours lately, now up to 4 fetches within a few minutes from the same IP address). Since this behavior is not documented, I recommend the implementation of automated resubmit-pings however.
· Googlebot fetches new and updated pages harvested from the sitemap, at the latest 2 days after inclusion in the XML file, respectively after providing a current last modified value. Time to index is constantly maximal 2 days. There is just one fetch per page (as long as the sitemap doesn’t submit another update), resulting in a complete indexing (Title, snippets, and cached page). Sometimes she ‘forgets’ a sitemap-submitted URL, but fetches it later following links (this happens with very similar new URLs, especially when they differ only in a query string value). She crawls and indexes even (new) orphans (pages not linked from anywhere).
· Googlebot-Mozilla acts as a weasel in Googlebot’s backwash and is suspected to reveal her secrets to AdSense.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Is Google Sitemaps an Index Wiper?

A few weeks after the launch of Google Sitemaps, some unlucky site owners participating in Google’s new service, find their pages wiped out from Google’s index.

As for small and new sites, that’s pretty common and has nothing to do with Google Sitemaps. When it comes to established Web sites disappearing partly or even completely, that’s another story.

There is an underestimated and often overseen risk assigned to sitemap submissions. Webmasters who have made use of sitemap tools generating the sitemaps from the web server’s file system, may have submitted ‘unintended spider food’ to Google, and quickly triggered a spam filter.

At least with important sites, it’s a must to double-check the generated sitemaps before they get submitted. Sitemap generators may dig out unlinked and outdated junk, for example spider trap links pages and doorway pages from the last century.

More info here.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2