Yahoo’s Site Explorer

There is a lot of interesting reading in the SES coverage by Search Engine Roundable. Currently I’m a little sitemap addicted, thus Tim Mayer’s announcement got my attention:

Tim announces a new product named Site Explorer, where you can get your linkage data. It is a place for people to go to see which pages Yahoo indexed and to let Yahoo know about URLs Yahoo has not found as of yet …. He showed an example, you basically type in a URL into it (this is also supported via an API…), then you hit explore URL and it spits out the number of pages found in Yahoo’s index and also shows you the number of inbound links. You can sort pages by “depth” (how deep pages are buried) and you can also submit URLs here. You can also quickly export the results to TSV format.

Sounds like a pretty comfortable tool to do manual submissions, harvest data for link development etc. etc. Unfortunately it’s not yet life, I’d love to read more about the API. The concept outlined above makes me think that I may get an opportunity to shove my fresh content into Yahoo’s index way faster than today, because in comparison to other crawlers Yahoo! Slurp is a little lethargic:

Crawler stats
(tiny site)
Page Fetches robots.txt Fetches
Googlebot 7755 30 73.34 MB 11 Aug 2005 - 00:03
MSNBot 1627 98 39.86 MB 10 Aug 2005 - 23:38
Yahoo! Slurp 385 204 13.61 MB 10 Aug 2005 - 23:53

I may be misleaded here, but Yahoo’s Site Explorer announcement could indicate that Yahoo will not implement Google’s Sitemap Protocol. That’ll be a shame.

Tim Mayer in another SES session:
Q: “Is there a way to do the Google sitemaps type system at Yahoo?”
Tim: We just launched the feed to be able to do that. We will be expanding the products into the future.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Good News from Google

Google is always good for a few news: since yesterday news queries are availavle as RSS feed. That’s good news, although Google shoves outdated HTML (font tags and alike) into the item description. It’s good practice to separate the content from its presentation, and hard coded background colors in combination with foreign CSS can screw a page, thus webmasters must extract the text content if they want to make use of Google’s news feeds.

As for Google and RSS, to adjust Ms. Googlebot’s greed on harvested feeds, Google needs to install a ping service. Currently Ms. Googlebot requests feeds way too often, because she spiders them based on guesses and time schedules (one or more fetches every 15 minutes). From my wish list: http://feeds.google.com/ping?feedURI usable for submissions and pings on updates.

Google already makes use of ping technology in the sitemap program, so a ping server shouldn’t be a big issue. Apropos sitemaps: the Google Sitemaps team has launched Inside Google Sitemaps. While I’m on Google bashing, here is a quote from the welcome post (tip: a prominent home link on every page wouldn’t hurt, especially since the title is linked to google.com instead of the blog):

When you submit your Sitemap, you help us learn more about the contents of your site. Participation in this program will not affect your pages’ rankings or cause your pages to be removed from our index.

That’s not always true. Googlebot discovering a whole site will find a lot of stuff which is relevant for rankings, for example anchor text of internal links on formerly unknown pages, and this may improve a site’s overall search engine visibility. On the other hand sitemap based junk submissions can easily tank a site on the SERPs.

Last but not least Google has improved its wildcard search and can tell us now what SEO is all about *. Compare the search result to Google’s official SEO page and wonder.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The Power of Search Related Blogs

Aaron Wall posted a great call to help stop the war in northern Uganda. He argues that when search blogs can get the #1 spot at for a missing member of the community, the same can be done to gain attention to a war where children get abused as cannon fodder. Visit Uganda Conflict Action Network for more information, because you won’t find it at CNN or elsewhere.

Aaron’s call for action:

If you do not like the idea of children being abuducted, murdered, and living in constant fear please help. A few options:

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google Has Something Cooking

Covered by a smoke screen, Google performs an internal and top secret summer of coding. Charlie Ayers, Google’s famous chef, has decided to leave the GooglePlex. Larry & Sergey say that hungry engineers do work harder on algo tweaks and celestial projects as well. While SEOs and Webmasters are speculating on the strange behavior of the saucy Googlebot sisters, haggard engineers are cooking their secret sauce in the labs. Under those circumstances, some collateral damage is preprogrammed, but hungry engineers don’t care of a few stinkin’ directories they blow away by accident. Shit happens, don’t worry, failure is automated by Google. Seriously, wait for some exciting improvements in the googlesphere.

Tags: Bollocks removed. Here you go.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Fresh Content is King

Old news:
A bunch of unique content and a high update frequency increases search engine traffic.
Quite new:
Leading crawlers to fresh content becomes super important.
Future news:
Dynamic Web sites optimized to ping SE crawlers outrank established sites across the boards.

Established methods and tools to support search engine crawlers are clever internal linkage, sitemap networks, ‘What’s new’ pages, inbound links from high ranked and often changed pages etc etc. To a limited degree they still lead crawlers to fresh and not yet spidered old content. Time to crawl and time to index are dissatisfying, because the whole system is based on pulling and depends on the search engine backend system’s ability to guess.

Look back and forth at Google: Google News, Froogle, Sitemaps and rumors on blogsearch indicate a change from progressive pulling of mass data to proactive and event driven picking of fewer fresh data. Google will never stop crawling based on guessing, but has learned how to localize fresh content in no time by making use of submissions and pings.

Blog search engines more or less perfectly fulfil the demand on popular fresh content. The blogosphere pings blog search engines, that is why they are that up to date. The blogosphere is huge and the amount of blog posts is enormous, but it is just a tiny part of the Web. Even more fresh content is still published elsewhere, and elsewhere is the playground of the major search engines, not even touched by blog search engines.

Google wants to dominate search, and currently it does. Google cannot ignore the demand on fresh and popular content, and Google cannot lower the relevancy of search results. Will Google’s future search results be ranked by sort of ‘recent relevancy’ algos? I guess not in general, but ‘recent relevancy’ is not an oxymoron, because Google can learn to determine the type of the requested information and deliver more recent or more relevant results depending on the query context and tracked user behavior. I’m speculating here, but it is plausible and Google already has developed all components necessary to assemble such an algo.

Based on the speculation above, investments in RSS technology and alike should be a wise business decision. If ‘ranking by recent relevancy’ or something similar comes true, dynamic Web sites with the bigger toolset will often outrank the established but more static organized sources of information.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Bait Googlebot With RSS Feeds

Seeing Ms. Googlebot’s sister running wild on RSS feeds, I’m going to assume that RSS feeds may become a valuable tool to support Google’s fresh and deep crawls. Test it for yourself:

Create a RSS feed with a few unlinked or seldom spidered pages which are not included in your XML sitemap. Add the feed to your personalized Google Home Page (’Add Content’ -> ‘Create Section’ -> Enter Feed URL). Track spider accesses to the feed and the included pages as well. Most probably Googlebot will request your feed more often than Yahoo’s FeedSeeker and similar bots. Chances are that Googlebot-Mozilla is nosy enough to crawl at least some of the pages linked in the feed.

That does not help a lot with regard to indexing and ranking, but it seems to be a neat procedure helping the Googlebot sisters spotting fresh content. In real life add the pages to your XML sitemap, link to them and acquire inbound links…

To test the waters, I’ve added RSS generation to my Simple Google Sitemaps Generator. This tool reads a plain page list from a text file, and generates a dynamic XML sitemap, a RSS 2.0 site feed and a hierarchical HTML site map.

Related article on Google’s RSS endeavors: Why Google is an RSS laggard

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Take Free SEO Advice With a Grain of Salt

Yesterday I’ve spotted again how jokes become rumors. It happens every day and sometimes it even hurts. In my post The Top-5 Methods to Attract Search Engine Spiders I was joking about bold dollar signs driving the MSN bot crazy. A few days later I discovered the first Web site making use of the putative ‘$$-trick’. To make a sad story worse, the webmaster has put the dollar signs as hidden text.

This reminds me to the spreading of the ghostly robots revisit META tag. This tag was used by a small regional canadian engine for local indexing in the stone age of the Internet. Today every free META tag generator on the net produces a robots revisit tag. Not a single search engine is interested in this tag. It was never standardized. But it’s present on billions of Web pages.

This is how bad advice becomes popular. Folks read nasty tips and tricks on the net and don’t apply common sense when they implement it. There is no such thing as free and good advice on the net. Even good advice on a particular topic can result in astonishing effets when applied outside its context. It’s impossible to learn SEO from free articles and posts on message boards. Go see a SEO, it’s worth it.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

AdSense Crawler Downloads XML Sitemaps

If your site provides a Google XML Sitemap, check your server logs for entries like

2005-07-27 13:33:01
/mycutesitemap.xml
Mediapartners-Google/2.1
66.249.66.47
GET
crawl-66-249-66-47.googlebot.com

It looks like Google has launched a new phase of the Sitemaps project, and this could help targeting AdSense ads to a great degree. If AdSense gets alerted to current content changes (submitted via the sitemap’s Last Modified attribute) before the next scheduled crawl occurs (probably next month or so), it could become easier to tweak pages carrying AdSense ads.

If your Web site lacks a dynamic XML sitemap, implement this feature as soon as possible. It could increase your AdSense revenue.

Learn more about Google XML Sitemaps here.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google’s Own XML Sitemap Explored

While developing a Google XML Sitemap parser, I couldn’t resist to test my tool with Google’s own XML sitemap. The result is somewhat astonishing, even for a BETA software.

Parsing only the first 100 entries, I found lots of 404s (page not found), and both 301 (moved permanently) and 302 (found somewhere else) redirects. Site owners get their sitemaps declined for less invalid entries. It seems Google does not use its own XML sitemap.

View the page list parsed from Google’s Sitemap here.

Dear Google, please forgive me. I just had to publish this finding ;)

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Just Another Free Sitemap Tool Launched

FREE (Google) Sitemap Tool for Smaller Web Sites I get a lot of feedback on my Google Sitemaps Tutorial and related publications. I read the message boards and newsgroups. I’ve learned that there are lots of smaller Web sites out there, where the site owner wants to provide both a Google XML Sitemap and a HTML site map, but there are close to zero tools available to support those Web publishers. At least the suitable tools are not free of charge, respectively most low-cost content management systems don’t create both sitemap variants.

To help out those Web site owners, I’ve written a pretty simple PHP script generating dynamic Google XML Sitemaps as well as pseudo-static HTML site maps from one set of page data. Both the XML sitemap and the viewable version pull their data from a plain text file, where the site owner or Web designer adds a new a line per page after updates.

The Google XML Sitemap is a PHP script reflecting the current text files’s content on request. It writes a static HTML site map page to disk. Since Googlebot downloads XML site maps every 12 hours like a clockwork, the renderable sitemap gets refreshed at least twice per day.

The site owner or Web designer just needs to change a simple text file on updates, and after the upload Googlebot recreates the sitemaps. Ain’t that cute?

Curious? Here is the link: Simple Sitemaps 1.0 BETA

Although this free script provides a pretty simple sitemap solution, I wouldn’t use it with Web sites containing more than 100 pages. Why not? Site map pages carrying more than 100 links may devalue the links. On the average Web server my script will work with hundreds of pages, but from a SEOs point of view that’s counter productive.

Please download the script and tell me what You think. Thanks!

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30  Next Page »