Getting new stuff crawled in real-time

Posted on 18 August, 2009

Thanks to PubSubHubbub we’ve got another method to invite Googlebot.

Just add a link to FriendFeed, for example by twittering or bookmarking it. Once the URI hits your FriendFeed account, FriendFeed shares it with Googlebot, and both of them request it before the tweet appears on your timeline:

Date/Time	Request URI	IP	User agent
2009–08–17 15:47:19	Your URI	38.99.68.206 / 38.99.68.206	Mozilla/5.0 (compatible; FriendFeedBot/0.1; +Http://friendfeed.com/about/bot)
2009-08-17 15:47:19	Your robots.txt	66.249.71.141 / crawl-66-249-71-141.googlebot.com	Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2009-08-17 15:47:19	Your URI	66.249.71.141 / crawl-66-249-71-141.googlebot.com	Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Neat. Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot […]. Even when the actual fetch is meant to feed Google Reader or something, you’ve got your stuff into Google’s crawling cache from where other services like the Web indexer can pick it without requesting a crawl.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Sebastian | SEO, Google | Related posts

14 Comments to "Getting new stuff crawled in real-time"

Richard Hearne on 18 August, 2009 #link

Interesting. What’s the relationship here?

Ignore, just read the link. So can Joe BLogs ping this service also, say for instance via WP ping?
Sebastian on 18 August, 2009 #link

Yup. With WordPress you can ping a hub, here’s a WP PubSubHubbub plugin.
mosquito on 18 August, 2009 #link

In this case, what reason is acceptable? FriendFeed pings Google via a pubsubhubbub hub, or Google scans pubsubhubbub.appspot.com hub?

Thanks!
Sebastian on 18 August, 2009 #link

Prolly Googlebot is the hub’s subscriber.
The Weekly Insider 8-10-09 to 8-21-09 on 21 August, 2009 #link

[…] Getting new stuff crawled in real-time […]
Michael Thomas on 25 August, 2009 #link

That is brilliant, if you had made a change with some extra keywords distributed around the page would this help your SEO? Would it bump up the results?
Steve on 2 September, 2009 #link

umm I don’t follow.. I did this - you’re saying say I have a site with 5,000 links.. if I were to manually tweet each link after doing this, google will index and cache all 5,000 links?

sounds kinda easy??
Sebastian on 2 September, 2009 #link

Not really, Steve. Googlebot will crawl those links, but that doesn’t mean they get indexed, or even get a ranking boost if indexed already.
graduate admission essay on 18 September, 2009 #link

Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot.

Can you elaborate on this? Why is this one handy? Please explain it. Thank you.

[Guess I’ll explain a slightly shady technique on request of an assclown spamming my comments? No way.]
Redirection 302, contenu dupliqué et autres infos sur l'indexation on 17 November, 2009 #link

[…] nettoyage des parties existantes de l’index prend encore plus de temps. Bien entendu, il existe des manières d’initier le crawl de Google (phase séparée de la mise à jour dans l’index), mais il est indéniable que les sites moins […]
Handle your (UGC) feeds with care! on 20 November, 2009 #link

[…] crawling http://twitter.com/SebastianX or the timeline page of any of your followers. Magic? Nope. The same goes for favs/retweets, stumbles, delicious bookmarks etc., by the […]
How Search Engines Find Your Content | Search Engine Journal on 28 January, 2010 #link

[…] RSS and Atom – along the way search engines realized that a page with no links weren’t being found easily and that some query spaces required fresher results. How can this be sorted? Well, they started indexing from RSS aggregators such as Google Reader (in Google’s case). Oh and I’d also look at PubSubHubub […]
Kiran Voleti on 19 April, 2010 #link

If you identi.ca it will also index in just few seconds.
Joachim’s Presentation at SMX East: A Treasure Trove for SEOs : RKGBlog on 7 October, 2011 #link

[…] “existing part of the index” takes an even longer period of time. There are of course ways to issue a crawl from Google (which is separate from an index update, of course), but by and large lesser-known […]

Sebastian’s Pamphlets

Getting new stuff crawled in real-time

14 Comments to "Getting new stuff crawled in real-time"

Leave a reply

Categories

Monthly Archives

Links

RSS Feeds