Getting new stuff crawled in real-time

Thanks to PubSubHubbub we’ve got another method to invite Googlebot.

Just add a link to FriendFeed, for example by twittering or bookmarking it. Once the URI hits your FriendFeed account, FriendFeed shares it with Googlebot, and both of them request it before the tweet appears on your timeline:

Date/Time Request URI IP User agent
2009–08–17 15:47:19 Your URI 38.99.68.206 / 38.99.68.206 Mozilla/5.0 (compatible; FriendFeedBot/0.1; +Http://friendfeed.com/about/bot)
2009-08-17 15:47:19 Your robots.txt 66.249.71.141 / crawl-66-249-71-141.googlebot.com Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2009-08-17 15:47:19 Your URI 66.249.71.141 / crawl-66-249-71-141.googlebot.com Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

 

Neat. Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot […]. Even when the actual fetch is meant to feed Google Reader or something, you’ve got your stuff into Google’s crawling cache from where other services like the Web indexer can pick it without requesting a crawl.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

14 Comments to "Getting new stuff crawled in real-time"

  1. Richard Hearne on 18 August, 2009  #link

    Interesting. What’s the relationship here?

    Ignore, just read the link. So can Joe BLogs ping this service also, say for instance via WP ping?

  2. Sebastian on 18 August, 2009  #link

    Yup. With WordPress you can ping a hub, here’s a WP PubSubHubbub plugin.

  3. mosquito on 18 August, 2009  #link

    In this case, what reason is acceptable? FriendFeed pings Google via a pubsubhubbub hub, or Google scans pubsubhubbub.appspot.com hub?

    Thanks!

  4. Sebastian on 18 August, 2009  #link

    Prolly Googlebot is the hub’s subscriber.

  5. […] Getting new stuff crawled in real-time […]

  6. Michael Thomas on 25 August, 2009  #link

    That is brilliant, if you had made a change with some extra keywords distributed around the page would this help your SEO? Would it bump up the results?

  7. Steve on 2 September, 2009  #link

    umm I don’t follow.. I did this - you’re saying say I have a site with 5,000 links.. if I were to manually tweet each link after doing this, google will index and cache all 5,000 links?

    sounds kinda easy??

  8. Sebastian on 2 September, 2009  #link

    Not really, Steve. Googlebot will crawl those links, but that doesn’t mean they get indexed, or even get a ranking boost if indexed already.

  9. graduate admission essay on 18 September, 2009  #link

    Of course there are other methods to invite search engine crawlers in lightning speed, but this one comes handy because it can run on autopilot.

    Can you elaborate on this? Why is this one handy? Please explain it. Thank you.

    [Guess I’ll explain a slightly shady technique on request of an assclown spamming my comments? No way.]

  10. […] nettoyage des parties existantes de l’index prend encore plus de temps. Bien entendu, il existe des manières d’initier le crawl de Google (phase séparée de la mise à jour dans l’index), mais il est indéniable que les sites moins […]

  11. Handle your (UGC) feeds with care! on 20 November, 2009  #link

    […] crawling http://twitter.com/SebastianX or the timeline page of any of your followers. Magic? Nope. The same goes for favs/retweets, stumbles, delicious bookmarks etc., by the […]

  12. […] RSS and Atom – along the way search engines realized that a page with no links weren’t being found easily and that some query spaces required fresher results. How can this be sorted? Well, they started indexing from RSS aggregators such as Google Reader (in Google’s case). Oh and I’d also look at PubSubHubub […]

  13. Kiran Voleti on 19 April, 2010  #link

    If you identi.ca it will also index in just few seconds.

  14. […] “existing part of the index” takes an even longer period of time. There are of course ways to issue a crawl from Google (which is separate from an index update, of course), but by and large lesser-known […]

Leave a reply


[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.