Cat post: Life’s getting better!

Even the cat says life's getting betterSince I’ve moved this blog here my traffic has nicely improved. Skip the bragging, hop into the grub. During the past three weeks my feed subscriptions went up by 17%, and daily uniques by 460% (I never got stumbled at blogspot).

Thanks to Google’s newish BlitzIndexing my SERP referrers went up from 1 (August/24/2007) to 40 yesterday, naturally all long tail searches by the way. I didn’t count the bogus MSN referrer spam suggesting I rank for “yahoo”, “pontiac” and whatnot.

The search engine crawlers are quite buzzy too. When Ms. Googlebot was ready fetching all URLs 3-5 times each, Slurp and MSNbot wholehearted joined the game. Google has indexed roughly 500 pages, my Webmaster Central account counts 2,000 inbound links, shows PageRank, anchor text stats and all the other neat stuff. Yahoo has indexed 90 pages and 1,000 inbound links. Although MSN has crawled a lot, they’ve indexed a whopping 2 pages. Wow.

I’d be quite happy since my blog’s life is getting better, if there weren’t that info from Google via Matt Cutts’s blog telling me that 80% of my pages are considered useless crap, at least from Google’s perspective. That’s not a joke. Google dislikes 80% of this wonderful blog, although it contains only 20% Google bashing. Weird.

I repeated the searches below multiple times, so what I’ve spotted is not an isolated phenomenon, nor coincidence. Here’s what the standard site search query shows, 494 indexed pages:
Google site search for Sebastian's Pamphlets

Next I’ve used Matt’s power toy limiting the results to pages Google discovered within the past 30 days (&as_qdr=d30). Please note that 30 days ago this domain didn’t exist. I’ve installed WordPress on August/16/2007, the day I’ve registered the domain, that means 29 days ago I’ve created the very first indexable page. The rather astonishing result is 89 indexed pages:
Google site search for Sebastian's Pamphlets for the past 30 days

Either Matt’s time tunnel for power searchers is only 20% accurate, or 80% of my stuff went straight into the supplemental index from where advanced search can’t pull it.

The latter presumption is plausible, because the site is new, 99% or so of my deep links came in 2-3 weeks ago via 301 redirects so that the pages have no PageRank yet, and for most of the URLs Google noticed near duplicates with source bonus from my old blogspot outlet, not to speak of scraped stuff on low-life servers. Roughly 90 pages can have got noticable PageRank yet, judging from my understanding of my fresh inbound links and my internal linkage. Interestingly, those 90 pages on my blog have a real world timestap after the funeral of the blogspot thingy, and content that didn’t exist over there. That could lead to interesting theories, however I guess that indeed <speculation>time restricted searches don’t pull pages from the supplemental hell</speculation>. Reminds me of the fact that Google’s link stats show nofollow’ed links and all that, but not a single link from a page buried in the supp index.

Did Matt by accident reveal a sure-fire procedure to identify supplemental results? I mean they can’t make timely searches defunct like /& and undocumented stuff like that. I’ve tested the method with two sites where I know the supp ratio and the results were kinda plausible, but that’s no proof. Of course I couldn’t resist to post this vague piece of speculation before doing solid research. Maybe I’m dead wrong.

What do you guys think? Flame me in the comments. :)

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

19 Comments to "Cat post: Life's getting better!"

  1. Hugo Guzman on 13 September, 2007  #link

    Have you tried doing a search, within the advanced search result, for a unique term found on one of the pages that seems to be in supplemental hell?

    I thought I found a similar loophole, but when I searched for my supposed “supplemental results” urls by using an actual search term (i.e. the title tag of the article in question) it did show up in non-supplemental results.

    I might be missing your point here, but let me know if this illuminates anything.


  2. JLH on 13 September, 2007  #link

    What’s blogging with wild-ass-guesses?

  3. Andy Beard on 13 September, 2007  #link

    It seems a little bugged to me

    Maybe a data centre problem

    Doing a /* search and comparing to total index I have only 20% supplemental

  4. Sebastian on 13 September, 2007  #link

    Andy, when you test it with your site you shouldn’t do that with a limit of 30 days, because your domain is around way longer. Stolen from Matt:

    The nice thing is that you can change the value of as_qdr to custom intervals. Here are all the possible values of the as_qdr parameter:

    d[number] - past number of days (e.g.: d10)
    w[number] - past number of weeks
    y[number] - past number of years

    What about this search, still buggy or are 1,400 pages in the main index plus 390 in supp hell plausible?

    The /* thingy definitely lists pages which are supplemental by the way, I found it extremely unreliable.

  5. Sebastian on 13 September, 2007  #link

    John, that’s fishing for information, IOW laziness.

  6. JLH on 13 September, 2007  #link

    I meant to say, “What’s blogging without wild-ass-guesses?” but since I’ve somehow lost the ability to type and think at the same time it came out wrong.

    As I said on Sphinn, how about we make a secret deal with Matt that if he Sphinn’s it, it’s an un-official nudge-nudge-wink-wink that you may be on to something.

    I heard of another supplemental test that I promised not to share, so maybe I can check against your method.

  7. Sebastian on 13 September, 2007  #link

    Hugo, I tried a couple quoted searches for indexed pages which did not show on the 30 days search and they didn’t come up. That’s typical for pages living in the supplemental index. So far it stands, but that’s still no evidence without broader tests.

  8. Andy Beard on 13 September, 2007  #link

    1400 is certainly similar to the 1420 returned by a /* search example

  9. Hugo Guzman on 13 September, 2007  #link

    Here’s an example of what I was referring to…

    Time-based search (one week) for a specific domain

    You’ll notice that only two urls show up (there are a total of 39 when you view the supplemental results).

    I went ahead and performed a follow-up search for the term “Bills Three Starters Down” (I didn’t use quotes when I searched, BTW) and found that one of the urls that appeared to be in supplemental hell did in fact show up (in position #1.

    I took this to mean that despite initial result (when doing a time-based search) this url is not stuck in supplemental hell.

    Does that help?

  10. Sebastian on 13 September, 2007  #link

    Perhaps I’m confused, but filtered similar results and supplemental results is not the same. Pages which are *only* in the supplemental index shouldn’t come up for quoted searches like [”He couldn’t kick anything beyond an extra point or chip-shot, and it came back to bite the Jaguars in the end”].

  11. Hugo Guzman on 13 September, 2007  #link

    LOL…damn, now I’m confused.

    Am I looking at the wrong thing? Let me read through your post again and see if I’m missing something.

  12. Richard Hearne on 13 September, 2007  #link

    All I can say is I hope this isn’t real…

    I get ‘In order to show you the most relevant results, we have omitted some entries very similar to the 3 already displayed.’ for one site… *sigh*

    Testing my own site I get 275 from 396 which sounds completely plausible.

    Nice find Sebastian

  13. Sebastian on 13 September, 2007  #link

    Richard, I’ve resubmitted the query and noticed that Google has read my post and accomodated the number of pages in the main index to 90 accordingly. ;)

  14. Richard Hearne on 15 September, 2007  #link

    So Matt’s comment over on Sphinn that he didn’t mind if someone found a backdoor to supps might also mean ‘I don’t mind, but we’re still going to close this’…

  15. Sebastian on 15 September, 2007  #link

    Matt said it makes no sense to show the supplemental status to end users any more. One can read that as “we ignore supp obsessed SEOs”. We’ll see.

  16. Clint Dixon on 15 September, 2007  #link

    Supplemental Results did not go away nor will they.

    This second index (supplemental) has been around since Google started, originally there was a main and forward index I believe.

    The main index would be what is known now as the Supplemental Index its where all webpages go.

    The forward index is where pages that are likely to be returned for a users query go and why google can return results in nano-seconds.

    I explain how to find if any site is in the supplemental results in under 60 seconds


  17. Sebastian on 15 September, 2007  #link

    Clint, your search using the filter=0 parameter shows similar pages, not supplemental pages. Not all similar pages live in the supp index, some live in both. As for the Matt bait at Sphinn … of course he didn’t reveal anything, his post just inspired me to play with time limited searches.

  18. NY SEO on 10 October, 2007  #link

    Greetings fellow Google Basher,

    80% supplemental? That confounded Google Algo needs some work!

    Especially since they are toying with the fabric of the web with the whole link snitching fiasco.

    There are several priceless posts here! So 80% supp has to be temporary, hopefully due to your migration.

    Well, you have a handful social bookmarks & votes from me, and I imagine many others will find your information useful as well.

    Install the “share this” plug-in and the links should be rolling in.

    Thanks for sharing your expertise dude.

  19. Sebastian on 11 October, 2007  #link

    Thanks Marc. ;)

    I still have all pages which deserve it in the main index. Nearly all pages from this blog which populate the supplemental index are old posts with next to no inbound links, and archives. Tiered search indexes are not a bad thing in general. Try to see it the other way round: without tiered indexes you’d lose a shitload of SE traffic from all major engines, because they would feed the main index only, within its physical limits, and ignore the lesser important pages (remember the old Altavista for example where unsubmitted and more or less unlinked pages simply phased out). Of course it would be nice when all engines cache the whole Web in their main indexes, but that would mean more competition, and less working methods (respectively more SEO efforts necessary) to promote particular pages. I can handle the different indexes quite well, and I’m preparing my methods for expected changes (growing main indexes and maybe even decreasing secondary indexes over time).

Leave a reply

[If you don't do the math, or the answer is wrong, you'd better have saved your comment before hitting submit. Here is why.]

Be nice and feel free to link out when a link adds value to your comment. More in my comment policy.