Blogger to rule search engine visibility?

Via Google’s Webmaster Forum I found this curiosity:
http://www.stockweb.blogspot.com/robots.txt

User-agent: *
Disallow: /search
Disallow: /

A standard robots.txt at *.blogspot.com looks different:

User-agent: *
Disallow: /search
Sitemap: http://*.blogspot.com/feeds/posts/default?orderby=updated

According to the blogger the blog is not private, what would explain the crawler blocking:

It is a public blog. In the past it had a standard robots.txt, but 10 days ago it changed to “Disallow: /”

Copyscape thinks that the blog in question shares a fair amount of content with other Web pages. So does blog search:
http://stockweb.blogspot.com/2007/07/ukraine-stock-index-pfts-gained-97-ytd.html
has a duplicate, posted by the same author, at
http://business-house.net/nokia-nok-gains-from-n-series-smart-phones/,
http://stockweb.blogspot.com/2007/07/prague-energy-exchange-starts-trading.html
is reprinted at
http://business-house.net/prague-energy-exchange-starts-trading-tomorrow/
and so on. Probably a further investigation would reveal more duplicated contents.

It’s understandable that Blogger is not interested in wasting Google’s resources by letting Ms. Googlebot crawl the same contents from different sources. But why do they block other search engines too? And why do they block the source (the posts reprinted at business-house.net state “Originally posted at [blogspot URL]”)?

Is this really censorship, or just a software glitch, or is it all the blogger’s fault?

Update 07/26/2007: The robots.txt reverted to standard contents for unknown reasons. However, with a shabby link neigborhood as expressed in the blog’s footer I doubt the crawlers will enjoy their visits. At least the indexers will consider this sort of spider fodder nauseous.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Hey, there is content in the widgets!

Yeah, I do know the layout of this blog is somewhat cluttered. Especially the sidebar with all the JS script calls slowing down page loads. Not that Blogger page load times are exiting at all, especially not with the classic template. Forgive me, I just can’t stay away from fancy stuff.

Perhaps you’re not exactly interested in my twits telling you that my monsters are asleep and I can code untroubled, or that I’ve dugg or sphunn my friends’ posts. Perfectly legit votings of course, since we share that many interests so that I often like what my buddies write and submit to whatever social bookmarking services or communities.

Of course you couldn’t care less on stats like how many blogs in the Technorati universe (which is a tiny subset of the GoogleBlogSearch universe, which is a tiny subset of the blogosphere, which is a tiny subset of the Web … Ok, you don’t give a f***) link to my pamplets. Actually, here you could help me out, just put me on your blogroll. Honestly, the lack of backlinks is scandalous. Everybody reads my stuff but very few of you dear readers link to me. I don’t consider scrapers readers, so their links don’t count. Since my audience consists of 99% Webmasters, I hope all of you understand the syntax of my beloved A element. I promote lots of nice folks in my diverse blogroll sections, but very few return the honor. Not even the Google blog lists me under “What We’re Reading” (please notice the capital “W” indicating a pluralis majestatis), although I spam FeedFetcher with Google bashing quite frequently. Weird …

And no, the MBL users list doesn’t count as content (but it’s nice to see who visited), and the AdSense stuff is just informational (and remains unclicked by the way, you guys and gals are way too savvy). Oups, I did it again: four inexpressively paragraphs before I come to the point - vice.

Since I add widgets when I discover them, you’ve to scroll down for the GoogleReader thingy. It’s titled “Sebastian’s picked gems“, and I mean that.

When I stumble upon a great post, I share it. That does not mean that I agree 100%, perhaps I even disagree 100%, but when I share a post I believe it’s worth reading. Honestly, you wouldn’t read my pamplets if you wouldn’t share (a few of) my pet peeves, would you?

I guess it’s safe to assume that you’ll enjoy reading my shared articles. Good news is, you can subscribe to the feed of my selected readings. I don’t recycle news, so I don’t blog every tidbit I find on the ‘Net. Hence you should subscribe to the feed and read the content I’d like to have on my blog although I’m too busy (Ok Ok, that’s just a lame excuse for laziness) to publish it myself.

If you read my blog in your preferred feed reader, you’ll miss out on some exciting stuff!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Buying cheap viagra algorithmically

Since Google can’t manage to clean up [Buy cheap viagra] let’s do it ourselves. Go seek a somewhat trusted search blog mentioning “buy cheap viagra” somewhere in the archives and link to the post with a slightly diversified anchor text like “how to buy cheap viagra online“. Matt deserves a #1 spot by the way so spread many links …

Then when Matt is annoyed enough and Google has kicked out the unrelated stuff from this search hopefully my viagra spam will rank as deserved again ;)

Update a few hours later: Matt ranks #1 for [buy cheap viagra algorithmically]:
Matt Cutts's first spot for [buy cheap viagra algorithmically]
His ranking for [buy cheap viagra] fell about 10 positions to #17 but for [buy cheap viagra online] he’s still on the first SERP, now at position #10 (#3 yesterday). Interesting. It seems that Google’s newish turbo-blog-indexing influences the rankings of pages linked from blog posts relatively short dated but not exactly long lasting.

Related posts:
Negative SEO At Work: Buying Cheap Viagra From Google’s Very Own Matt Cutts - Unless You Prefer Reddit? Or Topix? by Fantomaster
Trust + keywords + link = Good ranking (or: How Matt Cutts got ranked for “Buy Cheap Viagra”) by Wiep



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Getting the most out of Google’s 404 stats

The 404 reports in Google’s Webmaster Central panel are great to debug your site, but they contain URLs generated by invalid –respectively truncated– URL drops or typos of other Webmasters too. Are you sick of wasting the link love from invalid inbound links, just because you lack a suitable procedure to 301-redirect all these 404 errors to canonical URLs?

Your pain ends here. At least when you’re on a *ix server running Apache with PHP 4+ or 5+ and .htaccess enabled. (If you suffer from IIS go search another hobby.)

I’ve developed a tool which grabs all 404 requests, letting you map a canonical URL to each 404 error. The tool captures and records 404s, and you can add invalid URLs from Google’s 404-reports, if these aren’t recorded (yet) from requests by Ms. Googlebot.

It’s kinda layer between your standard 404 handling and your error page. If a request results in a 404 error, your .htaccess calls the tool instead of the error page. If you’ve assigned a canonical URL to an invalid URL, the tool 301-redirects the request to the canonical URL. Otherwise it sends a 404 header and outputs your standard 404 error page. Google’s 404-probe requests during the Webmaster Tools verification procedure are unredirectable (is this a word?).

Besides 1:1 mappings of invalid URLs to canonical URLs you can assign keywords to canonical URLs. For example you can define that all invalid requests go to /fruit when the requested URI or the HTTP referrer (usually a SERP) contain the strings “apple”, “orange”, “banana” or “strawberry”. If there’s no persistent mapping, these requests get 302-redirected to the guessed canonical URL, thus you should view the redirect log frequently to find invalid URLs which deserve a persistent 301-redirect.

Next there are tons of bogus requests from spambots searching for exploits or whatever, or hotlinkers, resulting in 404 errors, where it makes no sense to maintain URL mappings. Just update an ignore list to make sure those get 301-redirected to example.com/goFuckYourself or a cruel and scary image hosted on your domain or a free host of your choice.

Everything not matching a persistent redirect rule or an expression ends up in a 404 response, as before, but logged so that you can define a mapping to a canonical URL. Also, you can use this tool when you plan to change (a lot of) URLs, it can 301-redirect the old URL to the new one without adding those to your .htaccess file.

I’ve tested this tool for a while on a couple of smaller sites and I think it can get trained to run smoothly without too many edits once the ignore lists etcetera are up to date, that is matching the site’s requisites. A couple of friends got the script and they will provide useful input. Thanks! If you’d like to join the BETA test drop me a message.

Disclaimer: All data get stored in flat files. With large sites we’d need to change that to a database. The UI sucks, I mean it’s usable but it comes with the browser’s default fonts and all that. IOW the current version is still in the stage of “proof of concept”. But it works just fine ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Sphinn rocks

Thanks to Danny’s crew we’ve got a promising search geek community site. Since I’ve recently started to deal with invites, here is the top secret link where you get your free Sphinn invite. Click it now and join today, as Gorbachev said ‘those who are late will be punished by life itself’ ;)

Previous experiments revealed that my pamphlets aren’t diggworthy, despite the presence of OL/UL lists. Because I mention search and stuff like that every once in a while, I decided to submit a horror story to Sphinn to test the waters over there.

Adding Sphinn-it! widgets to my posts hopefully helps promoting Sphinn, but with Blogger that turned into kinda nightmare. To prevent you from jumping through infinite try-and-error hoops, here is how it works:

Classic templates:

Search for $BlogItemBody$ and below the </div> put

<script type='text/javascript'>submit_url='<$BlogItemPermalinkUrl$>';</script>
<script src=’http://sphinn.com/evb/button.php’ type=’text/javascript’/></script>

(Blogger freaks out when you omit the non-standard ;</script> after the self-closing second tag, hence stick with the intentional syntax error.)

Newish templates:

Check “Expand Widget Templates”

Search for data:post.body/ and below the </p> put

<b:if cond='data:post.url'>
<p><script type=’text/javascript’>submit_url=’<data:post.url/>’;</script>
<script src=’http://sphinn.com/evb/button.php’ type=’text/javascript’/></p>
</b:if>

(After saving the changes Blogger replaces some single quotes with HTML entities, but it works though. Most probably one could do that in a more elegant way, but once I saw the badges pointing to the correct URL –both in the posts and on the main page– I gave up.)

Have fun sphinning my posts!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google helps those who help themselves

And if that’s not enough to survive on Google’s SERPs, try Google’s Webmaster Forum where you can study Adam Lasnik’s FAQ which covers even questions the Webmaster Help Center provides no comprehensive answer for (yet), and where Googlers working in Google’s Search Quality, Webspam, and Webmaster Central teams hang out. Google dumps all sorts of questioners to the forum, where a crowd of hardcore volunteers (aka regulars as Google calls them) invests a lot of time to help out Webmasters and site owners facing problems with the almighty Google.

Despite the sporadic posts by Googlers, the backbone of Google’s Webmaster support channel is this crew of regulars from all around the globe. Google monitors the forum for input and trends, and intervenes when the periodic scandal escalates every once in a while. Apropos scandal … although the list of top posters mentions a few of the regulars, bear in mind that trolls come with a disgusting high posting cadency. Fortunately, currently the signal drowns the noise (again), and I appreciate very much that the Googlers participate more and more.

Some of the regulars like seo101 don’t reveal their URLs and stay anonymous. So here is an incomplete list of folks giving good advice:

If I’ve missed anyone, please drop me a line (I stole the list above from JLH and Red Cardinal, so it’s all their fault!).

So when you’re a Webmaster or site owner, don’t hesitate to post your Google related question (but read the FAQ before posting, and search for your topics), chances are one of these regulars or even a Googler offers assistance. Otherwise when you’re questionless carrying a swag of valuable answers, join the group and share your knowledge. Finally, when you’re a Googler, donate the sites linked above a boost on the SERPs ;)

Micro-meme started by John Honeck, supported by Richard Hearne, Bert Vierstra



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Now Powncing

John, thanks for the invite!Inspired by all the twits about pownce I submitted my email addy too. What a useless procedure. From inside there’s no list of submitted email addresses to pick friends from. Or I’m too blind to find that page.

Probably the best procedure to get rid of the 6 invites is to sell them at eBay. Perhaps Pownce releases 6 new invites then and I get rich quick. Wait … I’ve a better idea. Submit your honest review of this blog in the comments and send me the email addy for your invite. If your piece is funny or honest or vilifying enough to make me laugh I might invite you ;)

Ok, so what separates Pounce from Twitter and WS_FTP? Here are my first impressions.

Unfortunately, I will not see the ads, never. Hectic clicking on all links signed me up as a pro-member by accident. pro-crab Now Pownce blemishes my cute red crab with a “pro” label. I guess I got what I paid for. Paid? Yep, that’s the first difference, Pownce is not completely free. Spamming friends in 100 meg portions costs an annual fee of 20 bucks.

Next difference. There is no 140 bytes per message limit. Nice. And the “Send to” combo box is way more comfortable than the corresponding functionality at Twitter. I miss Twitter’s “command line options” like “d username” and “@username”. Sounds schizophrenic perhaps, but I’m just greedy.

I figured out how to follow someone without friending. Just add somebody as friend and (you don’t need to) wait for the decline, this makes you a fan of other users. You get their messages but not the other way round. Twitter’s “add as friend” and “follow user” is clearer I think.

Searching for the IM setup I learned there’s none. Pownce expert John said I’ve to try the desktop thingy but it looks like AIM 1999, so I refuse the download and stick with the Web interface until Pownce interacts with GTalk. The personal pounce page has a refresh link at least, but no auto-refresh like Twitter.

There’s no way to bookmark messages or threads yet, and the link to the particular messages is somewhat obfuscated. The “email a bug report” is a good replacement for a “beta” label. I guess I’ll use it to tell Pownce that I hate their link manipulation applying rel-nofollow crap. I’ll play with the other stuff later on, the daddy-cab is due at the kindergarden. Hopefully, when I return, there will be a Pownce badge available for this blog, I’ve plenty of white space left on my sidebar.


Back, still no badge, but I realized that I forgot to mention the FTP similarities. And there is no need to complete this post, since I found Tamar’s brilliant Twitter vs. Pownce article.

Update: How to post to Twitter and Pownce at the same time (a Twitterfeed work around, I didn’t test this configuration)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

LZZR Linking™

LZZR Link LoveIn why it is a good thing to link out loud LZZR explains a nicely designed method to accelerate the power of inbound links. Unfortunately this technique involves Yahoo! Pipes, which is evil. Certainly that’s a nice tool to compose feeds, but Yahoo! Pipes automatically inserts the evil nofollow crap. Hence using Pipes’ feed output to amplify links faults caused by the auto-nofollow. I’m sure LZZR can replace this component with ease, if that’s not done already.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Why eBay and Wikipedia rule Google’s SERPs

It’s hard to find an obscure search query like [artificial link] which doesn’t deliver eBay spam or a Wikipedia stub within the first few results at Google. Although both Wikipedia and eBay are large sites, the Web is huge, so two that different sites shouldn’t dominate the SERPs for that many topics. Hence it’s safe to say that many nicely ranked search results at Googledia, pulled from eBaydia, are plain artificial positioned non-results.

Curious why my beloved search engine fails so badly, I borrowed a Google-savvy spy from GHN and sent him to Mountain View to uncover the eBaydia ranking secrets. He came back with lots of pay-dirt scraped from DVDs in the safe of building 43. Before I sold Google’s ranking algo to Ask (the price Yahoo! and MSN offered was laughable), I figured out why Googledia prefers eBaydia from comments in the source code. Here is the unbelievable story of a miserable failure:

When Yahoo! launched Mindset, Larry Page and Sergey Brin threw chairs out of anger because Google wasn’t able to accomplish such a simple task. The engineers, eager to fulfill their founder’s wishes asap, tried to integrate mindset-functionality without changing Google’s fascinating simple search interface (that means without a shopping/research slider). Personalized search still lived in the labs, but provided a somewhat suitable API (mega beta): scanSearchersBrainForContext([search query]). Not knowing that this function of personalized search polls a nano-bugging-device (pre alpha) which Google had not yet released nor implemented into any searcher’s brain at this time, they made use of that piece of experimental code to evaluate the search query’s context. Since the method always returned “false”, though they had to deliver results quickly, they made up some return values to test their algo tweaks:

/* debug - praying S&L don't throw more chairs */
if (scanSearchersBrainForContext($searchQuery) === false) then {
$contextShopping = “%ebay%”;
$contextResearch = “%wikipedia%”;
$context = both($contextShopping, $contextResearch);
}
else {[pretty complex algo])

This worked fine and found its way into the ranking algo under time pressure. The result is that with each and every search query where a page from eBay and/or Wikipedia is in the raw result set, those get a ranking boost. Sergey was happy because eBay is generally listed on page #1, and Larry likes the Wikipedia results on the first SERP. Tell me why the heck should the engineers comment out these made up return values? No engineer on this planet likes flying chairs, especially not in his office.

PS: Some SEOs push Wikipedia stubs too.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Who is responsible for the paid link mess?

Look at this graph showing the number of [buy link] searches since 2004:

Interestingly this search term starts out in September or October 2004, and shows a quite stable trend until the recent paid links debate started.

Who or what caused SEOs to massively buy links since 2004?

  • The Playboy interview with Google cofounders Larry Page and Sergey Brin just before Google was about to go public?
  • Google’s IPO?
  • Rumors that Google ran out of index space and therefore might restrict the number of doorway pages in the search index?
  • Nick Wilson preparing the launch of Threadwatch?
  • AdWords and Overture no longer running gambling ads?
  • The Internet Advancement scandal?
  • Google’s shortage of beer at the SES Google dance?
  • A couple UK based SEOs invented bought organic rankings?

Seriously, buying links for rankings was an established practice way before 2004. If you know the answer, or if you’ve a somewhat plausible theory, leave it in the comments. I’m really curious. Thanks.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »