Archived posts from the 'Google' Category

How Google’s Web Spam Team finds your link scheme

Natural Search Blog has a nice piece reporting that Matt’s team makes use of a proprietary tool to identify webspam trying to manipulate Google’s PageRank.

Ever wondered why Google catches PR-boosting services scams in no time?

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Getting Help and Answers from Google

For webmasters and publishers not having Googlers on their IM buddy list or in their email address book, Google has opened a communication channel for the masses. Google’s Webmaster Blog is open for webmaster comments, and Googlers answer crawling and indexing related questions in Google’s Webmaster Help Central. Due to the disadvantages of snowboarding participation of Googlers in the forum slowed down a bit lately, but things are going to evolve to the better as I’ve recognized.

As great as all these honest efforts to communicate with webmasters are, large user groups come with disadvantages like trolling and more noise than signal. So I’ve tried to find ways to make Google’s Webmaster Forums more useful. Since the Google Groups platform doesn’t offer RSS feeds for search results, I tried to track particular topics and authors as well with Google’s blog search. This experiment turned out to miserable failure.

Tracking discussions via web search is way to slow because time to index reaches a couple days, not minutes or hours like with blog search or news search. The RSS feeds provided contain all the noise and trolling I don’t want to see, they don’t even come with useful author tags, so I needed a simple and stupid procedure to filter RSS feeds with Google Reader. I thought I’d use Yahoo pipes to create the filters, and this worked just fine as long as I viewed the RSS output as source code or formated by Yahoo. Seems today is my miserable failure day: Google Reader told me my famous piped feeds contain zero items, no title, nor all the neat stuff I’ve seen seconds ago in the feed’s source. Aaaahhhrrrrgggg … I’m going back to track threads (missing lots of valuable post due to senseless thread titles or topic changes within threads) and profiles, for example Adam Lasnik (Google’s Search Evangelist), John Mueller (Softplus), Jonathan Simon (Google), Maile Ohye (Google), Thu Tu (Google), Vanessa Fox (Google) and Google is awesome, not perfect but still awesome. Seems my intention (constructive criticism) got obscured by my sometimes weird sense of humor and my preference for snaky irony and exaggeration to bring a point home.

Update July/05/2007: Google has fixed the broken RSS feeds.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google Blog Search Banned Legit Webmaster Forum

I’ve been able to get all sorts of non-blog stuff onto the SERPs of Google’s blog search in the past. However, my attempt to get contents hosted by Google into blog search is best described as miserable failure. Although Google Blog Search BETA delivers results from all kind of forums, it obviously can’t deal with threaded content from a source which recently got rid of its BETA stage.

First I’ve tried to ping blog search, submitted feeds, linked to threads from here and in a feed regulary fetched for blog search as well. No results. No robots.txt barriers or noindex tags, just badly malformed code but Google’s bot can eat not properly closed alternate links pointing to an RSS feed … drove me nuts. Must be a ban or at least a heavy troll-penalty I thought, went to Yahoo, masked the feed URLs, submitted again but no avail.

Try for yourself, submit a feed to Google Blog Search, then use a somewhat unique thread title and do a blog search. Got zilch too? Try a web search to double check that the content is crawlable. It is. Conclusion? Google banned its very own Google Groups.

Too sad, poor PageRank addicts running blog searches will miss out on tidbits like this quote from Google’s Adam Lasnik, asked why URLs blocked from crawlers show toolbar-PR:

As for the PR showing… it’s my understanding that the toolbar is using non-private info (PR data from other pages in that domain) to extrapolate/infer/guess a PR for that page :).

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google going to revamp the rel=nofollow microformat?

I’ve asked Adam Lasnik, Google’s search evangelist:

Adam, what is Google’s take on extending the nofollow functionality by working out a microformat that covers the existing mechanism w/o being that unclear and confusing, and which takes care of similar needs like section targeting on element level and qualified votes as well?

and he answered

Sebastian, nothing’s set in stone. Stuff is likely to evolve :)

That’s an elating signal, thank you Adam. And it leads to a bunch of questions.

Will Google continue to cook nofollow in its secret sauce, revealing morphed semantics (affiliate links), unpopular areas of application (paid links) and changed functionality (no longer fetching the linked resource) every now and then? From my interpretation of Google’s ongoing move to candidness I guess not.

Will Google gather a couple search companies to work out a new standard? I hope not, it would be a mistake not to involve content providers, webmasters, publishers, CMS vendors, even SEOs and opinion makers again.

Will Google ask for input? Will the process of defining a standard for micro crawler directives be an open and public discussion? Are we talking about an extended microformat, limited to the A element’s rel and rev attributes, or does Google think of a broader approach covering for example section targeting and other crawler directives in class attributes on block level too? Will a new or more powerful interfere other norms like , , , or drafts like the not yet that comprehensive microformat (also badly named because it covers inclusion too)? By the way, the links above lead you to interesting thoughts on reach, functionality and implementation of an extended norm replacing nofollow, and I, like many of you, have a couple more ideas and concepts in mind.

I take Adam’s tidbit as call for participation. Dear no-to-nofollow-sayers and nofollow-supporters out there, join the crowd at the white board! Throw in your thoughts, concepts, wishes and ideas.

In the meantime make use of this catalogue of do-follow plugins.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Say No to NoFollow Follow-up

Say NO to NOFOLLOW - copyright jlh-design.comI don’t want to make this the nofollow-blog, but since more and more good folks don’t love the nofollow-beast any more, here is a follow-up on the recent nofollow discussion. Follow the no-to-nofollow trend here:

Loren Baker posts 13 very good reasons why rel=nofollow sucks. He got dugg, buried, but tons of responses in the comments, where most people state that rel=nofollow was a failure with regard to the current amount of comment spam, because the spammers spam for traffic, not link love. Well, that’s true, but rel=nofollow at least nullifies the impact spamming of unmoderated blogs had on search results, says Google. Good point, but is it fair to penalize honest comment authors by nofollow’ing their relevant links by default? Not really. The search engines should work harder on solving this problem algorithmically, and CMS vendors should go back to the white board to develop a reasonable solution. Matt Mullenweg from WordPress admits that “in hindsight, I don’t think nofollow had much of an effect [in fighting comment spam]”, and I hope this insight triggers a well thought out workflow replacing the unethical nofollow-by-default (see follow you, follow me).

At Google’s Webmaster Help Center regular posters nag Googlers with questions like Is rel=nofollow becoming the norm? Google’s search evangelist Adam Lasnik stepped in and states “As you might have noticed, many of the world’s most successful sites link liberally to other sites, and this sort of thing is often appreciated by and rewarded by visitors. And if you’re editorially linking to sites you can personally vouch for, I can’t see a reason to no-follow those.” and “On the whole [nofollow thingie], while Matt’s been pretty forthcoming and descriptive, I do think we Googlers on the whole can do a better job in explaining and justifying nofollow“. Thanks Adam, while explaining Google’s take on rel=nofollow to the great unwashed, why not start a major clean-up to extend this microformat and to make it useful, useable and less confusing for the masses?

While waiting for actions promised by the nofollow inventor, here is a good summary of nofollow clarifications by Googlers. I’ve a ton of respect for Matt, I know he listens and picks reasonable arguments even from negative posts, so stay tuned (I do hope my tiny revamp-nofollow campaign is not seen as negative press by the way).

A very good starting point to examine the destructive impact rel=nofollow had, has, and will have if not revamped, is Carsten Cumbrowski’s essay explaining why rel=nofollow leverages mistrust among people. I do not provide quotes because I want you all to read and reread this great article.

Robert Scoble rethinking his nofollow support says “I was wrong about “NoFollow” … I’m very concerned, for instance, about Wikipedia’s use of nofollow“. Scroll down, don’t miss out on the comments.

Michael Gray’s strong statement Google’s policy on No follow and reviews is hypocritical and wrong is worth a read, he’s backing his point of view providing a complete nofollow-history along with many quotes and nofollow-tidbits.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

The Nofollow-Universe of Black Holes

I pretty much dislike the rel=nofollow fiasco for various reasons, especially its ongoing semantic morphing and often unethical implementation. Recently I wrote about nofollow-confusion and beginning nofollow-insane. Meanwhile the nofollow-debacle went a major step forwards: bloggers fight huge black holes (the completely link-condomized Wikipedia) with many tiny black holes (plug-ins castrating links leading to Wikipedia).

Folks, do you realize that actually you’ve joined the nofollow-nightmare you’re ranting about? Instead of trying to change things with constructive criticism addressing nofollow-supporters, you take the Old Testament approach, escalating an IMHO still remediable aberration. This senseless attitude supports the hapless nofollow-mechanism by the way. You’re acting like defiant kids crying “nofollow is sooooo unfair” while you strike back with tactical weapons unsuitable to solve the nofollow-problem. Devaluing Wikipedia links because Wikipedia is de facto an untrusted source of information OTOH makes sound sense, although semantically rel=nofollow is not the right way to go in this case.

I understand that losing the (imputed!) link juice of a couple Wikipedia links is not nice. However, I don’t buy that these links were boosting SE rankings in the first place –although a few sites having only Wikipedia inbound links drop out of the SERPs currently–, their real value is extremely well targeted traffic, and these links are still clickable.

I agree that Wikipedia’s decision to link-condomize all outbound links is a thoughtless, lazy, and pretty insufficient try to fight vandalizing link droppers. It is even “unfair”, because the black hole Wikipedia now sucks the whole Web’s link juice while giving nothing (except nicely targeted traffic) in return. But I must admit that there were not that many options, since there are no search engine crawler directives on link level providing the granularity Wikipedia probably needs.

Lets imagine the hapless nofollow value of the REL attribute would not exist. In this scenario Wikipedia could implement 4-eyes link tagging as follows:
1. New outgoing links would get tagged rel=”unapproved”. Search engines would not count a vote for the link destination, but follow the link.
2. Later on, when a couple trusted users and/or admins have approved the link, “unapproved” would get removed forever (URL and REL values stored in combination with the article’s URL to automatically reinstate the link’s stage on edits where a link gets removed, added, removed and added again…). So far that would even work with the misguiding “nofollow” value, but an extended microformat would allow meaningful followup-tags like “example”, “source”, “inventor”, “norm”, “worstenemy”, “hownotto” or whatever.

Instead of ranting and vandalizing links we should begin to establish a RFC on crawler directives on HTML element level. That would be a really productive approach.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Dear search engines, please bury the rel=nofollow-fiasko

The misuse of the rel=nofollow initiative is getting out of control. Invented to fight comment spam, nowadays it is applied to commercial links, biased editorial links, navigational links, links to worst enemies (funny example: Matt Cutts links to a SEO-Blackhat with rel=nofollow) and whatever else. Gazillions of publishers and site owners add it to their links for the wrong reasons, simply because they don’t understand its intention, its mechanism, and especially not the ongoing morphing of its semantics. Even professional webmasters and search engine experts have a hard time to follow the nofollow-beast semantically. As more its initial usage gets diluted, as more folks suspect search engines cook their secret sauce with indigestibly nofollow-ingredients.

Not only rel=nofollow wasn’t able to stop blog-spam-bots, it came with a build-in flaw: confusion.

Good news is that currently the nofollow-debate gets stoked again. Threadwatch hosts a thread titled Nofollow’s Historical Changes and Associated Hypocrisy, folks are ranting on the questionable Wikipedia decision to nofollow all outbound links, Google video folks manipulated the PageRank algo by plastering most of their links with rel=nofollow by mistake, and even Yahoo’s top gun Jeremy Zawodny is not that happy with the nofollow-debacle for a while now.

Say NO to NOFOLLOW - copyright jlh-design.comI say that it is possible to replace the unsuccessful nofollow-mechanism with an understandable and reasonable functionality to allow search engine crawler directives on link level. It can be done although there are shitloads of rel=nofollow links out there. Here is why, and how:

The value “nofollow” in the link’s REL attribute creates misunderstandings, recently even in the inventor’s company, because it is, hmmm, hapless.

In fact, back then it meant “passnoreputation” and nothing more. That is search engines shall follow those links, and they shall index the destination page, and they shall show those links in reversed citation results. They just must not pass any reputation or topical relevancy with that link.

There were micro formats better suitable to achieve the goal, for example Technorati’s votelinks, but unfortunately the united search geeks have chosen a value adapted from the robots exclusion standard, which is plain misleading because it has absolutely nothing to do with its (intended) core functionality.

I can think of cases where a real nofollow-directive for spiders on link level makes perfect sense. It could tell the spider not to fetch a particular link destination, even if the page’s robots tag says “follow”, for example printer friendly pages. I’d use an “ignore this link” directive for example in crawlable horizontal popup menus to avoid theme dilution when every page of a section (or site) links to every other page. Actually, there is more need for spider directives on HTML element level, not only in links, for example to tag templated and/or navigational page areas like with Google’s section targeting.

There is nothing wrong with a mechanism to neutralize links in user input. Just the value “nofollow” in the type-of-forward-relationship attribute is not suitable to label unchecked or not (yet) trusted links. If it is really necessary to adopt a well known value from the robots exclusion standard (and don’t misunderstand me, reusing familiar terms in the right context is a good idea in general), the “noindex” value would have been be a better choice (although not perfect). “Noindex” describes way better what happens in a SE ranking algo: it doesn’t index (in its technical meaning) a vote for the target. Period.

It is not too late to replace the rel=nofollow-fiasco with a better solution which could take care of some similar use cases too. Folks at Technorati, the W3C and whereever have done the initial work already, so it’s just a tiny task left: extending an existing norm to enable a reasonable granularity of crawler directives on link level, or better for HTML elements at all. Rel=nofollow would get deprecated, replaced by suitable and standardized values, and for a couple years the engines could interpret rel=nofollow in its primordial meaning.

Since the rel=nofollow thingy exists, it has confused gazillions of non-geeky site owners, publishers and editors on the net. Last year I’ve got a new client who added rel=nofollow to all his internal links because he saw nofollowed links on a popular and well ranked site in his industry and thought rel=nofollow could perhaps improve his own rankings. That’s just one example of many where I’ve seen intended as well as mistakenly misuse of the way too geeky nofollow-value. As Jill Whalen points out to Matt Cutts, that’s just the beginning of net-wide nofollow-insane.

Ok, we’ve learned that the “nofollow” value is a notional monster, so can we please have it removed from the search engine algos in favour of a well thought out solution, preferably asap? Thanks.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google’s cool robots.txt validator

What Andrey from Google’s Sitemaps team said: Stay tuned for more cool tools.

Google has just launched a robots.txt validator in the Sitemaps stats area, login and check it out! It’s really cool and saves a lot of time and hassles, more info here.

Also since yesterday you get a word analysis for your textual content and anchor text from inbound links, and you can see which of your pages had the highest PageRank for the last three months. Nice. Really nice.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google’s Sitemaps Team Interviewed

Recently I had the great opportunity to interview the Google Sitemaps Team: Shiva Shivakumar who started the program in June 2006; Vanessa Fox, the extremely helpful spokeswoman who blogs for the Sitemaps team and assists Webmasters in the newsgroup; her coworkers Michael and Andrey from Google’s Kirkland office, Grace and Patrik from the branch in Zurich, and Shal from the Googleplex in Mountain View. Matt Cutts chimed in with some good advice too.

I want to thank those friendly Googlers for taking the time to contribute loads of great technical advice and extremely valuable information to the Google Sitemaps Knowledge Base.

Besides Sitemaps related information, the interview provides the ultimate answer to the endless “404 vs. 410″ debate, explains the URL removal tool Matt Cutts was talking about yesterday at Webmasterradio, provides hints to optimize dynamic Web sites … hey, just read it to find all the gems:

Google Sitemaps Team Interview

Consider bookmarking the Google Sitemaps Info Page and subscribing to the Sitemaps Feed to get alerted on future stuff like that.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Reporting spam to Google is a nightmare

Well, every now and then I do report a spammer, for example when I find my or a fellow SEOs articles on an AdSense driven Blogger-Splog, heavily and pretty much unrelated advertised in Google Groups and Google Base via a Google Mail account, and probably even visible on organic SERPs.

Google’s network of services allows spammers to make a living from Google alone, but on the other hand it doesn’t provide the victims with a consolidated spam report form.

Why can’t Google create an enterprise wide tool to report abuse?

[Omitted: the detailed description on how to report a spamming freeloader and plagiarist to all abused Google services in less than 2.5 hours]

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13  Next Page »