Archived posts from the 'Webmaster Central' Category

Google’s Anchor Text Reports Encourage Spamming and Scraping

Despite the title-bait Google’s new Webmaster Toy is an interesting tool, thanks! I’ll look at it from a Webmaster’s perspective, all SEO hats in the wardrobe.

Danny tells us more, for example the data source:

A few more details about the anchor text data. First, it comes only from external links to your site. Anchor text you use on your own site isn’t counted.
Second, links to any subdomains you have are NOT included in the data.

From a quick view I doubted that Google filters internal links properly. So I’ve checked a few sites and found internal anchor text leading the new anchor text stats. Well, it’s not unusual that folks copy and paste links including anchor text. Using page titles as anchor text is also business as usual. But why the heck do page titles and shortcuts from vertical menu bars appear to be the most prominent external anchor text? With large sites having tons of inbounds that seems not so easy to investigate.

Thus I’ve looked at a tiny site of mine, this blog. It carries a few recent bollocks posts which were so useless that nobody bothered linking, I thought. Well, partly that’s the case, there were no links by humans, but enough fully automated links to get Ms. Googlebot’s attention. Thanks to scrapers, RSS aggregators, forums linking the author’s recent blog posts and so on, everything on this planet gets linked, so duplicated post titles make it in the anchor text stats.

Back to larger sites I found out that scraper sites and indexed search results were responsible for the extremely misleading ordering of Google’s anchor text excerpts. Both page types should not get indexed in the first place, and it’s ridiculous that crappy data, respectively irrelevantly accumulated data, dilute a well meant effort to report inbound linkage to Webmasters.

Unweighted ordering of inbound anchor text by commonness and limiting the number of listed phrases to 100 makes this report utterly useless for many sites, and so much the worse it sends a strong but wrong signal. Importance in the eye of the beholder gets expressed by the top of an ordered list or result set.

Joe Webmaster tracking down the sources of his top-10 inbound links finds a shitload of low-life pages, thinks hey, linkage is after all a game of large numbers when Google says those links are most important, launches a gazillion of doorways and joins every link farm out there. Bummer. Next week Joe Webmaster pops up in the Google Forum telling the world that his innocent site got tanked because he followed Google’s suggestions and we’re bothered with just another huge but useless thread discussing whether scraper links can damage rankings or not, finally inventing the “buffy blonde girl pointy stick penalty” on page 128.

Roundtrip to hat rack, SEO hat attached. I perfectly understand that Google is not keen on disclosing trust/quality/reputation scores. I can read these stats because I understand the anatomy (intention, data, methods, context), and perhaps I can even get something useful out of them. Explaining this to an impatient site owner who unwillingly bought the link quality counts argument but still believes that nofollow’ed links carry some mysterious weight because they appear in reversed citation results is a completely other story.

Dear Google, if you really can’t hand out information on link weighting by ordering inbound anchor text by trust or other signs of importance, then can you please at least filter out all the useless crap? This shouldn’t be that hard to accomplish, since even simple site-search queries for “scraping” reveal tons of definitely not index-worthy pages which already do not pass any search engine love to the link destinations. Thanks in advance!

When I write “Dear Google”, can Vanessa hear me? Please :)

Tags: ()

Update: Check your anchor text stats page every now and then, don’t miss out on updates :)

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Getting Help and Answers from Google

For webmasters and publishers not having Googlers on their IM buddy list or in their email address book, Google has opened a communication channel for the masses. Google’s Webmaster Blog is open for webmaster comments, and Googlers answer crawling and indexing related questions in Google’s Webmaster Help Central. Due to the disadvantages of snowboarding participation of Googlers in the forum slowed down a bit lately, but things are going to evolve to the better as I’ve recognized.

As great as all these honest efforts to communicate with webmasters are, large user groups come with disadvantages like trolling and more noise than signal. So I’ve tried to find ways to make Google’s Webmaster Forums more useful. Since the Google Groups platform doesn’t offer RSS feeds for search results, I tried to track particular topics and authors as well with Google’s blog search. This experiment turned out to miserable failure.

Tracking discussions via web search is way to slow because time to index reaches a couple days, not minutes or hours like with blog search or news search. The RSS feeds provided contain all the noise and trolling I don’t want to see, they don’t even come with useful author tags, so I needed a simple and stupid procedure to filter RSS feeds with Google Reader. I thought I’d use Yahoo pipes to create the filters, and this worked just fine as long as I viewed the RSS output as source code or formated by Yahoo. Seems today is my miserable failure day: Google Reader told me my famous piped feeds contain zero items, no title, nor all the neat stuff I’ve seen seconds ago in the feed’s source. Aaaahhhrrrrgggg … I’m going back to track threads (missing lots of valuable post due to senseless thread titles or topic changes within threads) and profiles, for example Adam Lasnik (Google’s Search Evangelist), John Mueller (Softplus), Jonathan Simon (Google), Maile Ohye (Google), Thu Tu (Google), Vanessa Fox (Google) and Google is awesome, not perfect but still awesome. Seems my intention (constructive criticism) got obscured by my sometimes weird sense of humor and my preference for snaky irony and exaggeration to bring a point home.

Update July/05/2007: Google has fixed the broken RSS feeds.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google Blog Search Banned Legit Webmaster Forum

I’ve been able to get all sorts of non-blog stuff onto the SERPs of Google’s blog search in the past. However, my attempt to get contents hosted by Google into blog search is best described as miserable failure. Although Google Blog Search BETA delivers results from all kind of forums, it obviously can’t deal with threaded content from a source which recently got rid of its BETA stage.

First I’ve tried to ping blog search, submitted feeds, linked to threads from here and in a feed regulary fetched for blog search as well. No results. No robots.txt barriers or noindex tags, just badly malformed code but Google’s bot can eat not properly closed alternate links pointing to an RSS feed … drove me nuts. Must be a ban or at least a heavy troll-penalty I thought, went to Yahoo, masked the feed URLs, submitted again but no avail.

Try for yourself, submit a feed to Google Blog Search, then use a somewhat unique thread title and do a blog search. Got zilch too? Try a web search to double check that the content is crawlable. It is. Conclusion? Google banned its very own Google Groups.

Too sad, poor PageRank addicts running blog searches will miss out on tidbits like this quote from Google’s Adam Lasnik, asked why URLs blocked from crawlers show toolbar-PR:

As for the PR showing… it’s my understanding that the toolbar is using non-private info (PR data from other pages in that domain) to extrapolate/infer/guess a PR for that page :).

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Code Monkey Very Simple Man/Woman

Rcjordan over at Threadwatch pointed me to a nice song perfectly explaining romors like “Google’s verification tags get you into supplemental hell” and thoughtless SEO theories like “self-closing meta tags in HTML 4.x documents and uppercase element/attribute names in XHTML documents prevent search engine crawlers from indexing”. You don’t believe such crappy “advice” can make it to the headlines? Just wait for an appropiate thread at your preferred SEO forum picked by a popular but technically challenged blogger. This wacky hogwash is the most popular lame excuse for MSSA issues (aka “Google is broke coz my site sitting at top10 positions since the stone age disappeared all of a sudden”) at Google’s very own Webmaster Central.

Here is a quote:

“The robot [search engine crawler] HAS to read syntactically … And I opt for this explanation exactly because it makes sense to me [the code monkley] that robots have to be dilligent in crawling syntactically in order to do a good job of indexing … The old robots [Googlebot 2.x] did not actually parse syntactically - they sucked in all characters and sifted them into keywords - text but also tags and JS content if the syntax was broken, they didn’t discrimnate. Most websites were originally indexed that way. The new robots [Mozilla compatible Googlebot] parse with syntax in mind. If it’s badly broken (and improper closing of a tag in the head section of a non-xhtml dtd is badly broken), they stop or skip over everything else until they find their bearings again. With a broken head that happens the </html> tag or thereabouts”.

Basically this means that the crawler ignores the remaining code in HEAD or even hops to the end of the document not reading the page’s contents.

In reality search engine crawlers are pretty robust and fault tolerant, designed to eat and digest the worst code one can provide. These Google’s Sandbox“).

Just hire code monkeys for code monkey tasks, and SEOs for everything else ;)

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google’s cool robots.txt validator

What Andrey from Google’s Sitemaps team said: Stay tuned for more cool tools.

Google has just launched a robots.txt validator in the Sitemaps stats area, login and check it out! It’s really cool and saves a lot of time and hassles, more info here.

Also since yesterday you get a word analysis for your textual content and anchor text from inbound links, and you can see which of your pages had the highest PageRank for the last three months. Nice. Really nice.

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google’s Sitemaps Team Interviewed

Recently I had the great opportunity to interview the Google Sitemaps Team: Shiva Shivakumar who started the program in June 2006; Vanessa Fox, the extremely helpful spokeswoman who blogs for the Sitemaps team and assists Webmasters in the newsgroup; her coworkers Michael and Andrey from Google’s Kirkland office, Grace and Patrik from the branch in Zurich, and Shal from the Googleplex in Mountain View. Matt Cutts chimed in with some good advice too.

I want to thank those friendly Googlers for taking the time to contribute loads of great technical advice and extremely valuable information to the Google Sitemaps Knowledge Base.

Besides Sitemaps related information, the interview provides the ultimate answer to the endless “404 vs. 410″ debate, explains the URL removal tool Matt Cutts was talking about yesterday at Webmasterradio, provides hints to optimize dynamic Web sites … hey, just read it to find all the gems:

Google Sitemaps Team Interview

Consider bookmarking the Google Sitemaps Info Page and subscribing to the Sitemaps Feed to get alerted on future stuff like that.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

TGIF - Casting pearls before swines

Today I stumbled across a crowd in the Google Sitemaps Group posting long but pointless superhero-stories on how they have excluded Googlebot because Google is that slow in indexing and MSN and Yahoo are so much better.

You know that sort of uber-savvy marketing geniuses, don’t you? Someone told them that going to the Web where everything is free of charge and the search engines owe everybody and his dog instant rankings for any kind of crap they will get rich quick without spending a dime. Clueless journalists and zillions of scrupulous do-it-yourself sites powered by AdSense have mutated many serious and reasonable people to greedy and agressive freeloaders.

TGIF, so I did it again:

Oh well, you guys are funny. Just because you did not manage to get your stuff into Google’s search index that doesn’t mean nobody can manage it. Actually, ignoring 60% or even more of all referrers on the net is plain silly.

You cannot compare the major engines. They have very different ideas, and not the same data. A new engine like MSN lacks historical linkage data for example, and therefore MSN has to weight on-the-page factors more than Google. That’s a flaw, not a goodie. The same goes for Yahoo to some degree, although they have three predecessors (ATW, AV, INK), but those engines lacked a sophisticated link analysis.

Psst… here is a secret: to rank high at Google a new site must come with a solid marketing strategy. There is no such thing as technical tricks and SEO magic to get a site indexed and ranked. Plain diversified marketing, targeting every channel out there, will do the trick. Even word of mouth will boost your Google rankings.

“Fact is other search engines make it easier to submit a site and provide better results for unit of time and money invested in terms of traffic and in terms of visibility.”

Yes. Other engines are easier to spam because their quality control fails way too often.
“Better results” from a site owner’s perspective have nothing to do with relevant results provided to search engine users.

“In addition, new content needs to be available much more quickly than is the ‘norm’ for this process to meet market [read: customer] demands.”

True, Google does exactly that. Google’s news and feed search delivers new stuff within minutes on the SERPs, and Google’s Web search is pretty current when it comes to search results from trustworthy authority sites. If you don’t operate such an established Web resource, see above how to change your site’s status.

Your problems are

1. Ignorance
If you would have read the Sitemaps home page and other information publically available on Google’s site, you would be able to fund and manage your expectations properly. You say Google owes you free traffic in return for the sitemaps submission. That’s a misinterpretation.

2. Laziness
You’ve done not even half of your job. If you want to be successful on the Web, then you have to learn how things work. That includes studying all major search engines as well as every other source of traffic. On the long haul MSN and Yahoo will dump your pages if you don’t improve your stuff. They get better every week, and in a while they will have reached Google’s relevancy and search quality as of today.

3. Lack of patience and strategic thinking
Gazillions of sites online for many years do very well with Google. These sites have created a stable base of converting traffic coming from many sources, including search engines, over years, and that’s hard work based on reasonable well thought out strategies, and diligence. Why should you arrogant noobs pop up and get rich quick?

4. Arrogance
You guys think your stuff is indexworthy, search engine users –potential customers!– may respectfully disagree. Search engines, and that means all of them, are designed to find valuable and interesting content for their users. Since the first days of AltaVista they have improved their technology, and today all major players deliver pretty good results, w.r.t. to search query relevancy and timely information as well.

The engines handle huge amounts of junk and spam quite fine, but spammers have changed the game to a great degree. That means nowadays a site must gain reputation before the search engines honor great content or outstanding offers with free organic traffic. The engines weight reputation differently in their ranking algos, but reputation is an important ranking factor across the boards. You can’t gain reputation over night. Not on the Web, and not elsewhere.

Look at the first experimental MSN results a while back, they were overwhelmed with junk and spam. Look at MSN search results now, they’re much better, but MSN still indexes way more questionable stuff than Yahoo and Google. They’ve learned to factor in reputation. They still learn to do a better job, they have deep pockets and their engineers are smart.

If you don’t promote your Web presence properly, your MSN traffic will decrease, your Yahoo traffic will decrease, and Google will send zilch because you impatient and arrogant kids keep Googlebot out. Reminds me of my 3yo daughter crying “that is soooo unfair!” when she doesn’t get a candy 5 minutes before dinner.

There is a life without organic SE traffic, actually lots of sites do very well with bought traffic, but it is thoughtless and stupid to cut off any potential source of traffic.

Well, I should have known better, explanations like that are a waste of time. Here is the reply I got from the guy who bragged like a marketing guru specialized in search marketing research before:

What you now say passes my poor powers of comprehension; it may be all very true, but I can’t understand it, and I refrain from any expression of opinion on it.

Will I ever learn to avoid useless efforts? FWIW, feel free to take my ‘pearls’ to cast before other swines.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

Google’s New Site Stats: more than a sitemaps byproduct

Google’s new sitemap stats come with useful stuff, full coverage of the updates here. You get crawl stats and detailed error reports, even popularity statistics like the top 5 search queries directing traffic to your site, real PageRank distribution, and more.

The most important thing in my opinion is, that Google has created a toolset for Webmasters by listening to the Webmaster’s needs. Most of the pretty neat new stats are answers to questions and requests from Webmasters, collected from direct feedback and the user groups. I still have wishes, but I like the prototyping approach, so I can’t wait for the next version of knowledge is power.

Tags: ()

Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments

« Previous Page  1 | 2 | 3