Archived posts from the 'Google' Category

Categorizing posts with blogger (rant)

Google knows everything about AJAX. Why the heck can’t I assign categories to old posts without hassles? “Edit posts - change number of listed posts - scoll down - edit - scoll down - choose/enter categories - publish - repeat” is just 7 full page reloads/actions too much. On a slow DSL connection this archaic procedure drives me nuts.

Dear readers, when you click on “Labels” most probably you won’t find related posts :( I’m adding categories when I update an old post, but UI flaws hinder me to categorize the whole archive. Sorry.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

How Google & Yahoo handle the link condom

Loren Baker over at SEJ got a few official statements on use and abuse of the rel-nofollow microformat by the major players: How Google, Yahoo & Ask treat NoFollow’ed links. Great job, thanks!

Ask doesn’t “officially” support nofollow, whatever that means. Loren didn’t ask MSN, probably because he didn’t expect that they’ve even noticed that they officially support nofollow since 2005, same procedure with sitemaps by the way. Yahoo implemented it along the specs, and Google stepped way over the line the norm sets. So here is the difference:

1. Do you follow a nofollow’ed link?
Google: No (longer)
Yahoo: Yes

2. Do you index the linked page following a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: Yes

3. Does your ranking algos factor in reputation, anchor/alt/title text or whichever link love sourced from a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: No

4. Do you show nofollow’ed links in reverse citation results?
Google: Yes (in link: searches by accident, in Webmaster Central if the source page didn’t make it into the supplemental index)
Yahoo: Yes (Site Explorer)

Q&A#4 is made up but accurate. I think it’s safe to assume that MSN handles the link condom like Yahoo. (Update: As Loren clarifies in the comments, he asked MSN search but they didn’t answer in a timely fashion.)

And here’s a remarkable statement from Google’s search evangelist Adam Lasnik, who may like nofollow or not:

On a related note, though, and echoing Matt’s earlier sentiments … we hope and expect that more and more sites — including Wikipedia — will adopt a less-absolute approach to no-follow … expiring no-follows, not applying no-follows to trusted contributors, and so on.

Bravo!

Related link: rel=”nofollow” Google, Yahoo and MSN



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Erol to ship a Patch Fixing Google Troubles

Background: read these four posts on Google penalizing respectively deindexing e-commerce sites. Long story short: Recently Google’s enhanced algos began to deindex e-commerce sites powered by Erol’s shopping cart software. The shopping cart maintains a static HTML file which redirects user agents executing JavaScript to another URL. This happens with each and every page, so it’s quite understandable that Ms. Googlebot was not amused. I got involved as a few worried store owners asked for help in Google’s Webmaster Forum. After lots of threads and posts on the subject Erol’s managing director got in touch with me and we agreed to team up to find a solution to help the store owners suffering from a huge traffic loss. Here’s my report of the first technical round.

Understanding how Erol 4.x (and all prior versions) works:

The software generates a HTML page offline, which functions as an XML-like content source (called “x-page”, I use that term because all Erol customers are familar with it). The “x-page” gets uploaded to the server and is crawlable, but not really viewable. Requested by a robot it responds with 200-Ok. Requested by a human, it does a JavaScript redirect to a complex frameset, which loads the “x-page” and visualizes its contents. It responds to browsers if directly called, but returns a 404-NotFound error to robots. Example:

“x-page”: x999.html
Frameset: erol.html#999×0&&

To view the source of the “x-page” disable JavaScript before you click the link.

Understanding how search engines handle Erol’s pages:

There are two major weak points with regard to crawling and indexing. The crawlable page redirects, and the destination does not exist if requested by a crawler. This leads to these scenarios:

  1. A search engine ignoring JavaScript on crawled pages fetches the “x-page” and indexes it. That’s the default behavior of yesterdays crawlers, and still works this way at several search engines.
  2. A search engine not executing JavaScript on crawled pages fetches the “x-page”, analyzes the client sided script, and discovers the redirect (please note that a search engine crawler may change its behavior, so this can happen all of a sudden to properly indexed pages!). Possible consequences:
    • It tries to fetch the destination, gets the 404 response multiple times, and deindexes the “x-page” eventually. That would mean that depending on the crawling frequency and depth per domain the pages disappear quite fast or rather slow until the last page is phased out. Google would keep a copy in the supplemental index for a while, but this listing cannot return to the main index.
    • It’s trained to consider the unconditional JavaScript redirect “sneaky” and flags the URL accordingly. This can result in temporarily and permanent deindexing as well.
  3. A search engine executing JavaScript on crawled pages fetches the “x-page”, performs the redirect (thus ignores the contents of the “x-page”), and renders the frameset for indexing. Chances are it gives up on the complexity of the nested frames, indexes the noframe-tag of the frameset and perhaps a few snippets from subframes, considers the whole conglomerate thin, hence assignes the lowest possible priority for the query engine and moves on.

Unfortunately the search engine delivering the most traffic began to improve its crawling and indexing, hence many sites formerly receiving a fair amount of Google traffic began to suffer from scenario 2 — deindexing.

Outlining a possible work around to get the deleted pages back in the search index:

In six months or so Erol will ship version 5 of its shopping cart, and this software dumps frames, JavaScript redirects and ugly stuff like that in favor of clean XHTML and CSS. By the way, Erol has asked me for my input on their new version, so you can bet it will be search engine friendly. So what can we do in the meantime to help legions of store owners running version 4 and below?

We’ve got the static “x-page” which should not get indexed because it redirects, and which cannot be changed to serve the contents itself. The frameset cannot be indexed because it doesn’t exist for robots, and even if a crawler could eat it, we don’t consider it easy to digest spider fodder.

Let’s look at Google’s guidelines, which are the strictest around, thus applicable for other engines as well:

  1. Don’t […] present different content to search engines than you display to users, which is commonly referred to as “cloaking.”
  2. Don’t employ cloaking or sneaky redirects.

If we find a way to suppress the JavaScript code on the “x-page” when a crawler requests it, the now more sophisticated crawlers will handle the “x-page” like their predecessors, that is they would fetch the “x-pages” and hand them over to the indexer without vicious remarks. Serving identical content under different URLs to users and crawlers does not contradict the first prescript. And we’d comply to the second rule, because loading a frameset for human vistors but not for crawlers is definitely not sneaky.

Ok, now how to tell the static page that it has to behave dynamically, that is outputting different contents server sided depending on the user agent’s name? Well, Erol’s desktop software which generates the HTML can easily insert PHP tags too. The browser would not render those on a local machine, but who cares when it works after the upload on the server. Here’s the procedure for Apache servers:

In the root’s .htaccess file we enable PHP parsing of .html files:
AddType application/x-httpd-php .html

Next we create a PHP include file xinc.php which prevents crawlers from reading the offending JavaScript code:
<?php
$crawlerUAs = array(”Googlebot”, “Slurp”, “MSNbot”, “teoma”, “Scooter”, “Mercator”, “FAST”);
$isSpider = FALSE;
$userAgent = getenv(”HTTP_USER_AGENT”);
foreach ($crawlerUAs as $crawlerUA) {
if (stristr($userAgent, $crawlerUA)) $isSpider = TRUE;
}
if (!$isSpider) {
print “<script type=\”text/javascript\”> [a whole bunch of JS code] </script>\n”;
}
if ($isSpider) {
print “<!– Dear search engine staff: we’ve suppressed the JavaScript code redirecting browsers to “erol.html”, that’s a frameset serving this page’s contents more pleasant for human eyes. –>\n”;
}
?>

Erol’s HTML generator now puts <?php @include(”x.php”); ?> instead of a whole bunch of JavaScript code.

The implementation for other environments is quite similar. If PHP is not available we can do it with SSI and PERL. On Windows we can tell IIS to process all .html extensions as ASP (App Mappings) and use an ASP include. That would give three versions of that patch which should help 99% of all Erol customers until they can upgrade to version 5.

This solution comes with two disadvantages. First, the cached page copies, clickable from the SERPs and toolbars, would render pretty ugly because they lack the JavaScript code. Second, perhaps automated tools searching for deceitful cloaking might red-flag the URLs for a human review. Hopefully the search engine executioner reading the comment in the source code will be fine with it and give it a go. If not, there’s still the reinclusion request. I think store owners can live with that when they get their Google traffic back.

Rolling out the patch:

Erol thinks the above said makes sense and there is a chance of implementing it soon. While the developers are at work, please provide feedback if you think we didn’t interpret Google’s Webmaster Guidelines strict enough. Keep in mind that this is an interim solution and that the new version will handle things more standardized. Thanks.

Paid-Links-Disclosure: I do this pro bono job for the sake of the suffering store owners. Hence the links pointing to Erol and Erol’s customers are not nofollow’ed. Not that I’d nofollow them otherwise ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

More anchor text analysis from Webmaster Central

If you didn’t spot my update posted a few hours ago, log in to Webmaster Central and view your anchor text stats. Find way more phrases and play with the variations, these should allow you to track down sources by quoted search queries. Also, the word-stats are back.
Have fun!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Ultimately: Watch out for Google’s URL terminator

I hate it to recycle news, but I just fell in love with this neat URL terminator. Unfortunately there’s no button to remove SPAM, so I still have to outrank competitors, but besides that ‘flaw’ it’s a perfect and user friendly tool covering all my needs. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Where is the precise definition of a paid link?

Good questions:

How many consultants provide links through to the companies they work for?
I do.

How many software firms provide links through to their major corporate clients?
Not my company. Never going to happen.

If you make a donation to someone, and they decide to give you a link back, is that a paid link?
Nope.

If you are a consultant, and are paid to analyse a company, but to make the findings known publicly, are you supposed to stick nofollow on all the links?
Nope.

If you are a VC or Angel investor, should you have to use NoFollow linking through to companies in your investment portfolio?
Nope.

Are developers working on an open-source project allowed a link back to their sites (cough Wordpress) Yep, and then use that link equity to dominate search engines on whatever topic they please?
Hmmmm, if it really works that way, why not?

If you are a blog network, or large internet content producer, is it gaming Google to have links to your sister sites, whether there is a direct financial connection or not?
Makes business sense, so why should those links get condomized? Probably a question of quantity. No visitor would follow a gazillion of links to blogs handling all sorts of topis the yellow pages have categories for.

Should a not for profit organisation link through to their paid members with a live link?
Sure, perfectly discloses relationships and their character.

A large number of Wordpress developers have paid links on their personal sites, as do theme and plugin developers.
What’s wrong with that? Maybe questionable (in the sense of useless) on every page, but perfectly valid on home page, about page and so on if disclosed. As for ads, that sort of paid links is valid on every page - nofollow’ing ads just avoids misunderstandings.

If you write a blog post, thanking your sponsors, should you use nofollow?
Yep.

Some people give away prizes for links, or offer some kind of reciprocation.
If the awards are honest and truly editorial, linking back is just good practice.

If you are an expert in a particular field, and someone asks you to write a review of their site, and the type of review you write means that writing that content might take 10 hours of your time to do due diligence, is it wrong to accept some kind of monetary contribution? Just time and material?
In such a situation, why would you be forced to use nofollow on all links to the site being reviewed?
Disclosing the received expense allowance there’s nothing wrong with uncondomized links.

Imagine someone created a commercial Wikipedia, and paid $5 for every link made to it.
Don’t link. The link would be worth more than five bucks and the risks involved can cost way more than five bucks.

Where is the precise definition of a paid link?
Now that’s the best question at all!

Disclaimer: Yes/No answers are kinda worthless without a precisely defined context. Thus please read the comments.

Related thoughts: Should Paid Links Influence Organic Rankings? by Mark Jackson at SEW
Paid Link Schemes Inside Original Content by Brian White, also read Matt’s updated post on paid links.

Update: Google’s definition of paid links and other disliked linkage considered “linkspam”



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Revise your linkage now

Google’s take on paid links obviously is there to stay, and frankly, old news were heating the minds last weekend. There’s no sign at the horizon that Google revises the policy, quite the opposite is true. Google has new algos in development which are supposed to detect all sorts of paid linkage better than before. For many that’s bad news, however I guess that rants are a waste of valuable time needed to revise actual and previous link attitudes.

I don’t think that Googlebot has learned to read text like “Buy a totally undetectable PR8 link now for as low as 2,000$ and get four PR6 links for free!” on images. Sites operating that obvious link selling businesses leave other –detectable– footprints, and their operators are aware of the risks involved. Buyers are rather save, they just waste a lot of money because purchased links most probably don’t carry link love and nofollow’ed links may be way cheaper.

I often stumble upon cases where old and forgotten links create issues over time. Things that have worked perfectly in the past can bury a site on todays SERPs. The crucial message is be careful who you link out to! Compile a list of all your outbound links and check them carefully before Google’s newest link analysis goes life. Especially look at the ancient stuff.

Update: I’ve just spotted Eric Wards article on forensic inbound link analysis, you should read that and his comments on outgoing links too.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Full Disclosure of Paid Links

Since I’ve moved the blog from blogspot.com to this place, this post makes not much sense anymore. I’ve dispensed all the stuff from my old and ugly sidebar at blogspot over a couple pages here, not much of it is still on the sidebar.

Following Matt’s advice on paid links I’ve looked at this blog to reveal sneaky commercial links, although nobody really likes this idea.

I’m pretty sure that I never got paid for posting, so there was just the sidebar to check. I found a couple of links leading to articles I wrote with both educational and commercial intent as well. I consider these valuable resources so there’s no need to report or nofollow them.

Next in the “What I read” section I didn’t find a suitable procedure to report that “Books, tons of books” includes commercial stuff like database manuals and other publications with a clearly commercial message. I paid for all these books … sigh.

Ok, next the blogroll. Again, all links point to good resources, nothing to report. Under the search box there’s a link to Technorati which I can’t nofollow because it’s put by Technorati’s script. Technorati sends me traffic, I use Technorati for research, so probably this link is fine and counts as honest recommendation, although it functions as a traffic deal too.

Checking the “Links and Folks” section I found a not that related link pointing to bikes for sale at OCC. Well, I really like OCC bikes, and this is my personal blog, so why shouldn’t I link out to a resource unrelated to search and Web development? Hmmmm … perhaps I should ask Google for permission to dofollow this somewhat commercial link before I receive a free bike in return.

Next the “Ads by Google” links are fine, because they’re put client sided and even the Googlebot executing JavaScript knows that everything in a block of code containing an AdSense publisher code is auto-nofollow’ed by definition.

Both the MBL widget as well as the Twitter badge are put client sided, plus both were free of commercial links, at least last time I looked. End of sidebar, I didn’t find serious fodder for a paid link report, could that be true?

Wait … I missed the header, and luckily there’s a big fat paid link:

With this link I pay Google for Blogger’s services and hosting, and it is not nofollow’ed. Dammit, I can’t nofollow it myself, so here’s my paid links spam report:

paidlinks spam report

Ok, seriously I think that Google can discount commercial links because that’s how Google’s cookie crumbles. And I perfectly understand that Matt asks for a few samples of paid links Google has not yet discovered to fine tune Google’s algos. However, I fear that this call for paid-links-spam-reports will result in massive abuse of the form I use to report webspam that really annoys me because it disturbs my search results. I’m happy that it’s pretty easy to filter out abusive reports filed to damage a competitor’s rankings marked with “paidlinks” once Matt’s team has collected enough examples.

Tags: ()

Update: Read Rae Hoffman’s full disclosure too!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

XML sitemap auto-discovery

Vanessa Fox makes the news: In addition to sitemap submissions you can add this line to your robots.txt file:

Sitemap: http://www.example.com/sitemap.xml

Google, Yahoo, MSN and Ask (new on board) then should fetch and parse the XML sitemap automatically. Next week or so the cool robots.txt validator will get an update too.

Question: Is XML Sitemap Autodiscovery for Everyone?

Update:
More info here.

Q: Does it work by user-agent?
A: Yes, add all sitemaps to robots.txt, then disallow by engine.

Q: Must I fix canonical issues before I use sitemap autodiscovery?
A: Yes, 301-redirect everything to your canonical server name, and choose a preferred domain at Webmaster Central.

Q: Can I submit all supported sitemap formats via robots.txt?
A: Yes, everything goes. XML, RSS, ATOM, plain text, Gzipped …

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Four reasons to get tanked on Google’s SERPs

You know I find “My Google traffic dropped all of a sudden - I didn’t change anything - I did nothing wrong” threads fascinating. Especially posted with URLs on non-widgetized boards. Sometimes I submit an opinion, although the questioners usually don’t like my replies, but more often I just investigate the case for educational purposes.

Abstracting a fair amount of tanked sites I’d like to share a few of my head notes respectively theses as food for thoughts. I’ve tried to put these as generalized as possible, so please don’t blame me for the lack of a detailed explanation.

  1. Reviews and descriptions ordered by product category, product line, or other groupings of similar products, tend to rephrase each other semantically, that is in form and content. Be careful when it comes to money making areas like travel or real estate. Stress unique selling points, non-shared attributes or utilizations, localize properly and make sure reviews respectively descriptions don’t get spread in-full internally on crawlable pages as well as externally.
  2. Huge clusters of property/characteristic/feature lists under analogical headings, even unstructured, may raise a flag when the amount of applicable attributes is finite and values are rather similar with just few of them totally different respectively expressions of trite localization.
  3. The lack of non-commercial outgoing links on pages plastered with ads of any kind, or pages at the very buttom of the internal linking hierarchy, may raise a flag. Nofollow’ing, redirecting or iFraming affiliate/commercial links doesn’t prevent from breeding artificial page profiles. Adding unrelated internal links to the navigation doesn’t help. Adding Wikipedia links in masses doesn’t help. Providing unique textual content and linking to authorities within the content does help.
  4. Strong and steep hierarchical internal/navigational linkage without relevant crosslinks and topical between-the-levels linkage looks artificial, especially when the site in question lacks deep links. Look at the ratio of home page links vs. deep links to interior pages. Rethink the information architecture and structuring.

Take that as call for antithesis or just stuff for thoughts. And keep in mind that although there might be no recent structural/major/SEO/… on-site changes, perhaps Google just changed her judgement on the ancient stuff ranking forever, and/or has just changed the ability of existing inbound links to pass weight. Nothing’s set in stone. Not even rankings.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13  Next Page »