Erol to ship a Patch Fixing Google Troubles

Background: read these four posts on Google penalizing respectively deindexing e-commerce sites. Long story short: Recently Google’s enhanced algos began to deindex e-commerce sites powered by Erol’s shopping cart software. The shopping cart maintains a static HTML file which redirects user agents executing JavaScript to another URL. This happens with each and every page, so it’s quite understandable that Ms. Googlebot was not amused. I got involved as a few worried store owners asked for help in Google’s Webmaster Forum. After lots of threads and posts on the subject Erol’s managing director got in touch with me and we agreed to team up to find a solution to help the store owners suffering from a huge traffic loss. Here’s my report of the first technical round.

Understanding how Erol 4.x (and all prior versions) works:

The software generates a HTML page offline, which functions as an XML-like content source (called “x-page”, I use that term because all Erol customers are familar with it). The “x-page” gets uploaded to the server and is crawlable, but not really viewable. Requested by a robot it responds with 200-Ok. Requested by a human, it does a JavaScript redirect to a complex frameset, which loads the “x-page” and visualizes its contents. It responds to browsers if directly called, but returns a 404-NotFound error to robots. Example:

“x-page”: x999.html
Frameset: erol.html#999×0&&

To view the source of the “x-page” disable JavaScript before you click the link.

Understanding how search engines handle Erol’s pages:

There are two major weak points with regard to crawling and indexing. The crawlable page redirects, and the destination does not exist if requested by a crawler. This leads to these scenarios:

  1. A search engine ignoring JavaScript on crawled pages fetches the “x-page” and indexes it. That’s the default behavior of yesterdays crawlers, and still works this way at several search engines.
  2. A search engine not executing JavaScript on crawled pages fetches the “x-page”, analyzes the client sided script, and discovers the redirect (please note that a search engine crawler may change its behavior, so this can happen all of a sudden to properly indexed pages!). Possible consequences:
    • It tries to fetch the destination, gets the 404 response multiple times, and deindexes the “x-page” eventually. That would mean that depending on the crawling frequency and depth per domain the pages disappear quite fast or rather slow until the last page is phased out. Google would keep a copy in the supplemental index for a while, but this listing cannot return to the main index.
    • It’s trained to consider the unconditional JavaScript redirect “sneaky” and flags the URL accordingly. This can result in temporarily and permanent deindexing as well.
  3. A search engine executing JavaScript on crawled pages fetches the “x-page”, performs the redirect (thus ignores the contents of the “x-page”), and renders the frameset for indexing. Chances are it gives up on the complexity of the nested frames, indexes the noframe-tag of the frameset and perhaps a few snippets from subframes, considers the whole conglomerate thin, hence assignes the lowest possible priority for the query engine and moves on.

Unfortunately the search engine delivering the most traffic began to improve its crawling and indexing, hence many sites formerly receiving a fair amount of Google traffic began to suffer from scenario 2 — deindexing.

Outlining a possible work around to get the deleted pages back in the search index:

In six months or so Erol will ship version 5 of its shopping cart, and this software dumps frames, JavaScript redirects and ugly stuff like that in favor of clean XHTML and CSS. By the way, Erol has asked me for my input on their new version, so you can bet it will be search engine friendly. So what can we do in the meantime to help legions of store owners running version 4 and below?

We’ve got the static “x-page” which should not get indexed because it redirects, and which cannot be changed to serve the contents itself. The frameset cannot be indexed because it doesn’t exist for robots, and even if a crawler could eat it, we don’t consider it easy to digest spider fodder.

Let’s look at Google’s guidelines, which are the strictest around, thus applicable for other engines as well:

  1. Don’t […] present different content to search engines than you display to users, which is commonly referred to as “cloaking.”
  2. Don’t employ cloaking or sneaky redirects.

If we find a way to suppress the JavaScript code on the “x-page” when a crawler requests it, the now more sophisticated crawlers will handle the “x-page” like their predecessors, that is they would fetch the “x-pages” and hand them over to the indexer without vicious remarks. Serving identical content under different URLs to users and crawlers does not contradict the first prescript. And we’d comply to the second rule, because loading a frameset for human vistors but not for crawlers is definitely not sneaky.

Ok, now how to tell the static page that it has to behave dynamically, that is outputting different contents server sided depending on the user agent’s name? Well, Erol’s desktop software which generates the HTML can easily insert PHP tags too. The browser would not render those on a local machine, but who cares when it works after the upload on the server. Here’s the procedure for Apache servers:

In the root’s .htaccess file we enable PHP parsing of .html files:
AddType application/x-httpd-php .html

Next we create a PHP include file xinc.php which prevents crawlers from reading the offending JavaScript code:
<?php
$crawlerUAs = array(”Googlebot”, “Slurp”, “MSNbot”, “teoma”, “Scooter”, “Mercator”, “FAST”);
$isSpider = FALSE;
$userAgent = getenv(”HTTP_USER_AGENT”);
foreach ($crawlerUAs as $crawlerUA) {
if (stristr($userAgent, $crawlerUA)) $isSpider = TRUE;
}
if (!$isSpider) {
print “<script type=\”text/javascript\”> [a whole bunch of JS code] </script>\n”;
}
if ($isSpider) {
print “<!– Dear search engine staff: we’ve suppressed the JavaScript code redirecting browsers to “erol.html”, that’s a frameset serving this page’s contents more pleasant for human eyes. –>\n”;
}
?>

Erol’s HTML generator now puts <?php @include(”x.php”); ?> instead of a whole bunch of JavaScript code.

The implementation for other environments is quite similar. If PHP is not available we can do it with SSI and PERL. On Windows we can tell IIS to process all .html extensions as ASP (App Mappings) and use an ASP include. That would give three versions of that patch which should help 99% of all Erol customers until they can upgrade to version 5.

This solution comes with two disadvantages. First, the cached page copies, clickable from the SERPs and toolbars, would render pretty ugly because they lack the JavaScript code. Second, perhaps automated tools searching for deceitful cloaking might red-flag the URLs for a human review. Hopefully the search engine executioner reading the comment in the source code will be fine with it and give it a go. If not, there’s still the reinclusion request. I think store owners can live with that when they get their Google traffic back.

Rolling out the patch:

Erol thinks the above said makes sense and there is a chance of implementing it soon. While the developers are at work, please provide feedback if you think we didn’t interpret Google’s Webmaster Guidelines strict enough. Keep in mind that this is an interim solution and that the new version will handle things more standardized. Thanks.

Paid-Links-Disclosure: I do this pro bono job for the sake of the suffering store owners. Hence the links pointing to Erol and Erol’s customers are not nofollow’ed. Not that I’d nofollow them otherwise ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

AdSense asks me "Are You Gay?", but why?

John emailed me this screenshot:
Are you gay?

I wondered why the heck AdSense considers a post on Google’s new URL removal tool gay in nature. Warning! These links (Google search results) are not safe at work:

0.08% gay: “fell in love
0.11% gay: “neat
0.05% gay: “terminator
0.02% gay: “competitors
0.04% gay: “user
14.7% gay: “friendly
56.4% gay: “tool

Who can help me to figure out the remaining 28.6% gayness? I mean when putting two ads asking “Are you gay” above and below the post, Ol’ AdSense should be at least 100.0% certain that this question will not offend me. Actually, I’m not offended. I’m just curious to learn more about a possible coming out.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

More anchor text analysis from Webmaster Central

If you didn’t spot my update posted a few hours ago, log in to Webmaster Central and view your anchor text stats. Find way more phrases and play with the variations, these should allow you to track down sources by quoted search queries. Also, the word-stats are back.
Have fun!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Please don’t run your counter on my servers

DO NOT HOTLINKI deeply understand that sharing other peoples resources makes sense sometimes. I just ask you to rethink your technical approach. Running your page view stats on my server comes with a serious disadvantage: my server logs and referrer reports are protected, hence you can’t read your stats. Rest assured I’m really not eager to know who views your pages.

So please: when you copy my HTML code, be so kind and steal the invisible 1×1px images too. It’s really not that hard to upload them to your server and edit my HTML in a way that your visitors’ user agents request these images from your server.

Signing up at a free counter service not adding hidden links to all your pages gives less hassles than my reaction when I get annoyed.

Disclaimer: I don’t like it when you steal my code coz for some reasons it’s often crappy enough to break your layout. Also copying code without permission is as bad as content theft. So don’t copy, but feel free to ask.

Go to HTML Basix to figure out how you can block hotlinking with .htaccess:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\.)?sebastianx.blogspot.com(/)?.*$ [NC]
RewriteRule .*\.(gif|jpg|jpeg|bmp|png)$ http://www.smart-it-consulting.com/img/misc/do-not-hotlink-beauty.jpg [R,NC]

But please don’t steal or hotlink the offensive blonde beauty ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Ultimately: Watch out for Google’s URL terminator

I hate it to recycle news, but I just fell in love with this neat URL terminator. Unfortunately there’s no button to remove SPAM, so I still have to outrank competitors, but besides that ‘flaw’ it’s a perfect and user friendly tool covering all my needs. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Where is the precise definition of a paid link?

Good questions:

How many consultants provide links through to the companies they work for?
I do.

How many software firms provide links through to their major corporate clients?
Not my company. Never going to happen.

If you make a donation to someone, and they decide to give you a link back, is that a paid link?
Nope.

If you are a consultant, and are paid to analyse a company, but to make the findings known publicly, are you supposed to stick nofollow on all the links?
Nope.

If you are a VC or Angel investor, should you have to use NoFollow linking through to companies in your investment portfolio?
Nope.

Are developers working on an open-source project allowed a link back to their sites (cough Wordpress) Yep, and then use that link equity to dominate search engines on whatever topic they please?
Hmmmm, if it really works that way, why not?

If you are a blog network, or large internet content producer, is it gaming Google to have links to your sister sites, whether there is a direct financial connection or not?
Makes business sense, so why should those links get condomized? Probably a question of quantity. No visitor would follow a gazillion of links to blogs handling all sorts of topis the yellow pages have categories for.

Should a not for profit organisation link through to their paid members with a live link?
Sure, perfectly discloses relationships and their character.

A large number of Wordpress developers have paid links on their personal sites, as do theme and plugin developers.
What’s wrong with that? Maybe questionable (in the sense of useless) on every page, but perfectly valid on home page, about page and so on if disclosed. As for ads, that sort of paid links is valid on every page - nofollow’ing ads just avoids misunderstandings.

If you write a blog post, thanking your sponsors, should you use nofollow?
Yep.

Some people give away prizes for links, or offer some kind of reciprocation.
If the awards are honest and truly editorial, linking back is just good practice.

If you are an expert in a particular field, and someone asks you to write a review of their site, and the type of review you write means that writing that content might take 10 hours of your time to do due diligence, is it wrong to accept some kind of monetary contribution? Just time and material?
In such a situation, why would you be forced to use nofollow on all links to the site being reviewed?
Disclosing the received expense allowance there’s nothing wrong with uncondomized links.

Imagine someone created a commercial Wikipedia, and paid $5 for every link made to it.
Don’t link. The link would be worth more than five bucks and the risks involved can cost way more than five bucks.

Where is the precise definition of a paid link?
Now that’s the best question at all!

Disclaimer: Yes/No answers are kinda worthless without a precisely defined context. Thus please read the comments.

Related thoughts: Should Paid Links Influence Organic Rankings? by Mark Jackson at SEW
Paid Link Schemes Inside Original Content by Brian White, also read Matt’s updated post on paid links.

Update: Google’s definition of paid links and other disliked linkage considered “linkspam”



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Revise your linkage now

Google’s take on paid links obviously is there to stay, and frankly, old news were heating the minds last weekend. There’s no sign at the horizon that Google revises the policy, quite the opposite is true. Google has new algos in development which are supposed to detect all sorts of paid linkage better than before. For many that’s bad news, however I guess that rants are a waste of valuable time needed to revise actual and previous link attitudes.

I don’t think that Googlebot has learned to read text like “Buy a totally undetectable PR8 link now for as low as 2,000$ and get four PR6 links for free!” on images. Sites operating that obvious link selling businesses leave other –detectable– footprints, and their operators are aware of the risks involved. Buyers are rather save, they just waste a lot of money because purchased links most probably don’t carry link love and nofollow’ed links may be way cheaper.

I often stumble upon cases where old and forgotten links create issues over time. Things that have worked perfectly in the past can bury a site on todays SERPs. The crucial message is be careful who you link out to! Compile a list of all your outbound links and check them carefully before Google’s newest link analysis goes life. Especially look at the ancient stuff.

Update: I’ve just spotted Eric Wards article on forensic inbound link analysis, you should read that and his comments on outgoing links too.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Full Disclosure of Paid Links

Since I’ve moved the blog from blogspot.com to this place, this post makes not much sense anymore. I’ve dispensed all the stuff from my old and ugly sidebar at blogspot over a couple pages here, not much of it is still on the sidebar.

Following Matt’s advice on paid links I’ve looked at this blog to reveal sneaky commercial links, although nobody really likes this idea.

I’m pretty sure that I never got paid for posting, so there was just the sidebar to check. I found a couple of links leading to articles I wrote with both educational and commercial intent as well. I consider these valuable resources so there’s no need to report or nofollow them.

Next in the “What I read” section I didn’t find a suitable procedure to report that “Books, tons of books” includes commercial stuff like database manuals and other publications with a clearly commercial message. I paid for all these books … sigh.

Ok, next the blogroll. Again, all links point to good resources, nothing to report. Under the search box there’s a link to Technorati which I can’t nofollow because it’s put by Technorati’s script. Technorati sends me traffic, I use Technorati for research, so probably this link is fine and counts as honest recommendation, although it functions as a traffic deal too.

Checking the “Links and Folks” section I found a not that related link pointing to bikes for sale at OCC. Well, I really like OCC bikes, and this is my personal blog, so why shouldn’t I link out to a resource unrelated to search and Web development? Hmmmm … perhaps I should ask Google for permission to dofollow this somewhat commercial link before I receive a free bike in return.

Next the “Ads by Google” links are fine, because they’re put client sided and even the Googlebot executing JavaScript knows that everything in a block of code containing an AdSense publisher code is auto-nofollow’ed by definition.

Both the MBL widget as well as the Twitter badge are put client sided, plus both were free of commercial links, at least last time I looked. End of sidebar, I didn’t find serious fodder for a paid link report, could that be true?

Wait … I missed the header, and luckily there’s a big fat paid link:

With this link I pay Google for Blogger’s services and hosting, and it is not nofollow’ed. Dammit, I can’t nofollow it myself, so here’s my paid links spam report:

paidlinks spam report

Ok, seriously I think that Google can discount commercial links because that’s how Google’s cookie crumbles. And I perfectly understand that Matt asks for a few samples of paid links Google has not yet discovered to fine tune Google’s algos. However, I fear that this call for paid-links-spam-reports will result in massive abuse of the form I use to report webspam that really annoys me because it disturbs my search results. I’m happy that it’s pretty easy to filter out abusive reports filed to damage a competitor’s rankings marked with “paidlinks” once Matt’s team has collected enough examples.

Tags: ()

Update: Read Rae Hoffman’s full disclosure too!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Is XML Sitemap Autodiscovery for Everyone?

Referencing XML sitemaps in robots.txt was recently implemented by Google upon requests of webmasters going back to June, 2005, shortly after the initial launch of sitemaps. Yahoo, Microsoft, and Ask support it, whereby nobody knows when MSN is going to implement XML sitemaps at all.

Some folks argue that robots.txt introduced by the Robots Exclusion Protocol from 1994 should not get abused by inclusion mechanisms. Indeed this may create confusion, but it was done before, for example by search engines supporting Allow: statements introduced 1996. Also, the de facto Robots Exclusion Standard covers robots meta tags –where inclusion is the default– too. I think dogmatism is not helpful when the actual needs require evolvement.

So yes, the opportunity to address sitemaps in robots.txt is a good thing, but certainly not enough. It simplifies the process, that is auto detection of sitemaps eliminates a few points of failure. Webmasters don’t need to monitor which engines implemented the sitemaps protocol recently, and submit accordingly. They can just add a single line to their robots.txt file and the engines will do their job. Fire and forget is a good concept. However, the good news come with pitfalls.

But is this good thing actually good for everyone? Not really. Many publishers have no control over their server’s robots.txt file, for example publishers utilizing signup-and-instantly-start-blogging services or free hosts. As long as these platforms generate RSS feeds or other URL lists suitable as sitemaps, the publishers must submit to all search engines manually. Enhancing the sitemaps auto detection by looking at page meta data would be great: <meta name="sitemap" content="http://www.example.com/sitemap.xml" /> or <link rel="sitemap" type="application/rss+xml" href="http://www.example.com/sitefeed.rss" /> would suffice.

So far the explicit diaspora. Others are barred from sitemap autodiscovery by lack of experience, technical skills, or manageable environments like at way to restrictive hosting services. Example: the prerequisites for sitemap autodetection include the ability to fix canonical issues. An XML sitemap containing www.domain.tld-URLs referenced as Sitemap: http://www.domain.tld/sitemap.xml in http://domain.tld/robots.txt is plain invalid. Crawlers following links without the “www” subdomain will request the robots.txt file without the “www” prefix. If a webmaster running this flawed but very common setup relies on sitemap autodetection, s/he will miss out on feedback respectively error alerts. On some misconfigured servers this may even lead to deindexing of all pages with relative internal links.

Hence please listen to Vanessa Fox stating that webmasters shall register their autodiscovered sitemaps at Webmaster Central and Site Explorer to get alerted on errors which an XML sitemap validator cannot spot, and to monitor the crawling process!

I doubt many SEO professionals and highly skilled Webmasters managing complex sites will make use of that new feature. They prefer to have things under control, and automated 3rd party polls are hard to manipulate. Probably they want to maintain different sitemaps per engine to steer their crawling accordingly. Although this can be accomplished by user agent based delivery of robots.txt, that additional complexity doesn’t make the submission process easier to handle. Only uber-geeks automate everything ;)

For example it makes no sense to present a gazillion of image- or video clip URLs to a search engine indexing textual contents only. Google handles different content types extremely simple for the site owner. One can put HTML pages, images, movies, PDFs, feeds, office documents and whatever else all in one sitemap and Google’s sophisticated crawling process delivers each URL to the indexer it belongs to. We don’t know (yet) how other engines will handle that.

Also, XML sitemaps are a neat instrument to improve crawling and indexing of particular contents. One search engine may nicely index insuffient linked stuff, whilst another engine fails to discover pages buried more than two link levels deep, badly needing the hints via sitemap. There are more good reasons to give each engine its own sitemap.

Last but not least there might be good reasons not to announce sitemap contents to the competition.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

In need of a "Web-Robot Directives Standard"

The Robots Exclusion Protocol from 1994 gets used and abused, best described by Lisa Barone citing Dan Crow from Google: “everyone uses it but everyone uses a different version of it”. De facto we’ve a Robots Exclusion Standard covering crawler directives in robots.txt and robots meta tags as well, said Dan Crow. Besides non-standardized directives like “Allow:”, Google’s Sitemaps Protocol adds more inclusion to the mix, now even closely bundled with robots.txt. There are more ways to put crawler directives. Unstructured (in the sense of independence from markup elements) like with Google’s section targeting, on link level applying the commonly disliked rel-nofollow microformat or XFN, and related thoughts on block level directives.

All in all that’s a pretty confusing conglomerate of inclusion and exclusion, utilizing many formats respectively markup elements, and lots of places to put crawler directives. Not really the sort of norm the webmaster community can successfully work with. No wonder that over 75,000 robots.txt files have pictures in them, that less than 35 percent of servers have a robots.txt file, that the average robots.txt file is 23 characters (”User-agent: * Disallow:”), gazillions of Web pages carry useless and unsupported meta tags like “revisit-after” … for more funny stats and valuable information see Lisa’s robots.txt summit coverage (SES NY 2007), also covered by Tamar (read both!).

How to structure a “Web-Robot Directives Standard”?

To handle redundancies as well as cascading directives properly, we need a clear and understandable chain of command. The following is just a first idea off the top of my head, and likely gets updated soon:

  • Robots.txt

    1. Disallows directories, files/file types, and URI fragments like query string variables/values by user agent.
    2. Allows sub-directories, file names and URI fragments to refine Disallow statements.
    3. Gives general directives like crawl-frequency or volume per day and maybe even folders, and restricts crawling in particular time frames.
    4. References general XML sitemaps accessible to all user agents, and specific XML sitemaps addressing particular user agents as well.
    5. Sets site-level directives like “noodp” or “noydir”.
    6. Predefines page-level instructions like “nofollow”, “nosnippet” or “noarchive” by directory, document type or URL fragments.
    7. Predefines block-level respectively element-level conditions like “noindex” or “nofollow” on class names or DOM-IDs by markup element. For example “DIV.hMenu,TD#bNav ‘noindex,nofollow’” could instruct crawlers to ignore the horizontal menu as well as navigation at the very bottom on all pages.
    8. Predefines attribute-level conditions like “nofollow” on A elements. For example “A.advertising REL ‘nofollow’” could tell crawlers to ignore links in ads, or “P#tos > A ‘nofollow’” could instruct spiders to ignore links in TOS excerpts found on every page in a P element with the DOM-ID “tos”.
    • XML Sitemaps

      1. Since robots.txt deals with inclusion now, why not add an optional URL specific “action” element allowing directives like “nocache” or “nofollow”? Also a “delete” directive to get outdated pages removed from search indexes would make sound sense.
      2. To make XML sitemap data reusable, and to allow centralized maintenance of page meta data, a couple of new optional URL elements like “title”, “description”, “document type”, “language”, “charset”, “parent” and so on would be a neat addition. This way it would be possible to visualize XML sitemaps as native (and even hierarchical) site maps.

      Robots.txt exclusions overrule URLs listed for inclusion in XML sitemaps.

      • Meta Tags

      • Page meta data overrule directives and information provided in robots.txt and XML sitemaps. Empty contents in meta tags suppress directives and values given in upper levels. Non-existent meta tags implicitly apply data and instructions from upper levels. The same goes for everything below.
        • Body Sections

        • Unstructured parenthesizing of parts of code certainly is undoable with XMLish documents, but may be a pragmatic procedure to deal with legacy code. Paydirt in HTML comments may be allowed to mark payload for contextual advertising purposes, but it’s hard to standardize. Lets leave that for proprietary usage.
          • Body Elements

          • Implementing a new attribute for messages to machines should be avoided for several good reasons. Classes are additive, so multiple values can be specified for most elements. That would allow to put standarized directives as class names, for example class=”menu robots-noindex googlebot-nofollow slurp-index-follow” where the first class addresses CSS. Such inline robot directives come with the same disadvantages as inline style assignments and open a can of worms so to say. Using classes and DOM-IDs just as a reference to user agent specific instructions given in robots.txt is surely the preferable procedure.
            • Element Attributes

            • More or less this level is a playground for microformats utilizing the A element’s REV and REL attributes.

Besides the common values “nofollow”, “noindex”, “noarchive”/”nocache” etc. and their omissible positive defaults “follow” and “index” etc., we’d need a couple more, for example “unapproved”, “untrusted”, “ignore” or “skip” and so on. There’s a lot of work to do.

In terms of of complexity, a mechanism as outlined above should be as easy to use as CSS in combination with client sided scripting for visualization purposes.

However, whatever better ideas are out there, we need a widely accepted “Web-Robot Directives Standard” as soon as possible.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »