SEO

Archived posts from the 'SEO' Category

Yahoo! search going to torture Webmasters

Posted on 2 May, 2007

According to Danny Yahoo! supports a multi-class nonsense called robots-nocontent tag. CRAP ALERT!

Can you senseless and cruel folks at Yahoo!-search imagine how many of my clients who’d like to use that feature have copied and pasted their pages? Do you’ve a clue how many sites out there don’t make use of SSI, PHP or ASP includes, and how many sites never heard of dynamic content delivery, respectively how many sites can’t use proper content delivery techniques because they’ve to deal with legacy systems and ancient business processes? Did you ask how common templated Web design is, and I mean the weird static variant, where a new page gets build from a randomly selected source page saved as new-page.html?

It’s great that you came out with a bastardized copy of Google’s somewhat hapless (in the sense of cluttering structured code) section targeting, because we dreadfully need that functionality across all engines. And I admit that your approach is a little better than AdSense section targeting because you don’t mark payload by paydirt in comments. But why the heck did you design it that crappy? The unthoughtful draft of a microformat from what you’ve “stolen” that unfortunate idea didn’t become a standard for very good reasons. Because it’s crap. Assigning multiple class names to markup elements for the sole purpose of setting crawler directives is as crappy as inline style assignments.

Well, due to my zero-bullshit tolerance I’m somewhat upset, so I repeat: Yahoo’s robots-nocontent class name is crap by design. Don’t use it, boycott it, because if you make use of it you’ll change gazillions of files for each and every proprietary syntax supported by a single search engine in the future. When the united search geeks can agree on flawed standards like rel-nofollow, they should be able to talk about a sensible evolvement of robots.txt.

There’s a way easier solution, which doesn’t require editing tons of source files, that is standardizing CSS-like syntax to assign crawler directives to existing classes and DOM-IDs. For example extent robots.txt syntax like:

A.advertising { rel: nofollow; } /* devalue aff links */

DIV.hMenu, TD#bNav { content:noindex; rel:nofollow; } /* make site wide links unsearchable */

Unsupported robots.txt syntax doesn’t harm, proprietary attempts do harm!

Dear search engines, get together and define something useful, before each of you comes out with different half-baked workarounds like section targeting or robots-nocontent class values. Thanks!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

7 comments Sebastian | Crap, Copy+Paste-Penalties, robots.txt, SEO, Yahoo, Microformats

Google hunts paid links and reciprocal linkage

Posted on 2 May, 2007

Matt Cutts and Adam Lasnik have clarified Google’s take on paid links and overdone reciprocal linkage. Some of their statements are old news, but it surely helps to have a comprehensive round-up in the context of the current debate on paid links.

So what -in short- does Google consider linkspam:
Artificial link schemes, paid links and uncondomized affiliate links, overdone reciprocal linkage and interlinking.

All sorts of link schemes designed to increase a site’s ranking or PageRank. Link scheme means for example mass exchange of links pages, repeated chunks of links per site, fishy footer links, triangular PageRank boosting, 27-way-linkage where in the end only the initiator earns a few inbounds because the participants are confused, and “genial” stuff like that. Google’s pretty good at identifying link farming, and bans or penalizes accordingly. That’s old news, but such techniques are still used, widely.

Advice: don’t participate, Google will catch you eventually.

Paid links, if detected or reported, get devalued. That is, they don’t help the link destination’s search engine rankings, and in some cases the source will lose its ability to pass reputation via links. Google does this more or less silently since 2003 at least, probably longer, but until today there was no precise definition of risky paid links.

That’s going to change. Adam Lasnik, commenting Eric Enge’s “It seems to me that one of the more challenging aspects of all of this is that people have gotten really good at buying a link that show no indication that they are purchased.”

Yes and no, actually. One of the things I think Matt has commented about in his blog; it’s what we joking refer to as famous last words, which is “well, I have come up with a way to buy links that is completely undetectable”.

As people have pointed out, Google buys advertising, and a lot of other great sites engage in both the buying and selling of advertising. There is no problem with that whatsoever. The problem is that we’ve seen quite a bit of buying and selling for the very clear purpose of transferring PageRank. Some times we see people out there saying “hey, I’ve got a PR8 site” and, “this will give you some great Google boost, and I am selling it for just three hundred a month”. Well, that’s blunt, and that’s clearly in violation of the “do not engage in linking schemes that are not permitted within the webmaster guidelines”.

Two, taking a step back, our goal is not to catch one hundred percent of paid links [emphasis mine]. It’s to try to address the egregious behavior of buying and selling the links that focus on the passing of PageRank. That type of behavior is a lot more readily identifiable then I think people give us credit for.

So it seems Google’s just after PageRank selling. Adam’s following comments on the use and abuse of rel-nofollow emphasizes this interpretation:

I understand there has been some confusion on that, both in terms of how it [rel=nofollow] works or why it should be used. We want links to be treated and used primarily as votes for a site, or to say I think this is an interesting site, and good site. The buying and selling of links without the use of Nofollow, or JavaScript links, or redirects has unfortunately harmed that goal. We realize we cannot turn the web back to when it was completely noncommercial and we don’t want to do that [emphasis mine]. Because, obviously as Google, we firmly believe that commerce has an important role on the Internet. But, we want to bring a bit of authenticity back to the linking structure of the web. […] our interest isn’t in finding and taking care of a hundred percent of links that may or may not pass PageRank. But, as you point out relevance is definitely important and useful, and if you previously bought or sold a link without Nofollow, this is not the end of the world. We are looking for larger and more significant patterns [emphasis mine].

Don’t miss out on Eric Enge’s complete interview with Adam Lasnik, it’s really worth bookmarking for future references!

Matt Cutts has updated (May 12th, 2007) an older and well linked post on paid links. It also covers thoughts on the value of directory links. Here are a few quotes, but don’t miss out on Matt’s post:

… we’re open to semi-automatic approaches to ignore paid links, which could include the best of algorithmic and manual approaches.
…
Q: Now when you say “paid links”, what exactly do you mean by that? Do you view all paid links as potential violations of Google’s quality guidelines?
A: Good question. As someone working on quality and relevance at Google, my bottom-line concern is clean and relevant search results on Google. As such, I care about paid links that flow PageRank and attempt to game Google’s rankings. I’m not worried about links that are paid but don’t affect search engines. So when I say “paid links” it’s pretty safe to add in your head “paid links that flow PageRank and attempt to game Google’s rankings.”
…
Q: This is all well and fine, but I decide what to do on my site. I can do anything I want on it, including selling links.
A: You’re 100% right; you can do absolutely anything you want on your site. But in the same way, I believe Google has the right to do whatever we think is best (in our index, algorithms, or scoring) to return relevant results.
…
Q: Hey, as long as we’re talking about directories, can you talk about the role of directories, some of whom charge for a reviewer to evaluate them?
A: I’ll try to give a few rules of thumb to think about when looking at a directory. When considering submitting to a directory, I’d ask questions like:
- Does the directory reject URLs? If every URL passes a review, the directory gets closer to just a list of links or a free-for-all link site.
- What is the quality of urls in the directory? Suppose a site rejects 25% of submissions, but the urls that are accepted/listed are still quite low-quality or spammy. That doesn’t speak well to the quality of the directory.
- If there is a fee, what’s the purpose of the fee? For a high-quality directory, the fee is primarily for the time/effort for someone to do a genuine evaluation of a url or site.
Those are a few factors I’d consider. If you put on your user hat and ask “Does this seem like a high-quality directory to me?” you can usually get a pretty good sense as well, or ask a few friends for their take on a particular directory.
…

To get a better idea on how Google’s search quality team chases paid links, read Brian White’s post Paid Link Schemes Inside Original Content.

Advice: either nofollow paid links, or don’t get caught. If you buy links, pay only for the traffic, because with or without link condom there’s no search engine love involved.

Affiliate links are seen as kinda subset of paid links. Google can identify most (unmasked) affiliate links. Frankly, there’s no advantage in passing link love to sponsors.

Advice: nofollow.

Reciprocal links without much doubt nullify each other. Overdone reciprocal linkage may even cause penalties, that is the reciprocal links area of a site gets qualified as link farm, for possible consequences scroll up a bit. Reciprocal links are natural links, and Google honors them if the link profile of a site or network does not consist of a unnnatural high number of reciprocal or triangular link exchanges. It may be that natural reciprocal links pass (at least a portion of) PageRank, but no (or less than one-way links) revelancy via anchor text and trust or other link reputation.

Matt Cutts discussing “Google Hell”:

Reciprocal links by themselves aren’t automatically bad, but we’ve communicated before that there is such a thing as excessive reciprocal linking. […] As Google changes algorithms over time, excessive reciprocal links will probably carry less weight. That could also account for a site having more pages in supplemental results if excessive reciprocal links (or other link-building techniques) begin to be counted less. As I said in January: “The approach I’d recommend in that case is to use solid white-hat SEO to get high-quality links (e.g. editorially given by other sites on the basis of merit).”

Advice: It’s safe to consider reciprocal links somewhat helpful, but don’t actively chase for reciprocal links.

Interlinking all sites in a network can be counterproductive, but selfish cross-linking is not penalized in general. There’s no “interlinking penalty” when these links make sound business sense, even when the interlinked sites aren’t topically related. Interlinking sites handling each and every yellow page category on the other hand may be considered overdone. In some industries like adult entertainment, where it’s hard to gain natural links, many webmasters try to boost their rankings with links from other (unrelated) sites they own or control. Operating hundreds or thousands of interlinked travel sites spread on many domains and subdomains is risky too. In the best case such linking patterns may be just ignored by Google, that is they’ve no or very low impact on rankings at all, but it’s easy to convert a honest network into a link farm by mistake.

Advice: Carefully interlink your own sites in smaller networks, but partition these links by theme or branch in huge clusters. Consider consolidating closely related sites.

So what does all that mean for Webmasters?

Some might argue “if it ain’t broke don’t fix it”, in other words “why should I revamp my linkage when I rank fine?”. Well, rules like “any attempt to improve on a system that already works is pointless and may even be detrimental” are pointless and detrimental in a context where everything changes daily. Especially, when the tiny link-systems designed to fool another system, passively interact with that huge system (the search engine polls linkage data for all kinds of analyses). In that case the large system can change the laws of the game at any time to outsmart all the tiny cheats. So just because Google didn’t discover all link schemes or shabby reciprocal link cycles out there, that does not mean the participants are safe forever. Nothing’s set in stone, not even rankings, so better revise your ancient sins.

Bear in mind that Google maintains a database containing all links in the known universe back to 1998 or so, and that a current penalty may be the result of a historical analysis of a site’s link attitude. So when a site is squeaky clean today but doesn’t rank adequately, consider a reinclusion request if you’ve cheated in the past.

Before you think of penalties as the cause of downranked or even vanished pages, analyze your inbound links that might have started counting for less. Pull all your inbound links from Site Explorer or Webmaster Central, then remove questionable sources from the list:

Paid links and affiliate links where you 301-redirect all landing pages with affiliate IDs in the query string to a canonical landing page,
Links from fishy directories, links lists, FFAs, top rank lists, DMOZ-clones and stuff like that,
Links from URLs which may be considered search results,
Links from sites you control or which live off your contents,
Links from sites engaged in reciprocal link swaps with your sites,
Links from sites which link out to too many questionable pages in link directories or where users can insert links without editorial control,
Links from shabby sites regardless their toolbar PageRank,
Links from links pages which don’t provide editorial contents,
Links from blog comments, forum signatures, guestbooks and other places where you can easily drop URLs,
Nofollow’ed links and links routed via uncrawlable redirect scripts,
…

Judge by content quality, traffic figures if available, and user friendliness, not by toolbar PageRank. Just because a link appears in reverse citation results, that does not mean it carries any weight.

Look at the shrinked list of inbound links and ask yourself where on the SERPs a search engine should rank your stuff based on these remaining votes. Frustrated? Learn the fine art of link building from an expert in the field.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

1 comment Sebastian | Risky Linkage, Reciprocal Links, Paid Links, SEO, Google, Nofollow

How Google & Yahoo handle the link condom

Posted on 27 April, 2007

Loren Baker over at SEJ got a few official statements on use and abuse of the rel-nofollow microformat by the major players: How Google, Yahoo & Ask treat NoFollow’ed links. Great job, thanks!

Ask doesn’t “officially” support nofollow, whatever that means. Loren didn’t ask MSN, probably because he didn’t expect that they’ve even noticed that they officially support nofollow since 2005, same procedure with sitemaps by the way. Yahoo implemented it along the specs, and Google stepped way over the line the norm sets. So here is the difference:

1. Do you follow a nofollow’ed link?
Google: No (longer)
Yahoo: Yes

2. Do you index the linked page following a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: Yes

3. Does your ranking algos factor in reputation, anchor/alt/title text or whichever link love sourced from a nofollow’ed link?
Google: Obsolete, see 1.
Yahoo: No

4. Do you show nofollow’ed links in reverse citation results?
Google: Yes (in link: searches by accident, in Webmaster Central if the source page didn’t make it into the supplemental index)
Yahoo: Yes (Site Explorer)

Q&A#4 is made up but accurate. I think it’s safe to assume that MSN handles the link condom like Yahoo. (Update: As Loren clarifies in the comments, he asked MSN search but they didn’t answer in a timely fashion.)

And here’s a remarkable statement from Google’s search evangelist Adam Lasnik, who may like nofollow or not:

On a related note, though, and echoing Matt’s earlier sentiments … we hope and expect that more and more sites — including Wikipedia — will adopt a less-absolute approach to no-follow … expiring no-follows, not applying no-follows to trusted contributors, and so on.

Bravo!

Related link: rel=”nofollow” Google, Yahoo and MSN

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | SEO, Yahoo, Google, Microformats, Nofollow

Erol to ship a Patch Fixing Google Troubles

Posted on 24 April, 2007

Background: read these four posts on Google penalizing respectively deindexing e-commerce sites. Long story short: Recently Google’s enhanced algos began to deindex e-commerce sites powered by Erol’s shopping cart software. The shopping cart maintains a static HTML file which redirects user agents executing JavaScript to another URL. This happens with each and every page, so it’s quite understandable that Ms. Googlebot was not amused. I got involved as a few worried store owners asked for help in Google’s Webmaster Forum. After lots of threads and posts on the subject Erol’s managing director got in touch with me and we agreed to team up to find a solution to help the store owners suffering from a huge traffic loss. Here’s my report of the first technical round.

Understanding how Erol 4.x (and all prior versions) works:

The software generates a HTML page offline, which functions as an XML-like content source (called “x-page”, I use that term because all Erol customers are familar with it). The “x-page” gets uploaded to the server and is crawlable, but not really viewable. Requested by a robot it responds with 200-Ok. Requested by a human, it does a JavaScript redirect to a complex frameset, which loads the “x-page” and visualizes its contents. It responds to browsers if directly called, but returns a 404-NotFound error to robots. Example:

“x-page”: x999.html
Frameset: erol.html#999×0&&

To view the source of the “x-page” disable JavaScript before you click the link.

Understanding how search engines handle Erol’s pages:

There are two major weak points with regard to crawling and indexing. The crawlable page redirects, and the destination does not exist if requested by a crawler. This leads to these scenarios:

A search engine ignoring JavaScript on crawled pages fetches the “x-page” and indexes it. That’s the default behavior of yesterdays crawlers, and still works this way at several search engines.

A search engine not executing JavaScript on crawled pages fetches the “x-page”, analyzes the client sided script, and discovers the redirect (please note that a search engine crawler may change its behavior, so this can happen all of a sudden to properly indexed pages!). Possible consequences:
- It tries to fetch the destination, gets the 404 response multiple times, and deindexes the “x-page” eventually. That would mean that depending on the crawling frequency and depth per domain the pages disappear quite fast or rather slow until the last page is phased out. Google would keep a copy in the supplemental index for a while, but this listing cannot return to the main index.
- It’s trained to consider the unconditional JavaScript redirect “sneaky” and flags the URL accordingly. This can result in temporarily and permanent deindexing as well.

A search engine executing JavaScript on crawled pages fetches the “x-page”, performs the redirect (thus ignores the contents of the “x-page”), and renders the frameset for indexing. Chances are it gives up on the complexity of the nested frames, indexes the noframe-tag of the frameset and perhaps a few snippets from subframes, considers the whole conglomerate thin, hence assignes the lowest possible priority for the query engine and moves on.

Unfortunately the search engine delivering the most traffic began to improve its crawling and indexing, hence many sites formerly receiving a fair amount of Google traffic began to suffer from scenario 2 — deindexing.

Outlining a possible work around to get the deleted pages back in the search index:

In six months or so Erol will ship version 5 of its shopping cart, and this software dumps frames, JavaScript redirects and ugly stuff like that in favor of clean XHTML and CSS. By the way, Erol has asked me for my input on their new version, so you can bet it will be search engine friendly. So what can we do in the meantime to help legions of store owners running version 4 and below?

We’ve got the static “x-page” which should not get indexed because it redirects, and which cannot be changed to serve the contents itself. The frameset cannot be indexed because it doesn’t exist for robots, and even if a crawler could eat it, we don’t consider it easy to digest spider fodder.

Let’s look at Google’s guidelines, which are the strictest around, thus applicable for other engines as well:

Don’t […] present different content to search engines than you display to users, which is commonly referred to as “cloaking.”

Don’t employ cloaking or sneaky redirects.

If we find a way to suppress the JavaScript code on the “x-page” when a crawler requests it, the now more sophisticated crawlers will handle the “x-page” like their predecessors, that is they would fetch the “x-pages” and hand them over to the indexer without vicious remarks. Serving identical content under different URLs to users and crawlers does not contradict the first prescript. And we’d comply to the second rule, because loading a frameset for human vistors but not for crawlers is definitely not sneaky.

Ok, now how to tell the static page that it has to behave dynamically, that is outputting different contents server sided depending on the user agent’s name? Well, Erol’s desktop software which generates the HTML can easily insert PHP tags too. The browser would not render those on a local machine, but who cares when it works after the upload on the server. Here’s the procedure for Apache servers:

In the root’s .htaccess file we enable PHP parsing of .html files:
AddType application/x-httpd-php .html

Next we create a PHP include file xinc.php which prevents crawlers from reading the offending JavaScript code:
<?php $crawlerUAs = array(”Googlebot”, “Slurp”, “MSNbot”, “teoma”, “Scooter”, “Mercator”, “FAST”); $isSpider = FALSE; $userAgent = getenv(”HTTP_USER_AGENT”); foreach ($crawlerUAs as $crawlerUA) { if (stristr($userAgent, $crawlerUA)) $isSpider = TRUE; } if (!$isSpider) { print “<script type=\”text/javascript\”> [a whole bunch of JS code] </script>\n”; } if ($isSpider) { print “<!- Dear search engine staff: we’ve suppressed the JavaScript code redirecting browsers to “erol.html”, that’s a frameset serving this page’s contents more pleasant for human eyes. ->\n”; } ?>

Erol’s HTML generator now puts <?php @include(”x.php”); ?> instead of a whole bunch of JavaScript code.

The implementation for other environments is quite similar. If PHP is not available we can do it with SSI and PERL. On Windows we can tell IIS to process all .html extensions as ASP (App Mappings) and use an ASP include. That would give three versions of that patch which should help 99% of all Erol customers until they can upgrade to version 5.

This solution comes with two disadvantages. First, the cached page copies, clickable from the SERPs and toolbars, would render pretty ugly because they lack the JavaScript code. Second, perhaps automated tools searching for deceitful cloaking might red-flag the URLs for a human review. Hopefully the search engine executioner reading the comment in the source code will be fine with it and give it a go. If not, there’s still the reinclusion request. I think store owners can live with that when they get their Google traffic back.

Rolling out the patch:

Erol thinks the above said makes sense and there is a chance of implementing it soon. While the developers are at work, please provide feedback if you think we didn’t interpret Google’s Webmaster Guidelines strict enough. Keep in mind that this is an interim solution and that the new version will handle things more standardized. Thanks.

Paid-Links-Disclosure: I do this pro bono job for the sake of the suffering store owners. Hence the links pointing to Erol and Erol’s customers are not nofollow’ed. Not that I’d nofollow them otherwise

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

3 comments Sebastian | Cloaking, JavaScript Redirects, Erol, E-Commerce, SEO, Google

More anchor text analysis from Webmaster Central

Posted on 18 April, 2007

If you didn’t spot my update posted a few hours ago, log in to Webmaster Central and view your anchor text stats. Find way more phrases and play with the variations, these should allow you to track down sources by quoted search queries. Also, the word-stats are back.
Have fun!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Anchor Text, Webmaster Central, SEO, Google

Revise your linkage now

Posted on 16 April, 2007

Google’s take on paid links obviously is there to stay, and frankly, old news were heating the minds last weekend. There’s no sign at the horizon that Google revises the policy, quite the opposite is true. Google has new algos in development which are supposed to detect all sorts of paid linkage better than before. For many that’s bad news, however I guess that rants are a waste of valuable time needed to revise actual and previous link attitudes.

I don’t think that Googlebot has learned to read text like “Buy a totally undetectable PR8 link now for as low as 2,000$ and get four PR6 links for free!” on images. Sites operating that obvious link selling businesses leave other -detectable- footprints, and their operators are aware of the risks involved. Buyers are rather save, they just waste a lot of money because purchased links most probably don’t carry link love and nofollow’ed links may be way cheaper.

I often stumble upon cases where old and forgotten links create issues over time. Things that have worked perfectly in the past can bury a site on todays SERPs. The crucial message is be careful who you link out to! Compile a list of all your outbound links and check them carefully before Google’s newest link analysis goes life. Especially look at the ancient stuff.

Update: I’ve just spotted Eric Wards article on forensic inbound link analysis, you should read that and his comments on outgoing links too.

Tags: Search Engine Optimization (SEO) Paid links are risky with Google

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Risky Linkage, Paid Links, SEO, Google

Is XML Sitemap Autodiscovery for Everyone?

Posted on 13 April, 2007

Referencing XML sitemaps in robots.txt was recently implemented by Google upon requests of webmasters going back to June, 2005, shortly after the initial launch of sitemaps. Yahoo, Microsoft, and Ask support it, whereby nobody knows when MSN is going to implement XML sitemaps at all.

Some folks argue that robots.txt introduced by the Robots Exclusion Protocol from 1994 should not get abused by inclusion mechanisms. Indeed this may create confusion, but it was done before, for example by search engines supporting Allow: statements introduced 1996. Also, the de facto Robots Exclusion Standard covers robots meta tags -where inclusion is the default- too. I think dogmatism is not helpful when the actual needs require evolvement.

So yes, the opportunity to address sitemaps in robots.txt is a good thing, but certainly not enough. It simplifies the process, that is auto detection of sitemaps eliminates a few points of failure. Webmasters don’t need to monitor which engines implemented the sitemaps protocol recently, and submit accordingly. They can just add a single line to their robots.txt file and the engines will do their job. Fire and forget is a good concept. However, the good news come with pitfalls.

But is this good thing actually good for everyone? Not really. Many publishers have no control over their server’s robots.txt file, for example publishers utilizing signup-and-instantly-start-blogging services or free hosts. As long as these platforms generate RSS feeds or other URL lists suitable as sitemaps, the publishers must submit to all search engines manually. Enhancing the sitemaps auto detection by looking at page meta data would be great: <meta name="sitemap" content="http://www.example.com/sitemap.xml" /> or <link rel="sitemap" type="application/rss+xml" href="http://www.example.com/sitefeed.rss" /> would suffice.

So far the explicit diaspora. Others are barred from sitemap autodiscovery by lack of experience, technical skills, or manageable environments like at way to restrictive hosting services. Example: the prerequisites for sitemap autodetection include the ability to fix canonical issues. An XML sitemap containing www.domain.tld-URLs referenced as Sitemap: http://www.domain.tld/sitemap.xml in http://domain.tld/robots.txt is plain invalid. Crawlers following links without the “www” subdomain will request the robots.txt file without the “www” prefix. If a webmaster running this flawed but very common setup relies on sitemap autodetection, s/he will miss out on feedback respectively error alerts. On some misconfigured servers this may even lead to deindexing of all pages with relative internal links.

Hence please listen to Vanessa Fox stating that webmasters shall register their autodiscovered sitemaps at Webmaster Central and Site Explorer to get alerted on errors which an XML sitemap validator cannot spot, and to monitor the crawling process!

I doubt many SEO professionals and highly skilled Webmasters managing complex sites will make use of that new feature. They prefer to have things under control, and automated 3rd party polls are hard to manipulate. Probably they want to maintain different sitemaps per engine to steer their crawling accordingly. Although this can be accomplished by user agent based delivery of robots.txt, that additional complexity doesn’t make the submission process easier to handle. Only uber-geeks automate everything

For example it makes no sense to present a gazillion of image- or video clip URLs to a search engine indexing textual contents only. Google handles different content types extremely simple for the site owner. One can put HTML pages, images, movies, PDFs, feeds, office documents and whatever else all in one sitemap and Google’s sophisticated crawling process delivers each URL to the indexer it belongs to. We don’t know (yet) how other engines will handle that.

Also, XML sitemaps are a neat instrument to improve crawling and indexing of particular contents. One search engine may nicely index insuffient linked stuff, whilst another engine fails to discover pages buried more than two link levels deep, badly needing the hints via sitemap. There are more good reasons to give each engine its own sitemap.

Last but not least there might be good reasons not to announce sitemap contents to the competition.

Tags: Search Engine Optimization (SEO) XML Sitemap Auto-discovery via robots.txt

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Crawler Directives, XML-Sitemaps, robots.txt, SEO

Better don’t run a web server under Windows

Posted on 11 April, 2007

IIS defaults can produce serious troubles with search engines. That’s a common problem and not even all .nhs.uk (UK Government National
Health Service) admins have spotted it. I’ve alerted the Whipps Cross University Hospital but can’t email all NHS sites suffering from IIS and lazy or uninformed webmasters. So here’s the fix:

Create a server without subdomain domain.nhs.uk, then go to the “Home Directory” tab and click the option “Redirection to a URL”. As “Redirect to” enter the destination, for example “http://www.domain.nhs.uk$S$Q”, without a slash after “.uk” because the path ($S placeholder) begins with a slash. The $Q placeholder represents the query string. Next check “Exact URL entered above” and “Permanent redirection for this resource”, and submit. Test the redirection with a suitable tool.

Now when a user enters a URL without the “www” prefix s/he gets the requested page from the canonical server name. Also search engine crawlers following non-canonical links like http://whippsx.nhs.uk/ will transmit the link love to the desired URL, and will index more pages instead of deleting them in their search indexes after a while because the server is not reachable. I’m not joking. Under some circumstances all or many www-URLs of pages referenced by relative links resolving to the non-existent server will get deleted in the search index after a couple of unsuccessfull attempts to fetch them without the www-prefix.

Hat tip to Robbo
Tags: Search Engine Optimization (SEO) IIS National Health Service (NHS) UK

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | MSN, IIS, Crap, SEO

Four reasons to get tanked on Google’s SERPs

Posted on 7 April, 2007

You know I find “My Google traffic dropped all of a sudden - I didn’t change anything - I did nothing wrong” threads fascinating. Especially posted with URLs on non-widgetized boards. Sometimes I submit an opinion, although the questioners usually don’t like my replies, but more often I just investigate the case for educational purposes.

Abstracting a fair amount of tanked sites I’d like to share a few of my head notes respectively theses as food for thoughts. I’ve tried to put these as generalized as possible, so please don’t blame me for the lack of a detailed explanation.

Reviews and descriptions ordered by product category, product line, or other groupings of similar products, tend to rephrase each other semantically, that is in form and content. Be careful when it comes to money making areas like travel or real estate. Stress unique selling points, non-shared attributes or utilizations, localize properly and make sure reviews respectively descriptions don’t get spread in-full internally on crawlable pages as well as externally.
Huge clusters of property/characteristic/feature lists under analogical headings, even unstructured, may raise a flag when the amount of applicable attributes is finite and values are rather similar with just few of them totally different respectively expressions of trite localization.
The lack of non-commercial outgoing links on pages plastered with ads of any kind, or pages at the very buttom of the internal linking hierarchy, may raise a flag. Nofollow’ing, redirecting or iFraming affiliate/commercial links doesn’t prevent from breeding artificial page profiles. Adding unrelated internal links to the navigation doesn’t help. Adding Wikipedia links in masses doesn’t help. Providing unique textual content and linking to authorities within the content does help.
Strong and steep hierarchical internal/navigational linkage without relevant crosslinks and topical between-the-levels linkage looks artificial, especially when the site in question lacks deep links. Look at the ratio of home page links vs. deep links to interior pages. Rethink the information architecture and structuring.

Take that as call for antithesis or just stuff for thoughts. And keep in mind that although there might be no recent structural/major/SEO/… on-site changes, perhaps Google just changed her judgement on the ancient stuff ranking forever, and/or has just changed the ability of existing inbound links to pass weight. Nothing’s set in stone. Not even rankings.

Tags: Search Engine Optimization (SEO) Google 30+ Penalty 950+ Penalty MSSA Penalties

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | SEO, Google

Link monkey business is not worth a whoop

Posted on 4 April, 2007

Old news, pros move on educating the great unlinked.

A tremendous amount of businesses maintaining a Web site still swap links in masses with every dog and his fleas. Serious sites join link exchange scams to gain links from every gambling spammer out there. Unscrupulous Web designers and low-life advisors put gazillions of businesses at risk. Eventually the site owners pop up in Google’s help forum wondering why the heck they lost their rankings despite their emboldening toolbar PageRank. Told to dump all their links pages and to file a reinclusion request they may do so, but cutting one’s loss short term is not the way the cookie crumbles with Google. Consequences of listening to bad SEO advice are often layoffs or even bust.

In this context a thread titled “Do the companies need to hire a SEO to get in top position?” asks the somewhat right question but may irritate site owners even more. Their amateurish Web designer offering SEO services obviously got their site banned or at least heavily penalized by Google. Asking for help in forums they get contradictory SEO advice. Google’s take on SEO firms is more or less a plain warning. Too many scams sailing under the SEO flag and it seems there’s no such thing as reliable SEO advice for free on the net.

However, the answer to the question is truly “yes“. It’s better to see a SEO before the rankings crash out. Unfortunately, SEO is not a yellow pages category, and every clown can offer crappy SEO services. Places like SEO Consultants and honest recommendations get you the top notch SEOs, but usually the small business owner can’t afford their services. Asking fellow online businesses for their SEO partner may lead to a scammer who is still beloved because Google has not yet spotted and delisted his work. Kinda dilemma, huh?

Possible loophole: once you’ve got a recommendation for a SEO skilled Webmaster or SEO expert from somebody attending a meeting at the local chamber of commerce, post that fellow’s site to the forums and ask for signs of destructive SEO work. Should give you an indication of trustworthiness.

Tags: Search Engine Optimization (SEO) Google Linkage Bans & Penalties

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Link Building, Risky Linkage, SEO, Uncategorized

« Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 Next Page »