Please don’t run your counter on my servers

DO NOT HOTLINKI deeply understand that sharing other peoples resources makes sense sometimes. I just ask you to rethink your technical approach. Running your page view stats on my server comes with a serious disadvantage: my server logs and referrer reports are protected, hence you can’t read your stats. Rest assured I’m really not eager to know who views your pages.

So please: when you copy my HTML code, be so kind and steal the invisible 1×1px images too. It’s really not that hard to upload them to your server and edit my HTML in a way that your visitors’ user agents request these images from your server.

Signing up at a free counter service not adding hidden links to all your pages gives less hassles than my reaction when I get annoyed.

Disclaimer: I don’t like it when you steal my code coz for some reasons it’s often crappy enough to break your layout. Also copying code without permission is as bad as content theft. So don’t copy, but feel free to ask.

Go to HTML Basix to figure out how you can block hotlinking with .htaccess:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\.)?sebastianx.blogspot.com(/)?.*$ [NC]
RewriteRule .*\.(gif|jpg|jpeg|bmp|png)$ http://www.smart-it-consulting.com/img/misc/do-not-hotlink-beauty.jpg [R,NC]

But please don’t steal or hotlink the offensive blonde beauty ;)



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Ultimately: Watch out for Google’s URL terminator

I hate it to recycle news, but I just fell in love with this neat URL terminator. Unfortunately there’s no button to remove SPAM, so I still have to outrank competitors, but besides that ‘flaw’ it’s a perfect and user friendly tool covering all my needs. Thanks!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Where is the precise definition of a paid link?

Good questions:

How many consultants provide links through to the companies they work for?
I do.

How many software firms provide links through to their major corporate clients?
Not my company. Never going to happen.

If you make a donation to someone, and they decide to give you a link back, is that a paid link?
Nope.

If you are a consultant, and are paid to analyse a company, but to make the findings known publicly, are you supposed to stick nofollow on all the links?
Nope.

If you are a VC or Angel investor, should you have to use NoFollow linking through to companies in your investment portfolio?
Nope.

Are developers working on an open-source project allowed a link back to their sites (cough Wordpress) Yep, and then use that link equity to dominate search engines on whatever topic they please?
Hmmmm, if it really works that way, why not?

If you are a blog network, or large internet content producer, is it gaming Google to have links to your sister sites, whether there is a direct financial connection or not?
Makes business sense, so why should those links get condomized? Probably a question of quantity. No visitor would follow a gazillion of links to blogs handling all sorts of topis the yellow pages have categories for.

Should a not for profit organisation link through to their paid members with a live link?
Sure, perfectly discloses relationships and their character.

A large number of Wordpress developers have paid links on their personal sites, as do theme and plugin developers.
What’s wrong with that? Maybe questionable (in the sense of useless) on every page, but perfectly valid on home page, about page and so on if disclosed. As for ads, that sort of paid links is valid on every page - nofollow’ing ads just avoids misunderstandings.

If you write a blog post, thanking your sponsors, should you use nofollow?
Yep.

Some people give away prizes for links, or offer some kind of reciprocation.
If the awards are honest and truly editorial, linking back is just good practice.

If you are an expert in a particular field, and someone asks you to write a review of their site, and the type of review you write means that writing that content might take 10 hours of your time to do due diligence, is it wrong to accept some kind of monetary contribution? Just time and material?
In such a situation, why would you be forced to use nofollow on all links to the site being reviewed?
Disclosing the received expense allowance there’s nothing wrong with uncondomized links.

Imagine someone created a commercial Wikipedia, and paid $5 for every link made to it.
Don’t link. The link would be worth more than five bucks and the risks involved can cost way more than five bucks.

Where is the precise definition of a paid link?
Now that’s the best question at all!

Disclaimer: Yes/No answers are kinda worthless without a precisely defined context. Thus please read the comments.

Related thoughts: Should Paid Links Influence Organic Rankings? by Mark Jackson at SEW
Paid Link Schemes Inside Original Content by Brian White, also read Matt’s updated post on paid links.

Update: Google’s definition of paid links and other disliked linkage considered “linkspam”



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Revise your linkage now

Google’s take on paid links obviously is there to stay, and frankly, old news were heating the minds last weekend. There’s no sign at the horizon that Google revises the policy, quite the opposite is true. Google has new algos in development which are supposed to detect all sorts of paid linkage better than before. For many that’s bad news, however I guess that rants are a waste of valuable time needed to revise actual and previous link attitudes.

I don’t think that Googlebot has learned to read text like “Buy a totally undetectable PR8 link now for as low as 2,000$ and get four PR6 links for free!” on images. Sites operating that obvious link selling businesses leave other –detectable– footprints, and their operators are aware of the risks involved. Buyers are rather save, they just waste a lot of money because purchased links most probably don’t carry link love and nofollow’ed links may be way cheaper.

I often stumble upon cases where old and forgotten links create issues over time. Things that have worked perfectly in the past can bury a site on todays SERPs. The crucial message is be careful who you link out to! Compile a list of all your outbound links and check them carefully before Google’s newest link analysis goes life. Especially look at the ancient stuff.

Update: I’ve just spotted Eric Wards article on forensic inbound link analysis, you should read that and his comments on outgoing links too.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Full Disclosure of Paid Links

Since I’ve moved the blog from blogspot.com to this place, this post makes not much sense anymore. I’ve dispensed all the stuff from my old and ugly sidebar at blogspot over a couple pages here, not much of it is still on the sidebar.

Following Matt’s advice on paid links I’ve looked at this blog to reveal sneaky commercial links, although nobody really likes this idea.

I’m pretty sure that I never got paid for posting, so there was just the sidebar to check. I found a couple of links leading to articles I wrote with both educational and commercial intent as well. I consider these valuable resources so there’s no need to report or nofollow them.

Next in the “What I read” section I didn’t find a suitable procedure to report that “Books, tons of books” includes commercial stuff like database manuals and other publications with a clearly commercial message. I paid for all these books … sigh.

Ok, next the blogroll. Again, all links point to good resources, nothing to report. Under the search box there’s a link to Technorati which I can’t nofollow because it’s put by Technorati’s script. Technorati sends me traffic, I use Technorati for research, so probably this link is fine and counts as honest recommendation, although it functions as a traffic deal too.

Checking the “Links and Folks” section I found a not that related link pointing to bikes for sale at OCC. Well, I really like OCC bikes, and this is my personal blog, so why shouldn’t I link out to a resource unrelated to search and Web development? Hmmmm … perhaps I should ask Google for permission to dofollow this somewhat commercial link before I receive a free bike in return.

Next the “Ads by Google” links are fine, because they’re put client sided and even the Googlebot executing JavaScript knows that everything in a block of code containing an AdSense publisher code is auto-nofollow’ed by definition.

Both the MBL widget as well as the Twitter badge are put client sided, plus both were free of commercial links, at least last time I looked. End of sidebar, I didn’t find serious fodder for a paid link report, could that be true?

Wait … I missed the header, and luckily there’s a big fat paid link:

With this link I pay Google for Blogger’s services and hosting, and it is not nofollow’ed. Dammit, I can’t nofollow it myself, so here’s my paid links spam report:

paidlinks spam report

Ok, seriously I think that Google can discount commercial links because that’s how Google’s cookie crumbles. And I perfectly understand that Matt asks for a few samples of paid links Google has not yet discovered to fine tune Google’s algos. However, I fear that this call for paid-links-spam-reports will result in massive abuse of the form I use to report webspam that really annoys me because it disturbs my search results. I’m happy that it’s pretty easy to filter out abusive reports filed to damage a competitor’s rankings marked with “paidlinks” once Matt’s team has collected enough examples.

Tags: ()

Update: Read Rae Hoffman’s full disclosure too!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Is XML Sitemap Autodiscovery for Everyone?

Referencing XML sitemaps in robots.txt was recently implemented by Google upon requests of webmasters going back to June, 2005, shortly after the initial launch of sitemaps. Yahoo, Microsoft, and Ask support it, whereby nobody knows when MSN is going to implement XML sitemaps at all.

Some folks argue that robots.txt introduced by the Robots Exclusion Protocol from 1994 should not get abused by inclusion mechanisms. Indeed this may create confusion, but it was done before, for example by search engines supporting Allow: statements introduced 1996. Also, the de facto Robots Exclusion Standard covers robots meta tags –where inclusion is the default– too. I think dogmatism is not helpful when the actual needs require evolvement.

So yes, the opportunity to address sitemaps in robots.txt is a good thing, but certainly not enough. It simplifies the process, that is auto detection of sitemaps eliminates a few points of failure. Webmasters don’t need to monitor which engines implemented the sitemaps protocol recently, and submit accordingly. They can just add a single line to their robots.txt file and the engines will do their job. Fire and forget is a good concept. However, the good news come with pitfalls.

But is this good thing actually good for everyone? Not really. Many publishers have no control over their server’s robots.txt file, for example publishers utilizing signup-and-instantly-start-blogging services or free hosts. As long as these platforms generate RSS feeds or other URL lists suitable as sitemaps, the publishers must submit to all search engines manually. Enhancing the sitemaps auto detection by looking at page meta data would be great: <meta name="sitemap" content="http://www.example.com/sitemap.xml" /> or <link rel="sitemap" type="application/rss+xml" href="http://www.example.com/sitefeed.rss" /> would suffice.

So far the explicit diaspora. Others are barred from sitemap autodiscovery by lack of experience, technical skills, or manageable environments like at way to restrictive hosting services. Example: the prerequisites for sitemap autodetection include the ability to fix canonical issues. An XML sitemap containing www.domain.tld-URLs referenced as Sitemap: http://www.domain.tld/sitemap.xml in http://domain.tld/robots.txt is plain invalid. Crawlers following links without the “www” subdomain will request the robots.txt file without the “www” prefix. If a webmaster running this flawed but very common setup relies on sitemap autodetection, s/he will miss out on feedback respectively error alerts. On some misconfigured servers this may even lead to deindexing of all pages with relative internal links.

Hence please listen to Vanessa Fox stating that webmasters shall register their autodiscovered sitemaps at Webmaster Central and Site Explorer to get alerted on errors which an XML sitemap validator cannot spot, and to monitor the crawling process!

I doubt many SEO professionals and highly skilled Webmasters managing complex sites will make use of that new feature. They prefer to have things under control, and automated 3rd party polls are hard to manipulate. Probably they want to maintain different sitemaps per engine to steer their crawling accordingly. Although this can be accomplished by user agent based delivery of robots.txt, that additional complexity doesn’t make the submission process easier to handle. Only uber-geeks automate everything ;)

For example it makes no sense to present a gazillion of image- or video clip URLs to a search engine indexing textual contents only. Google handles different content types extremely simple for the site owner. One can put HTML pages, images, movies, PDFs, feeds, office documents and whatever else all in one sitemap and Google’s sophisticated crawling process delivers each URL to the indexer it belongs to. We don’t know (yet) how other engines will handle that.

Also, XML sitemaps are a neat instrument to improve crawling and indexing of particular contents. One search engine may nicely index insuffient linked stuff, whilst another engine fails to discover pages buried more than two link levels deep, badly needing the hints via sitemap. There are more good reasons to give each engine its own sitemap.

Last but not least there might be good reasons not to announce sitemap contents to the competition.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

In need of a "Web-Robot Directives Standard"

The Robots Exclusion Protocol from 1994 gets used and abused, best described by Lisa Barone citing Dan Crow from Google: “everyone uses it but everyone uses a different version of it”. De facto we’ve a Robots Exclusion Standard covering crawler directives in robots.txt and robots meta tags as well, said Dan Crow. Besides non-standardized directives like “Allow:”, Google’s Sitemaps Protocol adds more inclusion to the mix, now even closely bundled with robots.txt. There are more ways to put crawler directives. Unstructured (in the sense of independence from markup elements) like with Google’s section targeting, on link level applying the commonly disliked rel-nofollow microformat or XFN, and related thoughts on block level directives.

All in all that’s a pretty confusing conglomerate of inclusion and exclusion, utilizing many formats respectively markup elements, and lots of places to put crawler directives. Not really the sort of norm the webmaster community can successfully work with. No wonder that over 75,000 robots.txt files have pictures in them, that less than 35 percent of servers have a robots.txt file, that the average robots.txt file is 23 characters (”User-agent: * Disallow:”), gazillions of Web pages carry useless and unsupported meta tags like “revisit-after” … for more funny stats and valuable information see Lisa’s robots.txt summit coverage (SES NY 2007), also covered by Tamar (read both!).

How to structure a “Web-Robot Directives Standard”?

To handle redundancies as well as cascading directives properly, we need a clear and understandable chain of command. The following is just a first idea off the top of my head, and likely gets updated soon:

  • Robots.txt

    1. Disallows directories, files/file types, and URI fragments like query string variables/values by user agent.
    2. Allows sub-directories, file names and URI fragments to refine Disallow statements.
    3. Gives general directives like crawl-frequency or volume per day and maybe even folders, and restricts crawling in particular time frames.
    4. References general XML sitemaps accessible to all user agents, and specific XML sitemaps addressing particular user agents as well.
    5. Sets site-level directives like “noodp” or “noydir”.
    6. Predefines page-level instructions like “nofollow”, “nosnippet” or “noarchive” by directory, document type or URL fragments.
    7. Predefines block-level respectively element-level conditions like “noindex” or “nofollow” on class names or DOM-IDs by markup element. For example “DIV.hMenu,TD#bNav ‘noindex,nofollow’” could instruct crawlers to ignore the horizontal menu as well as navigation at the very bottom on all pages.
    8. Predefines attribute-level conditions like “nofollow” on A elements. For example “A.advertising REL ‘nofollow’” could tell crawlers to ignore links in ads, or “P#tos > A ‘nofollow’” could instruct spiders to ignore links in TOS excerpts found on every page in a P element with the DOM-ID “tos”.
    • XML Sitemaps

      1. Since robots.txt deals with inclusion now, why not add an optional URL specific “action” element allowing directives like “nocache” or “nofollow”? Also a “delete” directive to get outdated pages removed from search indexes would make sound sense.
      2. To make XML sitemap data reusable, and to allow centralized maintenance of page meta data, a couple of new optional URL elements like “title”, “description”, “document type”, “language”, “charset”, “parent” and so on would be a neat addition. This way it would be possible to visualize XML sitemaps as native (and even hierarchical) site maps.

      Robots.txt exclusions overrule URLs listed for inclusion in XML sitemaps.

      • Meta Tags

      • Page meta data overrule directives and information provided in robots.txt and XML sitemaps. Empty contents in meta tags suppress directives and values given in upper levels. Non-existent meta tags implicitly apply data and instructions from upper levels. The same goes for everything below.
        • Body Sections

        • Unstructured parenthesizing of parts of code certainly is undoable with XMLish documents, but may be a pragmatic procedure to deal with legacy code. Paydirt in HTML comments may be allowed to mark payload for contextual advertising purposes, but it’s hard to standardize. Lets leave that for proprietary usage.
          • Body Elements

          • Implementing a new attribute for messages to machines should be avoided for several good reasons. Classes are additive, so multiple values can be specified for most elements. That would allow to put standarized directives as class names, for example class=”menu robots-noindex googlebot-nofollow slurp-index-follow” where the first class addresses CSS. Such inline robot directives come with the same disadvantages as inline style assignments and open a can of worms so to say. Using classes and DOM-IDs just as a reference to user agent specific instructions given in robots.txt is surely the preferable procedure.
            • Element Attributes

            • More or less this level is a playground for microformats utilizing the A element’s REV and REL attributes.

Besides the common values “nofollow”, “noindex”, “noarchive”/”nocache” etc. and their omissible positive defaults “follow” and “index” etc., we’d need a couple more, for example “unapproved”, “untrusted”, “ignore” or “skip” and so on. There’s a lot of work to do.

In terms of of complexity, a mechanism as outlined above should be as easy to use as CSS in combination with client sided scripting for visualization purposes.

However, whatever better ideas are out there, we need a widely accepted “Web-Robot Directives Standard” as soon as possible.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

XML sitemap auto-discovery

Vanessa Fox makes the news: In addition to sitemap submissions you can add this line to your robots.txt file:

Sitemap: http://www.example.com/sitemap.xml

Google, Yahoo, MSN and Ask (new on board) then should fetch and parse the XML sitemap automatically. Next week or so the cool robots.txt validator will get an update too.

Question: Is XML Sitemap Autodiscovery for Everyone?

Update:
More info here.

Q: Does it work by user-agent?
A: Yes, add all sitemaps to robots.txt, then disallow by engine.

Q: Must I fix canonical issues before I use sitemap autodiscovery?
A: Yes, 301-redirect everything to your canonical server name, and choose a preferred domain at Webmaster Central.

Q: Can I submit all supported sitemap formats via robots.txt?
A: Yes, everything goes. XML, RSS, ATOM, plain text, Gzipped …

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Better don’t run a web server under Windows

IIS defaults can produce serious troubles with search engines. That’s a common problem and not even all .nhs.uk (UK Government National
Health Service) admins have spotted it. I’ve alerted the Whipps Cross University Hospital but can’t email all NHS sites suffering from IIS and lazy or uninformed webmasters. So here’s the fix:

Create a server without subdomain domain.nhs.uk, then go to the “Home Directory” tab and click the option “Redirection to a URL”. As “Redirect to” enter the destination, for example “http://www.domain.nhs.uk$S$Q”, without a slash after “.uk” because the path ($S placeholder) begins with a slash. The $Q placeholder represents the query string. Next check “Exact URL entered above” and “Permanent redirection for this resource”, and submit. Test the redirection with a suitable tool.

Now when a user enters a URL without the “www” prefix s/he gets the requested page from the canonical server name. Also search engine crawlers following non-canonical links like http://whippsx.nhs.uk/ will transmit the link love to the desired URL, and will index more pages instead of deleting them in their search indexes after a while because the server is not reachable. I’m not joking. Under some circumstances all or many www-URLs of pages referenced by relative links resolving to the non-existent server will get deleted in the search index after a couple of unsuccessfull attempts to fetch them without the www-prefix.

Hat tip to Robbo
Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Yahoo Pipes jeopardizes the integrity of the Internet

Update: This post, initially titled “No more nofollow-insane at Google Reader”, then updated as “(No) more nofollow-insane at Google Reader”, accused Google Reader of inserting nofollow crap. I apologize for my lazy and faulty bug report. Read the comments.

I fell in love with Yahoo pipes because that tool allowed me to funnel the tidbits contained in a shitload of noise into a more or less clear signal. Instead of checking hundreds of blog feeds, search query feeds and whatever else, I was able to feed my preferred reader with actual payload extracted from vast loads of paydirt digged from lots of sources.

Now that I’ve learned that Yahoo pipes is evil I guess I must code the filters myself. Nofollow insane is not acceptable. Nofollow madness jeopardizes the integrity of the Internet which is based on free linkage. I don’t need no stinking link condoms sneakily forced by nice looking tools utilizing nifty round corners. I’ll be way happier with a crappy and uncomfortable PHP hack feeded with OPML files and conditions pulled from a manually edited MySQL table.

Here is the evidence right from the Yahoo pipe output:
Also, abusing my links with target=”_blank” is not nice.


Initial post and its first update:

I’m glad Google has removed the auto-nofollow on links in blog posts. When I add a feed I trust its linkage and I don’t need no stinking condoms on pages nobody except me can see unless I share them. Thanks!

Update - Nick Baum, can you hear me?

It seems the nofollow-madness is not yet completely buried. Here is a post of mine and what Google Reader shows me when I add my blog’s feed:
Click to enlarge
And here is the same post filtered thru a Yahoo pipe:
Click to enlarge
So please tell me: why does Google auto-nofollow a link to Vanessa Fox when she gets linked via Yahoo, and uncondomizes the link from Google’s very own blogspot dot dom? Curious …



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »