New Google Dupe Filters?

Folks at WebmasterWorld, ThreadWatch and other hang-outs discuss a new duplicate content filter from Google. This odd thing seems to wipe out the SERPs, producing way more collateral damage than any other filter known to SEOs.

From what I’ve read, all threads contentrate on on-page and on-site factors trying to find a way out of Google’s trash can. I admit that on-page/site factors like near-duplicates produced with copy, paste and modify operations or excessive quoting can trigger duplicate content filters. But I don’t buy that’s the whole story.

If a fair amount of the vanished sites mentioned in the discussions are rather large, those sites probably are dedicated to popular themes. Popular themes are subject of many Web sites. The amount of unique information on popular topics isn’t infinite. That is, many Web sites provide the same piece of information. The wording may be different, but there are only so many ways to rewrite a press release. The core information is identical, making many pages considered near-duplicates, and inserting longer quotes even duplicates text snippets or blocks.

Semantic block analysis of Web pages is not a new thing. What if Google just bought a few clusters of new machines, now applying well known filters on a broader set of data? This would perfectly explain why a year ago four very similar pages all ranked fine, then three of four disappeard, and since yesterday all four are gone, because the page having the source bonus resides on a foreign Web site. To come to this conclusion, just expand the scope of the problem analysis to the whole Web. This makes sense, since Google says “Google’s mission is to organize the world’s information”.

Read more here: Thoughts on new Duplicate Content Issues with Google.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Yahoo! Site Explorer Finally Launched

Finally the Yahoo! Site Explorer (BETA) got launched. It’s a nice tool showing a site owner and the competitors all indexed pages per domain, and it offers subdomain filters. Inbound links get counted per page and per site. The tool provides links to the standard submit forms. Yahoo! accepts mass submissions of plain URL lists here.

The number of inbound links seems to be way more accurate than the guessings available from linkdomain: and link: searches. Unfortunately there is no simple way to exclude internal links. So if one wants to check only 3rd party inbounds, a painfull procedure begins:
1. Export of each result page to TSV files, that’s a tab delimited format, readable by Excel and other applications.
2. The export goes per SERP with a maximum of 50 URLs, so one must delete the two header lines per file and append file by file to produce one sheet.
3. Sorting the work sheet by the second column gives a list ordered by URL.
4. Deleting all URLs from the own site gives the list of 3rd party inbounds.
5. Wait for the bugfix “exported data of all result pages are equal” (each exported data set contains the first 50 results, regardless from which result page one clicks the export link).

The result pages provide assorted lists of all URLs known to Yahoo. The ordering does not represent the site’s logical structure (defined by linkage), not even the physical structure seems to be part of the sort order (that’s not exactly what I would call a “comprehensive site map”). It looks like the first results are ordered by popularity, followed by a more or less unordered list. The URL listings contain fully indexed pages, with known but not (yet) indexed URLs mixed in (e.g. pages with a robots “noindex” meta tag). The latter can be identified by the missing cached link.

Desired improvements:
1. A filter “with/without internal links”.
2. An export function outputting the data of all result pages to one single file.
3. A filter “with/without” known but not indexed URLs.
4. Optional structural ordering on the result pages.
5. Operators like filetype: and -site:domain.com.
6. Removal of the 1,000 results limit.
7. Revisiting of submitted URL lists a la Google sitemaps.

Overall, the site explorer is a great tool and an appreciated improvement, despite the wish list above. The most interesting part of the new toy is its API, which allows querying for up to 1,000 results (page data or link data) in batches of 50 to 100 results, returned in a simple XML format (max. 5,000 queries per IP address per day).

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google’s Master Plan

Alerted by Nick I had the chance to take a look at Google’s current master plan. Niall Kennedy “took some shots of the Google master plan. There is a long set of whiteboards next to the entrance to one of the Google buildings. The master plan is like a wiki: there is an eraser and a set of pens at the end of the board for people to edit and contribute to the writing on the wall.”

Interesting to see that “Directory” is not yet checked. Does this indicate that Google has plans to build its own? Unchecked items like “Diet”, “Mortgages” and “Real Estate” make me wonder what kind of services for those traditional spammy areas Google hides in its pipeline. The red dots or quotes cramping “Dating” may indicate that maps, talk, mail and search get a new consolidated user interface soon. The master plan also reveals that they’ve hired Vint Cerf to develop a world wide dark fiber/WiFi next generation web based on a redesigned TCP/IP and HTTP protocol.

Is all that beyond belief? Perhaps, perhaps not, but food for thoughts at any rate, if the shots are for real and not part of a funny disinformation campaign. Go study the plan and speculate yourself.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Search Engine Friendly Cloaking

Yesterday I had a discussion with a potential client who liked me to optimize the search engine crawler support on a fairly large dynamic Web site. A moment before he hitted submit on my order form, I stressed the point that his goals aren’t achievable without white hat cloaking. He is pretty much concerned about cloaking, and that’s understandable with regard to the engine’s webmaster guidelines and the cloaking hysteria across the white hat message boards.

To make a long story short, I’m a couple hours ahead of his local time and at 2:00am I wasn’t able to bring my point home. Probably I’ve lost the contract, what is not a bad thing, because obviously I’ve produced a communication problem resulting in lost confidence. To get the best out of it, after a short sleep I’ve written down what I should have told him.

Here is my tiny guide to search engine friendly cloaking. The article explains a search engine’s view on cloaking, provides evidence on tolerated cloaking, and gives some examples of white hat cloaking which is pretty much appreciated by the engines:

  • Truncating session IDs and similar variable/value pairs in query strings
  • Reducing the number of query string arguments
  • Stripping affiliate IDs and referrer identifiers
  • Preventing search engines from indexing duplicated content

I hope it’s a good read, and perhaps it helps me out next time I’ve to explain good cloaking.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google’s Blog Search Released

Spotted by SEW and TW, Google is the first major search engine providing a real feed and blog search service.

Google’s new feed search service covers all kinds of XML feeds, not only blogs, but usually no news feeds. So what can you do to get your non-blog and non-news feeds included? As discussed here, you need to ping services like pingomatic, since Google doesn’t offer a ping service.

‘Nuff said, I’m off to play with the new toy. Lets see whether I can feed it with a nice amount of neat stuff I’ve in the works waiting for the launch:)

[Update: This post appeared 14 minutes after uploading in Google’s blog search results - awesome!]

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

About Repetition in Web Site Navigation

Rustybrick runs a post on Secondary Navigation Links are Recommended, commenting a WMW thread titled Duplicate Navigation Links Downsides. While in the thread at WMW the main concern is content duplication (not penalized in navigation elements as Rustybrick and several contributors point out), the nuggets are provided by Search Engine Roundtable stating “having two of the same link, pointing to the same page, and if it is of use to the end user, will not hurt your rankings. In fact, they may help with getting your site indexed and ranking you higher (due to the anchor text)”. I think this statement is worth a few thoughts, because its underlying truth is more complex than it sounds at the first sight.

Thesis 1: Repeating the code of the topmost navigation at the page’s bottom is counter productive
Why? Every repetition of link blocks devalues their weight assigned by search engines. That goes for on-the-page duplication as well as for section-wide or especially site-wide repetition. One (or max. two) link(s) to upper levels is(are) enough, because providing too many off-topic-while-on-theme-links dilute the topical authority of the node and devaluate its linking power with regard to topic authority.
Solution: Make use of user friendly and search engine unfriendly menus at the top of the page, then put the vertical links leading to main sections and the root at the very bottom (a naturally cold zone with next to zero linking power). In the left- or right-handed navigation link to the next upper level, link the path to the root in breadcrumbs only.

Thesis 2: Passing PageRank™ works different from passing topical authority via anchor text
While every link (internal or external) passes PageRank™ (with duplicated links probably less than with unique links caused by a dampening factor), topical authority passed via anchor text is subject of a block specific weighting. As more a navigation element gets duplicated, as less topical reputation it will pass with its links. That means that anchor text in site-wide navigation elements and templated page areas is totally and utterly useless.
Solution: Use different anchor text in bread crumbs and menu items, and don’t repeat menus.

Summary:
1. All navigational links help with indexing, at least with crawling, but not all links help with ranking.
2. (Not too often) repeated links in navigation elements with different anchor text help with rankings.
3. Links in hot zones like bread crumbs at the top of a page as well as links within the body text perfectly boost SERP placements, because they pass topical reputation. Links in cold zones like in bottom lines or duplicated navigation elements are user friendly, but don’t boost SERP positionining that much, because their one and only effect is PageRank™ distribution to a pretty low degree.

Read more on this topic here.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Link Tutorial for Web Developers

I’ve just finished an article on hyperlinks, here is the first draft:
Anatomy and Deployment of Links

The targeted audience are developers and software architects, folks who usually aren’t that familiar with search engine optimizing and the usability aspects of linkage. Overview:

Defining Links, Natural Linking and Artificial Linkage
I’m starting with a definition of Link and its most important implementations as Natural Link and Artificial Link.

Components of a Link I. [HTML Element: A]
That’s the first anatomic chapter, a commented text- and image-link compendium explaining proper linking on syntax examples. Each attribute of the Anchor element is described along with usage tips and lists of valid values.

Components of a Link II. [HTML Element: LINK]
Based on the first anatomic part, here comes a syntax compendium of the LINK element, used in the HEAD section to define relationships, assign stylesheets, enhance navigation etc.

Web Site Structuring
Since links connect structural elements of a Web site, it makes sense to have a well thought out structure. I’m discussing poor and geeky structures which confuse the user, followed by the introduction of universal nodes and topical connectors, which solve a lot of weaknesses when it comes to topical interlinking of related pages. I’ve tried to popularize the parts on object modeling, thus OOAD purists will probably hit me hard on this piece, while (hopefully) Webmasters can follow my thoughts with ease. This chapter closes the structural part with a description of internal authority hubs.

A Universal Node’s Anchors and their Link Attributes
Based on the structural part, I’m discussing the universal node’s attributes like its primary URI, anchor text and tooltip. The definition of topical anchors is followed by tips on identifying and using alternate anchors, titles, descriptions etc. in various inbound and outbound links.

Linking is All About Popularity and Authority
Well, it should read ‘linking is all about traffic’, but learning more about the backgrounds of natural linkage helps to understand the power and underlying messages of links, which produce indirect traffic. Well linked and outstanding authority sites will become popular by word of mouth. The search engines will follow their users’ votes intuitionally, generating loads of targeted traffic.

Optimizing Web Site Navigation
This chapter is not so much focused on usability, instead I discuss a search engine’s view on site wide navigation elements and tell how to optimize those for the engines. To avoid repetition, I’m referring to my guide on crawler support and other related articles, so this chapter is not a guide on Web site navigation at all.

Search Engine Friendly Click Tracking
Traffic monitoring and traffic management influences a site’s linkage, often to the worst. Counting outgoing traffic per link works quite fine without redirecting scripts, which cause all kind of troubles with search engines and some user agents. I’m outlining an alternative method to track clicks, ready to use source code included.

I’ve got a few notes on the topic left behind, so most probably I’ll add more stuff soon. I hope it’s a good reading, and helpful. Your feedback is very much appreciated:)

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Blogging is not a crime!

Blogger Aaron Wall sued over comments on his blog

The short story is this: Aaron Wall is being sued over comments left on his blog by his readers about a notoriously unsavoury company called Traffic Power, or 1p. Within the Search industry, these people are regarded as the lowest of the low, and if you dig through some of those Search results, or the links at the bottom of this post, you’ll find all the gory details. Suffice to say, they are considered theives and villains by the overwhelming majority of the Search Marketing community.

We don’t think it’s right, do you?

We feel this has to end here. There is far more at stake than a scummy company vs a blogger - this is about free speech on blogs, and the right for users to comment, without blog publishers having to fear lawsuits.

So, What Can YOU Do?

See the graphic at the right below the links? It links to the donate to Aarons legal costs post. You can start by giving him a few $$$’s to fight these people effectively.

Help Promote the Blogging is NOT a Crime Campaign

By using one of these lovely graphical banners on your blog, forum or website, you will help spread the word, and raise more cash - enabling better lawyers and legal council. Simply pick one that fits your site, and link it to the donate post.

Simple eh? Don’t you feel GOOD now?

Resources

There is much to the history of TP/1p, and you can find a lot of it in the Google searches linked above, but recently, these are the more notable posts and discussions on the subject if you’d like more in-depth information to quote and link to.

Please redistribute this ThreadWatch post if you wish.
And please put one of the banners on your blog to help spread the word!

Thank you for your support of free speech on blogs and elsewhere on the net!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Awesome: Ms. Googlebot Provides Reports

Before Yahoo’s Site Explorer goes live, Google provides advanced statistics in the sitemap program. Ms. Googlebot now tells the webmaster which spider food she refused to eat, and why. The ‘lack of detailed stats’ has produced hundreds of confused posts in the Google Sitemaps Group so far.

As Google Sitemaps was announced at June/02/2005, Shiva Shivakumar statedWe are starting with some basic reporting, showing the last time you’ve submitted a Sitemap and when we last fetched it. We hope to enhance reporting over time, as we understand what the webmasters will benefit from“. Google’s Sitemaps team closely monitored the issues and questions brought up by the webmasters, and since August/30/2005 there are enhanced stats. Here is how it works.

Google’s crawler reports provide information on URIs spidered from sitemaps and URIs found during regular crawls by following links, regardless whether the URI is listed in a sitemap or not. Ms. Googlebot’s error reports are accessible for a site’s webmasters only, after a more or less painless verification of ownership. They contain all sorts of errors, for example dead links, conflicts with exclusions in the robots.txt file and even connectivity problems.

Google’s crawler report is a great tool, kudos to the sitemaps team!

More good news from the Sitemaps Blog:
Separate sitemaps for mobile content to enhance a site’s visibility in Google’s mobile search.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Serious Disadvantages of Selling Links

There is a pretty interesting discussion going on search engine spam at O’Reilly Radar. This topic is somewhat misleading, the subject is passing PageRank™ by paid ads on popular sites. Read the whole thread, lots of sound folks express their valuable and often fascinating opinions.

My personal statement is a plain “Don’t sell links for passing PageRank™. Never. Period.”, but the intention of ad space purchases isn’t always that clear. If an ad isn’t related to my content, I tend to put client sided affiliate links on my sites, because search engine spiders didn’t follow them for a long time. Well, it’s not that easy any more.

However, Matt Cutts ‘revealed’ an interesting fact in the thread linked above. Google indeed applies no-follow-logic to Web sites selling (at least unrelated) ads:

… [Since September 2003] …parts of perl.com, xml.com, etc. have not been trusted in terms of linkage … . Remember that just because a site shows up for a “link:” command on Google does not mean that it passes PageRank, reputation, or anchortext.

This policy wasn’t really a secret before Matt’s post, because a critical mass of high PR links not passing PR do draw a sharp picture. What many site owners selling links in ads have obviously never considered, is the collateral damage with regard to on site optimization. If Google distrusts a site’s linkage, outbound and internal links have no power. That is the optimization efforts on navigational links, article interlinking etc. are pretty much useless on a site selling links. Internal links not passing relevancy via anchor text is probably worse than the PR loss, because clever SEOs always acquire deep inbound links.

Rescue strategy:

1. Implement the change recommended by Matt Cutts:

Google’s view on this is … selling links muddies the quality of the web and makes it harder for many search engines (not just Google) to return relevant results. The rel=nofollow attribute is the correct answer: any site can sell links, but a search engine will be able to tell that the source site is not vouching for the destination page.

2. Write Google (possibly cc spam report and reinclusion request) that you’ve changed the linkage of your ads.

3. Hope and pray, on failure goto 2.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28  Next Page »