Archived posts from the 'SEO' Category

Link Tutorial for Web Developers

I’ve just finished an article on hyperlinks, here is the first draft:
Anatomy and Deployment of Links

The targeted audience are developers and software architects, folks who usually aren’t that familiar with search engine optimizing and the usability aspects of linkage. Overview:

Defining Links, Natural Linking and Artificial Linkage
I’m starting with a definition of Link and its most important implementations as Natural Link and Artificial Link.

Components of a Link I. [HTML Element: A]
That’s the first anatomic chapter, a commented text- and image-link compendium explaining proper linking on syntax examples. Each attribute of the Anchor element is described along with usage tips and lists of valid values.

Components of a Link II. [HTML Element: LINK]
Based on the first anatomic part, here comes a syntax compendium of the LINK element, used in the HEAD section to define relationships, assign stylesheets, enhance navigation etc.

Web Site Structuring
Since links connect structural elements of a Web site, it makes sense to have a well thought out structure. I’m discussing poor and geeky structures which confuse the user, followed by the introduction of universal nodes and topical connectors, which solve a lot of weaknesses when it comes to topical interlinking of related pages. I’ve tried to popularize the parts on object modeling, thus OOAD purists will probably hit me hard on this piece, while (hopefully) Webmasters can follow my thoughts with ease. This chapter closes the structural part with a description of internal authority hubs.

A Universal Node’s Anchors and their Link Attributes
Based on the structural part, I’m discussing the universal node’s attributes like its primary URI, anchor text and tooltip. The definition of topical anchors is followed by tips on identifying and using alternate anchors, titles, descriptions etc. in various inbound and outbound links.

Linking is All About Popularity and Authority
Well, it should read ‘linking is all about traffic’, but learning more about the backgrounds of natural linkage helps to understand the power and underlying messages of links, which produce indirect traffic. Well linked and outstanding authority sites will become popular by word of mouth. The search engines will follow their users’ votes intuitionally, generating loads of targeted traffic.

Optimizing Web Site Navigation
This chapter is not so much focused on usability, instead I discuss a search engine’s view on site wide navigation elements and tell how to optimize those for the engines. To avoid repetition, I’m referring to my guide on crawler support and other related articles, so this chapter is not a guide on Web site navigation at all.

Search Engine Friendly Click Tracking
Traffic monitoring and traffic management influences a site’s linkage, often to the worst. Counting outgoing traffic per link works quite fine without redirecting scripts, which cause all kind of troubles with search engines and some user agents. I’m outlining an alternative method to track clicks, ready to use source code included.

I’ve got a few notes on the topic left behind, so most probably I’ll add more stuff soon. I hope it’s a good reading, and helpful. Your feedback is very much appreciated:)

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Awesome: Ms. Googlebot Provides Reports

Before Yahoo’s Site Explorer goes live, Google provides advanced statistics in the sitemap program. Ms. Googlebot now tells the webmaster which spider food she refused to eat, and why. The ‘lack of detailed stats’ has produced hundreds of confused posts in the Google Sitemaps Group so far.

As Google Sitemaps was announced at June/02/2005, Shiva Shivakumar statedWe are starting with some basic reporting, showing the last time you’ve submitted a Sitemap and when we last fetched it. We hope to enhance reporting over time, as we understand what the webmasters will benefit from“. Google’s Sitemaps team closely monitored the issues and questions brought up by the webmasters, and since August/30/2005 there are enhanced stats. Here is how it works.

Google’s crawler reports provide information on URIs spidered from sitemaps and URIs found during regular crawls by following links, regardless whether the URI is listed in a sitemap or not. Ms. Googlebot’s error reports are accessible for a site’s webmasters only, after a more or less painless verification of ownership. They contain all sorts of errors, for example dead links, conflicts with exclusions in the robots.txt file and even connectivity problems.

Google’s crawler report is a great tool, kudos to the sitemaps team!

More good news from the Sitemaps Blog:
Separate sitemaps for mobile content to enhance a site’s visibility in Google’s mobile search.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Serious Disadvantages of Selling Links

There is a pretty interesting discussion going on search engine spam at O’Reilly Radar. This topic is somewhat misleading, the subject is passing PageRank™ by paid ads on popular sites. Read the whole thread, lots of sound folks express their valuable and often fascinating opinions.

My personal statement is a plain “Don’t sell links for passing PageRank™. Never. Period.”, but the intention of ad space purchases isn’t always that clear. If an ad isn’t related to my content, I tend to put client sided affiliate links on my sites, because search engine spiders didn’t follow them for a long time. Well, it’s not that easy any more.

However, Matt Cutts ‘revealed’ an interesting fact in the thread linked above. Google indeed applies no-follow-logic to Web sites selling (at least unrelated) ads:

… [Since September 2003] …parts of perl.com, xml.com, etc. have not been trusted in terms of linkage … . Remember that just because a site shows up for a “link:” command on Google does not mean that it passes PageRank, reputation, or anchortext.

This policy wasn’t really a secret before Matt’s post, because a critical mass of high PR links not passing PR do draw a sharp picture. What many site owners selling links in ads have obviously never considered, is the collateral damage with regard to on site optimization. If Google distrusts a site’s linkage, outbound and internal links have no power. That is the optimization efforts on navigational links, article interlinking etc. are pretty much useless on a site selling links. Internal links not passing relevancy via anchor text is probably worse than the PR loss, because clever SEOs always acquire deep inbound links.

Rescue strategy:

1. Implement the change recommended by Matt Cutts:

Google’s view on this is … selling links muddies the quality of the web and makes it harder for many search engines (not just Google) to return relevant results. The rel=nofollow attribute is the correct answer: any site can sell links, but a search engine will be able to tell that the source site is not vouching for the destination page.

2. Write Google (possibly cc spam report and reinclusion request) that you’ve changed the linkage of your ads.

3. Hope and pray, on failure goto 2.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Overlooked Duplicated Content Vanishing from Google’s Index

Does Google systematically wipe out duplicated content? If so, does it affect partial dupes too? Will Google apply site-wide ’scraper penalties’ when a particular dupe-threshold gets reached or exceeded?

Following many ‘vanished page posts’ with links on message boards and usenet groups, and monitoring sites I control, I’ve found that indeed there is kinda pattern. It seems that Google is actively wiping dupes out. Those get deleted or stay indexed as ‘URL only’, not moved to the supplemental index.

Example: I have a script listing all sorts of widgets pulled from a database, where users can choose how many items they want to see per page (values for #of widgets/page are hard coded and all linked), combined with prev¦next-page links. This kind of dynamic navigation produces tons of partial dupes (content overlaps with other versions of the same page). Google has indexed way too many permutations of that poorly coded page, and foolishly I didn’t take care of it. Recently I got alerted as Googlebot-Mozilla requested hundreds of versions of this page within a few hours. I’ve quickly changed the script, putting a robots NOINDEX meta tag when the content overlaps, but probably too late. Many of the formerly indexed (cached, appearing with title and snippets on the SERPs) URLs have vanished, respectively became URL-only listings. I expect that I’ll lose a lot of ‘unique’ listings too, because I’ve changed the script in the middle of the crawl.

I’m posting this before I’ve solid data to backup a finding, because it is a pretty common scenario. This kind of navigation is used at online shops, article sites, forums, SERPs … and it applies to aggregated syndicated content too.

I’ve asked Google whether they have a particular recommendation, but no answer yet. Here is my ‘fix’:

Define a straight path thru the dynamic content, where not a single displayed entry overlaps with another page. For example if your default value for items per page is 10, the straight path would be:
start=1&items=10
start=11&items=10
start=21&items=10

Then check the query string before you output the page. If it is part of the straight path, put a INDEX,FOLLOW robots meta tag, otherwise (e.g. start=16&items=15) put NOINDEX.

I don’t know whether this method can help with shops using descriptions pulled from a vendor’s data feed, but I doubt it. If Google can determine and suppress partial dupes within a site, it can do that with text snippets from other sites too. One question remains: how does Google identify the source?

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Fresh Content is King

Old news:
A bunch of unique content and a high update frequency increases search engine traffic.
Quite new:
Leading crawlers to fresh content becomes super important.
Future news:
Dynamic Web sites optimized to ping SE crawlers outrank established sites across the boards.

Established methods and tools to support search engine crawlers are clever internal linkage, sitemap networks, ‘What’s new’ pages, inbound links from high ranked and often changed pages etc etc. To a limited degree they still lead crawlers to fresh and not yet spidered old content. Time to crawl and time to index are dissatisfying, because the whole system is based on pulling and depends on the search engine backend system’s ability to guess.

Look back and forth at Google: Google News, Froogle, Sitemaps and rumors on blogsearch indicate a change from progressive pulling of mass data to proactive and event driven picking of fewer fresh data. Google will never stop crawling based on guessing, but has learned how to localize fresh content in no time by making use of submissions and pings.

Blog search engines more or less perfectly fulfil the demand on popular fresh content. The blogosphere pings blog search engines, that is why they are that up to date. The blogosphere is huge and the amount of blog posts is enormous, but it is just a tiny part of the Web. Even more fresh content is still published elsewhere, and elsewhere is the playground of the major search engines, not even touched by blog search engines.

Google wants to dominate search, and currently it does. Google cannot ignore the demand on fresh and popular content, and Google cannot lower the relevancy of search results. Will Google’s future search results be ranked by sort of ‘recent relevancy’ algos? I guess not in general, but ‘recent relevancy’ is not an oxymoron, because Google can learn to determine the type of the requested information and deliver more recent or more relevant results depending on the query context and tracked user behavior. I’m speculating here, but it is plausible and Google already has developed all components necessary to assemble such an algo.

Based on the speculation above, investments in RSS technology and alike should be a wise business decision. If ‘ranking by recent relevancy’ or something similar comes true, dynamic Web sites with the bigger toolset will often outrank the established but more static organized sources of information.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Bait Googlebot With RSS Feeds

Seeing Ms. Googlebot’s sister running wild on RSS feeds, I’m going to assume that RSS feeds may become a valuable tool to support Google’s fresh and deep crawls. Test it for yourself:

Create a RSS feed with a few unlinked or seldom spidered pages which are not included in your XML sitemap. Add the feed to your personalized Google Home Page (’Add Content’ -> ‘Create Section’ -> Enter Feed URL). Track spider accesses to the feed and the included pages as well. Most probably Googlebot will request your feed more often than Yahoo’s FeedSeeker and similar bots. Chances are that Googlebot-Mozilla is nosy enough to crawl at least some of the pages linked in the feed.

That does not help a lot with regard to indexing and ranking, but it seems to be a neat procedure helping the Googlebot sisters spotting fresh content. In real life add the pages to your XML sitemap, link to them and acquire inbound links…

To test the waters, I’ve added RSS generation to my Simple Google Sitemaps Generator. This tool reads a plain page list from a text file, and generates a dynamic XML sitemap, a RSS 2.0 site feed and a hierarchical HTML site map.

Related article on Google’s RSS endeavors: Why Google is an RSS laggard

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Take Free SEO Advice With a Grain of Salt

Yesterday I’ve spotted again how jokes become rumors. It happens every day and sometimes it even hurts. In my post The Top-5 Methods to Attract Search Engine Spiders I was joking about bold dollar signs driving the MSN bot crazy. A few days later I discovered the first Web site making use of the putative ‘$$-trick’. To make a sad story worse, the webmaster has put the dollar signs as hidden text.

This reminds me to the spreading of the ghostly robots revisit META tag. This tag was used by a small regional canadian engine for local indexing in the stone age of the Internet. Today every free META tag generator on the net produces a robots revisit tag. Not a single search engine is interested in this tag. It was never standardized. But it’s present on billions of Web pages.

This is how bad advice becomes popular. Folks read nasty tips and tricks on the net and don’t apply common sense when they implement it. There is no such thing as free and good advice on the net. Even good advice on a particular topic can result in astonishing effets when applied outside its context. It’s impossible to learn SEO from free articles and posts on message boards. Go see a SEO, it’s worth it.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Systematic Link Patterns Kill SE-Traffic

Years ago, Google started a great search engine ranking Web pages by PageRank within topical matches. Altavista was a big player, and a part of its algo ranked by weighted link popularity. Even Inktomi and a few others begun to experiment with linkpop as a ranking criteria.

Search engine optimizers and webmasters launched huge link farms, where thousands of Web sites were linking to each other. From a site owner’s point of view, those link farms, aka spider traps, ‘helped search engine crawlers to index and rank the participating sites’. For a limited period of time, Web sites participating in spider traps were crawled more frequently, and -caused by their linkpop- gained better placements on the search engine result pages.

From a search engine’s point of view, artificial linking for the sole purpose of manipulating search engine rankings is a bad thing. Their clever engineers developed link spam filters, and the engines begun to automatically penalize or even ban sites involved in systematic link patterns.

Back in 2000, removing the artificial links and asking for reinclusion worked for most of the banned sites. Nowadays it’s not that easy to get a banned domain back in the index. Savvy webmasters and serious search engine optimizers found better and honest ways to increase search engine traffic.

However, there are still a lot of link farms out there. Newbies following bad advice still join them, and get caught eventually. Spider trap operators are smart enough to save their ass, but thousands of participating newbies lose the majority of their traffic when a spider trap gets rolled up by the engines. Some spider traps even charge their participants. Google has just begun to work on a link spam network where the operator earns 46,000$ monthly for putting his customers at risk.

Stay away from any automated link exchange ’service’, it’s not worth it. Don’t trust sneaky sales pitches trying to talk you into risky link swaps. Approaches to automatically support honest link trades are limited to administrative tasks. Hire an experienced SEO Consultant for serious help on your link development.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The Top-5 Methods to Attract Search Engine Spiders

Full sized image copyrighted © by Leech Design 2000Folks on the boards and in news groups waste man years speculating on the best bait to entrap search engine spiders.

Stop posting, listen to the ultimate advice and boost your search engine traffic to the sky within a few months. Here are the five best methods to get a Web site crawled and indexed quickly:

5 Laying out milk and cookies attracts the Googlebot sisters.
4 Creating a Google Sitemap supports the Googlebot sisters.
3 Providing RSS feeds and adding them to MyYahoo decoys Slurp.
2 Placing bold dollar signs ‘$$‘ nearby the copyright or trademark notice drives the MSN bot crazy.
1 Spreading deep inbound links all over the Internet encourages all spiders to deep and frequent crawling and fast indexing as well.

Listen, there is only one single method that counts: #1. Forget everything you’ve heard about search engine indexing. Concentrate all your efforts on publishing fresh content and acquiring related inbound links to your content pages instead.

Link out to valuable pages within the body text and ask for a backlink. Keep your outbound links up, even if you don’t get a link back. Add a page to each content page and use it to trade links on the content page’s topic. Don’t bother with home page link exchanges.

Ignore tricky ‘backdoor’ advice. There is no such thing as a backdoor to a search engine’s index. Open your front door widely for the engines by actively developing deep inbound links. Once you’re indexed and ranked fairly, fine tune your search engine spider support. Best of luck.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Mozilla-Googlebot Helps with Debugging

Tracking Googlebot-Mozilla is a great way to discover bugs in a Web site. Try it for yourself, filter your logs by her user agent name:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Although Googlebot-Mozilla can add pages to the index, I see her mostly digging in ‘fishy’ areas. For example, she explores URLs where I redirect spiders to a page without query string to avoid indexing of duplicate content. She is very interested in pages with a robots NOINDEX,FOLLOW tag, when she knows another page carrying the same content, available from a similar URL but stating INDEX,FOLLOW. She goes after unusual query strings like ‘var=val&&&&’ resulting from a script bug fixed months ago, but still represented by probably thousands of useless URLs in Google’s index. She fetches a page using two different query strings, checking for duplicate content and alerting me to a superflous input variable used in links on a forgotten page. She fetches dead links to read my very informative error page … and her best friend is the AdSense bot since they seem to share IPs as well as the interest in page updates before Googlebot is aware of them.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13  Next Page »