Tools

Archived posts from the 'Tools' Category

sway(”Google Webmaster Happiness Index”, $numStars, $rant);

Posted on 17 November, 2010

Rumors about GWHI are floating around for a while, but not even insiders were able to figure out the formula. As a matter of fact, not a single webmaster outside the Googleplex has ever seen it. I assume Barry’s guess is quite accurate: GWHI-meter

Anyway, I don’t care what it is, or how it works, as long as I can automate it. At first I ran a few tests by retweeting Google related rants, and finally I developed sway(string destination, decimal numStars, string rant). For a while now I’m brain-dumping my rants to Google with a cron job. I had to kill the process a few times until I figured out that $numStars = -5 invokes a multiply by -1 error, but since Google has fixed this bug it runs smoothly, nine to five.

Yesterday I learned that Google launched a manual variant of my method for you mere mortals. I’m excited to share it: HotPot. Nope, it’s not a typo. Hot pot, as in bong. Officially addictive (source).

HotPot’s RTFM

Login with your most disposable Google account, then load http://google.com/hotpot/onboard with your Web browser (API coming soon, so I was told, hence feel free to poll https://google.com/hotpot/rest/sway for an HTTP response code != 503).

The landing page’s search box explains itself: “Enter a category near a familiar neighborhood and city to start rating places you know. Ex. [restaurants Mountain View, CA]”. Of course localization is in place and working fine (you can change your current address in your Google Profile at any time by providing Checkout with another credit card).

As a webmaster eager to submit GWHI ratings, you’re not interested in over-priced food near the Googleplex, so you overwrite the default category:

HotPot rating box for a search engine called Google in Mountain View, CA Press the Search button.

On the result page you’ll spot a box featuring Google, with a nice picture of the Googleplex in Mountain View. To convince you that indeed you’ve found the right place to drop your rants, “Google” is written in bold letters all over the building.

To its left, Google HotPot provides tips like

Get smarter SERPs.

Reading your mind we’ve figured out that a particular SERP ranking has pissed you off. You know, rankings can turn out good and bad, even yours. With you rating our rankings, we learn a bit more about your tastes, so you’ll get better SERPs the next time you search.

Next you click on any gray star at the bottom, and magically the promotional image turns into a text area.

HotPot review of a search engine called Google in Mountain View, CA Now tell the almighty Google why your pathetic site deserves better rankings than the popular brands with deep pockets you’re competiting with on the Interwebs.

Don’t make the mistake to mention that you’re cheaper. Google will conclude that goes for your information architecture, crawlability, usability, image resolution and content quality, too. Better mimick an elitist specialist of all professions or so, and sell your stuff as swiss army knife.

Then press the Publish button, and revisit your SERP, again and again.

You’ll be quite astonished.

Google’s webmaster relations team will be quite happy.

I mean, can you think of a better way to turn yourself in with a selfish spam report as an ajax’ed Web form that even comes with stars?

Google’s HotPot is pretty cool, don’t you agree?

Sebastian

spying at:

1600 Amphitheatre Parkway

Mountain View,
CA
94043

USA

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Tools, Fun, Webmaster Central, Google

How do Majestic and LinkScape get their raw data?

Posted on 21 January, 2010

LinkScape data acquisition Does your built-in bullshit detector cry in agony when you read announcements of link analysis tools claiming to have crawled Web pages in the trillions? Can a tiny SEO shop, or a remote search engine in its early stages running on donated equipment, build an index of that size? It took Google a decade to reach these figures, and Google’s webspam team alone outnumbers the staff of SEOmoz and Majestic, not to speak of infrastructure.

Well, it’s not as shady as you might think, although there’s some serious bragging and willy whacking involved.

First of all, both SEOmoz and Majestic do not own an indexed copy of the Web. They process markup just to extract hyperlinks. That means they parse Web resources, mostly HTML pages, to store linkage data. Once each link and its attributes (HREF and REL values, anchor text, …) are stored under a Web page’s URI, the markup gets discarded. That’s why you can’t search these indexes for keywords. There’s no full text index necessary to compute link graphs.

Majestic index size The storage requirements for the Web’s link graph are way smaller than for a full text index that major search engines have to handle. In other words, it’s plausible.

Majestic clearly describes this process, and openly tells that they index only links.

With SEOmoz that’s a completely different story. They obfuscate information about the technology behind LinkScape to a level that could be described as near-snake-oil. Of course one could argue that they might be totally clueless, but I don’t buy that. You can’t create a tool like LinkScape being a moron with an IQ slighly below an amoeba. As a matter of fact, I do know that LinkScape was developed by extremely bright folks, so we’re dealing with a misleading sales pitch:

Linkscape index size

Let’s throw in a comment at Sphinn, where a SEOmoz rep posted “Our bots, our crawl, our index“.

Of course that’s utter bullshit. SEOmoz does not have the resources to accomplish such a task. In other words, if -and that’s a big IF- they do work as described above, they’re operating something extremely sneaky that breaks Web standards and my understanding of fairness and honesty. Actually, that’s not so, but because it is not so, LinkScape and OpenSiteExplorer in its current shape must die (see below why).

They do insult your intelligence as well as mine, and that’s obviously not the right thing to do, but I assume they do it solely for marketing purposes. Not that they need to cover up their operation with a smokescreen like that. LinkScape could succeed with all facts on the table. I’d call it a neat SEO tool, if it just would be legit.

So what’s wrong with SEOmoz’s statements above, and LinkScape at all?

Let’s start with “Crawled in the past 45 days: 700 billion links, 55 billion URLs, 63 million root domains”. That translates to “crawled … 55 billion Web pages, including 63 million root index pages, carrying 700 billion links”. 13 links per page is plausible. Crawling 55 billion URIs requires sending out HTTP GET requests to fetch 55 billion Web resources within 45 days, that’s roughly 30 terabyte per day. Plausible? Perhaps.

True? Not as is. Making up numbers like “crawled 700 billion links” suggests a comprehensive index of 700 billion URIs. I highly doubt that SEOmoz did ‘crawl’ 700 billion URIs.

When SEOmoz would really crawl the Web, they’d have to respect Web standards like the Robots Exclusion Protocol (REP). You would find their crawler in your logs. An organization crawling the Web must

do that with a user agent that identifies itself as crawler, for example “Mozilla/5.0 (compatible; Seomozbot/1.0; +http://www.seomoz.com/bot.html)”,
fetch robots.txt at least daily,
provide a method to block their crawler with robots.txt,
respect indexer directives like “noindex” or “nofollow” both in META elements as well as in HTTP response headers.

SEOmoz obeys only <META NAME="SEOMOZ" CONTENT="NOINDEX" />, according to their sources page. And exactly this page reveals that they purchase their data from various services, including search engines. They do not crawl a single Web page.

Savvy SEOs should know that crawling, parsing, and indexing are different processes. Why does SEOmoz insist on the term “crawling”, taking all the flak they can get, when they obviously don’t crawl anything?

Two claims out of three in “Our bots, our crawl, our index” are blatant lies. If SEOmoz performs any crawling, in addition to processing bought data, without following and communicating the procedure outlined above, that would be sneaky. I really hope that’s not happening.

As a matter of fact, I’d like to see SEOmoz crawling. I’d be very, very happy if they would not purchase a single byte of 3rd party crawler results. Why? Because I could block them in robots.txt. If they don’t access my content, I don’t have to worry whether they obey my indexer directives (robots meta ‘tag’) or not.

As a side note, requiring a “SEOMOZ” robots META element to opt out of their link analysis is plain theft. Adding such code bloat to my pages takes a lot of time, and that’s expensive. Also, serving an additional line of code in each and every HEAD section sums up to a lot of wasted bandwidth -$$!- over time. Am I supposed to invest my hard earned bucks just to prevent me from revealing my outgoing links to my competitors? For that reason alone I should report SEOmoz to the FTC requesting them to shut LinkScape down asap.

They don’t obey the X-Robots-Tag (”noindex”/”nofollow”/… in the HTTP header) for a reason. Working with purchased data from various sources they can’t guarantee that they even get those headers. Also, why the fuck should I serve MSNbot, Slurp or Googlebot an HTTP header addressing SEOmoz? This could put my search engine visibility at risk.

If they’d crawl themselves, serving their user agent a “noindex” X-Robots-Tag and a 403 might be doable, at least when they pay for my efforts. With their current setup that’s technically impossible. They could switch to 80legs.com completely, that’ll solve the problem, provided 80legs works 100% by the REP and crawls as “SEOmozBot” or so.

With MajesticSEO that’s not an issue, because I can block their crawler withUser-agent: MJ12bot Disallow: /

Yahoo’s site explorer also delivers too much data. I can’t block it without losing search engine traffic. Since it will probably die when Microsoft overtakes search.yahoo.com, I don’t rant much about it. Google and Bing don’t reveal my linkage data to everyone.

I have an issue with SEOmoz’s LinkScape, and OpenSiteExplorer as well. It’s serious enough that I say they have to close it, if they’re not willing to change their architecture. And that has nothing to do with misleading sales pitches, or arrogant behavior, or sympathy (respectively, a possibly lack of sympathy).

The competitive link analysis OpenSiteExplorer/LinkScape provides, without giving me a real chance to opt out, puts my business at risk. As much as I appreciate an opportunity to analyze my competitors, vice versa it’s downright evil. Hence just kill it.

Is my take too extreme? Please enlighten me in the comments.

Update: A follow-up post from Michael VanDeMar and its Sphinn discussion, the first LinkScape thread at Sphinn, and Sphinn comments to this pamphlet.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

56 comments Sebastian | Tools, X-Robots-Tag, Robots Meta Tags, Crawler Directives, robots.txt, SEO

Sanitize links in your content feeds

Posted on 2 January, 2010

Here’s a WordPress plug-in that sanizites relative links and on-the-page links in your content feeds: feedLinkSanitizer. Why do you need it?

Because you end up with invalid links like http://feeds.sebastians-pamphlets.com/SebastiansPamphlets#tos if you don’t use it. Once the post phases out of the main page, the link points to nowhere in feedreaders and reprints.

In feeds, absolute links are mandatory. Make sure that not a single on-the-page link or relative link slips out of your site.

Relative links

When you put all links to your own stuff as /perma-link/ instead of http://your-blog/permalink/ you can serve your blog’s content from a different server / base URI (dev, move, …) without editing all internal links.

The downside is, that for various very good reasons (scrapers, search engines, whatnot) thou must not have relative links in your HTML. You might disagree, but read on.

The simple solution is: store relative links in your WordPress database, but output absolute links. Follow the hint in feedLinkSanitizer.txt to activate link sanitizing in your HTML. By default it changes only feed contents.

The plug-in changes /perma-link/ to http://example.com/perma-link/ in your posts, using the blog URI provided in your WordPress settings. It takes the current server name if this value is missing.

Fragment links

You can link to any DOM-ID in an HTML page, for example <a href="#tos">Table of contents</a> where ‘tos’ is the DOM-ID of an HTML element like <h2 id="tos">Table of contents</h2>. These on-the-page links even come with some SEO value, just in case you don’t care much about usability.

The plug-in changes #tos to http://example.com/perma-link/#tos in your posts. If you’ve set $sanitizeAllLinks = TRUE; in the plugin-code, an on-the-page link clicked on the blog’s main page will open the post, positioning to the DOM-ID.

Download feedLinkSanitizer

I’m a launch-early kind of guy, so test it yourself. And: Use at your own risk. No warranty expressed or implied is provided.

If you use another CMS, download the plug-in anyway and ~~steal~~ adapt its code.

Credits for previous work go to Jon Thysell and Gerd Riesselmann.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

1 comment Sebastian | Blogging, Web development, Tools, SEO

Ping the hell out of Technorati’s reputation algo

Posted on 19 December, 2007

Ping your inbound links for technorati reputation If your Technorati reputation factor sucks ass then read on, otherwise happily skip this post.

Technorati calculates a blog’s authority/reputation based on its link popularity, counting blogroll links from the linking blogs main pages as well as links within the contents of their posts. Links older than six months after their very first discovery don’t count.

Unfortunately, Technorati is not always able to find all your inbound links, usually because clueless bloggers forget to ping them, hence your blog might be undervalued. You can change that.

Compile a list of blogs that link to you and are unknown at Technorati, then introduce them below to a cluster ping orgy. Technorati will increase your authority rating after indexing those blogs.

Enter one blog home page URL per line, all lines properly delimited with a “\n” (new line, just hit [RETURN]; “\r” crap doesn’t work). And make sure that all these blogs have an auto-discovery link pointing to a valid feed in their HEAD section. Do NOT ping Technorati with post-URIs! Invest the time to click through to the blog’s main page and submit the blog-URI instead. Post-URI pings get mistaken for noise and trigger spam traps, that means their links will not increase your Technorati authority/rank.

Results:

<br/>

Actually, this tool pings other services than Technorati too. Pingable contents make it on the SERPs, not only at Technorati.

If you make use of URL canonicalization routines that add a trailing slash to invalid URLs like http://example.com then make sure that you claim your blog at Technorati with the trailing slash.

Please note that this tool is experimental and expects a Web standard friendly browser. It might not work for you, and I’ll remove it if it gets abused.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

36 comments Sebastian | Technorati, Blogging, Tools, SEO

Free WordPress Add-on: Categorized Sitemaps

Posted on 10 September, 2007

In How to feed all posts on a WordPress blog with link love I’ve outlined a method to create short and topically related paths to each and every post even on a large blog. Since not every blogger is PHP savvy enough to implement the concept, some readers asked me to share the sitemaps script.

Ok, here it is. It wasn’t developed as a WordPress plugin, and I’m not sure that’s possible (actually, I didn’t think about it), but I’ll do my best to explain the template hacks necessary to get it running smoothly. Needless to say it’s a quick hack and not exactly elegant, however it works here with WordPress 2.2.2. Use it as is at your own risk, yada yada yada, the usual stuff.

I’m a link whore, so please note: If you implement my sitemap script, please link out to any page on my blog. The script inserts a tiny link at the bottom of the sitemap. If you link to my blog under credits, powered by, in the blogroll or whereever, you can remove it. If you don’t link, the search engines shall ban you.

Prerequisites

You should be able to do guided template hacks.

You need a WordPress plugin that enables execution of PHP code within the content of posts and pages. Install one from the list below and test it with a private post or so. Don’t use the visual editor and deactivate the “WordPress should correct invalidly nested XHTML automatically” thingy in Options::Writing. In the post editor write something like Q: Does my PHP plugin work? <?php print "A: Yep, It works."; ?>and check “enable PHP on this page” (labels differ from plug-in to plug-in), save and click preview. If you see the answer it works. Otherwise try another plug-in:

RunPHP
~~EzStatic3~~ EzStatic
PhpExec

(Maybe you need to quote your PHP code with special tags like <phpcode></phpcode>, RTFM.)

Consider implementing my WordPress-SEO tweaks to avoid unnecessary code changes. If your permalink structure is not set to custom /%postname%/ giving post/page URLs like http://sebastians-pamphlets.com/about/ you need to tweak my code a little. Not that there’s such a thing as a valid reason to use another permalink structure …

Download

Don’t copy and paste PHP code from this page, it might not work because WordPress prettifies quotes etcetera. Everything you need is on the download page.

Installation

Copy list_categories.php to your template directory /wp-content/themes/yourtemplatename/ on your local disk and upload it to your server.

Create a new page draft, title it “Category Index” or so, and in page content put <?php @include(TEMPLATEPATH . "/list_categories.php"); ?> then save and preview it. You should see a category links list like this one. Click the links, check whether the RSS icons show or not, etcetera.

If anything went wrong, load list_categories.php with your preferred editor (not word processor!). Scroll down to edit these variables: // Customize if necessary: //$blogLocaction = “sebastians-pamphlets.com”; // “www.yourserver.com”, “www.yourserver.com/blog” … // without “http://” and no trailing slash! //$rssIconPath = “/img/feed-icon-16×16.gif”; // get a 16*16px rss icon somewhere and upload it you your server, // then change this path which is relative to the domain’s root. $rssIconWidth = 16; $rssIconHeight = 16; If you edit a variable, remove its “//“. If you use the RSS icon delivered with WordPress, change width and height to 14 pixels. Save the file, upload it to your server, and test again.

If you use Feedburner then click the links to the category feeds, Feedburner shouldn’t redirect them to your blog’s entries feed. I’ve used feed URLs which the Feedburner plug-in doesn’t redirect, but if the shit hits the fan search for the variable $catFeedUrl and experiment with the category-feed URLs.

Your sitemap’s URL is http://your-blog.com/sitemap-page-slug/ (respectively your-blog.com/about/sitemap/ or so when the sitemap has a parent page).

In theory you’re done. You could put a link to the sitemap in your sidebar and move on. In reality you want to prettify it, and you want to max out the SEO effects. Here comes the step by step guide to optimized WordPress sitemaps / topical hubs.

Category descriptions

On your categorized sitemap click any “[category-name] overview” link. You land on a page listing all posts of [category-name] under the generic title “Category Index”, “Sitemap”, or whatever you’ve put in the page’s title. Donate at least a description. Your visitors will love that and when you install a meta tag plugin the search engines will send a little more targeted traffic because your SERP listings look better (sane meta tags don’t boost your rankings but should improve your SERP CTR).

On your dashboard click Manage::Categories and write a nice but keyword rich description for each category. When you reference other categories by name my script will interlink the categories automatically, so don’t put internal links. Now the category links lists (overview pages) look better and carry (lots of) keywords.

The sitemap URL above will not show the descriptions (respectively only as tooltip), but the topical mini-hubs linked as “overview” (category links lists) have it. Your sitemap’s URL with descriptions is http://your-blog.com/sitemap-page-slug/?definitions=TRUE (your-blog.com/about/sitemap/?definitions=TRUE or so when the sitemap has a parent page).

If you want to put a different introduction or footer depending on the appearance of descriptions you can replace the code in your page by: <?php // introduction: if (strtoupper($_GET["definitions"]) == "TRUE") { print "<p><strong>All categories with descriptions.</strong> (Example)</p>”; } else { if (!isset($_GET[”cat”])) { print “<p><strong>All categories without descriptions.</strong> (Example)</p>”; } } @include(TEMPLATEPATH . “/list_categories.php”); // footer as above ?>(If you use quotes in the print statements then prefix them with a slash, for example: print "<em>yada \"yada\" <a href=\"url\" title=\"string\">yada</a></em>."; will output yada “yada” yada.)

Title tags

The title of the page listing all categories with links to the category pages and feeds is by design used for the category links pages too. WordPress ignores input parameters in URLs like http://your-blog.com/sitemap-page-slug/?cat=category-name.

To give each category links list its own title tag, replace the PHP code in the title tag. Edit header.php: <title> <?php // 1. Everything: $pageTitle = wp_title(“”,false); if (empty($pageTitle)) { $pageTitle = get_bloginfo(”name”); } $pageTitle = trim($pageTitle); // 2. Dynamic category pages: $input_catName = trim($_GET[”cat”]); if ($input_catName) { $input_catName = ucfirst($input_catName); $pageTitle = $input_catName .” at ” .get_bloginfo(”name”); } // 3. If you need a title depending on the appearance of descriptions $input_catDefs = trim($_GET[”definitions”]); if ($input_catDefs) { $pageTitle = “All tags explained by ” .get_bloginfo(”name”); } print $pageTitle; ?> </title>
The first statements just fix the obscene prefix crap most template designers are obsessed about. The second block generates page titles with the category name in it for the topical hubs (if your category slugs and names are identical). You need 1. and 2.; 3. is optional.

Page headings

Now that you’ve neat title tags, what do you think about accurate headings on the category hub pages? To accomplish that you need to edit page.php. Search for a heading (h3 or so) displaying the_title(); and replace this function by: <h3 class=”entrytitle” id=”post-<?php the_ID(); ?>”> <a href=”<?php the_permalink() ?>” rel=”bookmark”> <?php // 1. Dynamic category pages $input_catName = trim($_GET[”cat”]); if ($input_catName) { $input_catName = ucfirst($input_catName); $dynTitle = “All Posts Tagged ‘” .$input_catName .”‘”; } // 2. If you need a heading depending on the appearance of descriptions $input_catDefs = trim($_GET[”definitions”]); if ($input_catDefs) { $dynTitle = “All tags explained”; } // 3. Output the heading if ($dynTitle) print $dynTitle; else the_title(); ?> </a> </h3>(The surrounding XHTML code may look different in your template! Replace the PHP code leaving the HTML code as is.)

The first block generates headings with the category name in it for the topical hubs (if your category slugs and names are identical). The last statement outputs either the hub’s heading or the standard title if the actual page doesn’t belong to the script. You need 1. and 3.; 2. is optional.

Feeding the category hubs

With most templates each post links to the categories its tagged with. Besides the links to the category archive pages you want to feed your hubs linking to all posts of each category with a little traffic and topical link juice. One method to accomplish that is linking to the category hubs below the comments. If you don’t read this post on the main page or an archive page, click here for an example. Edit single.php, a line below the comments_template(); call insert something like that: <br /> <p class="post-info" id="related-links-lists"> <em class="cat">Find related posts in <?php $catString = ""; foreach((get_the_category()) as $catItem) { if (!empty($catString)) $catString .= ", "; $catName = $catItem->cat_name; $catSlug = $catItem->category_nicename; $catUrl = "http://your-blog.com/sitemap-page-slug/?cat=" .strtolower($catSlug); $catString .= "<a href=\"$catUrl\">$catName</a>"; } // foreach print $catString; ?> </em> </p> (Study your template’s “post-info” paragraph and ensure that you use the same class names!)

Also, if your descriptions are of glossary quality, then link to your category hubs in your posts. Since most of my posts are dull as dirt, I decided to make the category descriptions an even duller canonical SEO glossary. It’s up to you to become creative and throw together something better, funnier, more useful … you get the idea. If you blog in english and you honestly believe your WordPress sitemap is outstanding, why not post it in the comments? Links are dofollowed in most cases.

Troubleshooting

Test everything before you publish the page and link to the sitemaps.

If you have category descriptions and on the sitemap pages links to other categories within the description are broken: Make sure that the sitemap page’s URL does not contain the name or slug of any of your categories. Say the page slug is “sitemaps” and “links” is the parent page of “sitemaps” (URL: /links/sitemaps/), then you must not have a category named “links” nor “sitemaps”. Since a “sitemap” category is somewhat unusual, I’d say serving the sitemaps on a first level page named “sitemap” is safe.

Disclaimer

I hope this post isn’t clear as mud and everybody can install my stuff without hassles. However, every change of code comes with pitfalls, and I can’t address each and every possibility, so please backup your code before you change it, or play with my script in a development system. I can’t provide support but I’ll try to reply to comments. Have fun at your own risk!

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Usability, Blogging, Web development, Tools, SEO

Analyzing search engine rankings by human traffic

Posted on 28 July, 2007

Recently I’ve discussed ranking checkers at several places, and I’m quite astonished that folks still see some value in ranking reports. Frankly, ranking reports are -in most cases- a useless waste of paper and/or disk space. That does not mean that SERP positions per keyword phrase aren’t interesting. They’re just useless without context, that is traffic data. Converting traffic pays the bills, not sole rankings. The truth is in your traffic data.

That said, I’d like to outline a method to get a particular useful information out of raw traffic data: underestimated search terms. That’s not a new attempt, and perhaps you have the reports already, but maybe you don’t look at the information which is somewhat hidden in stats ordered by success, not failure. And you should be -respective employ- a programmer to implement it.

The first step is gathering data. Create a database table to record all hits, then in a footer include or so, when the complete page got outputted already, write all data you have in that table. All data means URL, timestamp, and variables like referrer, user agent, IP, language and so on. Be a data rat, log everything you can get hold of. With dynamic sites it’s easy to add page title, (product) IDs etcetera, with static sites write a tool to capture these attributes separately.

For performance reasons it makes sense to work with a raw data table, which has just a primary key, to log the requests, and normalized working tables which have lots of indexes to allow aggregations, ad hoc queries, and fast reports from different perspectives. Also think of regular purging the raw log table and historization. While transferring raw log data to the working tables in low traffic hours or on another machine you can calculate interesting attributes and add data from other sources which were not available to the logging process.

You’ll need that traffic data collector anyway for a gazillion of purposes where your analytics software fails, is not precise enough, or just can’t deliver a particular evaluation perspective. It’s a prerequisite for the method discussed here, but don’t build a monster sized cannon to chase a fly. You can gather search engine referrer data from logfiles too.

For example an interesting information is on which SERP a user clicked a link pointing to your site. Simplified you need three attributes in your working tables to store this info: search engine, search term, and SERP number. You can extract these values from the HTTP_REFERER.

http://www.google.com/search?q=keyword1+keyword2~
&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
1. “google” in the server name tells you the search engine.
2. The “q” variable’s value tells you the search term “keyword1 keyword2″.
3. The lack of a “start” variable tells you that the result was placed on the first SERP. The lack of a “num” variable lets you assume that the user got 10 results per SERP, so it’s quite safe to say that you rank in the top 10 for this term. Actually, the number of results per page is not always extractable from the URL because it’s pulled from a cookie usually, but not so many surfers change their preferences (e.g. less than 0.5% surf with 100 results according to JohnMu and my data as well). If you’ve got a “num” value then add 1 and divide the result by 10 to make the data comparable. If that’s not precise enough you’ll spot it afterwards, and you can always recalculate SERP numbers from the canned referrer.

http://www.google.co.uk/search?q=keyword1+keyword2~
&hl=en&start=10&sa=N
1. and 2. as above.
3. The “start” variable’s value 10 tells you that you got a hit from the second SERP. When start=10 and there is no “num” variable, most probably the searcher got 10 results per page.

http://www.google.es/search?q=keyword1+keyword2~
&rls=com.microsoft:*&ie=UTF-8&oe=UTF-8&startIndex=~
&startPage=1
1. and 2. as above.
3. The empty “startIndex” variable and startPage=1 are useless, but the lack of “start” and “num” tells you that you’ve got a hit from the 1st spanish SERP.

http://www.google.ca/search?q=keyword1+keyword2~
&hl=en&rls=GGGL,GGGL:2006-30,GGGL:en&start=20~
&num=20&sa=N
1. and 2. as above.
3. num=20 tells you that the searcher views 20 results per page, and start=20 indicates the second SERP, so you rank between #21 and #40, thus the (averaged) SERP# is 3.5 (provided SERP# is not an integer in your database).

You got the idea, here is a cheat sheet and official documentation on Google’s URL parameters. Analyze the URLs in your referrer logs and call them with cookies off what disables your personal search preferences, then play with the values. Do that with other search engines too.

Now a subset of your traffic data has a value in “search engine”. Aggregate tuples where search engine is not NULL, then select the results for example where SERP number is lower or equal 3.99 (respectively 4), ordered by SERP number ascending, hits descending and keyword phrase, break by search engine. (Why sorted by traffic descending? You have a report of your best performing keywords already.)

The result is a list of search terms you rank for on the first 4 SERPs, beginning with keywords you’ve probably not optimized for. At least you didn’t optimize the snippet to improve CTR, so your ranking doesn’t generate a reasonable amount of traffic. Before you study the report, throw away your site owner hat and try to think like a consumer. Sometimes those make use of a vocabulary you didn’t think of before.

Research promising keywords, and decide whether you want to push, bury or ignore them. Why bury? Well, in some cases you just don’t want to rank for a particular search term, [your product sucks] being just one example. If the ranking is fine, the search term smells somewhat lucrative, and just the snippet sucks in a particular search query’s context, enhance your SERP listing.

Every once in a while you’ll discover a search term making a killing for your competitors whilst you never spotted it because your stats package reports only the best 500 monthly referrers or so. Also, you’ll get the most out of your rankings by optimizing their SERP CTRs.

Be crative, over time your traffic database becomes more and more valuable, allowing other unconventional and/or site specific reports which off-the-shelf analytics software usually does not deliver. Most probably your competitors use standard analytics software, individually developed algos and reports can make a difference. That does not mean you should throw away your analytics software to reinvent the wheel. However, once you’re used to self developed analytic tools you’ll think of more interesting methods not only to analyse and monitor rankings by human traffic than you can implement in this century

Bear in mind that the method outlined above does not and cannot replace serious keyword research.

Another -very popular- approach to get this info would be automated ranking checks mashed up with hits by keyword phrase. Unfortunately, Google and other engines do not permit automated queries for the purpose of ranking checks, and this method works with preselected keywords, that means you don’t find (all) search terms created by users. Even when you compile your ranking checker’s keyword lists via various keyword research tools, you’ll still miss out on some interesting keywords in your seed list.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Analytics, CTR, Tools, SEO, Google

Rediscover Google’s free ranking checker!

Posted on 27 July, 2007

Nowadays we’re searching via toolbar, personalized homepage, or in the browser address bar by typing in “google” to get the search box, typing in a search query using “I feel lucky” functionality, or -my favorite- typing in google.com/search?q=free+pizza+service+nearby.

Old fashioned, uncluttered and nevertheless sexy user interfaces are forgotten, and pretty much disliked due to the lack of nifty rounded corners. Luckily Google still maintains them. Look at this beautiful SERP:

It’s free of personalized search, wonderful uncluttered because the snippets appear as tooltip only, results are nicely numbered from 1 to 1,000 on just 10 awesome fast loading pages, and when I’ve visited my URLs before I spot my purple rankings quickly.

http://google.com/ie?num=100&q=keyword1+keyword2 is an ideal free ranking checker. It supports &filter=0 and other URL parameters, so it’s a perfect tool when I need to lookup particular search terms.

Mass ranking checks are totally and utterly useless, at least for the average site, and penalized by Google. Well, I can think of ways to semi-automate a couple queries, but honestly, I almost never need that. Providing fully automated ranking reports to clients gave SEO services a more or less well deserved snake oil reputation, because nice rankings for preselected keywords may be great ego food, but they don’t pay the bills. I admit that with some setups automated mass ranking checks make sense, but those are off-topic here.

By the way, Google’s query stats are a pretty useful resource too.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

4 comments Sebastian | Recommendations, Tools, SEO, Google

Getting the most out of Google’s 404 stats

Posted on 16 July, 2007

The 404 reports in Google’s Webmaster Central panel are great to debug your site, but they contain URLs generated by invalid -respectively truncated- URL drops or typos of other Webmasters too. Are you sick of wasting the link love from invalid inbound links, just because you lack a suitable procedure to 301-redirect all these 404 errors to canonical URLs?

Your pain ends here. At least when you’re on a *ix server running Apache with PHP 4+ or 5+ and .htaccess enabled. (If you suffer from IIS go search another hobby.)

I’ve developed a tool which grabs all 404 requests, letting you map a canonical URL to each 404 error. The tool captures and records 404s, and you can add invalid URLs from Google’s 404-reports, if these aren’t recorded (yet) from requests by Ms. Googlebot.

It’s kinda layer between your standard 404 handling and your error page. If a request results in a 404 error, your .htaccess calls the tool instead of the error page. If you’ve assigned a canonical URL to an invalid URL, the tool 301-redirects the request to the canonical URL. Otherwise it sends a 404 header and outputs your standard 404 error page. Google’s 404-probe requests during the Webmaster Tools verification procedure are unredirectable (is this a word?).

Besides 1:1 mappings of invalid URLs to canonical URLs you can assign keywords to canonical URLs. For example you can define that all invalid requests go to /fruit when the requested URI or the HTTP referrer (usually a SERP) contain the strings “apple”, “orange”, “banana” or “strawberry”. If there’s no persistent mapping, these requests get 302-redirected to the guessed canonical URL, thus you should view the redirect log frequently to find invalid URLs which deserve a persistent 301-redirect.

Next there are tons of bogus requests from spambots searching for exploits or whatever, or hotlinkers, resulting in 404 errors, where it makes no sense to maintain URL mappings. Just update an ignore list to make sure those get 301-redirected to example.com/goFuckYourself or a cruel and scary image hosted on your domain or a free host of your choice.

Everything not matching a persistent redirect rule or an expression ends up in a 404 response, as before, but logged so that you can define a mapping to a canonical URL. Also, you can use this tool when you plan to change (a lot of) URLs, it can 301-redirect the old URL to the new one without adding those to your .htaccess file.

I’ve tested this tool for a while on a couple of smaller sites and I think it can get trained to run smoothly without too many edits once the ignore lists etcetera are up to date, that is matching the site’s requisites. A couple of friends got the script and they will provide useful input. Thanks! If you’d like to join the BETA test drop me a message.

Disclaimer: All data get stored in flat files. With large sites we’d need to change that to a database. The UI sucks, I mean it’s usable but it comes with the browser’s default fonts and all that. IOW the current version is still in the stage of “proof of concept”. But it works just fine

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

2 comments Sebastian | Tools, Testing, 404grabber, Hotlinking, .htaccess, SEO, Webmaster Central, Google

Referrer spoofing with PrefBar 3.4.1

Posted on 24 June, 2007

Testing browser optimization, search engine friendly user-agent cloaking, referrer based navigation or dynamic landing pages with scripts or by changing the user agent name in the browser’s settings is no fun.

I love PrefBar, a neat FireFox plug-in, which provides me with a pretty useful customizable toolbar. With PrefBar you can switch JavaScript, Flash, colors, images, cookies… on and off with one mouse click, and you can enter a list of user agent names to choose the user agent while browsing.

So I’ve asked Manuel Reimer to create a referrer spoofer widget, and kindly he created it with PrefBar 3.4.1. Thank you Manuel!

To activate referrer spoofing in your PrefBar toolbar install or update Prefbar to 3.4.1, then download the Referer Spoof Menulist 1.0, click “Customize” on the toolbar and import the file. Then click on “Edit” to add all the referrer URLs you need for testing purposes, and enjoy. It works great.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

4 comments Sebastian | Spoofing, Testing, Web development, Tools, Cloaking

Sebastian’s Pamphlets

Archived posts from the 'Tools' Category

sway(”Google Webmaster Happiness Index”, $numStars, $rant);

HotPot’s RTFM

How do Majestic and LinkScape get their raw data?

Sanitize links in your content feeds

Relative links

Fragment links

Download feedLinkSanitizer

Ping the hell out of Technorati’s reputation algo

Results:

Free WordPress Add-on: Categorized Sitemaps

Prerequisites

Download

Installation

Category descriptions

Title tags

Page headings

Feeding the category hubs

Troubleshooting

Disclaimer

Analyzing search engine rankings by human traffic

Rediscover Google’s free ranking checker!

Getting the most out of Google’s 404 stats

Referrer spoofing with PrefBar 3.4.1

Categories

Monthly Archives

Links

RSS Feeds