<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Sebastian's Pamphlets &#187; Analytics</title>
	<link>http://sebastians-pamphlets.com</link>
	<description>If you've read my articles somewhere on the Internet, expect something different here.</description>
	<pubDate>Mon, 30 Jun 2008 20:12:40 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>
	<language>en</language>
			<item>
		<title>Update your crawler detection: MSN/Live Search announces msnbot/1.1</title>
		<link>http://sebastians-pamphlets.com/live-search-announces-msnbot-1-1/</link>
		<comments>http://sebastians-pamphlets.com/live-search-announces-msnbot-1-1/#comments</comments>
		<pubDate>Tue, 12 Feb 2008 18:41:28 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Analytics]]></category>

		<category><![CDATA[MSN]]></category>

		<category><![CDATA[Crawler Directives]]></category>

		<category><![CDATA[Cloaking]]></category>

		<category><![CDATA[robots.txt]]></category>

		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/live-search-announces-msnbot-1-1/</guid>
		<description><![CDATA[Fabrice Canel from Live Search announces significant improvements of their crawler today. The very much appreciated changes are:

HTTP compression

The revised msnbot supports gzip and deflate as defined by RFC 2616 (sections 14.11 and 14.39). Microsoft also provides a tool to check your server&#8217;s compression / conditional GET support. (Bear in mind that most dynamic pages [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://sebastians-pamphlets.com/img/posts/msnbot-1-1.png" width="250" height="180" align="right" alt="msnbot/1.1" style="margin-left:4px;" title="MSNBOT/1.1" />Fabrice Canel from <a href="http://blogs.msdn.com/webmaster/archive/2008/02/12/announcing-crawling-improvements-for-live-search.aspx">Live Search announces significant improvements of their crawler</a> today. The very much appreciated changes are:</p>
<dl>
<dt>HTTP compression</dt>
<dd>
<p>The revised msnbot supports <b>gzip</b> and <b>deflate</b> as defined by <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html">RFC 2616</a> (sections 14.11 and 14.39). Microsoft also provides a <a href="http://go.microsoft.com/?linkid=8272590">tool to check your server&#8217;s compression / conditional GET support</a>. (Bear in mind that most dynamic pages (blogs, forums, &#8230;) will fool such <a href="http://www.microsoft.com/search/Tools/">tools</a>, try it with a static page or use your robots.txt.)</p>
</dd>
<dt>No more crawling of unchanged contents</dt>
<dd>
<p>The new msnbot/1.1 will not fetch pages that didn&#8217;t change since the last request, as long as the Web server supports the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25">&#8220;If-Modified-Since&#8221; header</a> in conditional GET requests. If a page didn&#8217;t change since the last crawl, the server responds with 304 and the crawler moves on. In this case your Web server exchanges only a handful of short lines of text with the crawler, not the contents of the requested resource.</p>
<p>If your server isn&#8217;t configured for HTTP compression and conditional GETs, you really should request that at your hosting service for the sake of your bandwidth bills.</p>
</dd>
<dt>New user agent name</dt>
<dd>
<p>From reading server log files we know the Live Search bot as &#8220;msnbot/1.0 (+http://search.msn.com/msnbot.htm)&#8221;, or &#8220;msnbot-media/1.0&#8243;, &#8220;msnbot-products/1.0&#8243;, and &#8220;msnbot-news/1.0&#8243;. From now on you&#8217;ll see &#8220;<b>msnbot/1.1</b>&#8220;. Nathan Buggia from Live Search clarifies: &#8220;<b>This update does not apply to all the other &#8216;msnbot-*&#8217; crawlers, just the main msnbot</b>. We will be updating those bots in the future&#8221;.</p>
<p>If you just check the user agent string for &#8220;msnbot&#8221; you&#8217;ve nothing to change, otherwise you should check the user agent string for both &#8220;msnbot/1.0&#8243; as well as &#8220;msnbot/1.1&#8243; before you do the reverse DNS lookup to identify bogus bots. MSN will not change the host name &#8220;.search.live.com&#8221; used by the crawling engine.</p>
<p>The announcement didn&#8217;t tell us whether the new bot will utilize HTTP/1.1 or not (MS and Yahoo crawlers, like other Web robots, still perform, respectively fake, HTTP/1.0 requests).</p>
</dd>
</dl>
<p>It looks like it&#8217;s no longer necessary to <a href="http://searchengineland.com/080207-174632.php">charge Live Search for bandwidth their crawler has burned</a>. <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Jokes aside, instead of reporting crawler issues to msnbot@microsoft.com, you can post your questions or concerns at a forum dedicated to <a href="http://forums.microsoft.com/webmaster/ShowForum.aspx?ForumID=1984&#038;SiteID=79">MSN crawler feedback and discussions</a>.</p>
<p>I&#8217;m quite nosy, so I just had to investigate what &#8220;there are many more improvements&#8221; in the blog post meant. I&#8217;ve asked <a href="http://nathanbuggia.com/">Nathan Buggia</a> from Microsoft a few questions. </p>
<p class="question">Nate, thanks for the opportunity to <em>talk crawling</em>&nbsp; with you. Can you please reveal a few msnbot/1.1 secrets? <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p class="answer">I&#8217;m glad you&#8217;re interested in our update, but we&#8217;re not yet ready to provide more details about additional improvements. However, there are several more that we&#8217;ll be shipping in the next couple months.</p>
<p class="question">Fair enough. So lets talk about related topics.</p>
<p class="question">Currently I can set crawler directives for file types identified by their extensions in my robots.txt&#8217;s msnbot section. Will you fully support wildcards (* and $ for all URI components, that is path and query string) in robots.txt in the foreseeable future?</p>
<p class="answer">This is one of several additional improvements that we are looking at today, however it has not been released in the current version of MSNBot. In this update we were squarely focused on reducing the burden of MSNBot on your site.</p>
<p class="question">What can or should a Webmaster do when you seem to crawl a site way too fast, or not fast enough? Do you plan to provide a tool to reduce the server load, respectively speed up your crawling for particular sites?</p>
<p class="answer">We currently support the &#8220;<a href="http://search.live.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_FAQ_MSNBotIndexing.htm&#038;FORM=WFDD#D">crawl-delay</a>&#8221; option in the robots.txt file for webmasters that would like to slow down our crawling. We do not currently support an option to increase crawling frequency, but that is also a feature we are considering.</p>
<p class="question">Will msnbot/1.1 extract URLs from client sided scripts for discovery crawling? If so, will such links pass reputation?</p>
<p class="answer">Currently we do not extract URLs from client-side scripts.</p>
<p class="question">Google&#8217;s last change of their infrastructure made nofollow&#8217;ed links completely worthless, because they no longer used those in their discovery crawling. Did you change your handling of links with a &#8220;nofollow&#8221; value in the REL attribute with this upgrade too?</p>
<p class="answer">No, changes to how we process nofollow links were not part of this update.</p>
<p class="question">Nate, many thanks for your time and your interesting answers! </p>
<ul><b>Related posts:</b></p>
<li><a href="http://blogs.msdn.com/webmaster/archive/2008/02/12/announcing-crawling-improvements-for-live-search.aspx">Official announcement</a> - by <a href="http://nathanbuggia.com/">Nathan Buggia</a>, Live Search Webmaster Center Blog</a></li>
<li><a href="http://searchengineland.com/080212-160910.php">MSNbot 1.1: Live Search Implements A More Efficient Crawl</a> - by <a href="http://vanessafoxnude.com/">Vanessa Fox</a>, Search Engine Land</li>
</ul>
<hr />Copyright &copy; 2008 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/live-search-announces-msnbot-1-1/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Analyzing search engine rankings by human traffic</title>
		<link>http://sebastians-pamphlets.com/analyzing-search-engine-rankings-by-human-traffic/</link>
		<comments>http://sebastians-pamphlets.com/analyzing-search-engine-rankings-by-human-traffic/#comments</comments>
		<pubDate>Sat, 28 Jul 2007 22:37:00 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
		
		<category><![CDATA[Analytics]]></category>

		<category><![CDATA[CTR]]></category>

		<category><![CDATA[Tools]]></category>

		<category><![CDATA[SEO]]></category>

		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://sebastians-pamphlets.com/analyzing-search-engine-rankings-by-human-traffic/</guid>
		<description><![CDATA[Recently I&#8217;ve discussed ranking checkers at several places, and I&#8217;m quite astonished that folks still see some value in ranking reports. Frankly, ranking reports are &#8211;in most cases&#8211; a useless waste of paper and/or disk space. That does not mean that SERP positions per keyword phrase aren&#8217;t interesting. They&#8217;re just useless without context, that is [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I&#8217;ve discussed <a href="http://sebastians-pamphlets.com/rediscover-googles-free-ranking-checker/">ranking checkers</a> at several places, and I&#8217;m quite astonished that folks still see some value in ranking reports. Frankly, ranking reports are &#8211;in most cases&#8211; a useless waste of paper and/or disk space. That does not mean that SERP positions per keyword phrase aren&#8217;t interesting. They&#8217;re just useless without context, that is traffic data. Converting traffic pays the bills, not sole rankings. The truth is in your traffic data.</p>
<p>That said, I&#8217;d like to outline a method to get a particular useful information out of raw traffic data: <b>underestimated search terms</b>. That&#8217;s not a new attempt, and perhaps you have the reports already, but maybe you don&#8217;t look at the information which is somewhat hidden in stats ordered by success, not failure. And you should be &#8211;respective employ&#8211; a programmer to implement it. </p>
<p>The first step is gathering data. Create a database table to record all hits, then in a footer include or so, when the complete page got outputted already, write all data you have in that table. All data means URL, timestamp, and variables like referrer, user agent, IP, language and so on. Be a data rat, log everything you can get hold of. With dynamic sites it&#8217;s easy to add page title, (product) IDs etcetera, with static sites write a tool to capture these attributes separately. </p>
<p>For performance reasons it makes sense to work with a raw data table, which has just a primary key, to log the requests, and normalized working tables which have lots of indexes to allow aggregations, ad hoc queries, and fast reports from different perspectives. Also think of regular purging the raw log table and historization. While transferring raw log data to the working tables in low traffic hours or on another machine you can calculate interesting attributes and add data from other sources which were not available to the logging process.</p>
<p>You&#8217;ll need that traffic data collector anyway for a gazillion of purposes where your analytics software fails, is not precise enough, or just can&#8217;t deliver a particular evaluation perspective. It&#8217;s a prerequisite for the method discussed here, but don&#8217;t build a monster sized cannon to chase a fly. You can <a href="http://oyoy.eu/site/sereferrer/">gather search engine referrer data from logfiles</a> too.</p>
<p>For example an interesting information is on which SERP a user clicked a link pointing to your site. Simplified you need three attributes in your working tables to store this info: search engine, search term, and SERP number. You can extract these values from the HTTP_REFERER.</p>
<p><em>http://www.<b>google</b>.com/search?<b>q=keyword1+keyword2</b>~<br />&#038;ie=utf-8&#038;oe=utf-8&#038;aq=t&#038;rls=org.mozilla:en-US:official&#038;client=firefox-a</em><br />1. &#8220;google&#8221; in the server name tells you the search engine.<br />2. The &#8220;q&#8221; variable&#8217;s value tells you the search term &#8220;keyword1 keyword2&#8243;.<br />3. The lack of a &#8220;start&#8221; variable tells you that the result was placed on the first SERP. The lack of a &#8220;num&#8221; variable lets you assume that the user got 10 results per SERP, so it&#8217;s quite safe to say that you rank in the top 10 for this term. Actually, the number of results per page is not always extractable from the URL because it&#8217;s pulled from a cookie usually, but not so many surfers change their preferences (e.g. less than 0.5% surf with 100 results <a href="http://www.cre8asiteforums.com/forums/index.php?showtopic=51682">according to</a> <a href="http://johnmu.com/">JohnMu</a> and my data as well). If you&#8217;ve got a &#8220;num&#8221; value then add 1 and divide the result by 10 to make the data comparable. If that&#8217;s not precise enough you&#8217;ll spot it afterwards, and you can always recalculate SERP numbers from the canned referrer.</p>
<p><em>http://www.google.co.uk/search?q=keyword1+keyword2~<br />&#038;hl=en&#038;<b>start=10</b>&#038;sa=N</em><br />1. and 2. as above.<br />3. The &#8220;start&#8221; variable&#8217;s value 10 tells you that you got a hit from the second SERP. When start=10 and there is no &#8220;num&#8221; variable, most probably the searcher got 10 results per page.</p>
<p><em>http://www.google.es/search?q=keyword1+keyword2~<br />&#038;rls=com.microsoft:*&#038;ie=UTF-8&#038;oe=UTF-8&#038;<b>startIndex=</b>~<br />&#038;<b>startPage=1</b></em><br />1. and 2. as above.<br />3. The empty &#8220;startIndex&#8221; variable and startPage=1 are useless, but the lack of &#8220;start&#8221; and &#8220;num&#8221; tells you that you&#8217;ve got a hit from the 1st spanish SERP.</p>
<p><em>http://www.google.ca/search?q=keyword1+keyword2~<br />&#038;hl=en&#038;rls=GGGL,GGGL:2006-30,GGGL:en&#038;<b>start=20</b>~<br />&#038;<b>num=20</b>&#038;sa=N</em><br />1. and 2. as above.<br />3. num=20 tells you that the searcher views 20 results per page, and start=20 indicates the second SERP, so you rank between #21 and #40, thus the (averaged) SERP# is 3.5 (provided SERP# is not an integer in your database).</p>
<p>You got the idea, here is a <a href="http://www.joostdevalk.nl/wp-content/uploads/2007/07/google-url-parameters.pdf">cheat sheet</a> and <a href="http://code.google.com/apis/soapsearch/reference.html#2_1">official</a> <a href="http://code.google.com/enterprise/documentation/xml_reference.html">documentation</a> on Google&#8217;s URL parameters. Analyze the URLs in your referrer logs and call them with <em>cookies off</em> what disables your personal search preferences, then play with the values. Do that with other search engines too.</p>
<p>Now a subset of your traffic data has a value in &#8220;search engine&#8221;. Aggregate tuples where search engine is not NULL, then select the results for example where SERP number is lower or equal 3.99 (respectively 4), ordered by SERP number ascending, hits descending and keyword phrase, break by search engine. (Why sorted by traffic descending? You have a report of your best performing keywords already.)    </p>
<p>The result is a list of search terms you rank for on the first 4 SERPs, beginning with keywords you&#8217;ve probably not optimized for. At least you didn&#8217;t optimize the snippet to improve CTR, so your ranking doesn&#8217;t generate a reasonable amount of traffic. Before you study the report, throw away your site owner hat and try to think like a consumer. Sometimes those make use of a vocabulary you didn&#8217;t think of before.</p>
<p>Research promising keywords, and decide whether you want to push, bury or ignore them. Why bury? Well, in some cases you just don&#8217;t want to rank for a particular search term, [your product sucks] being just one example. If the ranking is fine, the search term smells somewhat lucrative, and just the snippet sucks in a particular search query&#8217;s context, <a href="http://sebastians-pamphlets.com/google-assists-serp-click-through-optimization/">enhance your SERP listing</a>.</p>
<p>Every once in a while you&#8217;ll discover a search term making a killing for your competitors whilst you never spotted it because your stats package reports only the best 500 monthly referrers or so. Also, you&#8217;ll get the most out of your rankings by optimizing their SERP CTRs. </p>
<p>Be crative, over time your traffic database becomes more and more valuable, allowing other unconventional and/or site specific reports which off-the-shelf analytics software usually does not deliver. Most probably your competitors use standard analytics software, individually developed algos and reports can make a difference. That does not mean you should throw away your analytics software to reinvent the wheel. However, once you&#8217;re used to self developed analytic tools you&#8217;ll think of more interesting methods not only to analyse and monitor rankings by human traffic than you can implement in this century <img src='http://sebastians-pamphlets.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p><b>Bear in mind that the method outlined above does not and cannot replace <a href="http://learn.wordtracker.com/articles/keyword-inspiration-aaron-wall-of-seobookcom-shares-his-secrets/">serious keyword research</a>.</b>  </p>
<p>Another &#8211;very popular&#8211; approach to get this info would be automated ranking checks mashed up with hits by keyword phrase. Unfortunately, Google and other engines do not permit automated queries for the purpose of ranking checks, and this method works with preselected keywords, that means you don&#8217;t find (all) search terms created by users. Even when you compile your ranking checker&#8217;s keyword lists via various keyword research tools, you&#8217;ll still miss out on some interesting keywords in your seed list.</p>
<p><b>Related thoughts:</b> <a href="http://www.seo-scoop.com/2007/08/07/why-i-check-rankings/">Why regular and automated ranking checks are necessary when you operate seasonal sites</a> by <a href="http://www.seo-scoop.com/">Donna</a></p>
<hr />Copyright &copy; 2008 <strong><a href="http://sebastians-pamphlets.com/">Sebastian`s Pamphlets</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator/feed reader, the site you are looking at is guilty of copyright infringement and will be put down immediately. Please contact sebastians-pamphlets.com so we can take legal action immediately.<br /><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://sebastians-pamphlets.com/analyzing-search-engine-rankings-by-human-traffic/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
