Archived posts from the 'Uncategorized' Category

Blogging is not a crime!

Blogger Aaron Wall sued over comments on his blog

The short story is this: Aaron Wall is being sued over comments left on his blog by his readers about a notoriously unsavoury company called Traffic Power, or 1p. Within the Search industry, these people are regarded as the lowest of the low, and if you dig through some of those Search results, or the links at the bottom of this post, you’ll find all the gory details. Suffice to say, they are considered theives and villains by the overwhelming majority of the Search Marketing community.

We don’t think it’s right, do you?

We feel this has to end here. There is far more at stake than a scummy company vs a blogger - this is about free speech on blogs, and the right for users to comment, without blog publishers having to fear lawsuits.

So, What Can YOU Do?

See the graphic at the right below the links? It links to the donate to Aarons legal costs post. You can start by giving him a few $$$’s to fight these people effectively.

Help Promote the Blogging is NOT a Crime Campaign

By using one of these lovely graphical banners on your blog, forum or website, you will help spread the word, and raise more cash - enabling better lawyers and legal council. Simply pick one that fits your site, and link it to the donate post.

Simple eh? Don’t you feel GOOD now?

Resources

There is much to the history of TP/1p, and you can find a lot of it in the Google searches linked above, but recently, these are the more notable posts and discussions on the subject if you’d like more in-depth information to quote and link to.

Please redistribute this ThreadWatch post if you wish.
And please put one of the banners on your blog to help spread the word!

Thank you for your support of free speech on blogs and elsewhere on the net!



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Awesome: Ms. Googlebot Provides Reports

Before Yahoo’s Site Explorer goes live, Google provides advanced statistics in the sitemap program. Ms. Googlebot now tells the webmaster which spider food she refused to eat, and why. The ‘lack of detailed stats’ has produced hundreds of confused posts in the Google Sitemaps Group so far.

As Google Sitemaps was announced at June/02/2005, Shiva Shivakumar statedWe are starting with some basic reporting, showing the last time you’ve submitted a Sitemap and when we last fetched it. We hope to enhance reporting over time, as we understand what the webmasters will benefit from“. Google’s Sitemaps team closely monitored the issues and questions brought up by the webmasters, and since August/30/2005 there are enhanced stats. Here is how it works.

Google’s crawler reports provide information on URIs spidered from sitemaps and URIs found during regular crawls by following links, regardless whether the URI is listed in a sitemap or not. Ms. Googlebot’s error reports are accessible for a site’s webmasters only, after a more or less painless verification of ownership. They contain all sorts of errors, for example dead links, conflicts with exclusions in the robots.txt file and even connectivity problems.

Google’s crawler report is a great tool, kudos to the sitemaps team!

More good news from the Sitemaps Blog:
Separate sitemaps for mobile content to enhance a site’s visibility in Google’s mobile search.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Overlooked Duplicated Content Vanishing from Google’s Index

Does Google systematically wipe out duplicated content? If so, does it affect partial dupes too? Will Google apply site-wide ’scraper penalties’ when a particular dupe-threshold gets reached or exceeded?

Following many ‘vanished page posts’ with links on message boards and usenet groups, and monitoring sites I control, I’ve found that indeed there is kinda pattern. It seems that Google is actively wiping dupes out. Those get deleted or stay indexed as ‘URL only’, not moved to the supplemental index.

Example: I have a script listing all sorts of widgets pulled from a database, where users can choose how many items they want to see per page (values for #of widgets/page are hard coded and all linked), combined with prev¦next-page links. This kind of dynamic navigation produces tons of partial dupes (content overlaps with other versions of the same page). Google has indexed way too many permutations of that poorly coded page, and foolishly I didn’t take care of it. Recently I got alerted as Googlebot-Mozilla requested hundreds of versions of this page within a few hours. I’ve quickly changed the script, putting a robots NOINDEX meta tag when the content overlaps, but probably too late. Many of the formerly indexed (cached, appearing with title and snippets on the SERPs) URLs have vanished, respectively became URL-only listings. I expect that I’ll lose a lot of ‘unique’ listings too, because I’ve changed the script in the middle of the crawl.

I’m posting this before I’ve solid data to backup a finding, because it is a pretty common scenario. This kind of navigation is used at online shops, article sites, forums, SERPs … and it applies to aggregated syndicated content too.

I’ve asked Google whether they have a particular recommendation, but no answer yet. Here is my ‘fix’:

Define a straight path thru the dynamic content, where not a single displayed entry overlaps with another page. For example if your default value for items per page is 10, the straight path would be:
start=1&items=10
start=11&items=10
start=21&items=10

Then check the query string before you output the page. If it is part of the straight path, put a INDEX,FOLLOW robots meta tag, otherwise (e.g. start=16&items=15) put NOINDEX.

I don’t know whether this method can help with shops using descriptions pulled from a vendor’s data feed, but I doubt it. If Google can determine and suppress partial dupes within a site, it can do that with text snippets from other sites too. One question remains: how does Google identify the source?

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Information is Temporarily Unavailable

I’ve added my favorite feeds to my personalized Google home page. Unfortunately, “Information is temporarily unavailable” is the most repeated sentence on this page. It takes up to 20 minutes before all feeds are fetched and shown with current headlines. That’s weird, because Googlebot pulls known feeds every 15 minutes, often a few times per second.

It seems to me that Google does not cache (all, if any) RSS feeds. When I refresh the page while watching my crawler monitor screen, I can see Googlebot quickly fetching the ‘unavailable’ feeds from my server. After a while they get updated on the page.

Some feeds can’t be added. For example Google’s news feed ‘http://news.google.com/news?hl=en&ned=us&q=Google+Sitemaps&ie=UTF-8&output=rss&scoring=d’ doesn’t show, not even after dozens of clicks on ‘Go’ during the last week. The same happens every once in a while with MyYahoo too by the way. Microsoft’s sandbox even likes to crash my browser, what I consider the usual behavior of MS products, sandboxed or not.

Tags:



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Matt Cutts Slashdotted?

Matt Cutts starting to blog is great news. Bad news is that the blogosphere and search related sites shortly after the first announcement seem to generate that much traffic, that his blog is currently unreachable. If Google is not willing to host his blog, I’d like to be the first one to donate for a suitable hosting - Matt Cutts’ advice is worth a reasonable donation.

[Update] No slashdotting involved. Matt posts:

The site was down for a few hours today. I had visions of hordes of overenthusiastic SEOs, but my webhost said it was nothing to do with my site specifically–they said their server crashed “catastrophically.” I wanted to ask if they were sure the server wasn’t allergic to me, but they seemed rather busy.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Good News from Google

Google is always good for a few news: since yesterday news queries are availavle as RSS feed. That’s good news, although Google shoves outdated HTML (font tags and alike) into the item description. It’s good practice to separate the content from its presentation, and hard coded background colors in combination with foreign CSS can screw a page, thus webmasters must extract the text content if they want to make use of Google’s news feeds.

As for Google and RSS, to adjust Ms. Googlebot’s greed on harvested feeds, Google needs to install a ping service. Currently Ms. Googlebot requests feeds way too often, because she spiders them based on guesses and time schedules (one or more fetches every 15 minutes). From my wish list: http://feeds.google.com/ping?feedURI usable for submissions and pings on updates.

Google already makes use of ping technology in the sitemap program, so a ping server shouldn’t be a big issue. Apropos sitemaps: the Google Sitemaps team has launched Inside Google Sitemaps. While I’m on Google bashing, here is a quote from the welcome post (tip: a prominent home link on every page wouldn’t hurt, especially since the title is linked to google.com instead of the blog):

When you submit your Sitemap, you help us learn more about the contents of your site. Participation in this program will not affect your pages’ rankings or cause your pages to be removed from our index.

That’s not always true. Googlebot discovering a whole site will find a lot of stuff which is relevant for rankings, for example anchor text of internal links on formerly unknown pages, and this may improve a site’s overall search engine visibility. On the other hand sitemap based junk submissions can easily tank a site on the SERPs.

Last but not least Google has improved its wildcard search and can tell us now what SEO is all about *. Compare the search result to Google’s official SEO page and wonder.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

The Power of Search Related Blogs

Aaron Wall posted a great call to help stop the war in northern Uganda. He argues that when search blogs can get the #1 spot at for a missing member of the community, the same can be done to gain attention to a war where children get abused as cannon fodder. Visit Uganda Conflict Action Network for more information, because you won’t find it at CNN or elsewhere.

Aaron’s call for action:

If you do not like the idea of children being abuducted, murdered, and living in constant fear please help. A few options:

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Google Has Something Cooking

Covered by a smoke screen, Google performs an internal and top secret summer of coding. Charlie Ayers, Google’s famous chef, has decided to leave the GooglePlex. Larry & Sergey say that hungry engineers do work harder on algo tweaks and celestial projects as well. While SEOs and Webmasters are speculating on the strange behavior of the saucy Googlebot sisters, haggard engineers are cooking their secret sauce in the labs. Under those circumstances, some collateral damage is preprogrammed, but hungry engineers don’t care of a few stinkin’ directories they blow away by accident. Shit happens, don’t worry, failure is automated by Google. Seriously, wait for some exciting improvements in the googlesphere.

Tags: Bollocks removed. Here you go.



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Fresh Content is King

Old news:
A bunch of unique content and a high update frequency increases search engine traffic.
Quite new:
Leading crawlers to fresh content becomes super important.
Future news:
Dynamic Web sites optimized to ping SE crawlers outrank established sites across the boards.

Established methods and tools to support search engine crawlers are clever internal linkage, sitemap networks, ‘What’s new’ pages, inbound links from high ranked and often changed pages etc etc. To a limited degree they still lead crawlers to fresh and not yet spidered old content. Time to crawl and time to index are dissatisfying, because the whole system is based on pulling and depends on the search engine backend system’s ability to guess.

Look back and forth at Google: Google News, Froogle, Sitemaps and rumors on blogsearch indicate a change from progressive pulling of mass data to proactive and event driven picking of fewer fresh data. Google will never stop crawling based on guessing, but has learned how to localize fresh content in no time by making use of submissions and pings.

Blog search engines more or less perfectly fulfil the demand on popular fresh content. The blogosphere pings blog search engines, that is why they are that up to date. The blogosphere is huge and the amount of blog posts is enormous, but it is just a tiny part of the Web. Even more fresh content is still published elsewhere, and elsewhere is the playground of the major search engines, not even touched by blog search engines.

Google wants to dominate search, and currently it does. Google cannot ignore the demand on fresh and popular content, and Google cannot lower the relevancy of search results. Will Google’s future search results be ranked by sort of ‘recent relevancy’ algos? I guess not in general, but ‘recent relevancy’ is not an oxymoron, because Google can learn to determine the type of the requested information and deliver more recent or more relevant results depending on the query context and tracked user behavior. I’m speculating here, but it is plausible and Google already has developed all components necessary to assemble such an algo.

Based on the speculation above, investments in RSS technology and alike should be a wise business decision. If ‘ranking by recent relevancy’ or something similar comes true, dynamic Web sites with the bigger toolset will often outrank the established but more static organized sources of information.

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

Bait Googlebot With RSS Feeds

Seeing Ms. Googlebot’s sister running wild on RSS feeds, I’m going to assume that RSS feeds may become a valuable tool to support Google’s fresh and deep crawls. Test it for yourself:

Create a RSS feed with a few unlinked or seldom spidered pages which are not included in your XML sitemap. Add the feed to your personalized Google Home Page (’Add Content’ -> ‘Create Section’ -> Enter Feed URL). Track spider accesses to the feed and the included pages as well. Most probably Googlebot will request your feed more often than Yahoo’s FeedSeeker and similar bots. Chances are that Googlebot-Mozilla is nosy enough to crawl at least some of the pages linked in the feed.

That does not help a lot with regard to indexing and ranking, but it seems to be a neat procedure helping the Googlebot sisters spotting fresh content. In real life add the pages to your XML sitemap, link to them and acquire inbound links…

To test the waters, I’ve added RSS generation to my Simple Google Sitemaps Generator. This tool reads a plain page list from a text file, and generates a dynamic XML sitemap, a RSS 2.0 site feed and a hierarchical HTML site map.

Related article on Google’s RSS endeavors: Why Google is an RSS laggard

Tags: ()



Share/bookmark this: del.icio.usGooglema.gnoliaMixxNetscaperedditSphinnSquidooStumbleUponYahoo MyWeb
Subscribe to      Entries Entries      Comments Comments      All Comments All Comments
 

« Previous Page  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9  Next Page »