Search Quality

Archived posts from the 'Search Quality' Category

Another way to implement a site search facility

Posted on 12 June, 2007

Providing kick ass navigation and product search is the key to success for e-commerce sites. Conversion rates highly depend on user friendly UIs which enable the shopper to find the desired product with a context sensitive search in combination with few drill-down clicks on navigational links. Unfortunately, the build-in search as well as navigation and site structure of most shopping carts simply sucks. Every online store is different, hence findability must be customizable and very flexible.

I’ve seen online shops crawling their product pages with a 3rd party search engine script because the shopping cart’s search functionality was totally and utterly useless. Others put fantastic efforts in self made search facilities which perfectly implement real life relations beyond the limitations of the e-commerce software’s data model, but need code tweaks for each and every featured product, specials, virtual shops assembling a particular niche from several product lines or whatever. Bugger.

Today I stumbled upon a very interesting approach which could become the holy grail for store owners suffering from crappy software. Progress invited me to discuss a product they’ve bought recently -EasyAsk- from a search geek’s perspective. Long story short, I was impressed. Without digging deep into the technology or reviewing implementations for weaknesses I think the idea behind that tool is promising.

Unfortunately, the EasyAsk Web site doesn’t provide solid technical and architectural information (I admit that I may have missed the tidbits within the promotional chatter), hence I try to explain it from what I’ve gathered today. Progress EasyAsk is a natural language interface connecting users to data sources. Users are shoppers, and staff. Data sources are (relational) databases, or data access layers (that is a logical tier providing a standardized interface to different data pools like all sorts of databases, (Web) services, an enterprise service bus, flat files, XML documents and whatever).

The shopper can submit natural language queries like “yellow XS tops under 30 bucks”. The SRP is a page listing tops and similar garments under 30.00$, size XS, illustrated with thumbnails of pics of yellow tops and bustiers, linked to the product pages. If yellow tops in XS are sold out, EasyAsk recommends beige tops instead of delivering a sorry-page. Now when a search query is submitted from a page listing suits, a search for “black leather belts” lists black leather belts for men. If the result set is too large and exceeds the limitations of one page, EasyAsk delivers drill-down lists of tags, categories and synonyms until the result set is viewable on one page. The context (category/tag tree) changes with each click and can be visualized for example as bread crumb nav link.

Technically spoken, EasyAsk does not deal with the content presentation layer itself. It returns XML which can be used to create a completely new page with a POST/GET request, or it gets invoked as AJAX request whose response just alters DOM objects to visualize the search results (way faster but not exactly search engine friendly - that’s not a big deal because SERPs shouldn’t be crawlable at all). Performance is not an issue from what I’ve seen. EasyAsk caches everything so that the server doesn’t need to bother the hard disk. All points of failure (WRT performance issues) belong to the implementation, thus developing a well thought out software architecture is a must-have.

Well, that’s neat, but where’s the USP? EasyAsk comes with a natural language (search) driven admin interface too. That means that product managers can define and retrieve everything (attributes, synonyms, relations, specials, price ranges, groupings …) using natural language. “Gimme gross sales of leather belts for men II/2007 compared to 2006″ delivers a statistic and “top is a synonym for bustier and the other way round” creates a relation. The admin interface runs in the Web browser, definitions can be submitted via forms and all admin functions come with previews. Really neat. That reduces the workload of the IT dept. WRT ad-hoc queries as well as for lots of structural change requests, and saves maintenance costs (Web design / Web development).

I’ve spotted a few weak points, though. For example in the current version the user has to type in SKUs because there’s no selection box. Or meta data are stored in flat files, but that’s going to change too. There’s no real word stemming, EasyAsk handles singular/plural correctly and interprets “bigger” as “big” or “xx-large” politically correct as “plus”, but typos must be collected from the “searches without results” report and defined as synonym. The visualization of concurrent or sequentially applied business rules is just rudimentary on preview pages in the admin interface, so currently it’s hard to track down why particular products get downranked respectively highlighted when more than one rule applies. Progress told me that they’ll make use of 3rd party tools as well as in house solutions to solve these issues in the near future - the integration of EasyAsk into the Progress landscape has just begun.

The definitions of business language / expected terms used by consumers as well as business rules are painless. EasyAsk has build-in mappings like color codes to common color names and vice versa, understands terms like “best selling” and “overstock”, and these definitions are easy to extend to match actual data structures and niche specific everyday language.

Setting up the product needs consultancy (as a consultant I love that!). To get EasyAsk running it must understand the structure of the customer’s data sources, respectively the methods provided to fetch data from various structured as well as unstructured sources. Once that’s configured, EasyAsk pulls (database) updates on schedule (daily, hourly, minutely or whatever). It caches all information needed to fulfill search requests, but goes back to the data source to fetch real time data when the search query requires knowledge of not (yet) cached details. In the beginning such events must be dealt with, but after a (short) while EasyAsk should run smoothly without requiring much technical interventions (as a consultant I hate that, but the client’s IT department will love it).

Full disclosure: Progress didn’t pay me for that post. For attending the workshop I got two books (”Enterprise Service Bus” by David A. Chappel and “Getting Started with the SID” by John P. Reilly) and a free meal, travel expenses were not refunded. I did not test the software discussed myself (yet), so perhaps my statements (conclusions) are not accurate.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

3 comments Sebastian | Site-Search, Progress EasyAsk, Search Quality, AJAX, E-Commerce

Google to kill the power of links

Posted on 8 June, 2007

Well, a few types of links will survive and don’t do evil in Google’s search index I’ve updated my first take on Google’s updated guidelines stating paid links and reciprocal links are evil. Well, regardless whether one likes or dislikes this policy, it’s already factored in - case closed by Google. There are so many ways to generate natural links …

The official call for paid-link reports is pretty much disliked across the boards:
Google is Now The Morality Police on the Internet
Google’s Ideal Webmaster: Snitch, Rake It In And Don’t Deliver
Other sites can hurt your ranking
Google’s Updated Webmaster Guidelines Addresses Linking Practices
Google clarifies its stance on links

More information, and discussion of paid/exchanged links in my pamphlets:
Matt Cutts and Adam Lasnik define “paid link”
Where is the precise definition of a paid link?
Full disclosure of paid links
Revise your linkage
Link monkey business is not worth a whoop
Is buying and selling links risky? (02/2006)

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

Be the first to comment Sebastian | Reciprocal Links, Search Quality, Risky Linkage, Paid Links, Google, SEO, Nofollow

Danny Sullivan did not strip for Matt Cutts

Posted on 7 June, 2007

Nope, this is not recycled news. I’m not referring to Matt asking Danny to strip off his business suit, although the video is really funny. I want to comment on something Matt didn’t say recently, but promised to do soon (again).

Danny Sullivan stripped perfectly legit code from Search Engine Land because he was accused to be a spammer, although the CSS code in question is in no way deceitful.

StandardZilla slams poor Tamar just reporting a WebProWorld thread, but does an excellent job in explaining why image replacement is not search engine spam but a sound thing to do. Google’s recently updated guidelines need to tell more clearly that optimizing for particular user agents is not considered deceitful cloaking per se. This would prevent Danny from stripping (code) not for Matt or Google but for lurid assclowns producing canards.

Share/bookmark this: del.icio.us • Google • ma.gnolia • Mixx • Netscape • reddit • Sphinn • Squidoo • StumbleUpon • Yahoo MyWeb
Subscribe to

Entries

Comments

All Comments

5 comments Sebastian | Webspam, Search Quality, Crap, Cloaking, SEO, Google

Google enhances the quality guidelines

Posted on 5 June, 2007

Maybe todays update of Google’s quality guidelines is the first phase of the Webmaster help system revamp project. I know there’s more to come, Google has great plans for the help center. So don’t miss out on the opportunity to tell Google’s Webmaster Central team what you’d like to have added or changed. Only 14 replies to this call for input is an evidence of incapacity, shame on the Webmasters community.

I haven’t had the time to write a full-blown review of the updates, so here are just a few remarks from a Webmaster’s perspective. Scroll down to Quality guidelines - specific guidelines to view the updates, that means click the links to the new (sometimes overlapping) detail pages.

As always, the guidelines outline best practices of Web development, refer to common sense, and don’t encourage over-interpretations (not that those are avoidable, nor utterly useless). Now providing Webmasters with more explanatory directives, detailed definitions and even examples in the “Don’ts” section is very much appreciated. Look at the over five years old first version of this document before you bitch

Avoid hidden text or hidden links
The new help page on hidden text and links is descriptive and comes with examples, well done. What I miss is a hint with regard to CSS menus and other content which is hidden until the user performs a particular action. Google states “Text (such as excessive keywords) can be hidden in several ways, including […] Using CSS to hide text”. The same goes for links by the way. I wish they would add something in the lines of “… Using CSS to hide text in a way that a user can’t visualize it by a common action like moving the mouse over a pointer to a hidden element, or clicking a text link or descriptive widget or icon”. The hint at the bottom “If you do find hidden text or links on your site, either remove them or, if they are relevant for your site’s visitors, make them easily viewable” comes close to this but lacks an example.

Susan Moskwa from Google clarifies what one can hide with CSS, and what sorts of CSS hidden stuff is considered a violation of the guidelines, in the Google forum on June/11/2007:

If your intent in hiding text is to deceive the search engines, we frown on that; if your intent is purely to improve the visual user experience (e.g. by replacing some text with a fancier image of that same text), you don’t need to worry. Of course, as with many techniques, there are shades of gray between “this is clearly deceptive and wrong” and “this is perfectly acceptable”. Matt [Cutts] did say that hiding text moves you a step further towards the gray area. But if you’re running a perfectly legitimate site, you don’t need to worry about it. If, on the other hand, your site already exhibits a bunch of other semi-shady techniques, hidden text starts to look like one more item on that list. […] As the Guidelines say, focus on intent. If you’re using CSS techniques purely to improve your users’ experience and/or accessibility, you shouldn’t need to worry. One good way to keep it on the up-and-up (if you’re replacing text w/ images) is to make sure the text you’re hiding is being replaced by an image with the exact same text.

Don’t use cloaking or sneaky redirects
This sentence in bold red blinking uppercase letters should be pinned 5 pixels below the heading: “When examining […] your site to ensure your site adheres to our guidelines, consider the intent” (emphasis mine). There are so many perfectly legit ways to do the content presentation, that it is impossible to assign particular techniques to good versus bad intent, nor vice versa.

I think this page leads to misinterpretations. The major point of confusion is, that Google argues completely from a search engine’s perspective and dosn’t write for the targeted audience, that is Webmasters and Web developers. Instead of all the talk about users vs. search engines, it should distinguish plain user agents (crawlers, text browsers, JavaScript disabled …) from enhanced user agents (JS/AJAX enabled, installed and activated plug-ins …). Don’t get me wrong, this page gives the right advice, but the good advice is somewhat obfuscated in phrases like “Rather, you should consider visitors to your site who are unable to view these elements as well”.

For example “Serving a page of HTML text to search engines, while showing a page of images or Flash to users [is considered deceptive cloaking]” puts down a gazillion of legit sites which serve the same contents in different formats (and often under different URLs) depending on the ability of the current user agent to render particular stuff like Flash, and a bazillion of perfectly legit AJAX driven sites which provide crawlers and text browsers with a somewhat static structure of HTML pages, too.

“Serving different content to search engines than to users [is considered deceptive cloaking]” puts it better, because in reverse that reads “Feel free to serve identical contents under different URLs and in different formats to users and search engines. Just make sure that you accurately detect the capabilities of the user agent before you decide to alter a requested plain HTML page into a fancy conglomerate of flashing widgets with sound and other good vibrations, respectively vice versa”.

Don’t send automated queries to Google
This page doesn’t provide much more information than the paragraph on the main page, but there’s not that much to explain: don’t use WebPosition Gold™. Period.

Don’t load pages with irrelevant keywords
Tells why keyword stuffing is not a bright idea, nothing to note.

Don’t create multiple pages, subdomains, or domains with substantially duplicate content
This detail page is a must read. It starts with a to the point definition “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar”, followed by a ton of good tips and valuable information. And fortunately it expresses that there’s no such thing as a general duplicate content penalty.

Don’t create pages that install viruses, trojans, or other badware
Describes Google’s service in partnership with StopBADware.org, highlighting the quickest procedure to get Google’s malware warning removed.

Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches such as affiliate programs with little or no original content
The info on doorway pages is just a paragraph on the “cloaking and sneaky redirect” page. I miss a few tips on how one can identify unintentional doorway pages created by just bad design, without any deceptive intent. Also, I think a few sentences on thin SERP-like pages would be helpful in this context.

“Little or no original content” targets thin affiliate sites, again doorway pages, auto-generated content, and scraped content. It becomes clear that Google does not love MFA sites.

If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first
The link points to the “Little or no original content” page mentioned above.

Don’t trade, exchange, swap or purchase links for ranking purposes
“Buying links in order to improve a site’s ranking is in violation of Google’s webmaster guidelines and can negatively impact a site’s ranking in search results. […] Google works hard to ensure that it fully discounts links intended to manipulate search engine results, such link exchanges and purchased links.”

Basically that means: if you purchase a link, then make dead sure it’s castrated or Google will take away the ability to pass link love from the page (or even site) linking out for green. Or don’t get caught respectively denunciated by competitors (I doubt that’s a surefire tactic for the average Webmaster).

Note that in the second sentence quoted above Google states officially that link exchanges for the sole purpose of manipulating search engines are a waste of time and resources. That means reciprocal links of particular types nullify each other, and site links might have lost their power too. <speculation>Google may find it funny to increase the toolbar PageRank of pages involved in all sorts of link swap campaigns, but the real PageRank will remain untouched.</speculation>

There’s much confusion with regard to “paid link penalties”. To the best of my knowledge the link’s destination will not be penalized, but the paid link(s) will not (or no longer) increase its reputation, so that in case the link’s intention got reported or discovered ex-post its rankings may suffer. Penalizing the link buyer would not make much sense, and Googlers are known as pragmatic folks, hence I doubt there is such a penalty. <speculation>Possibly Google has a flag applied to known link purchasers (sites as well as webmasters), which -if it exists- might result in more scrupulous judgements of other optimization techniques.</speculation>

What I really like is that the Googlers in charge honestly tried to write for their audience, that is Webmasters and Web developers, not (only) search geeks. Hence the news is that Google really cares. Since the revamp is a funded project, I guess the few paragraphs where the guidelines are still mysterious (for the great unwashed), or even potentially misleading, will get an update soon. I can’t wait for the next phase of this project.

Vanessa Fox creates buzz at SMX today, so I’ll update this post when (if?) she blogs about the updates later on (update: Vanessa’s post). Perhaps Matt Cutts will comment the updated quality guidelines at the SMX conference today, look for Barry’s writeup at Search Engine Land, and SEO Roundtable as well as the Bruce Clay blog for coverage of the SMX Penalty Box Summit. Marketing Pilgrim covered this session too. This post at Search Engine Journal provides related info, and more quotes from Matt. Just one SMX tidbit: according to Matt they’re going to change the name of the re-inclusion request to something like a reconsideration request.