Keyword Density More Than Meets the Eye
By Ralph Tegtmeier

One of the standard elements of web page optimization is Keyword Density: up until very recently the ratio of keywords to rest of body text was generally deemed to be one of the most important factors employed by search engines to determine a web site's ranking.

However, this basically linear approach is gradually changing now. As mathematical linguistics and automatic content recognition technology progresses, the major search engines are shifting their focus towards "theme" biased algorithms that do not rely on analysis of individual web pages anymore but, rather, will evaluate whole web sites to determine their topical focus or "theme" and its relevance in relation to users' search requests.

This is not to say that keyword density is losing in importance, quite the contrary. However, it is turning into a lot more complex technology than a simple computation of word frequency per web page can handle.

Context analysis is now being determined by a number of auxiliary linguistic disciplines and technology. For example: * semantic text analysis * textlexical database technology * distribution analysis of lexical components (such as nouns, adjectives, verbs) * evaluation of distance between semantic elements * AI and data mining technology based pattern recognition; * term vector database technology, etc.

All these are now contributing to the increasing sophistication of the relevance determination process. If you feel this is beginning to sound too much like rocket science for comfort, you may not be very far from the truth. It seems that the future of search engine optimization will be determined by what the industry is fond to term the "word gurus".

A sound knowledge of fundamental linguist methodology plus more than a mere smattering of statistical calculus will most probably be paramount to achieve successful search engine rankings in the foreseeable future. Merely repeating the well worn mantra "content is king!", as some of the lesser qualified SEO professionals and very many amateurs are currently doing, may admittedly have a welcome sedative effect by creating a feeling of fuzzy warmth and comfort. But, for all practical purposes it is tantamount to whistling in the dark and fails miserably in doing justice to the overall complexity of the process involved.

It should be noted that we are talking present AND future here: many of the classical techniques of search engine optimization are still working more or less successfully, but there is little doubt that they are rapidly losing their cutting edge and will probably be as obsolete in a few months' time as spamdexing or invisible text - both optimization techniques well worth their while throughout the 90s - have become today.

So where does keyword density come into this equation? And how is it determined anyway?

There's the rub: the term "keyword density" is by no means as objective and clear-cut as many people (some SEO experts included) will have it! The reason for this is the inherent structure of hypertext markup language (HTM) code. As text content elements are embedded in clear text command tags governing display and layout, it is not easy to determine what should or should not be factored into any keyword density calculus.

The matter is complicated further by the fact that the meta tags inside a HTML page's header may contain keywords and description content: should these be added to the total word count or not? Seeing that some search engines will ignore meta tags altogether (e.g. Lycos, Excite and Fast/Alltheweb), whereas others are still considering them (at least partially), it gets even more confusing. What may qualify for a keyword density of 2% under one frame of reference (e.g. including meta tags, graphics ALT tags, comment tags, etc.) may easily be reduced to 1% or less under another.

Further questions arise. Will meta tags, following the Dublin Convention ("D.C. tags"), be counted in or not? And what about HTTP-EQUIV tags? Would you really bet the ranch that TITLE tags in tables, forms or DIV elements will be ignored? Etc., etc.

Another fundamental factor generating massive fuzziness left, right and center, is the issue of semantic delimiters. What's a "word" and what isn't? Determining a lexical unity (aka a "word") by punctuation is a common though pretty low tech method which may lead to some rather unexpected results.

Say you are featuring an article by an author named "John Doe" who happens to sport a master's degree in arts, commonly abbreviated as "M.A.". While most algorithms will correctly count "John" and "Doe" as separate words, the "M.A." string is quite another story. Some algorithms will actually count this for two words ("M" and "A") because of the period (dot) is considered a delimiter - whereas others (surprise!) will not. But how would you know which search engines are handling it in which way? Answer: you don't, and that's exactly where the problems start.

The only feasible approach to master this predicament is trial and error. The typical beginner's inquiry "What's the best keyword density for AltaVista?", understandable and basically rational as it may be, is best answered with the fairly frustrating but ultimately precise: "It all depends - your mileage may vary." It is only by experimenting with keyword densities under standardized, comparable conditions yourself that you will be able to come to significant and viable conclusions.

To get going, here are some links to pertinent programs that will help you determine (and, in one case, even generate) keyword densities.

KeyWord Density Analyzer (KDA)
An all time classic of client based keyword density software is Roberto Grassi's powerful KeyWord Density Analyzer (KDA). It is immensely configurable and offers a fully featured free evaluation version for download. Find it here (Expect to pay appr.$99 for the registered version.)

Concordance is a powerful client based text analysis tool for making word lists and concordances from electronic texts. A trial version can be downloaded here (Expect to pay appr. $89 for the registered version.)

fantomas keyMixer(TM)
Our own fantomas keyMixer(TM) is the world's first automatic keyword density generator, enabling you to create web pages with ultra precise densities to the first decimal digit. Read more about this server based Perl/CGI application by clicking on the above link. (Expect to pay appr. $99 for the registered version.)

About The Author
Ralph Tegtmeier is the co-founder and principal of Ltd. (UK) and GmbH (Belgium), a company specializing in webmaster software development, industrial-strength cloaking and search engine positioning services. He has been a web marketer since 1994 and is editor-in-chief of fantomNews, a free newsletter focusing on search engine optimization. You can contact him at