What is Latent Semantic Indexing?
March 21, 2008
When we had the first search engine in the web many webmasters were able to fool their algorithm adding as many keywords that they want to be ranked very well till LSI came to their algorithm. Now, you can’t play with their algorithm like this. For example, if your site is about furnitures and you want to be ranked and get a lot of traffic to your site even if people are searching for webhosting. Before LSI you were able to foul them adding as many “webhosting” keywords in one article with h1, h2, h3, bold tags and magic! You were ranked very well. But this time bots and algorithms are being so smart that even you sometimes think that this bot is a mid-human.
LSI allows search engines to determine the relevance of your site, that’s why when you write an article about 31 Killer Headlines Adwords tips you are able to find it in Google with keywords like: Adwords tips, writing adwords ads, good headlines, tips about headlines… Why? Because the article itself teach you how to write Headlines adwords which can mean you can write headlines for advertising purposes or topics in your blog not just Adwords Headlines. Think about it, if those tips works on adwords and people click it is because it can work as a topic blog or even in Digg. Did you thought about it? Search engine algorithm did it with the help of LSI.
LSI has a different process like it was let’s say before Google. LSI now looks not only for keywords but the content of your whole site not a single page or an article. As I said before, you may rank well for the first days or even hours but then LSI comes to the end of their processing steps and your article goes down for the previous ranked keyword. LSI determines if the site has other semantically similar keyword, this mean that algorithms can check if what you are writting is about your theme site or you just want to get higher rankings on something that is unrelated, also this method makes Google or any search engine using LSI to see if the content is duplicated.
If your site and articles are very close to your theme site you will be ranking very high and even search engine change their algorithm you will not be slapped. Another note: Google Adwords use this system to decide your Quality score, if your landing page or advertised site has Adsense they use their adsense data to determine if your site is being advertising on the right “product” or let’s say again theme.
Examples of Latent Semantic Indexting
Example 1:
- In an AP news wire database, a search for Saddam Hussein returns articles on the Gulf War, UN sanctions, the oil embargo, and documents on Iraq that do not contain the Iraqi president’s name at all.
- Looking for articles about Tiger Woods in the same database brings up many stories about the golfer, followed by articles about major golf tournaments that don’t mention his name. Constraining the search to days when no articles were written about Tiger Woods still brings up stories about golf tournaments and well-known players.
- In an image database that uses LSI indexing, a search on Normandy invasion shows images of the Bayeux tapestry - the famous tapestry depicting the Norman invasion of England in 1066, the town of Bayeux, followed by photographs of the English invasion of Normandy in 1944.
In all these cases LSI is ’smart’ enough to see that Saddam Hussein is somehow closely related to Iraq and the Gulf War, that Tiger Woods plays golf, and that Bayeux has close semantic ties to invasions and England. As we will see in our exposition, all of these apparently intelligent connections are artifacts of word use patterns that already exist in our document collection.
Example 2:
Source: Search Engine Watch
“Latent Semantic Indexing is often misunderstood in its true purpose. (It is based on the vector space model of
document classification.) Fundamentally, it operates at some level in a ranking algorithm to help alleviate issues with ranking pages purely by text pattern matching, by adding context.Using statistical analysis, LSI can discover that documents have words which are often used in the same context. For example, “apple” and “computer” will also have “Mac OS” and are therefore also relevant. The same thing applies with “windows” as an operating system as opposed to an invention for looking through walls. It’s all about trying to understand more about the nature and intent of the user query and returning information in context with the user’s search, even when they give a little clue as to the actual nature of the search. Incidentally, LSI is used by other search engines besides Google.”
How to know which words are related in Google?
- Do a search in Google with this sign before the keyword for example: ~apple:
Now you can notice that Apple is equal also to Mac, G4 and more. If you take another example ~domain names results will be: Domains, Domain Names, web hosting, register domain names, etc..
- Kwbrowse is another tool that you can find some similar keywords, I searched again for Apple:
- Use also Wordnet : WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.
Some resources to know more about LSI:
- Scientific papers on search engine development
- LSI-Related Publications (Articles, Reports, and Books)
- The searchers’ Library will fill the reading wants of any keen, seeking-minded searcher
- Patterns in Unstructured Data: Discovery, Aggregation, and Visualization: Great article and highly recommended to read.
- Readings in Latent Semantic Analysis for Cognitive Science and Education
Want One of the Cheapest and Affordable Hosting?
What Next?
Digg It
Save This Page
Sphinn It
Stumble it!
Favorite This Post

Posted in 

content rss
March 23rd, 2008 at 5:37 pm
Awesome post. How do you find time to make great articles like these, on top of your Money Making endavours?
I think LSI is one of those things that you need to be aware of, but you don’t need to overly concern yourself with it.
If your content is grammatically sound from the start, and your subject is properly defined, then all the LSI goodness should be taken care of. (you can always use LSI principles to optimize your content a little bit though…).