Semantic similarity measures between words play an important role in relation extraction, community mining, document clustering, and automatic metadata extraction. For a computer to decide the semantic similarity, it should understand the semantics of the words. Computer being a syntactic machine, it cannot understand the semantics. So always an attempt is made to represent the semantics as syntax. There are various methods proposed to find the semantic similarity between words. Some of these methods have used the precompiled databases like WordNet and Brown Corpus. Some are based on Web Search Engine. In this paper we have described the methods based on the web search engine. Proposed method is an empirical method to estimate semantic similarity using page counts, text snippets retrieved from a web search engine for two words. Specifically, various word co-occurrence measures using page counts are integrate those with feature vector of lexical patterns extracted from text snippets. The proposed method will outperform various baselines and previously proposed web-based semantic similarity measures on benchmark data sets showing a high correlation with human ratings.
Semantic Similarity, Wordnet, Brown Corpus, Web Search Engine