Text Mining Using Markov Chains of Variable Length

Authors: Björn Hoffmeister and Thomas Zeugmann

Source: Federation over the Web: International Workshop, Dagstuhl Castle, Germany, May 1-6, 2005. Revised Selected Papers, (Klaus P. Jantke, Aran Lunzer, Nicolas Spyratos, Yuzuru Tanaka, Eds), Lecture Notes in Artificial Intelligence 3847, pp. 1 - 24, Springer 2006.

Abstract. When dealing with knowledge federation over text documents one has to figure out whether or not documents are related by context. A new approach is proposed to solve this problem.

This leads to the design of a new search engine for literature research and related problems. The idea is that one has already some documents of interest. These documents are taken as input. Then all documents known to a classical search engine are ranked according to their relevance. For achieving this goal we use Markov chains of variable length.

The algorithms developed have been implemented and testing over the Reuters-21578 data set has been performed.


©Copyright 2006, Springer