TCS-TR-A-05-2Date: Thu Feb 17 18:41:45 2005 Title: Reputation Extraction Using Both Structural and Content Information Authors: H. Hasegawa, M. Kudo and A. Nakamura Contact:
Abstract. We propose a new method of extracting texts related to a given keyword from Web pages collected by a search engine. By combining structural pattern matching and text classification, texts related to a given keyword such as reputations of a given restaurant can be extracted automatically from Web pages in unfixed sites, which is impossible by conventional wrappers. According to our cross validation results on extracting reputations of a given Ramen shop from Web pages collected by a search engine, our method achieved 79.3% precision and 56.6% recall by allowing acceptable errors. ©Copyright 2005 Authors |