Collection-based compound noun segmentation for Korean information retrieval
SCIE
SCOPUS
- Title
- Collection-based compound noun segmentation for Korean information retrieval
- Authors
- Kang, IS; Na, SH; Lee, JH
- Date Issued
- 2006-11
- Publisher
- SPRINGER
- Abstract
- Compound noun segmentation is a key first step in language processing for Korean. Thus far, most approaches require some form of human supervision, such as pre-existing dictionaries, segmented compound nouns, or heuristic rules. As a result, they suffer from the unknown word problem, which can be overcome by unsupervised approaches. However, previous unsupervised methods normally do not consider all possible segmentation candidates, and/or rely on character-based segmentation clues such as bi-grams or all-length n-grams. So, they are prone to falling into a local solution. To overcome the problem, this paper proposes an unsupervised segmentation algorithm that searches the most likely segmentation result from all possible segmentation candidates using a word-based segmentation context. As word-based segmentation clues, a dictionary is automatically generated from a corpus. Experiments using three test collections show that our segmentation algorithm is successfully applied to Korean information retrieval, improving a dictionary-based longest-matching algorithm.
- Keywords
- compound noun segmentation; unsupervised method; Korean information retrieval; TEXT
- URI
- https://oasis.postech.ac.kr/handle/2014.oak/23823
- DOI
- 10.1007/s10791-006-9007-3
- ISSN
- 1386-4564
- Article Type
- Article
- Citation
- INFORMATION RETRIEVAL, vol. 9, no. 5, page. 613 - 631, 2006-11
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.