통계기계번역에서 문장구조와 단어에 기반한 클러스터링
- 통계기계번역에서 문장구조와 단어에 기반한 클러스터링
- Date Issued
- Clustering method which based on sentence type or document genre is a technique used to improve translation quality of statistical machine translation (SMT) by domain-specific translation. But there is no previous research using sentence type information and document genre simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Both similarities are calculated by cosine measures and interpolated. With these similarities, we used K-means machine learning algorithm to clustering training corpus. Compared to previous approach in Japanese-English patent translation corpus, this approach relatively improved 14% of translation quality.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.