최신 개체명의 번역을 위한 효율적인 그래프 기반 방법론
- 최신 개체명의 번역을 위한 효율적인 그래프 기반 방법론
- Date Issued
- Named Entities (NEs) normally refer to a range of concepts such as people names, location names, organization names, and product names. As large quantities of newnamed entities (or emerging named entities) appear everyday in newspaper, web sites, and TV programs, NE analysis becomes more and more important in data mining and information retrieval society.Information on NEs can be extracted from (a) structured sources such as databases and tables, (b) semi-structured sources such as knowledge bases (or, called interchangeably as ontologies), or (c) unstructured sources such as text corpora. Among many researchtopics related with NE analysis such as ontology integration, named entity linking, and named entity translation, this dissertation addresses the problem of mining NE translations from comparable corpora, specifically, mining English and Chinese NE translation.I observe that existing approaches use one or more of the following NE similarity metrics: entity name similarity, entity context similarity, and entity relationship similarity.Motivated by this observation, this dissertation proposes a new holistic approach, by (1) combining all similarity types used and (2) additionally considering a new similaritymeasure, relationship context similarity between pairs of NEs, which is a missing quadrant in the taxonomy of similarity metrics. I abstract the NE translation problem asthe matching of two NE graphs extracted from the comparable corpora. Specifically, two monolingual NE graphs are first constructed from comparable corpora to extract relationship between NEs. Entity name similarity and entity context similarity are then calculated from every pair of bilingual NEs for computing initial pairwise NE similarity. A reinforcing method is utilized to reflect relationship similarity and relationship context similaritybetween NEs. I also discover corpus “latent” features lost in the graph extraction process and integrate them into proposed framework, and improve relationship-based similarities by overcoming asymmetry of comparable corpora and considering other types of NEs. According to the experimental results, proposed holistic graph-based approaches and its enhancements are highly effective and proposed framework significantly outperforms previous state-of-the-art approaches.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.