Entity Search and Mining Using Relational Matching
- Entity Search and Mining Using Relational Matching
- Date Issued
- As many entities such as people, locations, and organizations appear every day in multilingual news articles or Web sites, providing user with information on them is becoming more and more important. This dissertation addresses two topics of (1) named entity translation mining and (2) social entity search. In particular, we exploit entity-relationship graphs where each node represents an entity and each edge represents a relationship between two entities.
We first study the problem of mining named entity translations from two different language corpora, e.g., English and Chinese documents. This problem can be abstracted as finding correct mappings across two entity-relationship graphs extracted from two language corpora respectively. While existing efforts suffer from inaccuracy and scarcity, we can overcome these problems by leveraging massive monolingual co-occurrences from the entity-relationship graphs as evidences for mapping. Meanwhile, as the graph mapping process is very time-consuming, specifically, quadratic to the number of nodes or the number of edges, we suggest a partitioning-based parallelization algorithm on multicore architectures, which outperforms a straightforward computation-based parallelization.
We then study the problem of searching for social network entities, e.g., Twitter accounts, with rich information available on the Web, e.g., person names, attributes, and relationships to other people. For this purpose, we need to map such social network entities with Web entities. While existing solutions building upon naive textual matching inevitably suffer from low precision due to false positives (e.g., fake accounts) and false negatives (e.g., accounts using nicknames), we can overcome these problems by leveraging “relational” evidences from the entity-relationship graph.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.