DC Field | Value | Language |
---|---|---|
dc.contributor.author | Mu-Woong Lee | en_US |
dc.date.accessioned | 2014-12-01T11:48:13Z | - |
dc.date.available | 2014-12-01T11:48:13Z | - |
dc.date.issued | 2012 | en_US |
dc.identifier.other | OAK-2014-01107 | en_US |
dc.identifier.uri | http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001385016 | en_US |
dc.identifier.uri | https://oasis.postech.ac.kr/handle/2014.oak/1609 | - |
dc.description | Doctor | en_US |
dc.description.abstract | This research addresses the problem of supporting scalable code similarity search systems for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a “post-mortem” approach involving the detection of clones “after” the code development is completed, and hence, fail to return the results instantly. In clear contrast, the goal of this research is to combine the strength of these two lines of existing research.To achieve this goal, an indexing structure on vector abstractions of code is proposed. This index utilizes dimension reduction techniques to efficiently deal with the vector abstractions, which are naturally high-dimensional. This search system is then integrated into real-world development sessions. Such integration suggests that, by posing every code segment as a query to the software code corpus, developers can instantly reference relevant code segments at the time of generation to enhance productivity. This integration scenario creates the need for efficient similarity searches with the following requirements. First, a developer session translates into a sequence of evolving queries that need to be efficiently supported. Second, the quality of the results needs to be controlled, e.g., dealing with licenses requires that there be no false negatives. To satisfy these requirements, a workload-aware striping framework for high-dimensional evolving queries is proposed. This framework can be used to boost most existing high-dimensional indexes. In addition, to further enhance the scalability of code search systems, a workload-balancing distributed indexing structure is proposed. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Hence, workload-balancing is proposed as an optimization goal, and a distributed index that evenly distributes the workload is presented. | en_US |
dc.language | eng | en_US |
dc.publisher | 포항공과대학교 | en_US |
dc.rights | BY_NC_ND | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.0/kr | en_US |
dc.title | Scalable High-dimensional Index Design for Code Search Systems | en_US |
dc.title.alternative | 코드검색시스템을 위한 고차원 색인기법 설계 | en_US |
dc.type | Thesis | en_US |
dc.contributor.college | 일반대학원 전자컴퓨터공학부 | en_US |
dc.date.degree | 2012- 8 | en_US |
dc.type.docType | Thesis | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
library@postech.ac.kr Tel: 054-279-2548
Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.