Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Mutation profile for top-k search exploiting gene function relationship and matrix factorization

Title
Mutation profile for top-k search exploiting gene function relationship and matrix factorization
Authors
김성철
Date Issued
2015
Publisher
포항공과대학교
Abstract
Given a large quantity of genome mutation data collected from clinics, how can we search for similar patients? Similarity search based on patient mutation pro les can solve various translational bioinformatics tasks, including prognos- tics and treatment e cacy predictions for better clinical decision making through sheer volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as its high dimensionality. To tackle this problem, we suggest a compact representation and search strategy based on Gene-Ontology (GO) and orthogonal non-negative matrix factorization (ONMF). Statistical signi cance of relationship between the identi ed cancer sub- types and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes better than the recently introduced Network Based Strati cation method while enabling real-time search. To the best of our knowledge, this is the rst attempt to simul- taneously characterize and represent somatic mutational data for e cient search purposes. As a next step, to obtain a more accurate mutation pro le for similarity search, we propose a new mutation pro le, called Multi-Latent Semantic Analysis Mu- tation Pro le (MLSA-MP). MLSA-MP is inspired by the fact that the genes can have complex relationships in each gene set, in which the gene set contains genes that are biologically related with each other. Accordingly, it makes the same pair of patients to have di erent proximities according to the gene sets. To build MLSA-MP, given a mutation data and a number of pre-de ned gene sets, we rst generate a collection of sub-pro les of the mutation data. For each sub-pro le, a set of latent representations are constructed by repeatedly exploiting Latent Semantic Analysis (LSA). Finally, the MLSA-MP is built by concatenating a set of latent representations. According to the experimental result, MLSA-MP allows us to more accurately retrieve clinically similar patients than both of NBS and ONMF-MP. In terms of the predictive power of the identi ed cancer subtypes, the comparison result shows that MLSA-MP can identify and characterize clinically meaningful tumor subtypes better than both of ONMF-MP and NBS as well.
URI
http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002062204
http://oasis.postech.ac.kr/handle/2014.oak/93495
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse