Target Identification : A Challenging Step in Forward Chemical Genetics

Investigation of the genetic functions in complex biological systems is a challenging step in recent year. Hence, several valuable and interesting research projects have been developed with novel ideas to find out the unknown functions of genes or proteins. To validate the applicability of their novel ideas, various approaches are built up. To date, the most promising and commonly used approach for discovering the target proteins from biological system using small molecule is well known a forward chemical genetics which is considered to be more convenient than the classical genetics. Although, the forward chemical genetics consists of the three basic components, the target identification is the most challenging step to chemical biology researchers. Hence, the diverse target identification methods have been developed and adopted to disclose the small molecule bound protein. Herein, in this review, we briefly described the first two parts chemical toolbox and screening, and then the target identifications in forward chemical genetics are thoroughly described along with the illustrative real example case study. In the tabular form, the different biological active small molecules which are the successful examples of target identifications are accounted in this research review.


SYNOPSIS
Investigation of the genetic functions in complex biological systems is a challenging step in recent year.Hence, several valuable and interesting research projects have been developed with novel ideas to find out the unknown functions of genes or proteins.To validate the applicability of their novel ideas, various approaches are built up.To date, the most promising and commonly used approach for discovering the target proteins from biological system using small molecule is well known a forward chemical genetics which is considered to be more convenient than the classical genetics.Although, the forward chemical genetics consists of the three basic components, the target identification is the most challenging step to chemical biology researchers.Hence, the diverse target identification methods have been developed and adopted to disclose the small molecule bound protein.Herein, in this review, we briefly described the first two parts chemical toolbox and screening, and then the target identifications in forward chemical genetics are thoroughly described along with the illustrative real example case study.In the tabular form, the different biological active small molecules which are the successful examples of target identifications are accounted in this research review.

Introduction
The completion of Human genome project 1 , has led to the discovery of numerous novel genes.However in most cases, their functions in dynamic complex biological systems remain unknown.Therefore, illuminating the functions of these gene targets and the development of mechanistic pathways are poised to be the next challenging steps in the field of chemical genetics.Not surprisingly, the sheer amount of genetic information now available makes analysis a daunting task -a challenge that is currently being undertaken by the field of functional genomics 2 .The significant gap of functional genomics after identification of new pathways 3,4 and networks can be filled up by the emerging concept of chemical genetics 5 .Depending on the starting point of investigation, chemical genetics can be divided into two approaches 6 : forward chemical genetics 7 and reverse chemical genetics 8 , which proceed "from effect to cause" and "from cause to effect", or "from phenotype to genotype" and "from genotype to phenotype" 9 respectively.Screening small-molecule libraries 10 , 11 for compounds that generate a phenotype of interest is an example of the forward chemical genetic approach whereas the reverse approach entails small molecules that target a single protein.Contrary to conventional genetics where gene knock-outs 12 or overexpression 13 are carried out, forward chemical genetics exploits the small-molecules to generate new phenotypes that are utilized in the explication of gene functions.Forward chemical genetics requires three basic components: a) a chemically diverse library of compounds; b) an assay, in which the library is screened for a cellular 14 or organismal 15 ; phenotype and c) a method to trace an active compound to its biological target -otherwise known as target identification.It is this third component -target identification that remains one of the greatest challenges in chemical genetics.

Merit of Forward Chemical Genetics
In recent years, chemical genetics has increased in popularity due to the inducing capability of small molecules to modulate biological activity in reversible ways 16 .Although genetics can be applied for a better understanding of complex biological systems based on their capacity to modulate biological activity to a high degree of specificity, it does have its limitations.Direct induction of genes by sensitive mutation often raises the complicated situation to identify the effect of modulating gene products 17 .In genetics, delivery of gene is the most vital problem due to the issue of cell permeability.The generation of mutants one by one is also a tedious and time-consuming process.In mutation of genes, the importance of post translational modification or protein functions is not considered.Moreover, in mutation of genes, it is not feasible to confirm the reliable target for drugs and the over all process is also slow.In most cases, one protein has multiple functions, thus their modification (e.g.knockdown) does not lead to the expected effects.Moreover, other genetic methods, like antisense technology 18 , mutagenesis or RNAi interference can be applicable at the level of the genes which cause some temporal or even permanent effects.
Keeping in mind these issues in genetics, small molecule induced chemical genetics especially forward chemical genetics has several advantages.For example, the biological effect of small molecules is typically rapid 19 and it allows for characterizing the instantaneous effects of small molecules.Moreover, all small molecules can be used to study the dynamic processes in a conditional manner 20 .For example, they can be applied at any time point in the experiment conditions with different concentration range to study critical genes at any developmental stage.Small molecules can also be used in multi cellular organisms 21 to see the phenotypic change in vivo systems.A successful forward genetic study offers the knowledge about a novel gene's function as well as a small molecule which has the potential to serve as an on/off switch to control biological processes 22 .Therefore, these small molecules can be useful biologic probes as well as potential new drug candidates.

Devices for Forward Chemical Genetics 1. Chemical toolbox
In the last few decades, various inventive chemical toolboxes have been developed for the study of numerous gene functions in system biology.The chemical genomics field is rapidly expanding and evolving to facilitate the discovery of different unknown functions of genes by the aid of chemical toolboxes 23 .Inspired by many successful attempts, research groups are racing to invent novel and effective functions of gene networks by applying diverse chemicals libraries.Identifying these new networks would significantly enhance the discovery of new drug molecules from this powerful chemical toolbox.This powerful technology, known as "chemical genomics" not only overcomes significant gaps in functional genomics but also has the potential for the discovery of novel drug-like small molecules.These small molecules which can be found from large libraries of compounds from different sources generally alter the functions of proteins after binding to the target proteins.The compounds are derived from natural plant 24 and animal sources subsequent extracted and synthesized into small molecules.

1a. The discovery of bioactive compounds
In system biology it is more interesting to choose the suitable novel chemical entities which are capable of modulating biological functions after binding to the target proteins.This unknown protein function is influenced by diverse drug-like small molecules which can be found from different commercial sources 25 such as ChemBridge Corporation, Maybridge Chemical, Thermo-Fisher Hit-Finder, ChemBridge MicroFormats, and Spectrum Collection etc.

1b. Combinatorial libraries
The large amount of chemical compounds can be synthesized in the laboratory by using basic scaffolds which may belong to different classes such as heterocycles, natural products, oligosaccharides, and fluorochromes.By applying the diversityoriented design strategy 26 , large collections of structurally diverse and complex compound sets are synthesized through parallel and mixture synthesis of small molecules by combinatorial chemistry.The approach is that a library with the same core scaffold, but with various diversity elements/branches directly attached around the core, may selectively respond to a broader range of analytes and thus show a greater likelihood to "hit" the target 27 .Numerous methods are available for creating such diversity, such as skeletal diversity 28,29 , stereochemical diversity, and molecular property diversity 30,31 .

1c. Natural product-like libraries
Natural products are considered as a vast source of chemical compounds by most industrial drug research organization where they widely utilized, explored and modified to construct better derivatives 32 .Many biologically validated drug developers would choose to generate libraries from natural product cores, which are in a way sensible and effective [33][34][35] .

1d. Heterocyclic libraries
The majority of the heterocyclic compounds are widespread in various protein networks in the cellular context and built up for many cofactors of enzyme substrates, and also available as the main component of the DNA such as purine, pyrimidine bases.Hence, these are the highly desired structures for the development of new drug like small molecules 36 .In heterocyclic chemistry, cycloaddition and multicomponent reactions (MCRs) are mainly applied to construct a complex heterocyclic library 37 in DOS with different substitution.The number of hydrogen bond acceptors and donors in heterocylic compounds are higher than normal chemicals and most of the time accomplishes the appropriate balance of hydrophilic and lipophilic characters which is suitable for bioavailability and membrane-permeability.

1e. Oligosaccharides libraries
The oligosaccharides are the main components of the glycoproteins and are known to be the extracellular segments of integral membrane proteins.These are attached with compatible amino acid chains in proteins or lipid moieties.The different structures and distribution of these oligosaccharides are responsible for the cell regulation and their several respective gene functions 38 .Diversity of the oligosaccharides part increases the chance of interactions between the receptor and target analytes.

1f. Peptide libraries
The development of solid phase synthesis by Merrifield in 1963 and the discovery of the different functional groups protection methods led to the rapid design of different peptide libraries.Generally, the standard Fmoc protected aminoacids are used to make a peptides library by using either modern microwave assisted coupling with low cost coupling reagents or even utilizing the more sophisticated mechanical peptide synthesizer in a very short time frame.After completing the reactions, all peptides are cleaved and purified by HPLC.This peptide library can be further functionalized by different linkers to incorporate the fluorochromes for specific applications for in vitro cell analysis or in vivo experiments 39 .

High-Throughput Screenings
Phenotypic screenings are one of the most vital steps in forward chemical genetics study where high throughput screening is widely utilized to explore new biological active compounds.The active compounds show phenotype change mainly by changing protein (small molecule bind protein) function.The (high throughput) screening with small molecules is carried out mostly by use of model organism, mammalian cells or cell free systems.The most popular model organisms reported for phenotypic screening are yeast, plants, zebrafish, Drosophila and C. Elegans.Two types of yeast 40 (budding and fission yeast) are used.Yeast is applied for screening because of its easy growth, high genetic conservation with human and known genome sequence.Plants 41 are also commonly used (e.g.Arabidopsis thaliana) as the entire genome has been sequenced, and small molecules are readily up taken by plant roots.Drosophila 42 is sometimes used for screening for its short life cycle, known genome sequence and RNAi.Zebrafish 43 are widely used because it is vertebrate (having brain, heart, bone) and hence comparable to the human vertebrate system.They are also prolific reproducers, transparent and thus easy to visualize when observing phenotypic change.The use of C. Elegans 44 for the screening is also commonly reported.Its short life span, small and transparent appearance, known genome sequence and RNAi make it a useful model organism.
Recent technological advancements in liquid handling and robotics have also facilitated the development of high throughput screening of several individual compounds in a very short time.The screening is executed by exploiting living cells (mammalian cell), or complex cellular extracts.The cell phenotype can be simply and swiftly determined by the recent use of ELISA 45 (cytoblot) method.The compounds treated cells are appended and stained by the antibody to an epitope which speculates the phenotype of interest.Secondary antibody conjugated to horseradish peroxidase is normally put in to find the signal as a luminescence.
"Screening by imaging" 46 has recently appeared as an advanced technique whereby an automated microscope is employed to observe and record cellular phenotype change in response to compound addition.The screening is carried out in multi well plates and the generated data is analysed by the software.Fluorescence spectroscopy 47 and transmitted light microscopy 48 are also routinely used to perform the phenotypic screen.

Target Identification Approaches
Drug discovery research requires wide knowledge of diseaserelated proteins and their functions and the forward chemical genetics approach robustly contributes to this field by identifying the target proteins which bind to the small molecules.Once the proteins are identified, their functions and cellular signalling pathways can be elucidated thereby facilitating drug discovery research.However, exisiting methods for target identification such as affinity matrix have met with limited success.Many researchers encounter difficulty when investigating protein targets as most of the time the protein of interest is expressed at levels that are too low in biological systems for effective detection.Low binding affinity to the small molecules as well as non-specific binding also make up some of the many problems that plague the field of target identification research.To circumvent the problem, many researchers have come up with advanced methods through the use of small molecules to identify and discover new protein targets.

Affinity matrix approaches
One of the most powerful techniques to identify a target protein is the affinity matrix approach 26 .The affinity matrix can be achieved by modifying the hit compound on a bead or by attaching a tagged molecule (photoaffinity, chemical affinity, biotin or fluorescence).In each case, the binding affinity of the proteins with small molecules is utilized to find the target proteins 49 .After binding the small molecules with the proteins, the respective complexes are fished out by immobilizing small molecule 50 and can be identified by gel separation (SDS-PAGE or fluorescence gel) followed by MS-MS (tandem mass-spectrometry).

3.1a. On-bead affinity matrix
In this approach, structure-affinity relationship (SAR) has to be first studied for a small-molecule of interest in order to find out the proper position for suitable linker modification.Second a solidphase matrix (agarose beads) has to be attached covalently to the small molecules in a specific site which does not affect the original activity of interest of the small molecule 17 .Furthermore, the resin bound small molecules are exposed to cell extract to separate the target protein.SDS-PAGE gel chromatography is applied to analyze the proteins mixtures which are immobilized through the column.The specific target proteins are isolated by competition assay that assists to exclude the non specific binders.The isolated proteins are identified by mass spectrometry after partial tryptic digestion and database searched for mass-sequencing of the digested peptides.The result would be the discovery of target proteins or genes.

[Case study]
The high throughput screening of library molecules for finding the brain/eye morphological changes in the zebrafish embryo assay revealed that the encephalazine can inhibit the brain/eye development up to the different time point (1,8, and 1K cell stages).After the SAR study, the encephalaizine compound was attached with the agarose beads and then used for finding the target protein.Affinity matrix based pull down experiment followed by 14% SDS-PAGE resolved and silver staining identified the two strong bands (23 and 18 kDa) to be ribosomal protein 49 (S5, S13, S18, and L28) which was further confirmed by LC-MS/MS .

3.1b. Biotin tags in affinity matrix
The strategy of finding target proteins using biotin based affinity matrix is regularly used in chemical biology.After rigorous SARs study, the biologically active biotin attached small molecules are developed which are then treated to the cell extracts containing proteomes 51 .Next, the small molecule-biotin-protein complex can be fished out by using avidin/streptavidin resin beads 52 and then the corresponding target protein (after cleaved from the resin) can be identified by gel running followed by mass spectrometry analysis.

[Case study]
Image based high throughput screening of ESCs (R1 cell line) identified the small molecule straupimide which interacts with the NME2 protein and inhibits its nuclear localization, subsequently, the efficiency of the ESCs (human and mouse) differentiation increased.Stauprimide promotes ESCs differentiation toward the definitive endoderm fate which was confirmed by cell image experiment using various markers (definitive endoderm specific, mesoderm specific, visceral/parietal endoderm specific) and with the RT-PCR experiment using hepatocytes and pancreatic precursors cells ( Afp, albumin, Cyp7A1, Pdx1, Ngn3 cells).The results obtained from both the experiments indicate that the stauprimide promotes the definitive endoderm mediated ESCs differentiation.Using biotin tagged affinity based approach to precipitate out the target protein followed by mass spectrometry confirmed the NME2 protein as a cellular target of the stauprimide small molecule 53 .The in vitro experiment and gene knock down experiment (KD1 and KD2 genes sequences), further confirmed that the NME2 protein is the cellular target of the straupimide small molecule.

3.1c. Fluorescent tags in affinity matrix
Sometimes, affinity matrix with fluorescence tag can be used to visualize a target protein quite easily.The approach is quite similar to the biotin tag approach.In the case of fluorescence tag approach, small molecule bound proteins can be recognized by running fluorescence gels.The visualized fluorescence bands are excised and then analyzed by mass spectrometry to identify target proteins.

[Case study]
Cell (MDA-MB-231 human breast cancer cell) based screening of 50-membered natural product-like library revealed the compound MJE3, which can inhibit breast cancer cell proliferation (IC50 value of 19 M) and then activity based protein profiling (ABPP) which is quite similar like affinity matrix contained a reactive group, a binding group and an analytical tag.In this study a fluorescence tag was employed for successful identification of the target protein.After incubating the MJE3 in the MDA-MB-231 cells, the click reaction was carried out between the MJE3 labeled protein with a trifunctional biotin/rhodamine-azide reporter tag and the probe labeled protein was separated in SDS-PAGE and then visualized in gel fluorescence scanning 25 .The probe labeled protein was subsequently purified using avidin agarose beads.The gel band (26-kDa) was excised and then after the trypsin digestion the resulting peptide mixtures were analyzed and the MS data based search identified the 26-kDa protein as brain-type phosphorglycerate mutase 1 (PGAM1).

3.1d. Photoaffinity tags in affinity matrix
In this method, one photoaffinity moiety along with a reporter tag is added to the initial molecule (hit compound), making it highly reactive upon UV irradiation, releasing a reactive carbene functional group that can covalently attach to the specific target proteins.Subsequently, this binding protein can be fished out from the complex by a biological cell assay 54 .The reporter tag is usually a radioactive isotope 55 or chemical affinity group such as biotin, which allows for isolation and identification 8 .Mainly stable isotopes that can give unique isotopic pattern have been used to selectively identify the binding proteins even in very complex mixtures by mass spectrometry.

[Case study]
Image based high throughput phenotypic screening of neuropathiazols compounds identified some compounds which induce neuronal differentiation of cultured rat hippocampal NPCs.Synthesis of several analogs of the original neuro-pathiazol structure and a focused structure-activity relationship (SAR) study afforded a molecule (KHS101) of best activity towards the neuronal differentiation.The RT-PCR experiment and image based experiment revealed that the treatment of KHS101 significantly suppresses astrogenesis and at the same time it increases the neurogenesis.Protein-KHS101-BPcomplex was detected after   photocrosslinking and biotin-tag labeling by using two dimensional SDS/PAGE and western blotting of NPC cell lysates.Finally, mass spectrometry analysis revealed the 80-kDa protein to be TACC3.Gene knock down experiments and in vivo imaging studies confirmed that the TACC3 protein is the cellular target of the KHS101 small molecule 27,56 .

Drug western approaches
In the Drug westerns method 57 , bacteriophages are grown in a petridish.Then the lysis caused by a viral infection can lead to a clearing, consisting of one member of the library.The proteins from the plaque are moved to the nitrocellulose and then these are screened against the tagged small molecules.Hit obtained from the plaques are isolated and then each single virus is purified and followed by the application of the DNA sequencing method, the each target protein is identified.

[Case study]
Cell based screening of sulfonamide drug molecules, followed by SARs study found the small molecule HMN-154 which showed the potential anticancer activity towards the various cancer cell lines.The HMN-154 coupled BSA tagged was employed to identify the target protein using the drug-western method.BSA-conjugated HMN-154 was incubated in nitrocellulose membrane where the protein was expressed by the lTriplEx cDNA library.The positive clone, expressing HMN-154-binding protein was detected 57 and then the binding protein was identified after the DNA sequencing study to be transcription factor inhibitor NF-YB.The identified protein was further confirmed by an in vitro inhibition assay and immune-precipitation experiments.

Three-hybrid system approaches
So far, two types of three hybrid systems have been developed in order to identify target proteins.Yeast three hybrid systems are performed in yeast cells, and mammalian three hybrid systems are carried out mammalian but both the approaches are quite similar.Initially, both the systems were used to study the protein-protein interaction, but later the applications were widened to study the small molecule-protein interactions too.

3.3a. Yeast three-hybrid systems
The yeast three-hybrid system (Y3H) is evolved from yeast twohybrid screens and incorporates a dimer of organic small molecules into the screening spot.It comprises of three components: a synthetic hybrid ligand and two hybrid fusion proteins.The synthetic hybrid ligand is formed by covalently linking two small molecule ligands (hetero/homodimer) which induces cellular proteins in different cells.The synthetic hybrid ligand remains attached to one of the two hybrid fusion proteins.One of the small molecules in hybrid ligand brings the third hybrid protein to close constituting a functional transcription factor that drives expression of a reporter gene 58 .Whereas first ligand of the dimer binds to the receptor of DNA-binding domain, a second ligand binds to the receptor of transactivation domain allowing for the selection of yeast cells that harbor the relevant receptors.The two functions (ligand-receptor interaction) of the system are screened by the small molecule dimers with a known activation domain or using a known small molecule to identify target proteins, such as dexamethasoneglucocorticoid receptor to find interacting proteins.The advantages of Y3H system is that it is carried out in vivo thus the phenotype and genotype are closely linked.On other hand it is restricted in a unicellular system.

[Case study]
Cell based CDK (Cyclin Dependent Kinase) inhibition assays and a purine library screen, followed by SARs study discovered a small molecule named Purvalanol B which can inhibit cyclin dependent kinase activity in human leukemic cell at a nanomolar concentration range.In order to identify the target protein, Yeast three hybrid cDNA libraries screening employing small molecule Purvalanol B-MFC (methotrexate fusion compound coupled with the Purvalanol B) was pursued by utilizing the following steps.First, the lexA-DHFR expressing yeast cells were transformed with the choice of cDNA library.Next, transcriptional expression of the HIS3 auxotrophic marker was selected in the presence of an MFC.Then, the positive colonies were picked up and arrayed robotically; following this the compound dependence to the HIS3-reporter expression was reconfirmed.Finally, plasmid isolation, sequencing, retransformation of yeast expressing LexA-DHFR with purified plasmids, robotic arraying of the transformed yeast cells and reconfirmation of specific HIS3-reporter activation by the test MFC using a series of genetic or compound-based counter screens identified small molecule binding target proteins 58 (CDK1, CDK5, CDK6, CLK3, EPHB2, FLT4, FYN, PAK4, PCTK1, PCTK2, RSK3 and YES), many of which were further confirmed by using the affinity matrix pull down experiment and secondary enzyme assays.

3.3b. Mammalian three-hybrid systems
The mammalian three hybrid system is similar to the yeast three hybrid systems where mammalian cells are used instead of yeast.Initially, S. Eyckerman, et al developed the MAPPIT 59 explain acronym system in order to identify protein-protein interactions.Later C. Maureen, et al expanded this concept to the MASPIT 60 system to investigate small molecule-protein interaction and target identification.

[Case study]
After the synthesis of the several small molecules of Pyrido [2,3d] pyrimidine, followed by their application in the kinase inhibition assay and SARs, studies revealed PD17395 as a SRC kinase inhibitor which can inhibit several ephrin receptor tyrosine kinases.In order to identify the target protein, the MASPIT system was employed in mammalian cells.For target identification, the following pathways were adapted as l in Y3H.First, a cDNA library was built up from HEK293 mRNA in a retroviral vector and then the IL5R reporter cells were infected with the retroviral library and subjected to various cycles of enrichment for MFC-dependent IL5R-positive cells, followed by flow cytometric single-cell sorting into 96wellmicro-titer plates.Individual cell populations were subsequently screened for MFC and Epo-dependent reporter activation by fluorescence activated cell sorting (FACS).The cDNA library screen  with RGB-286649 and the MFC incorporating the ABL tyrosine kinase inhibitor PD173955 discovered a number of different tyrosine kinases as well as one Ser/Thr kinase (Cyclin Gassociated kinase, ephrin receptor tyrosine kinases, FGFR1 and SRC kinases FYN and LYN) 60 .These proteins were further confirmed with the in vitro enzymatic assay (competition assay).

Phage display approaches
In recent years, phage display technology has emerged as a popular method to connect proteins or polypeptides with genes in bacteriophages to study the protein-protein, protein-DNA and protein-peptide interactions.This technology utilized the display of proteins or peptides on bacteriophages such as filamentous, M13, T4, T7, phage and it enables the extraction of proteins from a large collection of variants by immobilizing relevant DNA or proteins.The resulting phage particles containing genes and encoded proteins provide a connection between phenotype and genotype that enables large libraries of proteins to be screened and further amplified.In the process of screening, the bacteriophage displayed protein bound to the target will remain in the washing step and other non interacting proteins are removed.The recovered proteins are identified by sequencing and more phages can be produced by bacterial infection to enrich the interacting proteins for further selection 61 .

[Case study]
Screening of a library of 1200 compounds revealed a candidate, IHY-153 that can effectively inhibit the proliferation of several human cancer cell lines, including human Cervical Cancer cell line (HeLa), human liver carcinoma cell line (HepG2), human fibrosarcoma cell line (HT 1080) and human colon carcinoma cell line (HCT116).Among them, HCT116 colon cancer cells are the most sensitive to IHY-153.Investigation of the effect of IHY-153 on HCT116 cell cycle progression by flow cytometry indicates that IHY-153 inhibits cellcycle in a dose-dependent manner and induces cell cycle arrest at G0-G1 phase.Bacteriophage display biopanning approach was applied in the target identification.T7 phage particles expressing human cDNA libraries were added into the BH1-immobilized wells.After incubation and washing, bound phage particles were eluted with 1HY-153 and eluted phage particles were amplified after infection into E. coli strain BLT5615 and used for a second round of biopanning.After the fourth biopanning, eluted phag es were infected into LB agar E. coli and plaques formed were isolated.Amplified phage lysates from the isolated plaques were used as PCR templates and obtained sequences were compared with database.The phage encoding region matched 100% with human calmodulin (CaM).The specificity of IHY-153 towards CaM was also tested via phage binding assay and phage library.Necessity of Ca2 + for CaM binding with IHY-153 is tested and results demonstrate that Ca2 + is required for IHY-153 binding to CaM 61 .

mRNA display approaches
mRNA display is an well known in vitro technique and has recently been used in studying protein-protein interaction 62 .The technique was initially developed to amplify the number of peptides 63 displayed in the phage display method.After the cDNA library amplification by PCR, the puromycin-DNA linker is ligated to the produced mRNA and using this mRNA-DNA complex 64 in vitro translation is carried out.The translation leads to generation of mRNA-protein fusion molecules which are then purified and reverse transcribed to generate the cDNA template which can be utilized to construct further amplification.Then the small molecules of interest are immobilized on solid support and are incubated with mRNA display molecules.The unbound protein-mRNA complexes are washed out and bound complexes are eluted and the bound genes amplified using PCR technique.Finally, after the several iteration processes, the cDNAs are purified, cloned, sequenced and put thorough a sequence similarity search to identify the target protein.

[Case study]
Tacrolimus (FK-506) 65 is a natural product which is established to be an immune-osuppressive drug.Molecules bound to the target protein were identified by using FK-506 biotinylated compounds in mRNA display method.After making the biotin tag with the FK-506, several steps are carried out in order to identify the molecular target in the mRNA display method.
In the beginning, the PCR-amplified cDNA was generated by applying primers that commence the engineered sequences necessary for transcription, ligation of the corresponding mRNA, in vitro translation of the mRNA-puromycin linker conjugate, and  epitope-based purification of the mRNA-protein fusion, Then this engineered PCR product was transcribed to construct mRNA which was then hybridized to a poly-dA-containing linker carrying a 5"psoralenmoiety and a 3"terminus containing the transfer-RNA mimic puromycin (Pu).Next a covalent crosslink between the mRNA and DNA-puromycin linker was introduced upon UV irradiation and this conjugate (mRNA-DNA-puromycin) was employed as a template for in vitro translation where ribosome could translate the open reading frame and stop at the mRNA-DNA junction.The dearth of a stop codon hindered the action of release factors and permitted the conjugate puromycin to enter the A-site of the ribosome and then the peptidyl transferase subunit catalyzes amide bond formation between an amine group on the puromycin and the carboxyl terminus of the mature protein to give an mRNAprotein fusion.Next, the fusion was purified (dissociation from the ribosome happened in presence of poly dA linker with oligo-dT cellulose) and the cDNA strand was made by reverse transcription of the fusion that protected the mRNA against degradation and served as a template for future PCR.After the initial random priming of cellular mRNA, the library of mRNA display molecules were then incubated with an immobilized drug (immobilized on streptavidin bead) or small molecule, and unbound material was eliminated by washing.The bound fusion protein was then eluted either specifically using excess of drug or nonspecifically using KOH.The identified target protein was found to be FKBP12 51 .This was further confirmed with the in vitro protein binding assay.

Protein microarray approaches
The high throughput analysis of interactions between the target proteins and small molecules can be performed by using protein chip technology which is known as protein microarray 66 .The high density format is used in this approach.Hence, only few days are required to study the binding profile of a certain small molecule to an entire proteome.The proteins of interest for analysis are purified and consequently immobilized on the glass microscope slide or derivatized on another surface 67,68 .The fluorescent tag or radioisotope-labeled small molecules are incubated in the array and then the positions where small molecule binds protein on the array are determined and the target protein subsequently identified.

[Case study]
Yeast growth inhibition studies using the chemical genetic screen of small molecules (16,320 compounds) identified SMIR3 and SMIR4 which can fully suppress rapamycin"s anti-proliferative effects in yeast.Both the molecules were then biotinylated (to preserve their bioactivity) to identify target protein.
Biotin-small molecules (SMIR3-biotin and SMIR4-biotin) were probed on the proteome chip which consisted of almost the entire yeast proteome.After adding Cy3 labeled streptavidin on the proteome chip, 8 different candidate proteins binding to SMIR3biotin and 30 different candidate proteins binding to SMIR4-bioitin were identified.This was followed by an in vivo study of rapamycin sensitivity and the ability of SMIR to suppress the rapamycins"s effect towards the yeast cell using the yeast deletion strain of each candidate protein (found by protein chip) discovered YBR077CP(Nir1P) protein which is responsible for SMIR4 to suppress rapamycins"s effect 69 .

Drug affinity responsive target stability (DARTS)
A few successful cases have been reported based on the affinity matrix method.Introducing an affinity tags on to effective drug molecules to identify the target proteins is a major challenge due to the present limitations.A new technique, drug affinity responsive target stability (DARTS) may have great prospective to discover target proteins by stabilizing the target protein upon the binding of drug like mall molecules 70,71 .This modern methodology represents a target identification strategy without the modification of small molecules and relies on drug-protein binding.After binding with drug molecules, target proteins are less susceptible towards proteolysis thus helping to readily identify target proteins even in complex biological systems.Generally the ligand-bound states are thermodynamically favourable structures among multiple dynamic conformations due to hydrogen bonding, hydrophobic and/or electrostatic interactions between the protein and the small molecules.Hence, the target proteins are stabilized by restricting the multiple conformations.
To date the affinity matrix as well as affinity chromatography methods are well studied by different research groups and its major limitations are well defined.In spite of several good examples having been reported, the major drawbacks (e.g.nonspecific binding to nontarget proteins, time consuming SAR study and impossible to incorporate the affinity tag for some drug like small molecules) limit its application.In general, nonspecific binding proteins can be eliminated by simple washing for multiple times but in the same time interested binding proteins may be eliminated if the interaction between target proteins and small molecules is too weak.Therefore, compared to affinity matrix, DARTs presents a unique alternative path that allows for the detection of the target proteins by small molecules which requires neither further chemical derivatization nor extensive washing.Furthermore, this approach allows all bioactive small molecules, different source of extensive chemical library with diverse structural diversity including natural products too.DARTS apply to find target proteins from complex biological samples by digesting away nontarget proteins that are  Resveratrol, a compound that can be found in grapes and red wine is known to be responsible for various health benefits.However, its direct molecular target protein has not been reported so far due to low specific binding affinity of small molecules towards proteins.A potential requirement of the polyphenol groups in resveratrol for its activity has discouraged generation of affinity reagents for target identification.To overcome this limitation, Lomenick et al, came up with a novel idea considering the less susceptibility of the target protein to drug binding named as drug affinity responsive target stability (DARTS).As a proof of principle small molecule-protein complexes (mTOR-rapamycin and COX-2celecoxib) are identified by DARTS method.Eventually, a target protein eIF4A was identified by applying new tools (DARTS).DARTS with resveratrol dosed yeast cell lysates revealed two silver stained bands between the 15-and 20-kDa MW markers that were more intense in the resveratrol treated lysate post proteolysis compared with the control.Mass spectrometry analysis identified the target protein as eIF4A 70 .Further gene mutation of target protein depicts that eIF4A is the vital protein for various health benefits.

In vitro inhibition assay: guess and check method
Sometimes, the target protein could be determined by using a simple in vitro assay.After the detection of a hit compound by looking at the change in the cell or organismal phenotype or by looking at the cell images, expert researchers can predict some relative target proteins responsible for the respective alternations.This knowledge based approach leads one to guess and then to check some in vitro assay with some presumed proteins thereby leading to the identification of target proteins.This is a frequently used method 36 to find out the target proteins.

[Case study]
Optical density based high throughput screening of conditionally essential enzymes identified a small molecule which can prevent the growth of a wild type bacterial strain but do not affect the growth of a mutant strain incapable of initiating polymer synthesis.This approach led to the discovery of 1835F03 72 , a molecule that can inhibit the wall teichoic acid (WTA) biosynthesis in Staphylococcus aureus.Using radiometric in vitro assays to test the inhibitory effects of 1835F03 on Tar B,D,F and L, and using overexpression and resistant mutants analysis showed that the 1835F03 compound can potentially inhibit the function of the TarG, which can exports WTAs to the cell surface, with an MIC of 1.3 µg/mL.The assessment of the antibacterial properties of 1835F03 compound can clearly demonstrate that its activity was fully bacteriostatic.

Magnetism-based Interaction Capture (MAGIC)
Magnetism-based interaction capture (MAGIC) 73,74 is an in vivo target identification approach in mammalian cells.The compound specific for target proteins is bound to cDNA library coupled with EGFP motif.When proteins are attached to the compounds, they will be separated from matrix under magnetic field.Hence through this magnetism-based approach the target protein can be extracted out and identified.

[Case study]
The magnetism-based interaction capture (MAGIC) method was first developed by the Korean scientist (Jaejoon Won and his coworkers) in science journal 73 to identify target protein in living cell.High throughput screening revealed the ATM protein as the cellular target of the small molecule CGK733 74 .However, both the journals were retracted for data fabrications and results misrepresentation.Hence, there is no real example of target protein identification in this approach, although this method may be a good potential to find out the target protein in complex system.
In above study, we have demonstrated different useful strategies for the target protein identifications in complex biological systems.Moreover, we also highlighted the potential application of these strategies in forward chemical genetics with specific real examples.In the below table we have summarized several examples in a short and succinct way for better understanding (Table 1).

Conclusion and Prospects
In this review, we described all the three components of forward chemical genetics.The approach consists of the collection of chemical compounds from various sources, screening in a high throughput format and then the daunting task of target identification.The first two parts were covered briefly and then we mainly focused on the several target identification methods with successful story of discovering small molecules bound proteins.Our objective in this review is to bring most of the target identification methods which are successful for identifying target proteins into one frame.Previously most routinely used methods like affinity matrix and other protocols sometimes show hurdle to discover target proteins.To resolve the problem, more technically advanced target identification methods have been steadily developed in the recent years.Our review has included both the previous and recent target identification protocols along with illustrations of several real examples which were accumulated from several research works.Hence, chemical biology researchers can easily follow as well as apply any target identification method in their respective drug discovery research.We prognosticate that this review would be supportive for the researcher to employ forward chemical genetics in accelerating the drug discovery research.Technology and Research A*STAR, Singapore for their financial support.

Figure 1 .
Figure 1.Schematic representation of affinity matrix: on bead for target identification.

Figure 2 .
Figure 2. Schematic representation of affinity matrix: biotin tagged target identification method.

Figure 3 .
Figure 3. Schematic representation of affinity matrix: fluorescence tagged target identification method.

Figure 4 .
Figure 4. Schematic representation of the affinity matrix: photoaffinity based target identification method.

Figure 5 .
Figure 5. Schematic representation of the drug western techniques for target identifiaction.

Figure 6 .
Figure 6.Schematic diagram of Y3H system for target identification.

Figure 8 .
Figure 8. Schematic representation of mRNA display techniques for identifying target protein.

Figure 7 .
Figure 7. Schematic diagram of phase display method for identifying target protein.

Figure 9 .
Figure 9. Schematic representation of protein microarray method for target identification.

Figure 11 .
Figure 11.Schematic representation of MAGIC method for target identification.

Table 1 .
Representative examples of successful target identification approaches Page 10 of 16