(PDF) Predicting High-Impact Pharmacological Targets by Integrating Transcriptome and Text-Mining Features

In this report, we describe a strategy for the optimized selection of protein targets suitable for drug development against neoplastic diseases taking the particular case of breast cancer as an example. We combined human interactome and transcriptome data from malignant and control cell lines because highly connected proteins that are up-regulated in malignant cell lines are expected to be suitable protein targets for chemotherapy with a lower rate of undesirable side effects. We normalized transcriptome data and applied a statistic treatment to objectively extract the sub-networks of down- and up-regulated genes whose proteins effectively interact. We chose the most connected ones that act as protein hubs, most being in the signaling network. We show that the protein targets effectively identified by the combination of protein connectivity and differential expression are known as suitable targets for the successful chemotherapy of breast cancer. Interestingly, we found additional proteins, not generally targeted by drug treatments, which might justify the extension of existing formulation by addition of inhibitors designed against these proteins with the consequence of improving therapeutic outcomes. The molecular alterations observed in breast cancer cell lines represent either driver events and/or driver pathways that are necessary for breast cancer development or progression. However, it is clear that signaling mechanisms of the luminal A, B and triple negative subtypes are different. Furthermore, the up- and down-regulated networks predicted subtype-specific drug targets and possible compensation circuits between up- and down-regulated genes.We believe these results may have significant clinical implications in the personalized treatment of cancer patients allowing an objective approach to the recycling of the arsenal of available drugs to the specific case of each breast cancer given their distinct qualitative and quantitative molecular traits.

J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Predicting High-Impact Pharmacological Transcriptome and Text-Mining Features Targets by Integrating Anatoly Mayburd1, Ancha Baranova1, 2, 3 School of Systems Biology, College of Science, George Mason University, Fairfax VA 22003. 2 Federal State Budgetary Institution "Research Centre for Medical Genetics," Moskvorechie str, 1 Moscow, Russia. 3 Atlas Biomed Group, Malaya Nikitskaya 31, Moscow, Russia. 1 Received, August 20, 2016; Revised, August 26, 2016; Accepted, November 14, 2016; Published, November 15, 2016. ABSTRACT - Purpose: Novel, “outside of the box” approaches are needed for evaluating candidate molecules, especially in oncology. Throughout the years of 2000-2010, the efficiency of drug development fell to barely acceptable levels, and in the second decade of this century, levels have improved only marginally. This dismal condition continues despite unprecedented progress in the development of a variety of high-throughput tools, computational methods, aggregated databases, drug repurposing programs and innovative chemistries. Here we tested a hypothesis that the economic impact of targeting a particular gene product is predictable a priori by employing a combination of transcriptome profiles and quantitative metrics reflecting existing literature. Methods: To extract classification features, the gene expression patterns of a posteriori high-impact and lowimpact anti-cancer target sets were compared. To minimize the possible bias of text-mining, the number of manuscripts published prior to the first clinical trial or relevant review paper, as well as its first derivative in this interval, were collected and used as quantitative metrics of public interest. Results: By combining the gene expression and literature mining features, a 4-fold enrichment in high-impact targets was produced, resulting in a favourable ROC curve analysis for the top impact targets. The dataset was enriched by the highest impact anticancer targets, while demonstrating drastic differences in economic value between high and low-impact targets. Known anti-cancer products of EGFR, ERBB2, CYP19A1/aromatase, MTOR, PTGS2, tubulin, VEGFA, BRAF, PGR, PDGFRA, SRC, REN, CSF1R, CTLA4 and HSP90AA1 genes received the highest scores for predicted impact, while microsomal steroid sulfatase, anticoagulant protein C, p53, CDKN2A, c-Jun, and TNSFS11 were highlighted as most promising research-stage targets. Conclusions: A significant cost reduction may be achieved by a priori impact assessment of targets and ligands before their development or repurposing. Expanding a suite of combinational treatments could also decrease the costs, while achieving a higher impact per developed ligand. This article is open to POST-PUBLICATION REVIEW. Registered readers (see “For Readers”) may comment by clicking on ABSTRACT on the issue’s contents page. _______________________________________________________________________ INTRODUCTION targeted/chemotherapy remains secondary, while long-term remission is being achieved through a combination of radiotherapy and early radical surgery. However, on average, a ten-year survival rate for all cancer forms increased from 22% in 1971 to 45% in 2007, with the most significant contribution to this increment being novel therapeutics. Healthimprovements in breast cancer that could be _________________________________________ Despite tremendous historical progress in anticancer research and the decrease in mortality rates associated with many forms of cancer (1), including tumors of the prostate, breast, testis, and colon, as well as many forms of leukemia, the outcomes for pancreatic, lung and brain tumors remain dismal. It is difficult to pinpoint individual factors contributing to the organ-specific treatment success rates. For some forms of cancer, including breast, ovary, prostate, uterus, leukemia, thyroid, and testis, there is a tremendous gap in outcomes for patients with treatment-sensitive and treatment-resistant tumors. For other cancers, the role of Corresponding Author: Dr. Ancha Baranova, Associate Professor, Director for the Center for the Study of Chronic Metabolic and Rare Diseases, School of Systems Biology College of Science George Mason University, 4400 University Dr David King Hall, Fairfax VA USA. E-mail: abaranov@gmu.edu; aancha@gmail.com 475 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 attrition at later development stages. Some attempts to evaluate the pharmacological promise of a given target or its ligand have been made before. Ma'ayan et al. introduced graphtheory methods to analyze the FDA-approved drugs and their known molecular targets (17). Zhu et al. in (18) explored multiple factors that collectively contribute to druggability of various targets, including its protein sequence, structural, physicochemical, and systems profiles. Importantly, the techniques to explore each of these profiles for target identification have been developed, but they have not been collectively used. Chen et al. in (19) proposed that a disease-independent property of proteins, "drug-target likeness", can be explored to facilitate the genomic scale target screening. Sakharkar et al. in (20) described quantitative characteristics of the currently explored (those that are not yet associated with any marketed drug) and successful (targeted by at least one marketed drug) biomolecules; these characteristics were translated into simple rules for selecting a target with larger possibility of success. These rules highlight target proteins with 5 or less homologs outside of their own family, proteins encoded by single-exon gene architecture and proteins interacting with more than 3 partners as more likely to be druggable. Bender et al. in (21) reported a success mining approach applied to ligands in the context of in vitro interaction profiles of their targets. According to Bender et al., Preclinical Safety Pharmacology (PSP) approach may anticipate adverse drug reactions (ADRs) during early phases of drug discovery by testing compounds in relatively simple in vitro binding assays. All of these previously implemented methodologies attempt to discriminate the targets that have already acquired a ligand from the targets that are either in the process of ligand acquisition or would never acquire an approved ligand. We find that this approach needs supplementing due to a number of considerations. First, some targets may eventually acquire a somewhat beneficial and a relatively harmless ligand that would pass safety and efficiency criteria, if a sufficient investment is made. Secondly, the targets that have acquired the approved ligand may lose the association with the approval if the ligand is pulled from the market later. Finally, the practical impact of ligands is in proportion to the significance of the target for the pathophysiological mechanism that drives a given pathology. A still developing candidate with attributed to chemotherapy were estimated at between 14% and 27% in 1998 (2). It is likely that the more recently developed regimens show even greater impact. The broadly cited assessment of DeVita et al. attributes 50% of the increase in survival rates to drug-based therapies (3). In this manuscript, we discuss a novel cost-effective strategy for developing anti-cancer therapeutics and their combinations. This strategy is based on a premise that the spectrum of efficient physiological anticancer mechanisms is relatively limited, and that improvement in survival rates is primarily due to the targeting of so-called “super-targets”. Examples of well-known “super-targets” that were drugged since long time ago include DNA (4), folate reductase (5), and microtubules (6); these “supertargets” are still commonly tackled by either combinational therapy (7) or adjuvant therapy (8). Discovery of a novel high-impact anti-cancer “super-target” is a significant event, further increasing cancer survival rates by approximately 1%, in our estimate. However, the existing statistics pertaining to the introduction of novel anti-cancer therapies point to stagnation (9-12). This trend occurs in the backdrop of Food and Drug Administration (FDA)'s attempts to expedite the review process (12). Even more concerning is that the downturn in the number of New Drug Applications (NDA) takes place at the time when the enabling technologies appear to explode (13-15), including the newest investments in personalized genome projects, gene knock-out techniques, toolboxes for biological network modeling, RNAseq, genome-wide association studies along with centralization and dissemination of biomedical information by NCBI and other portals. Another alarming trend is the cost of drug design, reaching $5bn per an approved drug in 2013 (16). The combination of the NDA contraction and the exponentially rising costs of drug development points to a wall of resistance that has to be penetrated. In this report we demonstrate a possibility to predict that anti-cancer molecule would tackle a super-target a priori. In other words, we describe the methodology of the “success mining” that attempts to identify the features of known “winner” molecules at the preclinical stage, then to prioritize current candidate molecules according to relative resemblance of a “winner” profile. This approach would aid in reallocating available funding to the most promising candidates and minimize costly 476 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 expected huge clinical (and market) niche, but without approved associated ligands, may be more valuable than a comparable target with an approved ligand of a more modest impact. In this report we attempt to predict an impact of a pharmacological target candidate using its future market share as a proxy. The more clinical trials and especially advanced clinical trials are conducted around a certain ligand, the more likely the target would eventually be tackled by a high-impact therapeutics that would be inherently successful due to a combination of favorable biology and pharmacology. Aiming at the largest possible target impact significantly differs from using the criterion of being simply FDA approval; such studies have not been conducted yet. Incorporation of the forecasted future impacts in the decision criteria put forth by funding and regulatory agencies may aid in creating a policy instrumental in rolling back the escalating costs of the drug development. gene expression data, collected from normal and cancerous specimens of various tissue origins. Dataset GSE7307 includes 677 normal and diseased human tissues profiled for gene expression using the Affymetrix U133 plus 2.0 array. The target status information was extracted from TTD, where the ligand and clinical trial status for each target candidate is specified as either “successful” (approved ligand is associated with the gene's products) or “research” (no approved ligand is associated with the gene's products). The textmining was performed using PubMed with the gene names and their synonyms extracted from TTD. Definition of target impacts Target impacts were approximated by the number of clinical trials associated with the gene's name. The number of clinical trials associated with the gene was extracted by applying the PubMed filters. Weights 1, 2, 5 and 10 were assigned to the ligands in Phase I, II, III clinical trials and marketed ligands, respectively. The values of these weights were selected in proportion to the attrition rate of the ligands at each trial level. By resources committed, the ligand at the Phase I stage is cheaper by than a ligand that have reached the Phase III stage, and the approximate differences in the costs are reflected in the weights, see Supplemental Table 1 for the data. The number of the ligands in each category was multiplied by weights producing proxy target impacts that reflect the prospective revenue of the target reached in case of successful development. These values were defined as “real-life impacts”, while the predicted impacts were derived from both microarray and text-mining data. The proxies for real-life impacts were designated Y for the purpose of deriving a prediction rule as a linear classifier, see below. METHODS Overview of methodology The proposed impact forecasting technique relies on the target impact predictors available at the preclinical stage. The independent predictors were combined into an Index enabling to gauge the commercial potential of a target candidate before the bulk of investment is committed. Sources of data included in the study The study used GEO Datasets at NCBI at http://www.ncbi.nlm.nih.gov/gds/; PubMed at http://www.ncbi.nlm.nih.gov/PubMed/ and Therapeutic Target Database (TTD) at http://bidd.nus.edu.sg/group/cjttd/. The GEO Datasets was searched for similarly normalized Table 1. Modeled relative impacts for the successful anti-cancer targets. Gene ID Synonyms ERBB, ERBB1, HER1, PIG61, mENA, EGFR, epidermal growth factor receptor, avian EGFR erythroblastic leukemia viral (v-erb-b) oncogene homolog| CD340, HER-2, HER-2/neu, HER2, MLN 19, NEU, NGL, TKR1, ERBB2 v-erb-b2 ERBB2 avian erythroblastic leukemia viral oncogene homolog 2 c-erb B2/neu protein|herstatin| ARO, ARO1, CPV1, CYAR, CYP19, CYPXIX, P-450AROM, CYP19A1 cytochrome CYP19A1/ P450, family 19, subfamily A, polypeptide 1 aromatase, cytochrome P-450AROM, aromatase cytochrome P450 19A1 FRAP, FRAP1, FRAP2, RAFT1, RAPT1, mechanistic target of rapamycin MTOR (serine/threonine kinase) FK506 binding protein 12-rapamycin associated protein 2 COX-2, COX2, GRIPGHS, PGG/HS, PGHS-2, PHS-2, hCox-2, prostaglandinPTGS2 endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase) 477 Target impacts 1254 1202 620 259 246 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 1. Continued….. Tubuline VEGFA BRAF PGR PDGFRA SRC REN CSF1R CTLA4 HSP90AA1 IGF1R RET CD52 IL2RA RRM1 ADA STAT3 TOP2A TYMS CXCR4 KDR PPARG ABL1 TLR7 LCK CCKBR RXRA RRM2 FYN ESR1 DAAP-285E11.4, M40, OK/SW-cl.56, TUBB1, TUBB5, tubulin, beta class I beta 5tubulin|beta Ib tubulin|beta-4 RP1-261G23.1, MVCD1, VEGF, VPF, vascular endothelial growth factor A B-RAF1, BRAF1, NS7, RAFB1, BRAF v-raf murine sarcoma viral oncogene homolog B NR3C3, PR, progesterone receptor nuclear receptor subfamily 3 group C member 3 PDGFRA CD140A, PDGFR-2, PDGFR2, RHEPDGFRA, platelet-derived growth factor receptor, alpha polypeptide RP5-823N20.1, ASV, SRC1, c-SRC, p60-Src, v-src avian sarcoma (Schmidt-Ruppin A2) viral oncogene homolog proto-oncogene c-Src HNFJ2, renin angiotensin-forming enzyme, angiotensinogenase, renin precursor, renal CSF1R C-FMS, CD115, CSF-1R, CSFR, FIM2, FMS, HDLS, M-CSF-R, CSF1R colony stimulating factor 1 receptor CD115 antigen|CSF-1 receptor|FMS protooncogene CD152, CELIAC3, CTLA-4, GRD4, GSE, IDDM12, CTLA4 cytotoxic Tlymphocyte-associated protein 4 CD152 isoform|celiac disease 3|cytotoxic T lymphocyte associated antigen 4 short spliced form|cytotoxic T-lymphocyte antigen 4 EL52, HSP86, HSP89A, HSP90A, HSP90N, HSPC1, HSPCA, HSPCAL1, HSPCAL4, HSPN, Hsp89, Hsp90, LAP2, HSP90AA1 heat shock protein 90kDa alpha (cytosolic), class A member 1 CD221, IGFIR, IGFR, JTK13, IGF1R insulin-like growth factor 1 receptor CDHF12, CDHR16, HSCR1, MEN2A, MEN2B, MTC1, PTC, RET-ELE1, RET51, ret proto-oncogene CAMPATH-1 antigen, CDW52 antigen, HEL-S-171mP, cambridge pathology 1 antigen, epididymal secretory protein E5, epididymis secretory sperm binding protein Li 171mP, human epididymis-specific protein 5 RP1-261G23.1, MVCD1, VEGF, VPF, VEGFA, vascular endothelial growth factor A R1, RIR1, RR1, ribonucleotide reductase M1, ribonucleoside-diphosphate reductase large subunit adenosine deaminase, adenosine aminohydrolase APRF, HIES,signal transducer and activator of transcription 3 (acute-phase response factor) TOP2, TP2A, topoisomerase (DNA) II alpha, DNA gyrase, DNA topoisomerase (ATPhydrolyzing, DNA topoisomerase 2-alpha, DNA topoisomerase II OK/SW-cl.29, HST422, TMS, TS, thymidylate synthetase, TSase CD184, D2S201E, FB22, HM89, HSY3RR, LAP3, LCR1, LESTR, NPY3R, NPYR, NPYRL, NPYY3R, WHIM, CXCR4 chemokine (C-X-C motif) receptor 4 CD309, FLK1, VEGFR, VEGFR2, KDR kinase insert domain receptor CIMT1, GLM1, NR1C3, PPARG1, PPARG2, PPARgamma, peroxisome proliferatoractivated receptor gamma RP11-83J21.1, ABL, JTK7, bcr/abl, c-ABL, c-ABL1, p150, v-abl, ABL1 c-abl oncogene 1, non-receptor tyrosine kinase Abelson tyrosine-protein kinase 1|bcr/c-abl oncogene RP23-139P21.3, toll-like receptor 7 RP4-675E8.4, LSK, YT16, p56lck, pp58lck, lymphocyte-specific protein tyrosine kinase CCK-B, CCK2R, GASR, CCKBR , cholecystokinin B receptor, CCK-B receptor, CCK-BR, CCK2 receptor, CCK2-R, cholecystokinin-2 receptor, gastrin receptor NR2B1, RXRA retinoid X receptor, alpha nuclear receptor subfamily 2 group B member 1, retinoic acid receptor RXR-alpha, retinoid X nuclear receptor alpha R2, RR2, RR2M, ribonucleotide reductase M2, ribonucleoside-diphosphate reductase subunit M2 RP1-66H14.1, SLK, SYN, p59-FYN RP1-130E4.1, ER, ESR, ESRA, ESTRR, Era, NR3A1, ESR1 , estrogen receptor 1, ER-alpha, estradiol receptor, estrogen nuclear receptor alpha 478 227 220 162 123 95 87 84 70 58 53 51 47 39 39 37 33 31 30 25 23 21 21 18 14 8 6 5 4 3 2 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 1. Continued….. GNRH1 SSTR2 VDR BDKRB2 NTRK2 TOP1 ALPL CALM1 CSF2RA DNMT1 EDNRA EDNRB FASN FECH HDAC1 HPSE IFNAR1 IMPDH1 IMPDH2 ITGA2B LHCGR MME OXTR PARP1 PDE4A PTH1R RARA TSPO TXNRD1 Gnrh, Gnrh2, LHRH, Lhrh1, Lnrh, hpg, Gnrh1, gonadotropin releasing hormone 1 somatostatin receptor 2 SRIF-1, SS2R, somatostatin receptor type 2 NR11, VDR vitamin D (1,25- dihydroxyvitamin D3), receptor 1,25dihydroxyvitamin D3 receptor|nuclear receptor subfamily 1 group I member 1| B2R, BK-2, BK2, BKR2, BRB2, BDKRB2, bradykinin receptor B2 , BK-2 receptor GP145-TrkB/GP95-TrkB, Tkrb, trk-B, trkB, neurotrophic tyrosine kinase, receptor, type 2 BDNF/NT-3 growth factors receptor RP3-511B24.1, TOPI, topoisomerase (DNA) I DNA topoisomerase 1 AP-TNAP, APTNAP, HOPS, TNAP, TNSALP, ALPL alkaline phosphatase CALML2, CAMI, CPVT4, DD132, PHKD, caM, CALM1 calmodulin 1 (phosphorylase kinase, delta) CSF2RA CD116, CDw116, CSF2R, CSF2RAX, CSF2RAY, CSF2RX, CSF2RY, GMCSF-R-alpha, GMCSFR, GMR, SMDP4, CSF2RA colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage) ADCADN, AIM, CXXC9, DNMT, HSN1E, MCMT, DNMT1, DNA (cytosine-5-)methyltransferase 1, CXXC-type zinc finger protein 9 ET-A, ETA, ETA-R, ETAR, ETRA, hET-AR, EDNRA endothelin receptor type A, G protein-coupled receptor, endothelin receptor subtype A, endothelin-1 receptor RP11-318G21.1, ABCDS, ET-B, ET-BR, ETB, ETBR, ETRB, HSCR, HSCR2, WS4A, EDNRB endothelin receptor type B FAS, OA-519, SDR27X1, FASN fatty acid synthase short chain dehydrogenase/reductase family 27X, member 1 EPP, FCE, FECH ferrochelatase RP4-811H24.2, GON-10, HD1, RPD3, RPD3L1, HDAC1, histone deacetylase 1 HPA, HPA1, HPR1, HPSE1, HSE1, heparanase, endo-glucoronidase, heparanase-1 CD118, Ifar, Ifnar, Ifrc, Infar, Ifnar1 interferon (alpha and beta) receptor 1, IFN-R-1, IFN-alpha/beta receptor 1 IMPD, IMPD1, LCA11, RP10, sWSS2608, IMP (inosine 5'-monophosphate) dehydrogenase 1 hCG_2002013, IMPD2, IMPDH-II, IMPDH2 IMP (inosine 5'-monophosphate) dehydrogenase 2 BDPLT16, BDPLT2, CD41, CD41B, GP2B, GPIIb, GT, GTA, HPA3, ITGA2B integrin, alpha 2b (platelet glycoprotein IIb of IIb/IIIa complex, antigen CD4 HHG, LCGR, LGR2, LH/CG-R, LH/CGR, LHR, LHRHR, LSH-R, ULG5, LHCGR luteinizing hormone/choriogonadotropin receptor CALLA, CD10, NEP, SFE, MME, membrane metallo-endopeptidase atriopeptidase, common acute lymphocytic leukemia antigen OT-R, OXTR oxytocin receptor RP11-125A15.2, ADPRT, ADPRT1, ARTD1, PPOL, pADPRT-1, poly (ADP-ribose) polymerase 1 ADP-ribosyltransferase (NAD+); PDE4A DPDE2, PDE4, PDE46, PDE4A, cAMP-specific 3',5'-cyclic phosphodiesterase 4A PFE, PTHR, PTHR1, PTH1R parathyroid hormone 1 receptor, PTH/PTHr receptor|PTH/PTHrP type I receptor|PTH1 receptor|parathyroid hormone receptor 1 NR1B1, RAR, retinoic acid receptor, alpha RP3-526I14.4, BPBS, BZRP, DBI, IBP, MBR, PBR, PBS, PKBS, PTBR, mDRC, pk18, TSPO translocator protein (18kDa) benzodiazepine peripheral binding site GRIM-12, TR, TR1, TRXR1, TXNR, TXNRD1 thioredoxin reductase 1, KM-102derived reductase-like factor 479 2 2 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Text-mining predictive features The future target impacts were anticipated by extracting the levels of early scientific interest as measured by the number of non-review research publications available prior to the first review published and by first derivative of the research interest. This extraction was accomplished by querying the PubMed with gene name and all its synonyms followed by manual review of the result to ensure that all selected articles are relevant to the biology of the target and the resultant therapeutic avenues. The derivative was measured as the ratio of the Microarray-based predictive features The microarray data were retrieved by GEO NCBI at http://www.ncbi.nlm.nih.gov/gds/. The dataset GSE7307 is described above. Using this dataset, normal and cancer specimen pairs were formed for skin, lung, prostate, liver, uterus and ovary tissue environments. For each paired environment (cancer-norm), differential expression values were computed as: D1 = [cancer]/[norm in the same environment] (1) D2 = [cancer ]/[average norm in all environments] (2) ABS (NT – NR)/(Spacing) Differential expression consistency metric was derived based on these primary data points. The direct data D1 and D2 were transformed in bonuspenalty data using the following rule: (5) where NT – is the number of non-review publications addressing the role of the gene in the disease of interest prior to the date of the first clinical trial inception; NR – is the number of nonreview publications addressing the role of the gene in the disease of interest prior to the date of the first review published; Spacing – is the number of years between first clinical trial and first review. The function ABS is the absolute value operator and it accounts for the fact that the first review and the first clinical trial may follow in any order. If D1 > cutoff1, D'1 (cutoff1) = 2; If D1 < cutoff1, D'1 (cutoff1) = 0 (3) If D2 > cutoff2, D'2 (cutoff2) = 2; If D2 < cutoff2, D'2 (cutoff2) = 0 The transformed indirect values D'1 and D'2 were multiplied for each environment and the products were summed to produce the Differential Expression Consistency Score (DEXCON). In this system, the leading scores were assigned to the genes that show high expression levels in cancer, but low expression levels in the normal environment related to the tumor as well as low expression levels in all unrelated normal environments. It is apparent that the potential targets with such parameters would possess wider therapeutic windows under all other circumstances being equal. The absolute levels of gene expression were measured across the panel of 90 tissue environments and this feature was termed INTENSITY, to reflect intensity of absolute mRNA transcript expression. The parameters DEXCON and INTENSITY became the features embed in an integrated classifier and were designated as X1 and X2 for the future reference. See the specific values of the parameters and corresponding bonus-penalty points in the Supplemental Table 2. The average N = (NT + NR)/2 (6) The average measures the absolute number of peerreviewed research publications related to the gene. All numbers were normalized for the natural growth of PubMed population in time, by the formula: N (T2)/N(T1) = 1.045^(T2-T1) (7) The features (5) and (6) were designated as X3 and X4 for combining them within the integrated classifier. Classifier design To design a linear classifier locating high-impact targets, the values of Y were transformed as Y' = log (Y + 1) (8) The purpose of transform was to smooth the dataset by relatively diminishing the effects of a few very high Y values, dominating the numerical structure. The smoothing allows effective increase of diversity in the training set and is equivalent to 480 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 predicted and real impacts. The populations of the residual errors (10) were compared for the training and testing sets to assess generalization by the prediction rule. We tested for the statistical equality of the error populations in the absence of over-fit. To select a proper T-test form (equal or unequal variance, 2-tail), a preliminary F-test was run to compare variances. The F-test reported equal variances between the error populations and based on these data equal variance T-test was applied for population comparison. The populations of errors were identical, with no over-fitting detected. Based on this conclusion, the training and testing populations were merged for the plotting of Receiver Operating Characteristic (ROC) curve. Ranking of real impacts were used for defining high and low real impact categories and respective labeling of the targets. At the next step, predicted target scores were ranked, and the distribution of real score labels was traced as a function of the predicted score. The bin with high predicted score on the ROC curve provided significant enrichment to the real high impact labels, thus, validating our approach. the increased training set size, facilitating a more objective training process. The correction term + 1 in (8) accounts for zero Y values not amenable to log transform. The distortion introduced by + 1 correction is minimal and does not outweigh the benefit of the smoothing procedure. The features X1-X4 were ranked. The Y' was related to the ranked X1-X4 by a linear regression: YP' = W1X1 + W2X2 + W3X3 + W4X4 + A (9) where W1-W4 are the corresponding weight coefficients to be determined by an error minimization procedure: Sum (Y' – YP')^2 = MIN (10) where YP' are the predicted impacts for the entire population of the training set, Y' are the above defined “true impacts” for the entire population of the training set. The set of training weights [W1W4, A] was determined by minimizing (10) using the Least Square Method. Further steps taken to improve the resolution of the method included the following. The ranked values of X1-X4 were each scored in the following manner: the top rank [0.9-1.0] received a score of 2, the next bin [0.75-0.89] received a score of 1 and all bins below 0.75 received score of 0. The direct column vectors of features were replaced with the bonus-penalty values as defined here. The error minimization procedure was repeated and the proportions between the weight coefficients W remained practically unchanged. The meaning of the bonus-penalty transform is to emphasize the role of informative outliers on the side of the model factors X (as opposed to the outputs Y) and to smooth the effects of the random noise, distributed equally among the members of the training set. To re-phrase, the bonus-penalty system selects only the most informative genes for contributions in the prediction rule and maximizes the signal-to-noise ratio at the given size of the training set. ROC curve plotting and its use for computing relative enrichment The true impacts Y were subdivided based on rank in the “high-impact” bin with the ranks [0.75-1.0] and “low-impact” bin with the ranks [0 – 0.75]. The members of these groups acquired the positive and negative labels respectively. The predicted scores PY' were ranked as well, and the population of true impacts followed the rank of PY', producing a nonideal, but a generally correlating pattern. The true “high-impact” labels were predominantly concentrated in the higher regions of PY' rank. The predicted score ranks were explored from the top (1.0) to the bottom (0.0) values. The fraction of high-impacts f1 was computed by summarizing the positive labels as defined above. The fraction of low-impacts f2 was computed in a similar fashion. The fraction f1 of “high-impact” Y and the fraction f2 of “low-impact” Y were forming the Yaxis and the X-axis of the plot, respectively. Per each 0.1 (10%) increment of “low-impact” count, the fractional increment of “high-impact” targets was also computed. Every point on ROC curve can be represented in the coordinates [summary fraction of “low-impact” values; summary fraction of “highimpact” values], the summary fraction is the sum of Validation of selected approach The list of targets was randomly divided in the training and testing sets of equal size. The testing set was set apart until the prediction rule was derived in the training set. Based on the derived prediction rule, the residual errors were computed in the training and test sets by comparing the 481 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 possibility of predicting high-impact target category a priori, already at the stage of preclinical development and before the onset of the most expensive clinical trial phase. It is very unlikely that the inherent biological mechanism determining the target's future impact at the preclinical development stages remain obscure. This mechanism leaves its signature in a variety of large-scale high-throughput studies as well as in collective research activity patterns. The more diverse sources of information are incorporated and the more the prediction point is shifted away from an onset of active clinical trial stage, the lesser the role of “me-too” bias factor in the emergence of the detected patterns. The Tables 1 and 2, respectively, show the sets of FDA-approved and “still-in a-pipeline” anticancer targets with the predicted impacts identified on the above-described basis. The impact leaders on the side of the targets with the approved ligands are EGFR, ERBB2, CYP19A1, MTOR, PTGS2, tubulin, VEGFA, BRAF, PGR and PDGFRA. The functions and cancer-related status of the genes were explored using database Genes at NCBI. The predicted impact leaders demonstrate favorable biological anti-cancer features due to their central roles in more universal pathophysiological mechanisms. Thus, many forms of cancer depend on overexpression of EGFR and ERBB2 for their survival. Blockade of these kinases synergizes with cytotoxic therapeutics. In normal cells, such dependence is rare or absent; therefore, the combinational regimens based on EGFR and ERBB2 have an access to a broad therapeutic window. The aromatase CYP19A1 is a key enzyme in estrogen synthetic pathway and is selectively important for the cancer subtypes that rely on estrogen stimulation for growth and survival. Mammalian target of rapamycin (mTOR) regulates the functions of cell survival, motility, proliferation, protein synthesis and transcription, making this target extremely important for rapidly propagating cells characterized by destabilized metabolism. PTGS2 (cyclooxygenase-2) is among the most important mediators of inflammation. Consequently, this target is indispensable for the growth stimulation produced by stromal immune cells and for metastasis. Many tumor types directly depend on prostaglandin stimulation. Tubulins are selectively more important for rapidly dividing cells undergoing mitotic process. VEGFA pathway is necessary for neovascularization of tumor foci. VEGFA also produces autocrine stimulation of all increments over the previous intervals. To exemplify, the “low-impact” summary fraction 0.1 + 0.1 + 0.1 = 0.3; the matching value of the “highimpact” summary fraction becomes 0.5 + 0.1 + 0.05 = 0.65. The ratio of the summary fractions characterizes the relative enrichment in the true “high-impact” values as a function of the predicted impact rank. To exemplify, in the highest 0.1 fraction of the predicted impact rank corresponding to the left-most part of the ROC curve and [1.0-0.9] bin of the PY' rank, the summary fraction of the “high-impact” values is 0.5, therefore the relative enrichment is 0.5:0.1 = 5. Considering a predicted impact bin of rank [1.0-0.8], the summary fraction of the “high-impact” values is 0.5 + 0. 1 = 0.6, while the summary fraction of the “low-impact” values is 0.1 + 0.1 = 0.2, therefore the relative enrichment is 0.6:0.2 = 3. On comparative basis, using the top rank bin of the PY' rank produces 5fold higher chance to encounter a true “highimpact” label than using the population before the computational filter was applied. RESULTS A priori evaluation of future economic impact of a putative anti-cancer targets Figure 1 illustrates the attempt to predict target impacts based on the combination of DEXCON (X1) and INTENSITY (X2) gene expression features, extracted from the microarray data by the abovedescribed methodology. Figure 2 illustrates the attempt to predict target impacts based on textmining features. Figure 3 illustrates the attempt to predict target impacts based on the combination of DEXCON and INTENSITY gene expression features as well as the text-mining features. The ROC curves demonstrate non-zero area between the diagonal baseline, which reflects the ratio of false positive to false positive summary functions in each bin, and the thicker upper line which reflects the ratio of true positive and false positive summary functions in each bin. The ratio of true positives to the false positives was significantly higher than the baseline, as it is especially evident for the left corner of the ROC plot that describes the highest range of the predicted scores. The regions with higher predicted scores embed the majority of reallife high impact targets, while the regions with lower predicted scores are depleted in real-life high impact targets. These Figures point to the 482 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 relatively non-toxic ligands explain their observed ranks presented in Table 1. The sets of targets presented in the Table 2 are not yet drugged by suitable ligands. The target STS (steroid sulfatase, EC 3.1.6.2) is a member of a steroid synthesis pathway that is capable to produce in selected cancer cell populations the same dependence as other steroid pathway mediators, explaining its relatively high predicted impact. PROC (protein C) is actively involved in blood anti-coagulation pathways, stimulates cell migration and influences the secretory behavior of tumor cells, while suppressing NK killers and T helper cells of TH2, TH17 and TH21 subtypes. Therapeutic activation of TP53 is intended to restore the powerful tumor-suppressor phenotype mediated by protein, explaining its high predicted impact. multiple pro-cancer survival pathways, thus, its inhibition selectively affects almost all cancers. BRAF (serine-threonine kinase B-Raf) is a protooncogene with a broad transforming potential and, therefore, the blockade of its product is selectively more important for the cells where BRAF is constitutively activated. Progesterone receptor (PGR) is selectively more important for ovarian, mammary and endometrium cell populations that depend on progesterone for their growth and survival. Finally, PDGRFA (platelet-derived growth factor receptor, alpha polypeptide) is the powerful mitogen indispensable in certain tissues. To summarize, all genes demonstrating high predicted impacts also demonstrate highly selective involvement in specific cancer types and lack of such involvement in a majority of normal tissues. These differential roles and the ability to bind Figure 1. ROC curve of high-impact targets vs low-impact targets. The ROC curve was built using DEXCON and INTENSITY parameters, using the following procedure: 1) divide the combined list of targets (successful and research) by impact categories, the top 25% forming the “high-impact” class and the rest forming “low-impact” class. 2) apply bonuspenalty scoring approach to the DEXCON and INTENSITY values for the list of targets and combining the indirect bonuspenalty scores with the optimized weight in a 2-feature classifier. 3) rank the target list by the values of the 2-feature classifier. 4) determine the fractions of the “high-impact” and “low-impact” categories in each 0.2 bin of rank by the 2feature classifier. 5) summarize the differential fractions for each category accrued on the range from 0 (highest ranked 2feature scores) to the given point of rank for the 2-feature score. 6) the sum of differential fractions for “low-impact” categories forms a X-axis coordinate; the sum of differential fractions for “high-impact” categories forms a Y-axis coordinate. The diagonal thin baseline at 45 degrees across the plot reflects the ratio of false positives to false positives for each new point in the form of summary functions, while thicker line reflects the ratio of true positive summary function to the false positive summary function in ROC analysis. At the far right corner, the summary function for the true and false positives are both equal to 1, and the lines cross. The area between the lines is proportional to the resolution quality at multiple possible cut-offs. 483 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 response. Alterations of CD40 function determine probability of cancer emergence and metastasis. Proto-oncogene c-MET is involved in de novo angiogenesis and metastasis; its activation in tumors is correlated with poor prognosis. Being a receptor component adds up to its potential for higher impact upon drugging. Hepatocyte growth factor/scatter factor HGF is activating a tyrosine kinase signaling cascade of c-Met, thus, contributing to the same metastasis-related pathway. In leukemia patients, JAK2 (Janus kinase 2) forms fusions with the TEL(ETV6) (TEL-JAK2) and PCM1 genes providing the targets that do not exist in normal cells. These targets are druggable in the same manner as by well-known Gleevec. To summarize, the genes pinpointed as promising tend to participate in the most central mechanism of tumor cells survival and propagation. Attempts to reactivate CDKN2A (cyclin-dependent kinase inhibitor 2A, multiple tumor suppressor 1) are performed within the same therapeutic paradigm as for TP53. Proto-oncogene c-JUN encodes transcription factor that mediates apoptosis resistance, with good potential of pharmacological inhibition for a significant impact. TNFSF11 (Tumor necrosis factor (ligand) superfamily, member 11) is involved in metastasis and boneremodeling. Remarkably, the high predicted impact was assigned to TNFSF11 based on Therapeutic Target Database definition of the candidate as not yet matching an FDA-approved ligand. However, we found that TNFSF11 ligand denosumab was approved in 2010 under the names of Xgeva and Prolia, thus, validating our approach. CD40 (TNF receptor superfamily member 5) is a co-stimulatory protein found on antigen presenting cells and as such is a key molecule in establishing an immune Figure 2: ROC curve of high-impact targets vs. low-impact targets. The ROC curve was built using text-mining parameters, using the following procedure: 1) divide the combined list of targets (successful and research) by impact categories, the top 25% forming the “high-impact” class and the rest forming “low-impact” class. 2) apply bonus-penalty scoring approach to the N (Number of publications between first review and first clinical) and time-derivative of N for the list of targets and combining the indirect bonus-penalty scores with the optimized weight in a 2-feature classifier. 3) rank the target list by the values of the 2-feature classifier. 4) determine the fractions of the “high-impact” and “low-impact” categories in each 0.2 bin of rank by the 2-feature classifier. 5) summarize the differential fractions for each category accrued on the range from 0 (highest ranked 2-feature scores) to the given point of rank for the 2-feature score. 6) the sum of differential fractions for “low-impact” categories forms a X-axis coordinate; the sum of differential fractions for “high-impact” categories forms a Y-axis coordinate. The diagonal thin baseline at 45 degrees across the plot reflects the ratio of false positives to false positives for each new point in the form of summary functions, while thicker line reflects the ratio of true positive summary function to the false positive summary function in ROC analysis. At the far right corner, the summary functions for the true and false positives are both equal to 1, and the lines cross. The area between the lines is proportional to the resolution quality at multiple possible cut-offs. 484 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Figure 3: ROC curve of high-impact targets vs low-impact targets. The ROC curve was built using DEXCON, INTENSITY and text-mining parameters, using the following procedure: 1) divide the combined list of targets (successful and research) by impact categories, the top 25% forming the “high-impact” class and the rest forming “low-impact” class. 2) apply bonus-penalty scoring approach to the DEXCON and INTENSITY values for the list of targets. 3) apply bonuspenalty scoring approach to the ranked derivative of early research interest and to the volume N of early research interest. 4) combine the secondary bonus-penalty values for all features with the optimized training weights in a 4-feature classifier. 5) rank the target list by the values of the 4-feature classifier. 6) determine the fractions of the “high-impact” and “lowimpact” categories in each 0.2 bin of rank by the 4-feature classifier. 7) summarize the differential fractions for each category accrued on the range from 0 (highest ranked 4-feature scores) to the given point of rank for the 4-feature score. 8) the sum of differential fractions for “low-impact” categories forms a X-axis coordinate; the sum of differential fractions for “high-impact” categories forms a Y-axis coordinate. The diagonal thin baseline at 45 degrees across the plot reflects the ratio of false positives to false positives for each new point in the form of summary functions, while thicker line reflects the ratio of true positive summary function to the false positive summary function in ROC analysis. At the far right corner, the summary functions for the true and false positives are both equal to 1, and the lines cross. The area between the lines is proportional to the resolution quality at multiple possible cut-offs. patients. MME (MME membrane metalloendopeptidase) shows a strong link to cancer, to both prognosis and metastasis, however targeting of metallo-endopeptidases (MMPs) was historically not successful, despite an investment of a lot of efforts. OXTR (oxytocin receptor) is mostly involved in behavior and social adaptation and its involvement in tumorigenesis is a stretch. PARP1 (poly (ADP-ribose) polymerase is strongly related to malignancy, however its expression is ubiquitous, and only a few clinical trials have been published for the targeting of this gene. While PDE4A (phosphodiesterase 4A, cAMP-specific) is involved in cardiac muscle activity and fibroblast proliferation, the links to malignancy are indirect. PTH1R (parathyroid hormone 1 receptor) functions The genes with the lowest rank in the Table 1 were analyzed in a similar fashion. IMPDH1 (IMP (inosine 5'-monophosphate) dehydrogenase 1) is mostly involved in transplant rejection and retinitis pigmentosa, its link to the cancer is tenuous. IMPDH2 (inosine 5'-monophosphate dehydrogenase 2) is also involved in autograft rejection, and its connections to cancer are not apparent. ITGA2B (integrin, alpha 2b (platelet glycoprotein IIb of IIb/IIIa complex, antigen CD41) is mostly involved in fibrinogen activation and coagulopathies, however, its connections to metastasizing are proven. LHCGR (luteinizing hormone/choriogonadotropin receptor) is involved in a broad diversity of pathways and its levels correlate with survival in ovarian epithelial cancer 485 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 participates in redox processes, apoptosis, membrane raft formation, while its overexpression correlates with glioblastoma multiforme progression. Speaking generally, comparison of high and low-impact score bins indicates that the higher ranking score gene set consistently ignites substantially higher interests of members of research community. The entire list of the high score bin members is associated with prominent cancer-related finding and produces relevant DEXCON signals, exceeding the threshold of interest. The drugging of candidates with highest impact ranks would be the most influential on cancer outcomes and deserves prioritization. are pleiotropic, with known roles in transplant rejection, organ development and bone maturation, with some evidence of its contribution to certain breast cancers. RARA (retinoic acid receptor, alpha) is vital for differentiation of hematopoietic lineages and respective malignancies. However, the utility of RARA agonists is confined to leukemia field, and the number of clinical trials in this area is limited. TSPO (translocator protein) is involved in microglia and retinal inflammation, HIV-1 virus maturation, while overexpression of TSPO correlates with the progress of breast cancer. However, the number of clinical trials for drugs that target this molecule is small, and its expression is not highly specific to cancer samples. TXNRD1 (thioredoxin reductase 1) Table 2. Modeled impacts for anti-cancer targets at the “research” stage Gene ID Synonyms Target impacts STS 155 PROC TP53 CDKN2A c-JUN TNFSF11 CD40 c-MET JAK2 HGF PRKCA PRKCB CXCR2 TGFB1 TLR9 ARSC, ARSC1, ASC, ES, SSDD, XLI, STS steroid sulfatase (microsomal), isozyme S arylsulfatase C, estrone sulfatase, steryl-sulfatase, steryl-sulfate sulfohydrolase APC, PC, PROC1, THPH3, THPH4, protein C (inactivator of coagulation factors Va and VIIIa), anticoagulant protein C, autoprothrombin IIA, blood coagulation factor XIV BCC7, LFS1, P53, TRP53, tumor protein p53 ARF, CDK4I, CDKN2, CMM2, INK4, INK4A, MLM, MTS-1, MTS1, P14, P14ARF, P16, P16-INK4A, P16INK4, P16INK4A, P19, P19ARF, TP16 AP1, V-jun avian sarcoma virus 17 oncogene homolog, activator protein 1, protooncogene c-jun, transcription factor AP-1 RP11-86N24.2, CD254, ODF, OPGL, OPTB2, RANKL, TRANCE, hRANKL2, sOdf, TNFSF11, tumor necrosis factor (ligand) superfamily, member 11 , TNFrelated activation-induced cytokine, osteoclast differentiation factor, osteoprotegerin ligand Bp50, CDW40, TNFRSF5, p50, TNF receptor superfamily member 5, B cell surface antigen CD40 AUTS9, HGFR, RCCP2, c-Met, MET met proto-oncogene HGF receptor, HGF/SF receptor, SF receptor, hepatocyte growth factor receptor, met protooncogene tyrosine kinase JTK10, THCYT3, Janus kinase 2 JAK-2, Janus kinase 2 (a protein tyrosine kinase), tyrosine-protein kinase JAK2 DFNB39, F-TCF, HGFB, HPTA, SF, hepatocyte growth factor (hepapoietin A; scatter factor), fibroblast-derived tumor cytotoxic factor AAG6, PKC-alpha, PKCA, PRKACA, PRKCA protein kinase C, alpha PKC-A, aging-associated gene 6, protein kinase C alpha type PKC-beta, PKCB, PRKCB1, PRKCB2, protein kinase C, beta PKC-B, protein kinase C beta type CD182, CDw128b, CMKAR2, IL8R2, IL8RA, IL8RB, chemokine (C-X-C motif) receptor 2 CED, DPD1, LAP, TGFB, TGFbeta, transforming growth factor, beta 1 , TGFbeta-1 UNQ5798/PRO19605, CD289, toll-like receptor 9 486 108 103 80 72 66 41 41 40 29 27 27 23 22 22 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 2 Continued….. AKT1 CDK4 AKT, CWS6, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA, v-akt murine thymoma viral oncogene homolog 1 PKB alpha, RAC-PK-alpha, RAC-alpha serine, threonine-protein kinase Bcl-2, PPP1R50, B-cell CLL, lymphoma 2 apoptosis regulator Bcl-2, protein phosphatase 1, regulatory subunit 50 AAG4, APO-J, APOJ, CLI, CLU1, CLU2, KUB1, NA1/NA2, SGP-2, SGP2, SP40, TRPM-2, TRPM2, clusterin, aging-associated protein 4, apolipoprotein J|complement cytolysis inhibitor HIF-1A, HIF-1alpha, HIF1, HIF1-ALPHA, MOP1, PASD8, bHLHe78, hypoxia inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor) BSF2, HGF, HSF, IFNB2, IL-6, interleukin 6 B-cell differentiation factor, B-cell stimulatory factor 2, BSF-2, CDF, CTL differentiation factor EG5, HKSP, KNSL1, MCLMR, TRIP5, kinesin family member 11, TR-interacting protein 5,TRIP-5, kinesin-like protein 1|kinesin-like protein KIF11 APO2, CD261, DR4, TRAILR-1, TRAILR1, TNFRSF10A tumor necrosis factor receptor superfamily, member 10a TNF-related apoptosis-inducing ligand receptor 1 UNQ160/PRO186, CD262, DR5, KILLER, KILLER/DR5, TRAIL-R2, TRAILR2, TRICK2, TRICK2A, TRICK2B, TRICKB, ZTNFR9, tumor necrosis factor receptor superfamily IGF-I, IGF1A, IGFI, insulin-like growth factor 1 (somatomedin C) IGF-IA, IGFIB, MGF, insulin-like growth factor I, insulin-like growth factor IA MRTL, MYCC, bHLHe39, c-Myc, MYC v-myc avian myelocytomatosis viral oncogene homolog AGPT2, ANG2, ANGPT2, angiopoietin 2, ANG-2, Tie2-ligand, angiopoietin-2, angiopoietin-2B, angiopoietin-2a CFC3, MAPKK1, MEK1, MKK1, PRKMK1, mitogen-activated protein kinase kinase 1, ERK activator kinase 1, MAPK/ERK kinase 1 CMM9, DKCA2, DKCB4, EST2, PFBMFT1, TCS1, TP2, TRT, hEST2, hTRT, TERT telomerase reverse transcriptase CMM3, PSK-J3, cyclin-dependent kinase 4 cell division protein kinase 4 MDM2 ACTFS, HDMX, hdm2, MDM2 oncogene, E3 ubiquitin protein ligase 9 NQO1 DHQU, DIA4, DTD, NMOR1, QR1, NQO1, NAD(P)H dehydrogenase, quinone 1, DT-diaphorase, NAD(P)H dehydrogenase [quinone] 1 CLG4, CLG4A, MMP-II, MONA, TBE-1, MMP2 matrix metallopeptidase 2, gelatinase A, 72kDa gelatinase, 72kDa type IV collagenase CLG4B, GELB, MANDP2, MMP-9, matrix metallopeptidase 9 (gelatinase B, 92kDa gelatinase, 92kDa type IV collagenase) CHDS6, MMP-3, SL-1, STMY, STMY1, STR1, matrix metallopeptidase 3 (stromelysin 1, progelatinase) ErbB-3, HER3, LCCS2, MDA-BF-1, c-erbB-3, erbB3-S, p180-ErbB3, p45sErbB3, p85-sErbB3, v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3 proto-oncogene-like protein 582J2.1, CD124, IL-4RA, IL4RA, interleukin 4 receptor IL-4 receptor subunit alpha, IL4R nirs variant 1, interleukin-4 receptor alpha chain PLK, STPK13, polo-like kinase 1 , cell cycle regulated protein kinase, polo (Drosophia)-like kinase, polo like kinase FAAR, NR1C2, NUC1, NUCI, NUCII, PPARB, PPARD peroxisome proliferatoractivated receptor delta 9 BCL2 CLU HIF1A IL6 KIF11 TNFRSF10A TNFRSF10B IGF1 MYC ANGPT2 MAP2K1 TERT MMP2 MMP9 MMP3 ERBB3 IL4R PLK1 PPARD 487 20 18 18 18 18 17 16 16 15 15 14 14 13 10 8 7 6 5 5 5 5 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 2 Continued….. EPHB4 PLAUR TXN ERBB4 FGFR3 MCL1 NRP1 CDK9 CENPE CHEK2 FGFR1 FN1 MMP1 SMO TEK CCR2 CDK6 CDK7 IKBKB MAPK14 MPO NGFR PRKCD WEE1 MMP14 HTK, MYK1, TYRO11, EPH receptor B4 ephrin receptor EphB4, ephrin type-B receptor 4, hepatoma transmembrane kinase CD87, U-PAR, UPAR, URKR, plasminogen activator, urokinase receptor monocyte activation antigen Mo3, u-plasminogen activator receptor form 2 RP11-427L11.1, TRDX, TRX, TRX1, thioredoxin ADF, ATL-derived factor, SASP, TXN delta 3, surface-associated sulphydryl protein, thioredoxin delta 3 ALS19, HER4, p180erbB4, v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 4 ACH, CD333, CEK2, HSFGFR3EX, JTK4, fibroblast growth factor receptor 3 BCL2L3, EAT, MCL1-ES, MCL1L, MCL1S, Mcl-1, TM, bcl2-L-3, mcl1/EAT, myeloid cell leukemia 1 bcl-2-like protein 3 C530029I03, NP-1, NPN-1, Npn1, Nrp, neuropilin 1 A5 protein, Neuropilin-1 precursor (A5 protein), neuropilin-1 RP11-228B15.5, C-2k, CDC2L4, CTK1, PITALRE, TAK, cyclin-dependent kinase 9, CDC2-related kinase, cell division cycle 2-like protein kinase 4 CENP-E, KIF10, PPP1R61, CENPE centromere protein E, Centromere autoantigen E RP11-436C9.1, CDS1, CHK2, HuCds1, LFS2, PP1425, RAD53, hCds1, CHEK2 , checkpoint homolog BFGFR, CD331, CEK, FGFBR, FGFR-1, FLG, FLT-2, FLT2, HBGFR, HH2, HRTFDS, KAL2, N-SAM, OGD, bFGF-R-1, fibroblast growth factor receptor 1 CIG, ED-B, FINC, FN, FNZ, GFND, GFND2, LETS, MSF, FN1 fibronectin 1 cold-insoluble globulin, fibronectin, migration-stimulating factor MMP1, CLG, CLGN, MMP1 matrix metallopeptidase 1 (interstitial collagenase), fibroblast collagenase, interstitial collagenase, matrix metalloprotease 1 E130215L21Rik, Smoh, bnb, smoothened, Smo smoothened homolog (Drosophila) bent body|smoothened homolog CD202B, TIE-2, TIE2, VMCM, VMCM1, TEK tyrosine kinase, endothelial angiopoietin-1 receptor hCG_14621, CC-CKR-2, CCR-2, CCR2A, CCR2B, CD192, CKR2, CKR2A, CKR2B, CMKBR2, MCP-1-R, CCR2 chemokine (C-C motif) receptor 2 PLSTIRE, CDK6 cyclin-dependent kinase 6, cell division protein kinase 6, serine/threonine-protein kinase PLSTIRE CAK1, CDKN7, HCAK, MO15, STK1, p39MO15, CDK7 cyclin-dependent kinase 7, 39 KDa protein kinase, CAK, CDK-activating kinase 1, TFIIH basal transcription factor complex kinase subunit IKK-beta, IKK2, IKKB, IMD15, NFKBIKB, IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta RP1-179N16.5, CSBP, CSBP1, CSBP2, CSPB1, EXIP, Mxi2, PRKM14, PRKM15, RK, SAPK2A, p38, p38ALPHA, mitogen-activated protein kinase 14 myeloperoxidase RP23-67E18.6, LNGFR, Tnfrsf16, p75, p75NGFR, p75NTR, nerve growth factor receptor (TNFR superfamily, member 16) CVID9, MAY1, PKCD, nPKC-delta, protein kinase C, delta protein kinase C delta VIII, protein kinase C delta type, tyrosine-protein kinase PRKCD WEE1A, WEE1hu, WEE1 G2 checkpoint kinase, WEE1+ homolog, wee1-like protein kinase, wee1A kinase MMP-14, MMP-X1, MT-MMP, MT-MMP 1, MT1-MMP, MT1MMP, MTMMP1, WNCHRS, matrix metallopeptidase 14 (membrane-inserted) 488 4 4 4 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 0.333333333 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 2 Continued….. ACVRL1 ACVRL1, ACVRLK1, ALK-1, ALK1, HHT, HHT2, ORW2, SKR3, TSR-I, ACVRL1 activin A receptor type II-like 1, TGF-B superfamily receptor type I, activin A receptor BCL-W, BCL2-L-2, BCLW, PPP1R51, BCL2-like 2 apoptosis regulator BCL-W, bcl-2-like protein 2, protein phosphatase 1, regulatory subunit 51 Cmkbr1, Mip-1a-R, Ccr1 chemokine (C-C motif) receptor 1, C-C CKR-1, C-C chemokine receptor type 1, CC-CKR-1, MIP-1 alpha R, MIP-1 alphaR CDC42 RP1-224A6.5, CDC42Hs, G25K, cell division cycle 42 G25K GTPbinding protein, GTP binding protein, 25kDa CDHN, N-cadherin, Ncad, Cdh2, cadherin 2,neural cadherin 0 RP11-195M16.2, CSF-1, MCSF, colony stimulating factor 1 (macrophage), lanimostim, macrophage colony-stimulating factor 1 RP11-363I22.4, CTS02, CTSO, CTSO1, CTSO2, PKND, PYCD, cathepsin K, cathepsin O, cathepsin O1, cathepsin O2, cathepsin X FPTB, farnesyltransferase, CAAX box, beta, CAAX farnesyltransferase subunit beta, FTase-beta protein farnesyltransferase subunit growth hormone secretagogue receptor, GH-releasing peptide receptor, GHRP, GHS-R, ghrelin receptor, growth hormone secretagogue receptor type 1 BB2, gastrin-releasing peptide receptor GRP-R|GRP-preferring bombesin receptor|bombesin receptor 2 glycogen synthase kinase 3 beta GSK-3 beta, GSK3beta isoform, glycogen synthase kinase-3 beta, serine/threonine-protein kinase GSK3B GUCY2C DIAR6, GUC2C, MECIL, MUCIL, STAR, GUCY2C , guanylate cyclase 2C , heat stable enterotoxin receptor, GC-C STA receptor, guanylyl cyclase C, hSTAR AHO3, BDMR, HA6116, HD4, HDAC-4, HDAC-A, HDACA, histone deacetylase 4, histone deacetylase A CD127, CDW127, IL-7R-alpha, IL7RA, ILRA, IL7R interleukin 7 receptor, CD127 antigen, IL-7 receptor subunit alpha, IL-7R subunit alpha, IL-7RA, interleukin 7 receptor alpha chain leukotriene A4 hydrolase, LTA-4 hydrolase, leukotriene A-4 hydrolase 0 LTB4R BLT1, BLTR, CMKRL1, GPR16, LTB4R1, LTBR1, P2RY7, P2Y7, leukotriene B4 receptor G protein-coupled receptor 16, G-protein coupled receptor 16, LTB4-R 1|LTB4-R1|P2Y purinoceptor 7, chemoattractant receptor-like 1 0 MAPK12 ERK3, ERK6, P38GAMMA, PRKM12, SAPK-3, SAPK3, mitogen-activated protein kinase 12, ERK-6, MAP kinase 12, MAP kinase p38 gamma ERK3, HsT17250, PRKM6, p97MAPK, MAPK6 mitogen-activated protein kinase 6, ERK-3, MAP kinase 6, MAP kinase isoform p97 JNK, JNK-46, JNK1, JNK1A2, JNK21B1/2, PRKM8, SAPK1, SAPK1c, MAPK8 mitogen-activated protein kinase 8 , JUN N-terminal kinase METAP2 MAP2, MNPEP, p67, p67eIF2, methionyl aminopeptidase 2, eIF-2associated p67 homolog, initiation factor 2-associated 67 kDa glycoprotein HME, ME, MME, MMP-12, matrix metallopeptidase 12 (macrophage elastase) 0 BCL2L2 CCR1 CDC42 CDH2 CSF1 CTSK FNTB GHSR GRPR GSK3B GUCY2C HDAC4 IL7R LTA4H MAPK6 MAPK8 METAP2 MMP12 MMP13 MMP7 NFKB1 CLG3, MANDP1, MMP-13, matrix metallopeptidase 13 (collagenase 3) collagenase 3|matrix metalloproteinase 13 (collagenase 3) MPSL1, PUMP-1, matrix metallopeptidase 7 (matrilysin, uterine) matrilysin, matrin, matrix metalloproteinase 7 (matrilysin, uterine) EBP-1, KBF1, NF-kB1, NF-kappa-B, NF-kappaB, NFKB-p105, NFKB-p50, NFkappaB, p105, p50 489 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 Table 2 Continued….. NPEPPS NTSR1 P2RY1 PPIA PRKCG PRLR PTPN11 PTPN22 RPS6KB1 SLC44A4 SPHK1 TEP1 YES1 AAP-S, MP100, PSA, aminopeptidase puromycin sensitive cytosol alanyl aminopeptidase NTR, NTSR1 neurotensin receptor 1 (high affinity), NTRH, high-affinity levocabastine-insensitive neurotensin receptor, neurotensin receptor type 1 P2Y1, purinergic receptor P2Y, G-protein coupled, 1 ATP receptor, P2 purinoceptor subtype Y1, P2Y purinoceptor 1, platelet ADP receptor CYPA, CYPH, HEL-S-69p, peptidylprolyl isomerase A, (cyclophilin A), PPIase A, T cell cyclophilin, cyclophilin, cyclosporin A-binding protein PKC-gamma, PKCC, PKCG, SCA14, protein kinase C, gamma protein kinase C gamma type HPRL, MFAB, hPRLrI, prolactin receptor PRL-R, hPRL receptor, secreted prolactin binding protein BPTP3, CFC, NS1, PTP-1D, PTP2C, SH-PTP2, SH-PTP3, SHP2, protein tyrosine phosphatase, non-receptor type 11 PTP-2C, protein-tyrosine phosphatase 1D LYP, LYP1, LYP2, PEP, PTPN8, protein tyrosine phosphatase, non-receptor type 22 (lymphoid) PEST-domain phosphatase|hematopoietic cell protein-tyrosine phosphatase PS6K, S6K, S6K-beta-1, S6K1, STK14A, p70 S6KA, p70(S6K)-alpha, p70-S6K, p70-alpha,ribosomal protein S6 kinase, 70kDa, polypeptide 1 p70 S6 kinase SLC44A4 DAAP-66K18.1, C6orf29, CTL4, NG22, TPPT, solute carrier family 44, member 4, choline transporter-like protein 4 SPHK, sphingosine kinase 1 , SK 1, SPK 1 TLP1, TP1, TROVE1, VAULT2, p240, telomerase-associated protein 1, TROVE domain family, member 1, p80 telomerase homolog, telomerase protein 1 HsT441, P61-YES, Yes, c-yes, v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1 0 0 0 0 0 0 0 0 0 0 0 0 0 of these pathologies should continue unimpeded despite low scores for respective targets. Hence, the model that we propose should be further optimized by introducing an “orphan” disease coefficient that would preclude an attrition of the targets that are highly specific to certain malignancies that ail relatively low number of patients. By contrast, the lower scoring members not always show well-established association with cancer phenotypes as evident from the functional description entries that passed manual curation before linking to respective gene names in “Genes” subdivision of NCBI. For this group of genes, a majority of evidence is based on correlation of mRNA or protein expression levels to cancer phenotypes or outcomes, however, an independent evaluation of the consistency of overexpression findings does not confirm uniformity of this observation. However, retrospective post-prediction analysis revealed an important caveat which is likely to result in future improvements of the impact score concept. Some genes, like RARA or MME, demonstrate strong links to cancer, but produce low scores due to relatively narrow utility, i.e. limiting the applicability of developed ligands to limited spectrum of tumors. As these diseases are generally considered “orphan”, it is important that the development of the ligands aimed at the treatment An impact of targeting “super-targets” as a component of a combinatorial treatment Typically, anti-cancer therapies are combinatorial as they include at least 2 components. Assuming typical three-component therapy approach, n-fold increase in the number of available high-impact targets would result in n3 increase in the number of available drug combinations, prompting further progress in their evaluation and testing. Based on the data in Table 1 and different evaluations (2- 3), the effect of the current pool of therapies can be substantially magnified by this combinatorial expansion. Above we discussed that a conservative 490 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 disproportional, many novel therapies rely on a combinatorial synergy with EGRF and ERBB2 ligands (22-23). Hence, introduction of novel highimpact targets could alter the survival dynamics even further for at least some of the cancer forms. The technique presented in this paper allowed us to evaluate the potential impact of the targets which are currently at the research stage. Our analysis highlighted microsomal steroid sulfatase (estrone sulfatase) at the top of the list which, after some score gap, was followed by anticoagulant protein C, p53, CDKN2A, c-Jun, TNSFS11, CD40, c-MET and JAK2, all of which were highlighted as the most promising research-stage targets. Accoring to our calculations, the relative importance of microsomal steroid sulfatase is at approximately the same level as that of well-known anticancer targets BRAF kinase and progesterone receptor. A PubMed search using term ”estrone sulfatase” or “estrone sulfatase” AND “cancer” returns 193 and 125 manuscripts, respectively. Recent years resulted in the development of a number of potent estrone sulfatase inhibitors aimed at the suppression of the formation of both E1 and breast carcinomapromoting steroid dehydroepiandrosterone (DHEA) from DHEA-sulfate (DHEAS) (24-26). As approximately 40% of breast tumors are estrogendependent, successful advancement of estrone sulfatase inhibitors into clinical practice could potentially lead to sizable global effects. On the other hand, many potential targets were predicted to have minimal impact; deprioritization of these targets may lead to substantial savings and subsequent shift of clinical development efforts toward the most promicing drug candidates. Our impact scoring technique is, in a nutshell, a 4-feature bonus-penalty classifier which comprises two components, the microarray and the literature mining. The INTENSITY feature is the level of absolute transcript expression demonstrated by the target candidate. With all other factors being equal, the candidates with more intensive expression would influence biological signal transduction events more robustly as they produce higher amounts of mRNA, and, therefore, the protein. While the detected correlation of the impacts and the transcript levels is relatively weak (r = 0.2), it is sufficient to boost performance of a classifier of the bonus-penalty type. The DEXCON feature reflects the stability of the differential expression signal in tumors of various tissue origins, and in the tumors of same origin. As it was shown earlier, the estimate of the contribution of therapeutics to the observed doubling of cancer survival is at 40%. Designating this increment as SIDT (Survival Increment Due To Therapies), being equal of 40%, one can draw a model: SIDT(2) = SIDT (1) N nm (12) where SIDT (1) is the survival increment due to therapies at the current level, SIDT(2) is the survival increment at the projected level, N is the number of available therapies at the current level, m is the increment in the number of high-impact targets applied in cancer field. The exponential coefficient 2-3 assumes a formation of two or threecomponent drug cocktails, however this number may be greater or lesser in the future. Based on the model (12), at n = 2 and m = 2, the SIDT(2) would increase 4-fold, which would be coming very close to curing at least some types of cancer (see Table 1), and improving ten year survival rates for lung, brain and pancreatic cancers by at least 30%. At n = 3 and m = 2, the SIDT (2) would increase 9-fold, making many now lethal types of cancer eradicated, producing >60% improvement of the survival rates after 10 years observation for many other malignant disorders. DISCUSSION A significant progress is achieved for treatment of the majority of cancer form, with the average survival rate practically doubling over the last 45 years (Table 3 compiled according to the UK data presented at (1). The rate of progress appears to be constant for most of individual sub-ranges of the plot, however a certain recent acceleration is observed. Likely, increase in the number of available treatments contributes to this improvement, although not as a single factor. Assuming that the level of investment, and, therefore, projected impacts correctly reflect the level of future revenues/sales both for the ‘successful” and for the “research” sets of targets and also assuming that the level of sales correctly reflects the benefit to society, one can argue that the top 10% of the successful targets, or just seven of them, produce 75% of all anti-cancer effects. The top impact target alone, EGFR, mediates 23% of total anti-cancer effect, while the second best target, ERBB2, mediates 22% of total effect. While from purely clinical point of view these numbers seem 491 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 DEXCON score is superior to typical t-test based evaluations of the significance of observed differential expression patterns, as it takes into consideration a consistency of evidence (27). It is important to note that microarray-derived features are capable of serving as predictors even when completely novel target candidates comes into the scope of study; hence, their value is higher than that of text-mining features. The text-mining features rely on pre-existing information regarding a potential action mechanism and perceived value of a target candidate. When a majority of scientific data is collected prior to the major information disseminating event, i.e. publishing of influential review and/or result of a clinical trial, and the “metoo” bias is minimized by pre-dissemination choice of cut-off, the literature mining features become a less-than-obvious predictor, although still inferior to the experimental measurements such as microarrays. It is obvious that the bibliometric aspect of the study may be improved by taking into account the impact factors of the journals, number of patents, relative sizes of each study, total amount of grant support etc. In this initial report, the bibliometric aspects were limited to the number of publications and to the rate of accumulation prior to the critical bias-producing events. Table 3: 10 year survival rates post-diagnosis based on the data collected in UK during 1971-2007 period, survival increments are computed over entire period. Cancer localizations Period of Diagnosis Survival increment % from 1971 to 2007 1971-1972 1980-1981 1990-1991 2007(2) Bladder Bowel Brain 34.6 52.2 50.3 48.9 +14.4 Colon 22.6 33.3 38.3 50.4 +27.8 Rectum (1) 23.9 29.9 33.4 49.3 +25.4 5.7 7.4 8.7 9.4 +2.9 Breast (Female) 38.9 49.3 61.1 77.0 +38.1 Cervix 48.4 52.6 59.5 63.0 +14.6 Hodgkin Lymphoma 49.0 57.8 68.7 77.9 +28.9 Kidney 22.2 27.1 31.7 43.5 +21.3 Larynx (Male) 50.5 56.9 55.1 59.6 +9.1 Leukemia 8.1 13.8 23.5 33.2 +25.2 Lung 3.2 3.9 3.7 5.3 +2.1 Malignant Melanoma 49.3 57.5 69.5 83.2 +32.1 Myeloma 5.3 8.8 9.3 17.1 +11.8 Non-Hodgkin Lymphoma (1) 21.8 29.7 35.0 50.8 +29 Oesophagus 3.6 4.5 4.6 10.0 +6.4 Ovary 18.0 21.9 25.3 35.4 +17.4 Pancreas 1.9 2.1 1.7 2.8 +0.9 Prostate 20.4 30.1 30.9 68.5 +48.1 Stomach 4.6 7.2 8.5 13.5 +8.9 Testis 67.4 84.7 91.7 96.5 +28.6 Uterus 55.2 62.9 64.4 74.5 +18.7 Other Cancers 34.0 31.4 30.1 36.3 +2.3 All Cancers 23.7 30.0 33.7 45.2 +21.5 492 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 approach it. The rationale behind our approach is that both the total number of publications and their accelerating deposition into the PubMed are proxies to research interest, which, in turn, correlates with the objective value of the target. The research interest fundamental also drives publishing in higher impact journals and determines awarded grant support, which is instrumental to perform studies in larger groups of animals or patients cohorts. Hence, introduction of additional bibliometric parameters will also introduce co-correlating variables. Relative weights for each of these parameters should be evaluated by experimenting in silico. On the other hand, target gene expression related features are a-priori independent of the bibliometrics and, therefore, more likely to add an input to the model. The linear classifier for YP’ was selected due to its robustness which aids in prevention of model overfitting. Since the training set was as small as ~ 30 successful anticancer targets, this precaution appears to be warranted, especially if the number of features would be increasing due to inclusion of other information sources, for example, the networks of biomolecules. The least square method was selected for regression modeling, with the ranking-based cutoffs for high and low real impact. High rank was defined as the highest quartile and the low rank was defined as two lowest quartiles. These cut-offs are somewhat arbitrary and, therefore, the results of the study are qualitative rather than quantitative. However, the proposed technique for estimating the commercial promise of still potential targets, which are costly to develop, is inexpensive, and, therefore, of value. In this study, selected cut-offs clearly separated the high and low impact groups of targets while preserving sufficient number of targets in each group and, by that, allowing for ROC plotting. To generate a high proxy impact Y, a target should demonstrate a consistent promise in multiple clinical trials. There is certainly an informational gap between early research interest and clinical trial results. An intriguing discovery of a novel pharmacological mechanism and its experimental confirmations at pre-clinical level may not even acknowledge possible inability of a target to acquire a non-toxic ligand, unfavorable patterns of expression or pharmacodynamics etc. From the point of information theory, a predictor is a function that contributes a quantity of information sufficient to measure the pattern of the future event or I2 = I1 + ΔI (11) Where I2 - is the final state, the completeness of information allows reliably describe the target pattern of the future; I1 - is the initial state, the fragmentary or zero initial information concerning the target pattern is insufficient ΔI – the predictor produces the increment of information rendering knowledge of the future pattern. Based on the formula (11), the microarray setting corresponds to I1 ~ 0 (little is known about any aspect of the target and its behavior prior to the experiment), while text-mining corresponds to I1 > 0 (substantial knowledge about the target and its expected behavior prior to computation of metrics). From these considerations it is apparent that a perfect microarray classifier that allows complete prediction of a future pattern produces a greater informational increment than an equally perfect text-mining classifier. At the same time, the contribution of the text-mining classifier is non-zero, unless the information gap between the present state and the future state is negligible. Thus, we argue that the proposed text-mining approach is at least partially objective and, therefore, provides an added value when coupled with the microarray data. Speaking generally, the fusion of orthogonal sets of features produces a greater summary ฀I to elucidate the final state more reliably than the component set of features in isolation. In that sense, the imperfect (biased) contribution of the textmining features is still useful, due to its informative component permitting to bridge the information gap in the equation sooner (11). The method of extracting text-mining features employed in this report is analogous to consensus forecasting used in economic modeling. In most of cases the expert consensus is correct, but historical record attests that it never should be applied in isolation (28-29). Thus, the text-mining derived features and microarray features act synergistically, supporting each other, and provide an integrated predictor. The current rates of attrition for the ligands and targets are discussed extensively (30, 31). As an example, Hutchinson et al reports an average attrition rate for the anti-cancer therapies as 95-96% (30). As the leads to the loss of all the costs accrued 493 J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 REFERENCES by the rejected ligands, the attrition of more advanced candidates is more damaging event. Implementation of targets evaluation by their potential for eventual success will lead to earlier elimination of some ligands off the development pipelines. The preference towards anti-cancer targets with the widest possible therapeutic window would contract the overall volume of the clinical trials. If we would treat a clinical trial as a test with a certain signal-to-noise ratio, we could apply known statistical observation that the size of the test is smaller if the signal-to-noise ratio is inherently higher. In other words, if the targets are selected on their favorable gene expression pattern with preferential expression in tumors rather that in normal cells, lesser toxicities are expected, and an enrollment of lower numbers of patients into dose escalation trials would be necessary. 1. 2. 3. 4. CONCLUSION 5. The main result of the report is demonstration that the promising behavior of a pharmacological target is predictable early based on their expression signatures and text-mining of the pattern of the early research interest. Considering a very divergent levels of promise displayed by the currently approved targets, we conclude that most of survival increment attributed to anticancer therapies in general is achieved via the high-impact targets. Improvement in predicting the targets with inherently wide therapeutic window may result in clinical trials stage savings and, eventually, in explosion of therapeutic opportunities that would benefit the entire society. 6. 7. 8. 9. CONFLICT OF INTEREST There is no conflict of interest at all stages of the manuscript creation 10. ACKNOWLEDGEMENTS 11. The authors acknowledge the diligent effort by Neeraj Prasad in data mining based on the author’s instructions and a proofread by Jerome Glasser. This study has been funded by Ministry of Science and Education, Russia (Project no. RFMEFI60714X0098). 12. 13. 494 Cancer survival for common cancers. [cited 10 August 2016]. Available from: http://www.cancerresearchuk.org/cancerinfo/cancerstats/survival/common-cancers/#Cancer. Esteva FJ, Hortobagyi GN. Integration of systemic chemotherapy in the management of primary breast cancer. The Oncologist. 1998 Oct 1;3(5):300-13. DeVita VT, Chu E. A history of cancer chemotherapy. Cancer research. 2008 Nov 1;68(21):8643-53. Goodman LS, Wintrobe MM, Dameshek W, Goodman MJ, Gilman A, McLennan MT. Nitrogen mustard therapy: Use of methyl-bis (betachloroethyl) amine hydrochloride and tris (betachloroethyl) amine hydrochloride for Hodgkin's disease, lymphosarcoma, leukemia and certain allied and miscellaneous disorders. Journal of the American Medical Association. 1946 Sep 21;132(3):126-32. Wright JC, Prigot A, Wright BP, Weintraub S, Wright LT. An Evaluation of Folic Acid Antagonists in Adults with Neoplastic Diseases. Journal of the National Medical Association. 1951 Jul;43(4):211. Nathanson L, Hall TC, Schilling A, Miller S. Concurrent combination chemotherapy of human solid tumors: experience with a three-drug regimen and review of the literature. Cancer Res. 1969 Feb;29(2):419-25. Vassilakopoulos TP, Angelopoulou MK. Advanced and relapsed/refractory Hodgkin lymphoma: what has been achieved during the last 50 years. Semin Hematol. 2013;50(1):4-14. Wilson PM, Danenberg PV, Johnston PG, Lenz HJ, Ladner RD. Standing the test of time: targeting thymidylate biosynthesis in cancer therapy. Nat Rev Clin Oncol. 2014;11(5):282-98. doi: 10.1038/nrclinonc.2014.51. LoRusso PM. Phase 0 clinical trials: an answer to drug development stagnation? Journal of clinical oncology. 2009;27(16):2586-8. Stewart DJ, Batist G. Redefining cancer: a new paradigm for better and faster treatment innovation. J Popul Ther Clin Pharmacol. 2014;21(1):e56-65. Juliano RL. Pharmaceutical innovation and public policy: The case for a new strategy for drug discovery and development. Science and Public Policy. 2013: scs125. FDA's shorter approval time for new drugs raises questions. [cited 10 August 2016]. Available from http://www.nbcnews.com/health/fdas-shorterapproval-time-new-drugs-raises-questions8C11484613. Zhang D, Surapaneni S, editors. ADME-enabling technologies in drug design and development. John Wiley & Sons, Incorporated; 2012 Apr 13. J Pharm Pharm Sci (www.cspsCanada.org) 19(4) 475 - 495, 2016 14. Zhou W, Wang Y, Lu A, Zhang G. Systems Pharmacology in Small Molecular Drug Discovery. Int J Mol Sci. 2016 Feb 18;17(2):246. doi: 10.3390/ijms17020246. 15. Manis JP. Knock out, knock in, knock down— genetically manipulated mice and the Nobel Prize. New England Journal of Medicine. 2007 Dec 13;357(24):2426-9. 16. The Cost Of Creating A New Drug Now $5 Billion, Pushing Big Pharma To Change. [cited 10 October 2014]. Available from: http://www.forbes.com/sites/matthewherper/2013/08 /11/how-the-staggering-cost-of-inventing-newdrugs-is-shaping-the-future-of-medicine/ 17. Ma'ayan A, Jenkins SL, Goldfarb J, Iyengar R. Network analysis of FDA approved drugs and their targets. Mount Sinai Journal of Medicine: A Journal of Translational and Personalized Medicine. 2007 Apr 1;74(1):27-32. 18. Zhu F, Han L, Zheng C, Xie B, Tammi MT, Yang S, Wei Y, Chen Y. What are next generation innovative therapeutic targets? Clues from genetic, structural, physicochemical, and systems profiles of successful targets. Journal of Pharmacology and Experimental Therapeutics. 2009 Jul 1;330(1):304-15. 19. Xu H, Fang Y, Yao L, Chen Y, Chen X. Does Drugtarget Have a Likeness? Target. 2007;3000:49. 20. Sakharkar MK, Li P, Zhong Z, Sakharkar KR. Quantitative analysis on the characteristics of targets with FDA approved drugs. Int J Biol Sci. 2008 Jan 1;4(1):15-22. 21. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, Hamon J, Urban L, Whitebread S, Jenkins JL. Analysis of pharmacology data and the prediction of adverse drug reactions and off‐target effects from chemical structure. ChemMedChem. 2007 Jun 11;2(6):861-73. 22. Stinchcombe TE, Borghaei H, Barker SS, Treat JA, 23. 24. 25. 26. 27. 28. 29. 30. 31. Obasaju C. Pemetrexed With Platinum Combination as a Backbone for Targeted Therapy in Non-SmallCell Lung Cancer. Clin Lung Cancer. 2016; 17(1):19. doi: 10.1016/j.cllc.2015.07.002. Singla S, Pippin JA, Drebin JA. Dual ErbB1 and ErbB2 receptor tyrosine kinase inhibition exerts synergistic effect with conventional chemotherapy in pancreatic cancer. Oncology reports. 2012 Dec 1;28(6):2211-6. Hong Y, Chen S. Aromatase, estrone sulfatase, and 17β-hydroxysteroid dehydrogenase: structurefunction studies and inhibitor development. Mol Cell Endocrinol. 2011 Jul 4;340(2):120-6. Numazawa M, Tominaga T, Watari Y, Tada Y. Inhibition of estrone sulfatase by aromatase inhibitor-based estrogen 3-sulfamates. Steroids. 2006 May;71(5):371-9. Aidoo-Gyamfi K, Cartledge T, Shah K, Ahmed S. Estrone sulfatase and its inhibitors. Anticancer Agents Med Chem. 2009 Jul;9(6):599-612. Mayburd A, Baranova A. Knowledge-based compact disease models identify new molecular players contributing to early-stage Alzheimer’s disease. BMC systems biology. 2013 Nov 7;7(1):1. Batchelor R. The IMF and OECD versus consensus forecasts. City University Business School, London, August. 2000 Aug. http://www.consensuseconomics.com/ Golinelli R, Parigi G. Real-time squared: A real-time data set for real-time GDP forecasting. International Journal of Forecasting. 2008 Sep 30;24(3):368-85. Hutchinson L, Kirk R. High drug attrition rates— where are we going wrong? Nature Reviews Clinical Oncology. 2011 Apr 1;8(4):189-90. Stifling New Cures: The True Cost of Lengthy Clinical Drug Trials [cited 10 October 2014] Available from: http://www.manhattaninstitute.org/html/fda_05.htm Supplemental Table 1. Cost structure for different stages in drug development process. From A. Roy “Stifling new cures: the true cost of lengthy clinical trials, FDA Project reports, V. 5, 2012 Function Share of Total (expenses) Probability of FDA approval Reciprocal of approval probability (tested ligands per a marketed ligand) Preclinical 28% 8% 12.5 Phase I 9.2% 21% 4.9 Phase II 17.4% 28% 3.8 Phase III 39.8% 58% 1.6 Approval 5% 90% 1.1 495

RELATED PAPERS

RELATED TOPICS

Log In

Predicting High-Impact Pharmacological Targets by Integrating Transcriptome and Text-Mining Features

Predicting High-Impact Pharmacological Targets by Integrating Transcriptome and Text-Mining Features

Related Papers

RELATED PAPERS

RELATED TOPICS