Harnessing Artificial Intelligence, Machine Learning, and Natural Language Processing for overcoming Clinical Trial Challenges

iNDX.Ai is a Silicon Valley based company that has developed a broad-capability analytical platform that could be a panacea for overcoming some of the most common, time-consuming, and costly clinical research problems

Defining the problem

Clinical Research is a highly challengingcostly, and risky endeavor. From its pre-clinical inception to the actual market launch, a drug requires – on average of 7-12 years of intense pre-clinical and clinical research, an investment of about ~ $2.6 billion, and successful navigation through several regulatory gauntlets. The stats are even more dismal for Oncology. According to a collaborative study by Biotechnology Innovation Organization (BIO), Biomedracker, and Amplion which reports the approval rates for drugs across 14 major disease areas, the likelihood of approval (LOA) for Oncology drugs is an abysmal 5.1%. The success rates for solid tumors are even worse, with LOAs struggling to reach half this value. Clinical research pundits suggest that the main reasons for such a high failure rate are:

  1. Poor management of clinical trial data.
  2. Inadequate patient recruitment and retention.
  3. Pharmacogenomic inadequacies in stratifying patients according to validated biomarkers.

We describe here common problems that plaque clinical trials and briefly describe an innovative tech platform developed by us to solve them.

Poor management of clinical trial data

Management and data integrity are the foundation of successful research and clinical trials. According to an analysis by IBM conducted in 2016, the cost of poor data on the U.S. economy is more than 3.1 trillion dollars per annum. Loss of data integrity in clinical trialsis all too common. In order to conduct a proper analysis on the clinical trial data, the data has to reliable, complete, relevant, error-free, scalable, flexible, and organized in a manner that enables consistent processing each time data is analyzed.

Since a large majority of clinical trials are conducted in medical centers located across the globe, it becomes challenging to organize and assemble data according to a unified classification scheme. This also generates the need for multiple iterations of data entry, downloads, uploads, and analysis within variable clinical trial management platforms globally dispersed across many medical centers. This can lead to mistakes in labeling, tracking, storage, analysis, and interpretation of clinical data – costing sponsors thousands of wasted man-hours and research dollars. Inherent patient bias and clinical subjectivity in interpretation of medical data from case report forms (CRFs) can also compromise the quality of analysis that can be conducted. Mid-study changes in protocolspatient dropouts, switching of biospecimen vendors, transfer of ongoing trials to different clinical research organizations (CROs), and incompatibilities between software platforms used at medical centers can compound these problems and can introduce long delays in the process.

As the target lead or a therapeutic combination moves from preclinical testing to more advanced phases of clinical trials, investigators invariably generate increasingly large and complex data sets. The depth and breadth of this information can get infinitely complex as multiple data points start pouring in from several sources that encompass genomic, proteomics, cellular, imaging, and clinical sources. Although the disparate and sequential analysis is fairly common on such data sets, the inherent genetic heterogeneity within patient populations limits complex molecular analysis across different patient demographics. This leads to a loss of precious opportunities that can be used to analyze the data and derive useful trends for guiding clinical trial design. Bioinformatics platforms enabling superimposition of clinical data on “multiomics” data require immense storage and processing capabilities. This stems from the highly dynamic nature of clinical trial data which continuously adapts to vagaries of enrollments (and dropouts) of thousands of patients, frequent changes in study protocols, and the constant influx of “omics” and safety data. This can complicate analysis until all data points are acquired. Since pharmacogenomic guiding principles can improve the stratification of patients in subsequent stages of the study protocol, the need for real-time analysis on such dynamic data becomes important.

Inadequate patient recruitment and retention

Meeting patient recruitment and retention standards for clinical trials presents a major hurdle that must be overcome to ensure the success of clinical trials. Almost 25% of cancer trials fail to recruit an adequate number of patients and 18% of trials are shut down within 3 years because they fail to enroll enough patients. A patient recruitment report by Cognizantshowed that almost 30% of Phase III clinical study closeouts happen due to enrollment difficulties. These dismal stats highlight one of the most ironical problems that exist in clinical research. On one hand, the patients are actively scouting for enrollment criteria, and on the other spectrum clinical investigators and pharma companies struggle to recruit enough patients that can allow them to conduct a powerful statistical analysis.

It is estimated that the pharmaceuticals industry spends about $5.9 billion in recruitment expenses alone. Artificial intelligence (AI)-based strategies are now being touted as an elegant solution to these problems. A prime example showcasing the success of this approach in solving patient recruitment problem is the collaboration between Mayo Clinic and IBM Watson Health that resulted in the development of cognitive computing system. The partnership led to an 80% increase in enrollment into breast cancer trials within a year its implementation.

Pharmacogenomic inadequacies in stratifying patients

The recent first-of-its-kind approval of Merck’s flagship drug Keyturda (Pembrolizumab) across all microsatellite instability-high (MSI-H) solid tumors highlights the value of pharmacogenomic targeting. Traditionally clinical trials were classified based on type, stage, and aggressiveness of cancer. However, the high-throughput “omics” advances in the last decade have brought pharmacogenomic classification of patients at the forefront enabling highly tailored and efficacious therapies without the risk for unacceptable adverse events.

However, in order to derive useful and therapeutically relevant predictive biomarkers, it is important not only important to analyze smaller demographics of patients, but also crucial to develop capabilities that enable us to view a single patient’s genetic and molecular signatures. This requires “single-patient”, accurate, and reliable identification of “omic” alterations that can influence response rate to cancer treatments being evaluated during different phases of clinical research.

The Solution: iNDX.Ai iCore platform

So how can we solve the problems? Recent success from some artificial intelligence and machine learning powered studies by some groups indicates a tremendous potential of this technology. The technology is being increasingly employed by sponsors to track patient compliance through biological trackers, match patients with clinical trials based on their electronic health records (EHRs), and mine unique and unexplored genetic signatures that enable early and accurate pharmacogenomic stratification of patients with limited enrollment options.

Recognizing the major barriers within the clinical trial process, iNDX.Ai has developed a bioinformatics platform that utilizes state-of-the-art machine learningartificial intelligence, and natural language processing capabilities. This empowers clinical trial stakeholders in tracking, viewing, managing, and analyzing data within a single cloud-based platform. The platform enables investigators to conduct complex and cross-functional interrogations on “omic” and clinical data. Since data is managed on a unified and “endless” cloud-based platform, these tools:

  • Seamlessly overcome several geographical, logistical, and storage constraints; provide freedom from hardware/software upgrades; and enable real-time gathering and sharing of data among all stakeholders.
  • Enables rapid integration, organization, sharing, compliance, and correlative analysis of a broad set of data generated during clinical and translational studies encompassing data-points from whole exome sequencing (WES), RNA-Seq, Nanostring, T-cell receptor (TCR) Seq, qPCR, Flow Cytometry, Immunohistochemistry (IHC), cytokine profiles, Radiology (including images), multiplex ELISA, LC/MS, and clinical endpoints from electronic data captures (EDC)/Argus

In this manner, iCore platform of iNDX.Ai allows the generation of therapeutic and patient stratification insights based on a rational correlative analysis – overcoming some of the most difficult, time-consuming, and costly clinical research constraints.


Vinayak Khattar, Consultant, iNDX.Ai 

Mohan Uttarwar, CEO, iNDX.Ai 

Dr. Gowhar Shafi, VP Translational Research & Computational Biology, iNDX.Ai 

About us

Based in Silicon Valley, we are a highly motivated team of Entrepreneurs, Doctors, Engineers and Data Scientists with the mission to develop innovative software products, help our collaborators find a Cure-for-Cancer and make a societal impact, globally.

Contact us


Your email address will not be published. Required fields are marked *

eight + sixteen =