An Overview of Drug Discovery and Development: From Disease to Medicine

A drug discovery program is often initiated because no medicines are available for a given condition or because existing medicines have efficacy or safety limitations, or both. The drug discovery process is a systematic scientific endeavor and, as such, rigorously applies the principles of scientific research. The discovery and development of new medicines are complex processes involving distinct steps and a variety of scientific disciplines. From identifying a pressing medical need to treating patients in the clinic, the typical drug discovery and the subsequent development processes can be summarized as follows.

Disease Prioritization for Drug Discovery and Development

In principle, diseases with the highest pressing unmet medical need ought to be of the highest priority for the discovery and development of new medicines. In practice, however, there are many factors that define which diseases are prioritized. There are several important reasons why a disease condition with a pressing unmet medical need might not be of highest priority for drug discovery and development. First, for complex diseases with complicated and unclear mechanisms, there simply might not be a clear path for therapeutic intervention. Further, as outlined in the following sections, the availability of a relevant disease model is paramount for drug discovery and development, and the lack of such relevant models for a particular disease condition precludes the discovery of new medicines. Another important reason for a particular disease to not be of high priority is that potential drug targets might have a theoretical connection to a disease but without any strong experimental evidence for such a connection there might not be a clear path forward. Because the discovery and development of new drugs are highly capital-intensive processes that take a lot of time, there is often a compromise between what diseases should be addressed in principle and what diseases can be addressed in practice.

Target Identification

The drug discovery process often begins with the identification of a disease condition that can be addressed via intervention. The disease condition is often related to a specific biological target, usually a protein or a nucleic acid. There are two main groups of approaches to identify targets. In target-based approaches, scientists first analyze the disease and the underlying mechanisms and uncover potential targets that could be manipulated for treatment. To this end, scientists often rely on data mining through scientific literature and databases, comparing RNA and protein expression patterns in healthy and diseased states, or looking for associations between genetic differences and disease risk and progression. Target-based approaches often rely on biochemical assays but cell-based or even tissue-based assays can also be used. In phenotype-based approaches (sometimes referred to as “phenotypic screening approaches”), a large number of chemical compounds are tested in a relevant disease model, e.g., a cell line, to identify compounds with desired therapeutic effects. These compounds hold the potential to become promising medicines, and the biological target is identified later. In a hybrid approach, systematic silencing of gene expression in a relevant disease model and subsequent phenotypic readout can identify potential targets.

What makes a good drug target? A drug target in generally considered promising when several criteria are met. There is strong evidence that the target is directly related to the disease condition and is involved in the disease mechanism. Further, the target is either enriched in the diseased tissue or organ (non-uniform body distribution) or modulation of the target’s function in healthy tissues or organs is relatively inconsequential. The target can be modulated and be subjected to high-throughput screening. Ideally, there are good disease models (e.g., human cell lines, tissues, or animal models) and biomarkers that could inform on efficacy. Finally, there are no obvious potential safety concerns and the intellectual property landscape indicates freedom to operate.

Target Validation

During the process of target validation drug discovery scientists look for strong evidence that the potential biological target is highly relevant to the disease and that modulating it will have the desired therapeutic outcome: the stronger the evidence for a direct involvement of the potential target, the higher the likelihood that modulating the target will lead to clinical efficacy. Because the target needs to be validated experimentally, the availability of physiologically-relevant disease models is paramount. Further, these models also need to be predictive of human physiology. To validate potential targets, scientist often apply genetic approaches to manipulate the abundance of the target in a relevant disease model with the desired outcome being an impact on the disease’s phenotype. Ideally, manipulating the target’s abundance in the disease model does not cause any toxicity. Successful target validation requires a mechanistic understanding of the disease and the specific function of the target. The importance of effective target identification and validation is underscored by the observation that most drug candidates fail during clinical trials because of lack of efficacy and safety. The more diverse the approaches used for target validation, the higher the confidence in the relevance of the target. Further, target validation often continues through the discovery process as the increasing amount of experimental data can further increase the confidence in the relevance of the target.

What is clinical efficacy? A drug’s clinical efficacy, or efficacy for short, refers to the ability of a drug to produce the desired therapeutic effect or beneficial outcome. It assesses whether a drug is effective in treating a specific disease condition, typically based on evidence from clinical trials and real-world patient experiences. Efficacy is a fundamental concept in pharmacology and drug development, and it plays a central role in assessing the potential of a candidate drug to improve patient outcomes and quality of life.

Hit Discovery

Once a target is validated, drug discovery scientists begin the search for chemical compounds that can interact with the validated target and potentially alter its function in a way that directly impacts the disease mechanism. In essence, scientists are looking for a starting point for the development of drug candidates. There are two main approaches for hit discovery:

  • Physical high-throughput screening (HTS). A physical HTS requires establishing a robust screening experimental assay which allows to identify compounds that interact with the target and have the potential to interfere with the disease mechanism. HTS assays are typically designed to experimentally evaluate thousands to hundreds of thousands of compounds. The need for physical HTS assays in drug discovery is exactly why a good drug target should be amenable to HTS screening.

  • Virtual HTS or virtual screening (VS) for short. VS is an in silico approach in which the chemical structures of hundreds of thousands to millions of compounds are overlaid on a structural model of the target and the potential interaction between compounds and the target are evaluated computationally. Compounds predicted to strongly interact with the target are classified as hits. VS requires a high-quality structural model of the target because model quality directly impacts the likelihood of accurately identifying target-interacting compounds.

Each approach has advantages and disadvantages and, in practice, both approaches are often used in parallel to increase the likelihood of identifying potential hits.

Hit Validation

In the same way target validation is essential to confirm target relevance to a particular disease condition and select the most promising biological targets for drug discovery, hit validation is critical to select the most promising hits. The main objective of hit validation is to sift through the apparent hits and distinguish real hits from false positives. This is because sometimes certain compounds may appear promising in a specific assay due to peculiarities in the assay itself and not because they genuinely work. Regardless of the nature of HTS approach used for hit discovery, apparent hits need to be validated experimentally which requires working with actual physical samples. Typically, hit validation is done using an experimental assay different than the experimental assay used in hit identification. These aren't just double-checks; they are like having multiple pairs of expert eyes scrutinizing each compound: different assay types are unlikely to yield the same false positives. This approach doesn't only help drug discovery scientists weed out the false positives but also minimizes false negatives, which occur when potentially useful compounds don't show up as hits because of peculiarities in the assay. Using a different, orthogonal assay increases the likelihood of these compounds showing up. Overall, hit validation can be regarded as a type of quality control, making sure that only the most promising compounds move forward.

From Initial Hits to Preclinical Candidates

Chemical compounds with activity at the target often need to undergo extensive modifications before their properties can meet the stringent requirements to be considered preclinical candidates. The initial hits go through several overlapping phases: Hit optimization, Hit-to-lead, and Lead optimization. The different stages are generally concerned with optimizing various properties.

First, hit compounds undergo optimization cycles in which analogs are synthesized by making small structural changes. The new analogs are then tested for activity against the target. If the structural changes improve the compound’s activity, these changes are kept and additional changes are introduced before another cycle of activity evaluation. If the changes do not improve the compound, then the changes are discarded and new changes are introduced. During these optimizations, drug discovery scientists use structure-activity relationships to relate changes in chemical structure to changes in activity.

What is structure activity relationship (SAR)? SAR refers to the connection between the chemical structure of a compound and its biological activity. Scientists investigate these relationships to optimize and design new compounds by tweaking their structures to enhance desired effects or minimize undesired properties. Essentially, SAR serves as a crucial tool, guiding drug discovery scientists in crafting better drug candidates based on the interplay between molecular structure and biological function.

Identifying a potent compound with activity against a target of interest is the beginning of a series of optimizations to balance the different properties required for the hit to become a lead and eventually a successful drug candidate. These optimizations include improving compound’s potency, selectivity, metabolic and chemical stability, bioavailability, permeability, and solubility among others. These optimizations, together with structure-activity relationships are embedded in screening cascades in which compounds need to satisfy particular requirements for a given property to pass to the following stage where a different property is optimized. Screening cascades often use primary and secondary assays, with primary assays often being biochemical assays based on recombinant proteins, while secondary assays are most often cell-based assays and provide a more complete and physiologically-relevant context for evaluating compound’s properties.

What is drug potency? Drug potency refers to the strength or effectiveness of a drug in producing a specific biological response or desired effect. It quantifies the drug's ability to exert its intended therapeutic effect at a given concentration. A more potent drug achieves its desired outcome at lower concentrations, which also helps minimize potential side effects. Assessing potency involves evaluating the drug's activity against a biological target in relevant assays. This information is pivotal in selecting lead compounds during drug development, helping scientists focus on candidates with the highest therapeutic potential while considering safety and efficacy.

What is a screening  cascade? A screening cascade is a logical process designed to sift through numerous compounds systematically, aiming to identify compounds with the higher chances of becoming drugs. These cascades incorporate "gates", which act as checkpoints representing specific assays or tests. These gates assess various aspects of a compound, such as its interaction with a target, safety, and efficacy. The gates serve as selective filters, allowing only compounds meeting predefined criteria to progress further in the drug development process. This method enhances the efficiency of drug discovery by prioritizing compounds with the highest potential for success, streamlining the selection process and optimizing resources.

Preclinical Testing

Preclinical testing refers to the last stages that occur before a potential drug candidate progresses to human clinical trials. The last stages of most drug discovery programs are usually a thorough evaluation of the potency and safety of any compounds intended for human (clinical) trials. For a drug candidate to be advanced to clinical trials, an Investigational New Drug (IND) application has to be submitted to and approved by a relevant agency (The Food and Drug Administration (FDA) in the United States). In addition to showing adequate efficacy and safety profiles in appropriate in vivo models, the IND application also contains information about the production methods that will be used to manufacture a pharmaceutical grade quality active pharmaceutical ingredient (API) – the drug – as well as detailed plans about the clinical trials.   

Clinical Trials

A pivotal phase in drug development, clinical trials unfold in four progressive stages.

In Phase I, the paramount focus revolves around establishing that the drug candidate is safe to be used for clinical trials. Typically involving 20-100 subjects, phase I often spans several months, with subjects carefully monitored for adverse effects. Further, phase I studies aim to determine the maximum tolerable dose (MTD), the optimal dose, and dose-limiting toxicities (DTL), parameters crucial for potential subsequent studies. While primarily conducted on healthy volunteers, Phase I studies can offer limited early indications about clinical efficacy through the use of biomarkers. If phase I studies demonstrate an adequate safety profile, the drug candidate can progress to phase II studies.

Moving to phase II, the spotlight shifts to assessing the clinical efficacy of compounds on patients with the target disease, and their potential to improve patients’ disease condition. This phase, typically lasting several months to a few years, often involves 100-300 subjects. Because the objective of phase II studies is to also determine the dose to use in subsequent phase 3 studies, if phase II studies are successful, patients are typically divided into multiple cohorts with different doses used for treatment to define an efficacious dose. During phase II studies, safety continues to be monitored.

Phase III, the most resource-intensive and lengthy stage, typically spans years. With a larger population, often in the range of 300-3,000 subjects, the primary objective is to confirm safety and efficacy on a broader scale. Various study designs, such as superiority or equivalency trials, evaluate the drug against existing standards. This phase is critical as successful phase III studies are a prerequisite for regulatory approval.

Regulatory Approval. Regulatory bodies, such as the FDA in the United States and the European Medicines Agency (EMA) in Europe, play a central role in the drug development process. They meticulously assess all data from preclinical and clinical trials to determine whether a drug candidate is safe and effective for broad use.

After a successful phase III trial a new drug application can be prepared and submitted to the relevant regulatory administration, the FDA in the United States, for review and regulatory approval. Phase IV trials are designed to monitor the drug's safety and effectiveness in a larger and more heterogeneous population after market entry. Its primary goals include detecting rare or long-term adverse effects and comparing the drug candidate with the standard of care. Pharmacoeconomic studies, which assess the cost-effectiveness and economic impact of medicines, are also part of Phase IV studies, ensuring a comprehensive evaluation of the drug's post-marketing performance.

Machine Learning (ML) and Artificial Intelligence (AI) in Drug Discovery and Development

ML/AI hold the potential for a deep and broad impact across various phases of drug discovery and development. In the early stages, ML/AI can facilitate target identification, by helping drug discovery scientists identify potential drug targets more efficiently by analyzing large datasets. In hit optimization, ML/AI can be deployed to optimize various parameters simultaneously instead of sequentially. In the preclinical phase, ML/AI can dramatically accelerate the analysis of vast datasets to predict drug interactions, toxicity, and efficacy, thus streamlining drug candidate selection. During clinical trials, ML/AI can enhance patient recruitment through predictive analytics, help design more effective clinical trials, and can be deployed to monitor and manage trial data, improving overall efficiency. Finally, ML/AI can contribute to post-marketing surveillance, by continuously assessing real-world data to identify potential long-term effects and inform regulatory decisions.

Overall, ML/AI can accelerate all stages of the drug discovery and development process, reduce costs, and increase the precision and success rates of discovering and developing new medicines. As ML/AI methods are increasingly used in drug discovery and development, it is essential that ML/AI is used responsibly. Use of ML/AI has to adhere to the highest standards of confidentiality and security when it comes to sensitive patient information. There should be a clear understanding of how the ML/AI algorithms work. Ethical principles need to be incorporated in ML/AI models to prevent biases and address societal concerns. Validation, verification, and continuous monitoring via human oversight is also essential to ensure that ML/AI models used are reliable and accurate, and that potential biases are detected and eliminated prior to decision-making.

References and Further Reading

  1. Arrowsmith, J., & Miller, P. (2013). Phase II and Phase III attrition rates 2011–2012. Nature Reviews Drug Discovery.

  2. Ha, J., Park, H., Park, J., & Park, S. B. (2021). Recent advances in identifying protein targets in drug discovery. Cell Chemical Biology.

  3. Gashaw, I., Ellinghaus, P., Sommer, A., Asadullah, K. (2012). What makes a good drug target? Drug Discovery Today.

  4. Hughes, J. P., Rees, S., Kalindjian, S. B., Philpott, K. L. (2011). Principles of early drug discovery. British Journal of Pharmacology.

  5. Paulz, D., Sanapz, G., Shenoyz, S., Kalyane, D., Kalia, K., & Tekade, R. K. (2020). Artificial intelligence in drug discovery and development. Drug Discovery Today.

  6. United States Food and Drug Administration, The drug development process