Note for the youtube lecture: Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

- May 30, 2020

Link: https://youtu.be/uoVAd_zd-90

Drug

1. Biological entity -- biologic

2. Chemical based drugs – synthetic drugs, natural product, small molecule

Drug discovery – for one particular drug

- 10-15 years

- Failure rate -- >90%

- Cost ~2 billion USD

Drug discovery process (million to one compound and it could fail!!!)

1. Identified target; ~30,000 protein (not include PTM processes)

2. Screen for the hit compound – molecule disrupted the activity of particular protein

3. Optimization of the hit compounds – called medicinal chemistry, scaffold hopping, bioisostere, structure-activity relationship – getting the potential compound (potent)

4. ADMET (balance between potency and toxicity)

Computational drug discovery

- Green chemistry; safe way to generate the compound, environment safe as well as less processes (meaning less chemical wastes)

How do we seek for the compounds

- Nature resources

- Computational approach – training computer to learn organic reaction; GDB 13, 17 databases

- Known compounds; PubChem, ChEMBL

Drug discovery toolbox

Combinatorial chemistry

- Scaffold

- Functional group

Chemical libraries

Chemical space – diversity of compounds, approximately 10^60 molecules which <500 Da

HTS

Property filters

Computational chemistry

Machine learning

QSAR

Proteochemometrics

Molecular modeling

Molecular dynamics

Molecular docking

Computational model in drug discovery

- Linking the chemical library to bioactivity

- Training the computer to learn

- By using this approach;

o Using as a guideline to generate a good potency

- Chemist generates many compounds

o Which proteins could be bound to the particular compounds

o Off-target

o Similar compounds have the similar binding

o How do the compound bind to the protein

Quantum chemistry;

- Translate the distribution of electron into quantitative manner

- Thus, ligand-based drug design is feasible

Fragmented –based drug design

- 13-heavy atoms – get fragment

- Fragment-fragment – larger compounds – more diverse

Lipinski’s rule of 5

- MW <500 Da, <5 hydrogen bond donors, <10 hydrogen bond acceptors, partition coefficient <5 [PubMed:11259830]

- Collect 2000 FDA

- Orally drug

- Analyze the data – come up with common properties (safer chemical profiles for human uses)

- How about the chemical which passing the rule but having the toxicity?

Lead-like rule of 3

- Compound should be <300 Da

- Lead should be small as much as possible due to during the modification step, more molecule will be added

Biological space

- List of proteins which are druggable

- Small molecules which are active on specific targets

- AA sequence is not random

Structural classification of natural products

- Arranges the scaffolds of the natural products in tree-like fashion

- Providing a viable analysis- and hypothesis-generating tool for the design of natural-derived compound collections

Chemical space>Biological space

- Privilege substructures – substructures that are present in many drugs, and predisposed to bioactivity.

Polypharmacology

- One drug -- > multiple targets

- Basic idea -- > using fragment to target 1 and fragment to target 2 -- > links those 2 fragments -- > but we have to make sure it fits the previous suggested rules (rule of 3 and 5) -- > we have to check whether it is feasible to synthesize

QSAR

- Seeking the relationship between structure and activity

- Chemical structure (functional group) – Activity (biological property, IC50, EC50, Ki, Km, MIC)

- Multiple linear regression – Y (biological activity) = F(energy, Qm, dipole moment….)

- Regression coefficient – informing whether particular feature (factor, etc., energy, dipole moment) has more or less effect on the bioactivity (dependence factors)

- Application -- > we can predict the biological properties

QSAR vs Proteochemometrics

- QSAR

o Multiple chemical compounds -- > single target proteins

o The prediction has the confidence score, the tested compound is compared with the training set – higher similarity -- > more confidence to be correct if less similarity with the training set -- > less confidence

- Proeochemometrics

o Multiple compounds -- > Multiple target proteins -- > we can do the drug repurposing

o Just like doing meta-analysis, observing the relationship between many factors and many results

o We can study selectivity of particular compound to many many proteins

o This approach we can find the orphan receptor

Search This Blog

Random Records

Note for the youtube lecture: Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

Comments

Post a Comment

Most viewed blogs

Useful links (updated: 2026-01-29)

Genome editing technology short note

Umbrella vs Basket Trial