Note for the youtube lecture: Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Note for the youtube lecture: Computational Drug Discovery:
Machine Learning for Making Sense of Big Data in Drug Discovery
Link: https://youtu.be/uoVAd_zd-90
Drug
1.
Biological entity -- biologic
2.
Chemical based drugs – synthetic drugs, natural
product, small molecule
Drug discovery – for one particular drug
-
10-15 years
-
Failure rate -- >90%
-
Cost ~2 billion USD
Drug discovery process (million to one compound and it
could fail!!!)
1.
Identified target; ~30,000 protein (not include
PTM processes)
2.
Screen for the hit compound – molecule disrupted
the activity of particular protein
3.
Optimization of the hit compounds – called
medicinal chemistry, scaffold hopping, bioisostere, structure-activity
relationship – getting the potential compound (potent)
4.
ADMET (balance between potency and toxicity)
Computational drug discovery
-
Green chemistry; safe way to generate the
compound, environment safe as well as less processes (meaning less chemical
wastes)
How do we seek for the compounds
-
Nature resources
-
Computational approach – training computer to
learn organic reaction; GDB 13, 17
databases
-
Known compounds; PubChem, ChEMBL
Drug discovery toolbox
Combinatorial chemistry
-
Scaffold
-
Functional group
Chemical libraries
Chemical space – diversity of compounds, approximately 10^60
molecules which <500 Da
HTS
Property filters
Computational chemistry
Machine learning
QSAR
Proteochemometrics
Molecular modeling
Molecular dynamics
Molecular docking
Computational model in drug discovery
-
Linking the chemical library to bioactivity
-
Training the computer to learn
-
By using this approach;
o
Using as a guideline to generate a good potency
-
Chemist generates many compounds
o
Which proteins could be bound to the particular
compounds
o
Off-target
o
Similar compounds have the similar binding
o
How do the compound bind to the protein
Quantum chemistry;
-
Translate the distribution of electron into
quantitative manner
-
Thus, ligand-based drug design is feasible
Fragmented –based drug design
-
13-heavy atoms – get fragment
-
Fragment-fragment – larger compounds – more
diverse
-
MW <500 Da, <5 hydrogen bond donors,
<10 hydrogen bond acceptors, partition coefficient <5 [PubMed:11259830]
-
Collect 2000 FDA
-
Orally drug
-
Analyze the data – come up with common properties
(safer chemical profiles for human uses)
-
How about the chemical which passing the rule
but having the toxicity?
-
Compound should be <300 Da
-
Lead should be small as much as possible due to
during the modification step, more molecule will be added
-
List of proteins which are druggable
-
Small molecules which are active on specific
targets
-
AA sequence is not random
Structural
classification of natural products
-
Arranges the scaffolds of the natural products
in tree-like fashion
-
Providing a viable analysis- and
hypothesis-generating tool for the design of natural-derived compound
collections
Chemical space>Biological space
-
Privilege
substructures – substructures that are present in many drugs, and predisposed
to bioactivity.
Polypharmacology
-
One drug -- > multiple targets
-
Basic idea -- > using fragment to target 1
and fragment to target 2 -- > links those 2 fragments -- > but we have to
make sure it fits the previous suggested rules (rule of 3 and 5) -- > we
have to check whether it is feasible to synthesize
QSAR
-
Seeking the relationship between structure and
activity
-
Chemical structure (functional group) – Activity
(biological property, IC50, EC50, Ki, Km, MIC)
-
Multiple linear regression – Y (biological
activity) = F(energy, Qm, dipole moment….)
-
Regression coefficient – informing whether particular
feature (factor, etc., energy, dipole moment) has more or less effect on the
bioactivity (dependence factors)
-
Application -- > we can predict the
biological properties
QSAR vs Proteochemometrics
-
QSAR
o
Multiple chemical compounds -- > single
target proteins
o
The prediction has the confidence score, the
tested compound is compared with the training set – higher similarity -- >
more confidence to be correct if less similarity with the training set -- > less
confidence
-
Proeochemometrics
o
Multiple compounds -- > Multiple target
proteins -- > we can do the drug repurposing
o
Just like doing meta-analysis, observing the
relationship between many factors and many results
o
We can study selectivity of particular compound
to many many proteins
o
This approach we can find the orphan receptor
Comments
Post a Comment