Ligand-Protein Docking
Introduction
Ligand-protein docking methods attempt to identify optimal positions, orientations and conformations of a ligand or small molecule with respect to a given protein receptor or enzyme. InhibOx offers extensive expertise and a range of solutions in ligand-protein docking.
AutoDock
AutoDock, one of the most highly-cited ligand-protein docking programs in the literature, treats the ligand and optionally, sidechains in the protein, as flexible. This permits these parts of the molecules with rotatable bonds to change conformation during the docking. The combinatorial explosion in conformational flexibility with increasing numbers of rotatable bonds means that flexible docking is an NP-complete problem, so in practice, ligands with up to about 8 rotatable bonds can be reasonably expected to dock successfully using the Lamarckian Genetic Algorithm. This limit covers a large proportion of known drugs, and most lead-like small molecules.
AutoDock can been used to identify the binding sites of ligands without any a priori knowledge of the active site: there are several examples in the literature of its use for "blind docking".
The great advantage of ligand-protein docking methods is that they can propose structural hypotheses for how a given small molecule may interact with its target macromolecule, something that ligand-based virtual screening methods cannot do. Indeed, a recent perspective on molecular shape by Nicholls et al. points out that while the shape of an active ligand can be extremely valuable for the discovery of novel inhibitors, when the target protein is flexible, the shapes adopted by active ligands may be sufficiently different that a match could never be found using shape-based ligand-based VS.
The challenge of 'induced fit' can be overcome using multiple conformations of the target protein in the ligand dockings; the "Relaxed Complex Scheme" is one such approach, in which carefully selected, diverse, representative snapshots from Molecular Dynamics simulations of the apo protein are used with a ligand-protein docking program such as AutoDock to dock the ligands to the ensemble of conformations of the target protein. This approach increases the chances of identifying hits in virtual screening, avoiding false negatives that would otherwise arise with an incompatible protein conformation.
InhibOx can apply its technologies to model this protein flexibility and perform virtual screening using DOx and AutoDock to identify novel inhibitors, and potentially discovering novel binding pockets and even allosteric pockets, as has recently been reported by the AutoDock-based World Community Grid project, FightAIDS@Home.
DOx

DOx is InhibOx's in-house ligand docking and scoring software. In common with most docking programs it is composed of two main components:
- search: explores position and orientation of the ligand with respect to the protein.
- scoring: evaluates each generated molecular configuration.
The Search Module
There are two main types of search modules for positioning ligands with respect to a protein target: exhaustive and stochastic.
Exhaustive or systematic search methods move and rotate the ligand into every possible position and orientation within the search space using a given “granularity” of search. The success of such programs is often limited by efficiency considerations due to the complexity and scale associated with large proteins and receptors. Virtual high-throughput screening can therefore be hampered by large processing times. Such methods also encounter problems when presented with flexible molecules due to the exponential increase in the search space size.
A variety of stochastic and systematic search techniques are used by docking programs. For example, AutoDock and GOLD use variations [1,2]of the Genetic Algorithm (GA) [3] method for this purpose. DOCK uses an incremental construction and random conformation search-based method [4] to search for optimal poses. FlexX [5, 6] incrementally constructs the molecule to sample the conformation space and iteratively places it within the active site. Earlier versions of AutoDock used the simulated annealing [7] method to perform the search. All such methods attempt to achieve the correct balance between the efficiency and accuracy of the search.
In principle, a docking search is ideally suited for implementation within a GA as the search space can be represented using 6 or more real numbers which describe the position, orientation and torsion angles of the ligand with respect to the protein. In principle, a GA should be able identify the optimal configuration for a given optimisation problem. This is however limited by the design of the scoring function and the parameters used during the running of the GA. Efficiency concerns normally do not allow for large execution times and small population sizes and limited execution times can force the algorithm to settle within local minima. Modern GAs usually operate by performing large variations (and therefore large configuration changes) at the early stages of execution, performing the final optimizations by incrementally changing the mutation rate, crossover rate and the level of elitism. The use of an optimal GA configuration for the problem space is very important and was a main consideration for the design of the DOx search component.
DOx uses a GA-based search method with a gradient-based optimization module. AutoDock also uses a similar implementation of a hybrid GA by incorporating Lamarckian rules to the operation of the algorithm. DOx also uses a novel chromosome design which fragments the translational and rotational coordinates of a new configuration into several values of varying magnitude to allow the GA to perform its search using different step-sizes.
The Scoring Module
The scoring module is used to evaluate the favourability of a generated molecular configuration. A variety of scoring functions have been developed over the past decade. Several recent studies have also evaluated many collections of these scoring functions for accuracy. These studies [8, 9] have indicated the effectiveness of the XScore, DrugScore, PLP and G-Score scoring functions. The PLP and XScore functions have been implemented in DOx.
However, many of the scoring functions available have been developed, tested and evaluated against distinct classes of proteins and may therefore return different results for generalized cases. The best scoring function to use for a particular class of target protein can be difficult to predict. Therefore, a more recent approach to the construction of the scoring module involves the use of consensus scoring. Consensus scoring simply involves the use of two or more scoring functions for the prediction of the binding affinity. The construction of the final score can be done in many ways. The simplest involves the normalisation of two scores (e.g. PLP and XScore) and using the largest or smallest. A more refined approach involves the scoring of the ligand against a collection of scoring functions and constructing the final score by the parameterized addition of the different scoring function scores. The values of each parameter are dependent on the class of the protein being used.
DOxL
DOxL is a simpler and faster version of DOx which is used to dock a given molecule within a pre-defined region of the docking site. DOxL differs from DOx in that DOxL only attempts to optimise the position and orientation of a given molecule within a small box (typically a volume of around 8 Å3) containing its centeroid. DOxL therefore does not attempt a full docking, but optimises the positioning of a molecule against a selected scoring function. The current version of DOxL can use either the PLP or XScore scoring functions.
References
[1] Garrett M. Morris, David S. Goodsell, Robert S. Halliday, Ruth Huey, William E. Hart, Richard K. Belew, and Arthur J. Olson. Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, 19(14):1639–1662, January 1999.
[2] G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor. Development and Validation of a Genetic Algorithm for Flexible Docking. J. Mol. Biol., 267, 727-748, 1997
[3] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., 1989.
[4] Todd J. Ewing, Shingo Makino, Geoffrey A. Skillman, and Irwin D. Kuntz. Dock 4.0: Search strategies for automated molecular docking of flexible molecule databases. Journal of Computer-Aided Molecular Design, 15(5):411–428, May 2001.
[5] B. Kramer, M. Rarey, and T. Lengauer. Evaluation of the flexx incremental construction algorithm for protein-ligand docking. Proteins, 37(2):228–241, November 1999.
[6] I. Schellhammer and M. Rarey. Flexx-scan: fast, structure-based virtual screening. Proteins, 57(3):504–517, November 2004.
[7] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, Number 4598, 13 May 1983, 220, 4598:671–680, 1983.
[8] R. Wang, Y. Lu, X. Fang, and S. Wang. An extensive test of 14 scoring functions using the pdbbind refined set of 800 protein-ligand complexes. J Chem Inf Comput Sci, 44(6):2114–2125, 2004.
[9] Renxiao Wang, Yipin Lu, and Shaomeng Wang. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem., 46(12):pp 2287 – 2303, 2003.
