investigator_user investigator user funding collaborators pending menu bell message arrow_up arrow_down filter layers globe marker add arrow close download edit facebook info linkedin minus plus save share search sort twitter remove user-plus user-minus
  • Project leads
  • Collaborators

Synthetically Accessible Virtual Inventory (SAVI)

Marc Nicklaus

2 Collaborator(s)

Funding source

National Cancer Institute (NIH)
The SAVI Project is based on: (a) a set of transforms with rich chemical context annotation including functional group reactivity data (LHASA, LLC, U.S.; and Lhasa Limited, UK) (b) a set of highly annotated building blocks (Sigma-Aldrich, Global Strategic Services) (c) the chemoinformatics toolkit CACTVS with custom development (Xemistry GmbH, Germany) The transforms are a set of more than 1,500 rules described in the CHMTRN/PATRAN language for encoding chemical transformations with chemical context and quality criteria added, based ultimately on the pioneering work of E. J. Corey. These rules, in contrast to simple SMIRKS transforms, allow/provide: - Computation of whether a reaction, depending on the overall structural features of the target, will work at all. - Scoring: If the reaction works, how robust it is, taking into account overall structural features. - Whether protection of interfering groups is required - and these can then already be integrated in the final starting materials queries to prioritize pre-protected starting materials. - Proposal of suitable context-dependent reaction conditions. - Textual warnings in specific circumstances, such as potential of multiple products, borderline conditions, etc. Ancillary information to the rules is a set of functional group reactivity data, i.e. a table describing whether any of the standard functional groups in the rule set is unstable under any of the standard conditions. The building blocks are a set of several hundred thousand compounds available in gram quantities, and with high reliability, from, or through, Sigma-Aldrich. This set has been annotated with pricing information and other business intelligence type data useful for this project. The chemoinformatics toolkit CACTVS has been expanded in various ways, e.g. with the capability to read the CHMTRN/PATRAN transforms. An important feature that needed to be implemented was the handling of the reversal of the original LHASA transform direction, without re-writing rules, for the strictly forward-synthetic SAVI project. Another important capability was the initial and final starting material (SM) query handling, i.e. the 4-steps: initial SM query extraction from the 2D patterns in the rules; forward reaction from the 2D patterns; scoring (which is the only original LHASA functionality); final SM query expansion (R-groups, protecting groups, etc.). For the goal of filtering out structures with less-than-desirable attributes in the drug development context, several additional computed properties regarded as important in current drug design have been implemented, such as the demerit scores based on 275 rules for identifying potentially reactive or promiscuous compounds, published by Bruns and Watson (J. Med. Chem. 2012, 55, 9763?9772); dx.doi.org/10.1021/jm301008n. In the current, very early alpha, stage of this project, only 11 transforms of the possible 1,500 were used; applied to approx. 230,000 building blocks; in only one-step reactions. The 610,000 resulting products have been annotated but not yet filtered with any of the computed or associated molecular properties. To limit the file size, only on the order of one percent of the theoretically possible products (of one-step reactions) have been sampled. A current task in the SAVI project is the generation of schematic graphical representations of the transforms. We are ultimately aiming at creating a database of one billion high-quality screening samples that should be easily and cheaply synthesizable. These novel molecules will all be annotated with a proposed simple and high-yield synthetic route, and will have been filtered by all the molecular properties generally recognized as important in cutting-edge drug design that we will have implemented by then. A web GUI is planned that will allow users free access to this database via searches by various criteria including substructure searches. It will also present links to pages where users can place requests for having the molecule(s) synthesized by commercial entities.

Related projects