• Products
  • Statistics and Data Mining Solutions
  • Statistics and Data Mining Services
  • Statistics and Data Mining Resources
  • Support
  • News and Events
  • Company
Industry Solutions
Home / Life Science Solutions / Drug Discovery

Drug Discovery

Insightful has a number of specialized solutions and S-PLUS libraries targeted at drug discovery applications. These are available through Insightful Consulting Pack solutions. Solutions include:


CHEMINFORMATICS

Discovery researchers are using the following process, based on S-PLUS and I-Miner to do sophisticated predictive high throughput screening. They first develop a space-filling (e.g. D-optimal) design on the space of chemical structure properties. This design forms the basis of an initial sample of compounds. The compounds are then screened with a number of assays; including confirmatory screens. The assay activity is then modeled as a function of the chemical structure properties using a variety of statistical models e.g. recursive partitioning, neural nets, naïve Bayes etc in S-PLUS and I-Miner. Some of these models may be combined into ensembles. Recursive partitioning models may be assembled as forests of trees using boosting, bagging, random forests or boosted random forests. The models' performance may be assessed using cross-validation or out-of-bag samples so that the assessment is done on data not used in model training. The best models are then used to predict which compounds to choose from the compound libraries for the next round of screening.

Fig. 1

Figure1 : Visual QC and hit detection from high-throughput screening application based on S-PLUS

MICROARRAY ANALYSIS

Discovery researchers in many biopharmaceutical companies are using S+ArrayAnalyzer for analyzing microarray data. Gene expression microarrays are a powerful experimental platform for studies of functional genomics, toxicogenomics, cancer subtyping and a host of other applications.

Differential expression analysis is a key analysis component in most studies. A major goal of such analyses is to identify genes that are differentially expressed between experimental conditions, while keeping the probability of false discoveries acceptably low. Given the many sources of variability in microarray measurements, the need for experimental design, replication and statistically rigorous pre-processing and differential expression analysis is now widely recognized.

S+ArrayAnalyzer includes many components for design, data access, data preparation (probe-level analysis, normalization etc.), differential expression testing, FDR/FWER correction and interactive visualization and annotation of results.

Fig. 2

Figure2 : Insightful S+ArrayAnalyzer Workflow

TAQMAN ANALYSIS

Non-clinical statisticians at Merck have designed and deployed a Taqman data analysis system using S-PLUS and the S-PLUS Server for standardized analysis of gene expression data from Taqman instruments by discovery chemists and biologists. Taqman data output are loaded into S-PLUS Server in standardized formats. Relative expression is estimated along with measures of variability. Inbuilt outlier routines are included and 'safe statistics' are used in estimating and testing relative expression. The user interface to the Merck application is written as a combination of visual basic and excel, and selections of inputs for comparisons are conveniently presented to the discovery scientists. The S-PLUS Server has become the cornerstone of biometrics operations at Merck and the Taqman application is widely used by discovery chemists and biologists. The S-PLUS Server allows the Merck non-clinical statistics group to serve a large population of discovery chemists and biologists throughout the organization.

Fig. 3

Figure3 : Slide from Enterprise Solutions for Statistical Analysis of Gene Expression Data by Bill Pikounis, Merck Research Laboratories

GENOTYPE-RESPONSE ASSOCIATION (PHARMACOGENOMICS)

Researchers at the Mayo Clinic are using S-PLUS for the analysis of genotype-response association with an S-PLUS library called haplo.score. The library can be used to compute score statistics to test associations between haplotypes and a wide variety of traits, including binary, ordinal, quantitative, and Poisson. These methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The methods provide several different global and haplotype-specific tests for association, as well as provide adjustment for non-genetic covariates and computation of simulation p-values (which may be needed for sparse data). Details on the background and theory of the score statistics can be found in the following reference:Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests forassociation of traits with haplotypes when linkage phase is ambiguous. American JHuman Genetics, February, 2002.

 

LEARN MORE ABOUT
RESOURCES