• Products
  • Statistics and Data Mining Solutions
  • Statistics and Data Mining Services
  • Statistics and Data Mining Resources
  • Support
  • News and Events
  • Company
Products
Home / Products / S+ArrayAnalyzer / What's New

What's New in S+ArrayAnalyzer 2.0

Additional Experimental Designs and Associated Linear Model Analyses

S+ArrayAnalyzer now handles many experimental design formats including multifactor reference and loop designs with/without dye-swap for 2-channel arrays and multifactor designs for single channel oligo arrays. These designs can be specified simply from the user interface. All of the chip data files in an experimental design can be simply imported in one shot through the 'Read Design' interface. The imported data are stored by the internal S-PLUS Object Database and may be visually managed and analyzed through the S-PLUS Object Explorer. All of the experimental designs may be analyzed through a linear model framework e.g. as ANOVA or nested models.

Quality Control Diagnostics and Filtering | View images

S+ArrayAnalyzer provides an assortment of graphical tools for assessing the quality of your experimental data. The tools allow you to consider quality of chips from several perspectives and to filter genes and chips based on these assessments. Diagnostic plots include:

  • Color image plot of the entire array.
  • M vs. A plot as either a scatter plot or a hexbin plot.
  • Genes Present plot.
  • Intensity boxplot.
  • RNA degradation plot.
  • Principal components plot


GC-RMA | View images

In the Affymetrix system, each gene is represented by 11-20 PM and MM pairs of probes, each probing a different region of the mRNA transcript, typically within 600 base pairs of the 3' end. The RMA method of Irizarry et al. (2003) models PM intensity as a sum of exponential and Gaussian distributions for signal and background respectively, and uses quantile normalization (Bolstad et al., 2003) and a log-scale expression effect plus probe effect model that is fit robustly (median polish) to define the robust multi-array analysis (RMA) expression estimate for each gene. The GC-RMA method of Wu et al. (2004) describe an algorithm similar to RMA, but incorporating the MM using a model based on GC content (GC-RMA).

Improvements in Within-chip and Between Chip Normalization for 2-channel Arrays
S+ArrayAnalyzer has a rich family of normalization methods for within chip and between normalization of 2-channel arrays. Within-chip normalization methods include loess, print-tip loess and methods taking account of the spatial arrangement of probes on the chip. Within-chip normalization is essential in balancing the red and green intensities and removing effects of print-tips and other sources of extraneous noise in the chip. Between chip normalization methods include quantile normalization for aligning distributions of either the individual channels or the average on each chip. Between chip normalization is essential in the case of multifactor experiments in which one or more factors vary across different chips. S+ArrayAnalyzer also now handles unbalanced and ragged arrays within chips - no matter how your chip has been printed S+AA can read it and keep track of all spots - this is required for the powerful spatial normalization methods available in S+ArrayAnalyzer.

Linear Models and ANOVA Methods | View images

For more than two experimental conditions, linear models e.g. ANOVA and nested models can be used effectively. For one-way data, our linear model operations are done using a set of C functions operating on the rows. These functions are lightning fast, taking just a few seconds to fit >200 chips including estimation of (orthogonal) contrasts and scaling comfortably to many hundreds of chips. For two-way data of same design structure, operations are also very rapid and scaleable by virtue of an optimized model matrix implementation. These models are fit in approximately 10-20 seconds for ~100 chips and also scale comfortably to many hundreds of chips.

Resampling for FDR control in LPE
Contolling false discovery rate is a complex issue. It is one thing to set a false discovery rate, but quite another to hold such a rate in an analysis. In research underway at Insightful and UVa, we have found in simulation studies that resampling based menthods such as proposed by Reiner et al. (2003) generally do better than other methods in holding FDR. We have included resampling as an FDR option for our local pooled error (LPE) test; we are looking to offer this for other testing methods in the future.

Annotation and Gene List Management | View images

S+ArrayAnalyzer now includes flexible and rich annotation metadata analysis of gene lists derived from the statistical analyses. S+ArrayAnalyzer uses annotation metadata in four main ways:

  1. Annotate graphical and tabular reports from statistical analyses using gene lookup metadata sites, such as LocusLink and Entrez.
  2. Annotate gene lists derived from the statistical analyses via metadata repositories such as LocusLink, Entrez, Pubmed, AmiGO and Source.
  3. Connect to gene list analysis sites such as Onto-Express and DAVID/EASE, and initiate gene list analyses (e.g., gene function enrichment and identification of GO categories that are overrepresented in gene lists derived from statistical analyses).
  4. Subset microarray datasets according to GO categories prior to (differential expression) analysis

S+ArrayAnalyzer also now includes flexible methods for gene list management including tools for combining and comparing gene lists. Standard Venn diagrams provide a helpful visual in this process but represent only the tip of the underlying functionality available.


References to Methods Included in S+ArrayAnalyzer

Please cite these in papers you write. S+ArrayAnalyzer can be cited as follows:

O'Connell, M. (2003). Differential Expression, Class Discovery and Class Prediction using S-PLUS and S+ArrayAnalyzer. SIGKDD Explorations, December 2003, Volume 5, Issue 2.

Normalization

Bolstad, B., A., I. R., Astrand, M., and Speed, T. (2003). A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19,2: 185-193.

Durbin, B. Hardin, J., Hawkins, D. and Rocke, D. (2002). A Variance-Stabilizing Transformation for Gene-Expression Microarray Data, Bioinformatics 18, Number Supplemental 1, pp S105-S110.

Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 1: 1:9.

Irizarry, R., Hobbs, B., Collin, F., Beaxer-Barclay, Y., Antonellis, K., Scherf, U., and Speed, T. (2003). Exploration, normalization, and summaries of high density

Li, C. and Wong, W. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science USA 98: 31-36.

Wu, Z., LeBlanc, R. and Irizarry, R. A., Stochastic Models Based on Molecular Hybridization Theory for Short Oligonucleotide Microarrays Technical report, Johns Hopkins University, Dept. of Biostatistics Working Papers. (www.bepress.com/jhubiostat/paper4/)

Yang Y.H., Dudoit S., Luu P., Lin D.M., Peng V., Ngai J. and Speed, T. (2002). Normalization for cdna microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 30,4: e15.

Differential Expression Testing

Dudoit, S., Yang, Y. H., Callow, M. J. and Speed, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments, Statistica Sinica, 12, 1: 111-139.

Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connell, M. and Lee, J.K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics 19: 1945-1951.

False Discovery Rate and Family-wise Error Rate Control

Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, Methodological 57: 289-300.

Benjamini, Y., Yekutieli, D. (2001). The control of the false discovery rate in multiple hypothesis testing under dependency. Annals of Statistics 29,4: 1165-1188.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 6: 65-70.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75: 800-802.

Reiner, A., Yekutieli, D. and Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19: 368-375.

Westfall, P. H. and Young, S. S. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons, 1993.

Class Discovery and Cluster Analysis

Fraley C. and Raftery A. E. (2002). MCLUST: Software for Model-Based Clustering, Discriminant Analysis and Density Estimation. Technical Report no. 415, Department of Statistics, University of Washington.

Kaufmann L, Rousseeuw PJ (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, NY.

Kohonen T (1995). Self Organizing Maps. Springer, NY.

Venables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics with S. Springer, NY.

Annotation

Draghici, S. (2003). Data Analysis Tools for DNA Microarrays. Chapman and Hall, London.
Gentleman