• Products
  • Statistics and Data Mining Solutions
  • Statistics and Data Mining Services
  • Statistics and Data Mining Resources
  • Support
  • News and Events
  • Company
Home / Products / S+ArrayAnalyzer / Key Features

S+ArrayAnalyzer 2.0 Key Features

Integrated Data Access | View images

S+ArrayAnalyzer includes flexible data access methods allowing you to load data via a graphical user interface or in batch mode using the 'Read Design' interface. The data import dialog allows you to specify different experimental designs and handles both Affymetrix and 2-color microarray data including:
  • Affymetrix GeneChip® MAS 4/5 summary data
  • Affymetrix probe-level (CEL, CDF and Probe) data
  • Two-channel data including GenePix, Spot, ScanAlyze, and Agilent

S+ArrayAnalyzer includes the Affymetrix File and GCOS application programming interface (API), which allows you to rapidly read Affymetrix CEL, and CHP binary formats and to directly import from Affymetrix LIMS/GCOS. The Affymetrix File and GCOS API provides an intermediate layer so that whenever Affymetrix updates its data formats, S+ArrayAnalyzer immediately adapts to this, resulting in no downtime for any users of the S+ArrayAnalyzer system.

S+ArrayAnalyzer can also be simply configured to read data directly from microarray databases such as the Affymetrix AADM database, the Iobion Gene Traffic database and the Rosetta Resolver database.

Imported Data is stored in the S-PLUS object database and managed visually through the S-PLUS object explorer.

Quality Control Diagnostics and Filtering | View images

S+ArrayAnalyzer provides an assortment of graphical tools for assessing the quality of your experimental data. The tools allow you to consider quality of chips from several perspectives and to filter genes and chips based on these assessments. Diagnostic plots include:

  • Color image plot of the entire array
  • M vs. A plot as either a scatter plot or a hexbin plot
  • Genes Present plot
  • Intensity boxplot
  • RNA degradation plot
  • Principal components plot

Advanced Normalization Methods | View images

Normalization is the key to reducing variation in the measured gene expression levels. S+ArrayAnalyzer includes many advanced methods for normalization, including both within and between chip methods for two channel data and advanced methods for Affymetrix probe-level (CEL) and summary (CHP) data using non-linear methods such as quantiles.

In two-channel arrays, the main pre-processing required is normalization within slides for balancing intensities between channels/dyes. The standard method is to normalize using a smooth function of intensity e.g. the loess() function in S-PLUS. This approach may also be used to remove spatial effects of print-tips by fitting a separate loess() function for each print-tip. Two-channel, within-chip normalization methods comprise: median, loess, 2-D loess, print-tip loess, MAD, global MAD, print-tip, MAD 2-channel. Between-chip methods comprise: vsn, quantiles on R/G, quantiles on A.

In the Affymetrix system the goal is to line up the distribution of values from individual chips. Methods for CEL data comprise: quantiles, quantiles with robust option, invariant set, constant, contrasts, loess and vsn. Methods for MAS summary data comprise: median, inter-quartile range, vsn, quantiles and scale.

Precise and Powerful Statistical Tests | View images

A key goal of microarray experiements is to identify genes that are differentially expressed. S+ArrayAnalyzer includes the leading statistical methods for identifying differentially expressed genes, as well as many methods for class discovery and prediction. Methods for differential expression include:

  • Two sample and paired t-tests
  • Wilcoxon test
  • Distribution and Permutation based tests
  • One-way and two-way anova (fast, scaleable linear model methods)
  • Local pooled error testing (LPE)

The local pooled error test (LPE) is designed specifically for low-replicate microarray experiments. The LPE test statistic for each gene is formed by pooling variance estimates locally (i.e. just for genes with very similar expression intensities) from replicated arrays within experimental conditions. The LPE approach handles the situation where a gene with low expression may have very low variance by chance and the resulting signal-to-noise ratio is unrealistically large. The LPE method works very well in cases where RNA is limited or the budget doesn't allow many replicate chips to be run. In combination with a resampling FDR correction, the LPE method has been shown to outperform other 2-sample comparison methods.

The linear models methods in S+ArrayAnalyzer e.g. ANOVA and nested models use fast, scaleable algorithms, optimized to the high-throughput data array format of microarray technology.

Leading Clustering Methods | View images

S+ArrayAnalyzer includes a vast set of partitioning and hierarchical cluster analysis methods. Hierarchical methods allow complete, average and single linkage, and a variety of distance metrics e.g. Euclidean, manhattan, maximum and binary. Partitioning methods include kmeans, and a robust partitioning around medoids method. Model based clustering, whereby a set of multivariate Gaussian mixtures are fit in a Bayesian context, is also available. A number of other unsupervised learning methods are available in S-PLUS including self-organizing maps, fuzzy clustering and additional agglomerative methods (agnes) and divisive methods (diana and mona).

Control of Family Wise Error Rate and False Discovery Rate | View images

S+ArrayAnalyzer includes many methods for controlling the family wise error rate (FWER) and the false discovery rate (FDR). The FWER is controlled by using adjusted p-values for each gene so the overall Type I error rate is maintained at a desired level. Methods available for controlling family wise error rate include:

  • Bonferroni
  • Hochberg (1988)
  • Holm (1979)
  • Westfall & Young (1993)

Methods available for controlling FDR include:

  • Benjamini and Hochberg (1995)
  • Benjamini and Yekutieli (2001)

Annotation and Gene List Management | View images

The gene list represents the transition from the statistical analysis to the biological interpretation. There is a great deal of available annotation metadata available to help with the inferential and interpretive process. S+ArrayAnalyzer uses annotation metadata in four main ways:

  1. Annotate graphical and tabular reports from statistical analyses using gene lookup metadata sites, such as LocusLink and Entrez.
  2. Annotate gene lists derived from the statistical analyses via metadata repositories such as LocusLink, Entrez, Pubmed, AmiGO and Source.
  3. Connect to gene list analysis sites such as Onto-Express and DAVID/EASE, and initiate gene list analyses (e.g., gene function enrichment and identification of GO categories that are overrepresented in gene lists derived from statistical analyses).
  4. Subset microarray datasets according to GO categories prior to (differential expression) analysis.

S+ArrayAnalyzer also includes flexible methods for gene list management including tools for combining and comparing gene lists. Standard Venn diagrams provide a helpful visual in this process but represent only the tip of the underlying functionality available.

Graphical and Tabular Reports | View images

S+AA includes a rich palette of interactive and publication quality graphical and tabular reports. Graphics include volcano plots, parallel coordinate plots, whole genome plots, heat maps, silhouette plots, principal component biplots and Venn diagrams. Interactive reports are hyperlinked to gene annotation metadata and summary information e.g. LocusLink, Entrez, Pubmed, AmiGO and Source.

Open and Extensive Development Environment

S+ArrayAnalyzer leverages the S-PLUS language, which is a full featured object-oriented language for the analysis of data. Every feature available via the graphical user interface has an accessible programmatic command (function). You can use these functions to build scripts for automated analysis, batch analysis, or prototyping/implementing new methods. This gives you full control over the analysis unlike many black box applications. In addition to the S-PLUS language S+ ArrayAnalyzer also exposes a Java and C++ application programming interface (API). These API's allow you to further extend S+ArrayAnalyzer by creating custom interfaces, connections to other software, or integrating within your customized workflow.

Flexible Deployment

S+ ArrayAnalyzer is capable of adapting to your needs and can be deployed in a variety ways. Typically these decisions are by taking into account various factors such as number of users, size of data, analysis workflow, reporting requirements, and geographic locations of users. The following descriptions of versions and description of deployment examples will help you better understand what solution fits your needs.

S+ArrayAnalyzer Desktop
The desktop edition is a single user license available for PC's. Typically used by a scientist or statistician to analyze microarray data, conduct exploratory analysis, and develop new methods. The desktop edition gives you full access to the complete S-PLUS environment allowing for more individual control over your analysis options.
The desktopversion also works in concert with the Enterprise edition as a development system and prototyping environment.

S+ ArrayAnalyzer Network
The network edition is a license managed concurrent user license. Like the desktop the network edition gives you complete access to the S-PLUS environment, but can accomadate multiple users.

S+ ArrayAnalyzer Enterprise
The enterprise solution is licensed by CPU. Based on S-PLUS Server, the Enterprise edition is designed to be extensible and easy to integrate. The easy to use web based interface jump starts your analysis by providing you with out of the box access to rigorous statistical analysis. Using the included development tools you can customize your interface helping you to expand to meet new needs or enforce best practices.

S+ArrayAnalyzer enterprise solution can also serve as an engine for automated analysis that can be easily integrated with existing tools or databases. It can also integrate with other popular software packages like Spotfire Decision Site.

The CPU based license model makes it easy to deploy to many users simultaneously or run many automated batch processes at a time without ever running out of licenses, and is the most flexible of all deployments.