StarNet is a visual data mining front end for exploring correlation networks constructed from microarray data.
This software serves two main functions: (1) it readily provides new hypotheses via the standard “guilt by association model,”
where genes that participate in the same pathways frequently have similar expression profiles, and (2) it provides a starting place for
reconstructing biological networks using modeling approaches such as dynamic Bayesian networks, by providing lists of candidate genes to use in such approaches.
StarNet asks the user for a gene of interest, some parameters to specify how the network will be constructed, and some parameters for drawing preferences.
StarNet then draws correlation networks local to the gene of interest. Users can specify 1) that only one network be drawn, 2) that two networks from the same
species be drawn, or 3) that two networks from different species be drawn.
StarNet offers several useful features,
including: network gene lists linked to Entrez Gene, with flagging and listing of genes found in both networks, if two networks are drawn (homologous genes are highlighted, if networks are from different species); edge (correlation) lists with 95% and 99% confidence intervals;
optional highlighting and listing of nodes with specified Gene Ontology (GO) keyword matching (default=‘transcription’); lists of GO terms (and associated genes)
that are enriched in the network compared with the entire array; and finally, to easily compare the correlation network to current knowledge, networks of
known interactions (from Entrez’s Gene RIFs) involving genes in the drawn correlation networks. A recently added tool, HeatSeeker, will also draw false color maps comparing
the two networks (again, only when two networks have been drawn). Specifically, HeatSeeker draws false color maps of correlations between genes in the network, for each cohort.
The genes in each heatmap are hierarchically clustered using complete linkage, by correlation distance, within their cohort, and the appropriate dendrogram is drawn on the heatmap.
Each cohort’s heatmap is redrawn using the clustering of the other cohort, for comparison purposes. HeatSeeker also draws false color maps of the difference between the correlations
in the first and the second cohort. One heatmap of the difference is drawn for each of the two cohort’s clusterings.
The set of genes used in this procedure is the union of all genes in both networks. (Only genes with homologs in both networks are considered
in two species analysis.) HeatSeeker allows retrieval of data in a tabular format, one table per image. Each table presents numerical values of correlation
distances, or differences in correlation distances, as appropriate. Differences are tested for statistical significance: correlations are transformed
(Fisher r to Z transform) and differences compared against a standard normal. Significant differences are flagged in the text files.
Jupiter, D.C. and VanBuren, V. A Visual Data Mining Tool that Faciltates Reconstruction of Transcription Regulatory Networks. PLoS ONE 3(3): e1717 doi:10.1371/journal.pone.0001717 [PDF] [Link to StarNet]
Jupiter, D.C., Chen H. and VanBuren, V. StarNet 2.0: A Web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data. BMC Bioinformatics, 10:332, 2009, PMID: 19828039 [PDF] [Link to StarNet]
The easiest way to see what StarNet does is to enter the symbol for your favorite gene, and your favorite species, and click submit.
This will draw networks using our default parameters.
If you don’t know the official symbol or Entrez ID for a gene, you can use the Gene ID lookup tool on the StarNet front page to search by keyword or by partial symbols.
The lookup tool can be found at [Gene Lookup].
The data used was collected from NCBI's Gene Expression Omnibus (GEO), a repository of publicly available microarray data. We chose ten species to examine,
and for each chose a suitable Affymetrix microarray platform. Microarray samples were chosen for each species.
Within a species, where possible, we have selected a subset of arrays pertaining to development;
thus users can compare networks drawn from a full and from a development specific cohort- of array samples. The number of array samples per species ranges from
100 to 3,000, with each array platform containing beteween 5,000 and 23,000 genes. A list of species, with numbers of full and development cohort samples,
Gene Expression Omnibus (GEO) identifiers, and numbers of genes, is available on this page [Species]. Further information regarding
which GEO series were chosen for each platform and each cohort is available on this page [GSE descriptions].
Lists of specific samples chosen for each species and cohort can be found at [.cel files used].
Raw microarray data were normalized using the RMA normalization method available in BioConductor, and pairwise Pearson correlation coefficients
were calculated between the expression patterns of the genes within each array platform, using Octave. The resulting correlations were loaded into a MySQL database for further processing,
to obtain more tractable and meaningful subsets of correlations. For each species, and each cohort, we chose first the largest 100,000 negative, and the largest 100,000 positive
correlations. We also grouped these two distributions. As there are many genes that
are highly 'connected' in these tails, the 100K tails contain a small number of the genes which actually appear on an array. To get complete coverage of features on the array, we created a
'Genecentric' distribution, which contains the top 10 positive and top 10 negative correlations for every feature. We used the same 'top 10 positive/negative' approach to
construct two specialty distributions: one where both genes have a GO annotation matching 'transcription', and one where each gene matches either 'transcription' or 'signal'.
StarNet draws sub-graphs of larger correlation networks starting with your gene of interest in the center of the graph.
Networks are drawn in concentric ‘levels’, where the first level consists of genes that are directly connected to the gene of interest.
The second level consists of genes directly connected to genes in the first level, and so on.
StarNet can draw the following types of networks:
- this draws every connection in the specified distribution for N levels, starting from your gene
- same as above, but connections within a level or back to a lower level are allowed
- the user supplies a cutoff; the product of the coefficients in the path from your selected gene to any other gene must be higher than this cutoff; up to N levels are drawn, where N is user specified
- draws the top n connections for your specified gene, then does the same for the next level, and so on; up to N levels are drawn
- same as above, but connection within a level or back to a lower level are allowed
Clicking on a graph drawn by StarNet will spawn a new page for that cohort, where the nodes are linked to NCBI's gene description.
Below the graphs, on both the main page and cohort specific pages, there is supporting information and analysis, including a gene list; an edge list with confidence intervals; a list of the genes in the network that
match the user-supplied GO search terms (default =&lsquo transcription’) - nodes with matching GO terms are also highlighted in red on the graph; GO terms enriched
in the network compared with the whole array platform;small networks of known interactions for genes in the correlation networks. On the main page, in addition, there are lists of genes common to both networks drawn, or homologous
genes in the case of two species; and a link to HeatSeeker, if two networks were drawn.
Complete documentation can be found in the User manual.
Apache-Style Software License for ColorBrewer software and ColorBrewer Color Schemes, Version 1.1
Copyright (c) 2002 Cynthia Brewer, Mark Harrower, and The Pennsylvania State University. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions as source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. The end-user documentation included with the redistribution, if any, must include the following acknowledgment: This product includes color specifications and designs developed by Cynthia Brewer (http://colorbrewer.org/). Alternately, this acknowledgment may appear in the software itself, if and wherever such third-party acknowledgments normally appear.
3. The name "ColorBrewer" must not be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact Cynthia Brewer at cbrewer@psu.edu.
4. Products derived from this software may not be called "ColorBrewer", nor may "ColorBrewer" appear in their name, without prior written permission of Cynthia Brewer.
THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CYNTHIA BREWER, MARK HARROWER, OR THE PENNSYLVANIA STATE UNIVERSITY BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|