The simultaneous study of the expression of thousands of genes
during a single experiment is possible with the hybridization of immobilized
oligonucleotides or cDNAs. We present the statistical analysis of high
density cDNA arrays with clones from a mouse unigene library and calibration
spots from A. thaliana genes.
Using control spots and dilution series we quantify the reproducibility
of the measurements. It turns out that overshining effects are weak and
that there are multiplicative errors of about 10% over three orders of
magnitude. Normalization strategies are introduced to remove systematic
variations of the pin performance and to compare different chips.
In order to reconstruct gene regulatory networks from expression data,
clusters of coregulated genes have to be identified. Based on randomized
data we derive tests to assess the significance of the resulting clusters.
Using string search and multiple alignment via the
"Gibbs sampler" we identify regulatory sequences in upstream regions
of gene clusters.
The ultimate goal of expression data analysis is the reconstruction
of the underlying genetic networks. We apply techniques of "reverse engineering"
to toy-models of genetic networks. It turns out that mutual information
analysis allows the reconstruction of the network structure.