Prostate cancer is the most commonly diagnosed malignancy and the second most common cause of cancer death in North American males. It is largely unknown why some tumors progress to aggressive, potentially life-threatening disease whereas others remain latent for decades. Although cancer classification based on gene expression has been successful for several types of malignancies, expression analysis in prostate cancer is complicated by the heterogeneity of the tumor (even within a single specimen) and by contaminating normal cells. We have developed an approach that combines trimming and sectioning of frozen samples with associated histopathological review to isolate and verify percent tumor and grade within a given specimen. Messenger RNA from more than 40 normal, benign prostatic hyperplasia, and low- and high-grade prostate tumor samples was isolated and compared with a single reference to generate over 300,000 data points. Gene expression profiles of all prostate samples were correlated with Gleason score, clinical stage and other phenotypic and pathologic descriptors. Using a series of statistical tools, including multidimensional scaling, we have recognized unique clusters of samples that seem to correlate with tumor aggressiveness. Statistical analysis of these data also identified a subset of genes that distinguish tumor specimens from each other as well as from normal prostate tissue. Random permutation of the sample labels clearly indicates that the subset of genes separating tumor subgroups far exceeds that expected by chance alone (P=0.002). This work indicates that investigation of prostate cancer based on gene expression can identify genes that may help distinguish more aggressive forms of this disease.