We determined the molecular profiles of seven human lung cancer cell lines using commercially available large-scale DNA arrays (Incyte Genomics) composed of 60,000 elements (= 45,000 genes). Relative gene expression was determined by comparing the expression of normal primary epithelial lung cell lines to that of three SCLC cell lines and four NSCLC cell lines. Few genes were differentially expressed when independently derived primary lung cell lines (NHBE and SAEC) were compared with each other. However, we detected substantial differences in gene expression when comparing tumor cell lines with normal cell lines. The largest gene expression changes occurred in cell-surface markers and cytoskeletal elements. Of genes with greater than eightfold differential expression in at least one cell line, over 15% were members of five gene families: cytokeratins, laminin 5, fibronectin, integrins and annexins. A hierarchical clustering algorithm was used to analyze gene expression changes (threefold or greater) across eight probe pairs. We grouped probe pairs into clusters to categorize relationships among cell lines. The three SCLC cell lines formed one cluster and the four NSCLC cell lines clustered together. The normal cell lines seemed to be distinct from both SCLC and NSCLC. Using cluster analysis of individual genes we identified a cluster containing genes involved in mitotic pathways and up-regulated in most tumor cell lines. Four genes were represented two or more times within this cluster and placed in adjacent rows. The capacity of this method of statistical analysis to group these genes within the same cluster and as adjacent records supports the validity and reproducibility of our experimental approach.