Numerical Classification Method for deriving Natural Classes

WISHART, DAVID

doi:10.1038/221097a0

Letter
Published: 04 January 1969

Numerical Classification Method for deriving Natural Classes

DAVID WISHART¹

Nature volume 221, pages 97–98 (1969)Cite this article

139 Accesses
32 Citations
Metrics details

Abstract

THE current approach in numerical taxonomy is directed towards the so-called “minimum-variance” solution, for which it is argued that a population should be partitioned into cluster subsets by minimizing the total within group variation. Several classification methods have been compared¹ and shown to possess related variance constraints, and a case has been made^1–3 for suggesting that such methods are not ideally suited to the taxonomic problem of resolving “natural” classes. Implicit in the minimum variance approach is the concept that cluster should have no significant overall variance or spread, and this implies that in the case of a unimodal swarm the distribution should be split into an arbitrary number of compact sections. By contrast, Forgey has argued^2,3 that for a “natural” classification, clusters should correspond to data modes, and there can only be as many classes as there are distinct modes. No variance constraint is implied, or should be induced, for when a mode is elongated rather than spherical the distribution merely reflects some internal factor of variation for the corresponding class. Such factors will be present to some extent, depending on data transformations and the quality of the selected character set, and therefore a subsequent variable search is necessary to discover the hidden constant characteristics of the class. Furthermore, those characters which are non-constant for a cluster mode may be inter-correlated, suggesting that the original character choice was poor, and in such cases the consideration of correlations, ratio variables and regression coefficients is indicated. Forgey interprets^2,3 a data mode as a continuous dense swarm of points, separated from other such modes by either empty space or a scattering of “noise” data. It has been suggested that “noise” data usually result from sampling errors, and while this is true, they can also be interpreted as those natural phenomena associated with the intersecting tails of disjoint continuous distributions. We can therefore expect a “natural” cluster to exhibit a dense centre (of any shape) which is surrounded by a haze or cloud of points, and the problem is to isolate the dense centres irrespective of this interference.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

A vectorial tree distance measure

Article Open access 28 March 2022

Benchmark and application of unsupervised classification approaches for univariate data

Article Open access 12 March 2021

Band-based similarity indices for gene expression classification and clustering

Article Open access 03 November 2021

References

Wishart, D., Proc. St Andrews Coll. in Numerical Taxonomy (1968).
Google Scholar
Forgey, E. W., Amer. Psychol. Assoc. Meetings, Los Angeles (1964).
Google Scholar
Forgey, E. W., AAAS—Biometric Soc. Meetings, Calif. (1965).
Google Scholar
Williams, W. T., Lambert, J. M., and Lance, G. N., J. Ecol., 54, 427 (1966).
Article Google Scholar
Lance, G. N., and Williams, W. T., Comp. J., 9, 373 (1967).
Article Google Scholar
Sneath, P. H. A., Comp. J., 8, 383 (1966).
Article Google Scholar
Wishart, D., A Fortran II Programme for Numerical Classification (St Andrews, 1968).
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Laboratory, Mathematical Institute, University of St Andrews,
DAVID WISHART

Authors

DAVID WISHART
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

WISHART, D. Numerical Classification Method for deriving Natural Classes. Nature 221, 97–98 (1969). https://doi.org/10.1038/221097a0

Download citation

Received: 06 September 1968
Issue Date: 04 January 1969
DOI: https://doi.org/10.1038/221097a0

This article is cited by

Student college choice sets: Toward an empirical characterization
- Michael L. Tierney
Research in Higher Education (1983)
Grid ? A space density analysis for recognition of noda in vegetation samples
- Otto Wildi
Vegetatio (1980)
Clusteranalyse — Überblick und neuere Entwicklungen
- Hans -Hermann Bock
Operations-Research-Spektrum (1980)
Qualitative and quantitative study of the growth and cell surface properties of Huntington's disease fibroblasts and age-matched controls
- J. J. Cassiman
- J. Verlinden
- H. Van den Berghe
Human Genetics (1979)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Numerical Classification Method for deriving Natural Classes

Abstract

Access options

Similar content being viewed by others

A vectorial tree distance measure

Benchmark and application of unsupervised classification approaches for univariate data

Band-based similarity indices for gene expression classification and clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

Student college choice sets: Toward an empirical characterization

Grid ? A space density analysis for recognition of noda in vegetation samples

Clusteranalyse — Überblick und neuere Entwicklungen

Qualitative and quantitative study of the growth and cell surface properties of Huntington's disease fibroblasts and age-matched controls

Comments

Search

Quick links

Abstract

Access options

Similar content being viewed by others

A vectorial tree distance measure

Benchmark and application of unsupervised classification approaches for univariate data

Band-based similarity indices for gene expression classification and clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Student college choice sets: Toward an empirical characterization

Grid ? A space density analysis for recognition of noda in vegetation samples

Clusteranalyse — Überblick und neuere Entwicklungen

Qualitative and quantitative study of the growth and cell surface properties of Huntington's disease fibroblasts and age-matched controls

Comments

Search

Quick links