| Version: | 1.0-1 |
| Date: | 2013-10-28 |
| Author: | Frederick Novomestky <fnovomes@poly.edu> |
| Maintainer: | Frederick Novomestky <fnovomes@poly.edu> |
| Depends: | R (≥ 2.0.1) |
| Description: | A collection of data sets for teaching cluster analysis. |
| Title: | Cluster Analysis Data Sets |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Packaged: | 2013-10-28 22:49:31 UTC; fred |
| NeedsCompilation: | no |
| Repository: | CRAN |
| Date/Publication: | 2013-10-29 07:55:17 |
Hartigan (1975) Acidosis Patients
Description
The table contains measures of various compounds in cebrospinal fluid and blook for acidosis patients. This is Table 14.11 in Chapter 14 of Hartigan (1975) on page 265.
Usage
data(acidosis.patients)
Format
A data frame with 40 observations on the following 6 variables.
ph.cerebrospinal.fluida numeric vector
ph.blooda numeric vector
hco3.cerebrospinal.fluida numeric vector
hco3.blooda numeric vector
co2.cerebrospinal.fluida numeric vector
co2.blooda numeric vector
Details
Hartigan suggests the use of the direct splitting algorithm with this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(acidosis.patients)
Hartigan (1975) Airline Distance Between Principal Cities of the World
Description
The table contains the airline distances in hunds of miles between the principal cities of the world. This is Table 11.1 in Chapter 11 of Hartigan (1975) on page 192.
Usage
data(airline.distances.1966)
Format
A data frame with 30 observations on the following 31 variables.
codea character vector for the cities
AZa numeric vector for Azores
BDa numeric vector for Baghdad
BNa numeric vector for Berlin
BYa numeric vector for Bombay
BSa numeric vector for Buenos Aires
COa numeric vector for Cairo
CNa numeric vector for Capetown
CHa numeric vector for Chicago
GMa numeric vector for Guam
HUa numeric vector for Honolulu
ILa numeric vector for Istanbul
JUa numeric vector for Juneau
LNa numeric vector for London
MAa numeric vector for Manila
MEa numeric vector for Melbourne
MYa numeric vector for Mexico City
MLa numeric vector for Montreal
MWa numeric vector for Moscow
NSa numeric vector for New Orleans
NYa numeric vector for New York
PYa numeric vector for Panama City
PSa numeric vector for Paris
ROa numeric vector for Rio De Janeiro
REa numeric vector for Rome
SFa numeric vector for San Francisco
SOa numeric vector for Santiago
SEa numeric vector for Seattle
SIa numeric vector for Shanghai
SYa numeric vector for Sydney
TOa numeric vector for Tokyo
Details
Hartigan uses this data set with the single linkage algorithm.
Source
The World Almanac (1966).
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(airline.distances.1966)
Hartigan (1975) Mammal's Milk
Description
The table contains a list of animals and the constituents of their milk. A shorter version appearsa in jh.table.1.2. This is Table 16.3 in Chapter 16 of Hartigan (1975) on page 304.
Usage
data(all.mammals.milk.1956)
Format
A data frame with 25 observations on the following 6 variables.
namea character vector for the animal name
watera numeric vector for the percentage of water
proteina numeric vector for the percentage of protein
fata numeric vector for the percentage of fat
lactosea numeric vector for the percentage of lactose
asha numeric vector for the percentage of ash.
Details
Hartigan suggests the use of a joiner-scaler algorithm on this data set.
Source
Spector, W. S. (1956) Handbook of Biological Data, Saunders.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(all.mammals.milk.1956)
Hartigan (1975) City Crime
Description
The table records city crime along with population statistics. This is Table 18.6 in Chapter 18 of Hartigan (1975) on page 342.
Usage
data(all.us.city.crime.1970)
Format
A data frame with 24 observations on the following 10 variables.
citya character vector for the city name
populationa numeric vector for th epopulation in thousands
white.changea numeric vector for the percent change in inner city white population from 1960 to 1970
black.populationa numeric vector for the black population in thousands
murdera numeric vector for the murder rate
rapea numeric vector for the rape rate
robberya numeric vector for the robbery rate
assaulta numeric vector for the assault rate
burglarya numeric vector for the burglary rate
car.thefta numeric vector for the car theft rate
Details
All rate variables are per 100,000 population. Hartigan suggests using the AID algorithm on this data set.
Source
The Statistical Abstract of the United States (1971), Bureau of Census, Department of Commerce, Grossett and Dunlop, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(all.us.city.crime.1970)
Hartigan (1975) Amino Acid Sequence for Vertibrates
Description
The table defines the position of amino acids for Cytochrome-c. This is Table 13.4 in Chapter 13 of Hartigan (1975) on page 240.
Usage
data(amino.accid.sequence.1972)
Format
A data frame with 17 observations on the following 37 variables.
speciesa character vector for the species names
p.1a factor for position 1 with levels
IVp.2a factor for position 2 with levels
AEp.3a factor for position 3 with levels
ITVp.4a factor for position 4 with levels
ITVp.5a factor for position 5 with levels
MQp.6a factor for position 6 with levels
ASp.7a factor for position 7 with levels
CVp.8a factor for position 8 with levels
KNp.9a factor for position 9 with levels
TVp.10a factor for position 10 with levels
HNSWYp.11a factor for position 11 with levels
FIp.12a factor for position 12 with levels
AEPQVp.13a factor for position 13 with levels
FYp.14a factor for position 14 with levels
STp.15a factor for position 15 with levels
ADEp.16a factor for position 16 with levels
NSp.17a factor for position 17 with levels
ITVp.18a factor for position 18 with levels
GKNQp.19a factor for position 19 with levels
ENQp.20a factor for position 20 with levels
DEp.21a factor for position 21 with levels
MRp.22a factor for position 22 with levels
EIp.23a factor for position 23 with levels
IVp.24a factor for position 24 with levels
TVp.25a factor for position 25 with levels
ILp.26a factor for position 26 with levels
KSp.27a factor for position 27 with levels
Kp.28a factor for position 28 with levels
ADEGKSTp.29a factor for position 29 with levels
AEQTVp.30a factor for position 30 with levels
DNp.31a factor for position 31 with levels
IVp.32a factor for position 32 with levels
DEKQSp.33a factor for position 33 with levels
AKTp.34a factor for position 34 with levels
ACTp.35a factor for position 35 with levels
AKNSp.36a factor for position 36 with levels
-AEKS
Details
The factor levels across the 36 positions common. Hartigan uses the reduced mutation algorithm with this data set.
Source
Dickerson, R. E. (1972). The structure and history of an ancient problem, Scientific American, 222(4), 58-72.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(amino.acid.sequence.1972)
Hartigan (1975) Cluster of Animals Forming a Tree
Description
The table is a binary table that identifies which animals are in given cluster. This is Table 8.1 in Chapter 8 of Hartigan (1975) on page 155.
Usage
data(animal.cluster.trees)
Format
A data frame with 13 observations on the following 11 variables.
symbola character vector for
namea character vector for
c.1a numeric vector for a binary variable. A value 1 means the animal is in cluster 1 while 0 means that it is not in that cluster
c.2a numeric vector for a binary variable. A value 1 means the animal is in cluster 2 while 0 means that it is not in that cluster
c.3a numeric vector for a binary variable. A value 1 means the animal is in cluster 3 while 0 means that it is not in that cluster
c.4a numeric vector for a binary variable. A value 1 means the animal is in cluster 4 while 0 means that it is not in that cluster
c.5a numeric vector for a binary variable. A value 1 means the animal is in cluster 5 while 0 means that it is not in that cluster
c.6a numeric vector for a binary variable. A value 1 means the animal is in cluster 6 while 0 means that it is not in that cluster
c.7a numeric vector for a binary variable. A value 1 means the animal is in cluster 7 while 0 means that it is not in that cluster
c.8a numeric vector for a binary variable. A value 1 means the animal is in cluster 8 while 0 means that it is not in that cluster
c.9a numeric vector for a binary variable. A value 1 means the animal is in cluster 9 while 0 means that it is not in that cluster
Details
This table is used to construct and present a cluster tree as defined in Hartigan (1975).
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(animal.cluster.trees)
Hartigan (1975) Birth and Death Rates Per 1000
Description
A table with birth and death rates per 1000 persons for selected countries. This is Table 11.6 in Chapter 11 of Hartigan (1975) on page 197.
Usage
data(birth.death.rates.1966)
Format
A data frame with 70 observations on the following 3 variables.
countrya character vector for the country name
birtha numeric vector for the birth rates per 1000 persons
deatha numeric vector for the death rates per 1000 persons
Details
Hartigan recommends that spircal search algorithm be applied to this data set.
Source
Reader's Digest Almanac (1966)
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(birth.death.rates.1966)
Hartigan (1975) Times of Appearance of British Butterflies
Description
The table defines the metamorphisis sequences of British butterflies. This is Table 7.6 in Chapter 7 of Hartigan (1975) on page 150.
Usage
data(british.butterfly.appearance)
Format
A data frame with 27 observations on the following 13 variables.
namea character vector for the species
jana factor for January occurrences with levels
ILOPfeba factor for February occurrences with levels
ILOPmara factor for March occurrences with levels
ILOPapra factor for April occurrences with levels
ILLPOOLPPImaya factor for May occurrences with levels
ILLILPLPIPPIjuna factor for June occurrences with levels
IILIOLLLILPLPIPPIjula factor for July occurrences with levels
ILLILPLPIOPPIauga factor for August occurrences with levels
ILLILPIOPPIsepa factor for September occurrences with levels
ILLILPLPIOPPIocta factor for October occurrences with levels
ILLPLPIOPnova factor for November occurrences with levels
ILOPdeca factor for December occurrences with levels
ILOP
Details
Hartigan suggests using this data set to test the ditto algorithm.
Source
Ford, T. L. E. (1963). Practical Entomology, Warne, London, p. 181.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(british.butterfly.appearance)
Hartigan (1975) Ingredients in Cakes
Description
The table identifies for each cake which ingredient is used and the quantity. This is Table 12.8 in Chapter 12 of Hartigan (1975) on page 229.
Usage
data(cake.ingredients.1961)
Format
A data frame with 18 observations on the following 35 variables.
Cakea character vector for the name of the cake
AEa numeric vector for the amount of Almond essence in teaspoons
BMa numeric vector for the amount of Buttermilk in cups
BPa numeric vector for the amount of Baking powder in teaspoons
BRa numeric vector for the amount of Butter in cups
BSa numeric vector for the amount of Bananas in whole bananas
CAa numeric vector for the amount of Cocoa in tablespoons
CCa numeric vector for the amount of Cottage Cheese in pounds
CEa numeric vector for the amount of Chocolate in ounces
CIa numeric vector for the amount of Crushed Ice in cups
CSa numeric vector for the amount of Crumbs in cups
CTa numeric vector for the amount of Cream of tartar in teaspoons
DCa numeric vector for the amount of Dried currants in tablespoons
EGa numeric vector for the amount of Eggs in whole eggs
EYa numeric vector for the amount of Egg white in whole eggs
EWa numeric vector for the amount of Egg yolk in whole eggs
FRa numeric vector for the amount of Sifted flour in cups
GNa numeric vector for the amount of Gelatin in tablespoons
HCa numeric vector for the amount of Heavy cream in cups
LJa numeric vector for the amount of Lemon juice in tablespoons
LRa numeric vector for the amount of Lemon rind in teaspoons
MKa numeric vector for the amount of Milk in cups
NGa numeric vector for the amount of Nutmeg in teaspoons
NSa numeric vector for the amount of Nuts in cups
RMa numeric vector for the amount of Rum in ounces
SAa numeric vector for the amount of Soda in teaspoons
SCa numeric vector for the amount of Sour cream in cups
SGa numeric vector for the amount of Shortening in tablespoons
SRa numeric vector for the amount of Granulated sugar in cups
SSa numeric vector for the amount of Strawberries in quarts
STa numeric vector for the amount of Salt in teaspoons
VEa numeric vector for the amount of Vanilla extract in teaspoons
WRa numeric vector for the amount of Water in cups
YTa numeric vector for the amount of Yeast in ounces
ZHa numeric vector for the amount of Zwiebach in ounces
Details
For each cake and ingredient, the data frame contains NA if the ingredient is not required or a numeric value.
Source
Claiborn, C. (1961) The New York Times Cookbook, Harper and Row, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(cake.ingredients.1961)
Hartigan (1975) Oxidation-Fermentation Patterns
Description
The table contains the oxidation-fermentation patterns for a sample of species of Candida in terms of acid production. This is Table 15.1 in Chapter 15 Hartigan (2975) on page 279.
Usage
data(candida.oxidation.fermentation)
Format
A data frame with 8 observations on the following 13 variables.
namea character vector for the species name
glucosea factor for glucose with levels
+maltosea factor for maltose with levels
-+sucrosea factor for sucrose with levels
-+lactosea factor for lactose with levels
-+galactosea factor for galactose with levels
-+melibiosea factor for melibiose with levels
-+cellobiosea factor for cellobiose with levels
-+inositola factor for inositol with levels
-xylosea factor for xylose with levels
-+raffinosea factor for raffinose with levels
-+trehalosea factor for trehalose with levels
-+dulcitola factor for dulcitol with levels
-+
Details
A '+' level means oxidative production of acid where as a '-' level means no acide production. Hartigan suggests using direct joining on this data set.
Source
Hall, T. C., Webb, C. D> and Papageorge, C. (1972) Use of oxidation-fermentation medium in the identification of yeasts, HSMHA Report, 87, 172 - 176.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(candida.oxidation.fermentation)
Hartigan (1975) Presence of Cerci in Insects
Description
The table defines the hierarchy of insects classified according to cerci or tail appendages. This is Table 13.1 in Chapter 13 of Hartigan (1975) on page 234.
Usage
data(cerci.tail.presence)
Format
A data frame with 38 observations on the following 4 variables.
indexa numeric vector for the insect index
codea character vector for the insect code
namea character vector for the name of the index or family
parenta numeric vector the index of the parent insect
Details
Hartigan applies the minimu mutation method to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(cerci.tail.presence)
Hartigan (1975) Connecticut Votes for President
Description
The table contains presidential votes recorded over 12 elections and for 8 counties in Connecticut. This is Table 14.13 in Chapter 14 of Hartigan (1975) on page 267.
Usage
data(ct.president.vote.1920.1964)
Format
A data frame with 36 observations on the following 10 variables.
yeara numeric vector for the election year
partya character vector for the political party
fairfielda numeric vector for Fiarfield county
hartforda numeric vector for Hartford county
litchfielda numeric vector for Litchfield county
middlesexa numeric vector for Middlesex county
new.havena numeric vector for New Haven county
new.londona numeric vector for New London county
tollanda numeric vector for Tolland county
windhama numeric vector for Windham county
Details
Hartigan recommend the use of the two direct splitting algorithm on this data set.
Source
Scammon, R. M. (1965) America at the Polls, University of Pittsburgh, Pittsburgh.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(ct.president.vote.1920.1964)
Hartigan( 1975) European Food
Description
The table contains by country the percentage of all households with various foods in house at the time of questionnaire. This is Table 15.9 in Chapter 15 of (Hartigan) on page 289.
Usage
data(european.foods)
Format
A data frame with 20 observations on the following 18 variables.
codea character vector for the food code
namea character vector for the food name
wga numeric vector for West Germany
ita numeric vector for Italy
fra numeric vector for France
nsa numeric vector for Netherlands
bma numeric vector for Belgium
lga numeric vector for Luxemburg
gba numeric vector for Great Britain
pla numeric vector for Portugal
aaa numeric vector for Austria
sda numeric vector for Switzerland
swa numeric vector for Sweden
dka numeric vector for Denmark
nya numeric vector for Norway
fda numeric vector for Finland
spa numeric vector for Spain
ida numeric vector for Ireland
Details
Hartigan suggests applying two way direct joining to this data set.
Source
A Survey of Europe Today, The Readers' Digest Association Ltd, London.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(european.foods)
Hartigan (1975) Triads Based on Hardware
Description
The table defines pairs of hardware objects that are most similar along with a dissimilar object. This is Table 10.1 in Chapter 10 of Hartigan (1975) on page 178.
Usage
data(hardware.triads)
Format
A data frame with 20 observations on the following 4 variables.
casea character vector
similar.1a factor for the first object of similar pair with levels
BNPTsimilar.2a factor for the second object of similar pair with levels
BFSTodda factor for the different object with levels
BFNPST
Details
Six pieces of hardware were considered. Every possible set of three distinct pieces of hardware was examined, and a judgment was made about which two pieces were most similar. The results were reported by listing the closest pair with parentheses surrounding them, followed by the "odd" item. The hardware objects are identified as follows
"N" is a nail
"P" is a Phillips head screw
"B" is a bolt
"T" is a tack
"F" is a finishing nail
"S" is a screw
These data are used to test the triads algorithm.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(hardware.triads)
Hartigan (1975) Data Sets
Description
This data frame contains the directory of data sets from Hartigan (1975) that are available in this package.
Usage
data(hartigan.datasets)
Format
A data frame with 53 observations on the following 4 variables.
table.namea character vector with the table name
chaptera numeric vector with the chapter containing the table
pagea numeric vector with the page on which the table appears
data.set.namea character vector the data set name in this package
Details
Chapter number 0 is associated with the Introduction of the book.
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(hartigan.datasets)
Hartigan (1975) Indian Caste Measurements
Description
The table contains the correlations multiplied by 10000 for 22 caste groups each with 67 to 196 individuals. This is Table 17.6 in Chapter 17 of Hartigan (1975) on page 324.
Usage
data(indian.caste.measures)
Format
A data frame with 9 observations on the following 9 variables.
sta numeric vector for the correlations with stature
sha numeric vector for the correlations with sitting height
nda numeric vector for the correlations with basal depth
nha numeric vector for the correlations with nasal height
hla numeric vector for the correlations with head length
fba numeric vector for the correlations with frontal breadth
bba numeric vector for the correlations with bizygometic breadth
hba numeric vector for the correlations with head breadth
nba numeric vector for the correlations with nasal breadth
Details
The data frame has as row names the variable names. The actual correlations are recovered by dividing the data frame by 10000. Hartigan suggests performing a factor analysis on the data set as well as performing a joining algorithm.
Source
Rao, C. R. (1948). The utilization of multiple measurements in problems of biological classification, J. Royal Stat. Soc. B, 10, 159 - 193.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(indian.caste.measures)
Hartigan (1975) Indo-European Languages
Description
The table contains foreign language equivalent of the names associated with the column names. This is Table 13.8 in Chapter 13 of Hartigan (1975) on page 243.
Usage
data(indo.european.languages)
Format
A data frame with 13 observations on the following 17 variables.
languagea character vector for the foreign language
alla character vector for the foreign language equivalent
bada character vector for the foreign language equivalent
bellya character vector for the foreign language equivalent
blacka character vector for the foreign language equivalent
bonea character vector for the foreign language equivalent
daya character vector for the foreign language equivalent
diea character vector for the foreign language equivalent
drinka character vector for the foreign language equivalent
eara character vector for the foreign language equivalent
eata character vector for the foreign language equivalent
egga character vector for the foreign language equivalent
eyea character vector for the foreign language equivalent
fathera character vector for the foreign language equivalent
fisha character vector for the foreign language equivalent
fivea character vector for the foreign language equivalent
foota character vector for the foreign language equivalent
Details
Hartigan suggest that the minimum mutation algorithm is applied to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(indo.european.languages)
Hartigan (1975) Combat Deaths in Indochina
Description
Table contains the number of monthly combat deaths for US troops, South Vietnamese troops, third party troops and enemy troops. This is Table 6.4 in Chapter 6 of Hartigan (1975) on page 139.
Usage
data(indochina.combat.deaths)
Format
A data frame with 72 observations on the following 5 variables.
month.yeara character vector for the year
usa numeric vector for the number of US combat deaths
svna numeric vector for the number of South Vietnamese combat deaths
thirda numeric vector for the number of third party combat deaths
enemya numeric vector for the number of enemy combat deaths
Details
None
Source
Unclassified Statistics on Southeast Asia (1972), Department of Defense, OASD (Comptroller), Directorate for Information Operations.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(indochina.combat.deaths)
Hartigan (1975) Ivy League Football 1965
Description
The table contains the scores for the first half of the 1965 season of the Ivy League football games. This is Table 12.1 in Chapter 12 of Hartigan (1975) on page 217.
Usage
data(ivy.league.football.1965)
Format
A data frame with 40 observations on the following 4 variables.
home.teama character vector for the home team code
opponent.teama character vector for the opponent team code
home.scorea numeric vector for the home team score
opponent.scorea numeric vector for the opponent team score
Details
The following teams are represented in the table
| Brown | BN |
| Bucknell | BL |
| Colgate | CE |
| Connecticut | CT |
| Columbia | CA |
| Dartmouth | DN |
| Harvard | HD |
| New Hampshire | NH |
| Holy Cross | HO |
| Lafayette | LE |
| Pennsylvania | PA |
| Princeton | PN |
| Rhode Island | RI |
| Rutgers | RS |
| Tufts | TS |
| Yale | YE |
Hartigan applies a joining algorithm to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(ivy.league.football.1965)
Hartigan (1975) Jigsaw Puzzle Measurements
Description
A table of measurements for each piece in a jigsaw puzzle. This is Table 3.1 in Chapter 3 of Hartigan (1975) on page 76.
Usage
data(jigsaw.puzzle.measures)
Format
A data frame with 20 observations on the following 13 variables.
piecea numeric vector for the number of the piece.
L1a numeric vector for length of the line between the corners.
I1a numeric vector for the maximum deviation of the line into the piece
O1a numeric vector for the maximum deviation of the line out of the piece.
L2a numeric vector for the length of the line between the corners
I2a numeric vector for the maximum deviation of the line into the piece
O2a numeric vector for the maximum deviation of the line out of the piece.
L3a numeric vector for the length of the line between the corners.
I3a numeric vector for the maximum deviation of the line into the piece
O3a numeric vector for the maximum deviation of the line out of the piece.
L4a numeric vector for the length of the line between the corners.
I4a numeric vector for the maximum deviation of the line into the piece
O4a numeric vector for the maximum deviation of the line out of the piece.
Details
A jigsaw puzzle comprises 20 pieces, arranged in a regular array and numbered as follows:
| 1 | 2 | 3 | 4 |
| 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 |
Each piece is roughly rectangular. The corners of the piece are called its vertices, and the sides are called its edges. The four edges of each piece are numbered consecutively, starting from the top and moving clockwise.
For each piece, three measurements were made on each of the four edges, estimating the length of the side, and the amount by which the edge cuts into or juts out of the line joining the two vertices on that side. The measurements are in hundredths of an inch.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(jigsaw.puzzle.measures)
Hartigan (1975) Languages Spoken in Europe
Description
The table presents the percentage of the population who claimed to speak a language well enough to be understood. This is Table 15.10 in Chapter 15 of Hartigan (1975) on page 290.
Usage
data(languages.spoken.europe)
Format
A data frame with 16 observations on the following 13 variables.
countrya character vector for the country
finnisha numeric vector for speakers of Finnish
swedisha numeric vector for speakers of Swedish
danisha numeric vector for speakers of Danish
norwegiana numeric vector for speakers of Norwegian
englisha numeric vector for speakers of English
germana numeric vector for speakers of German
dutcha numeric vector for speakers of Dutch
flemisha numeric vector for speakers of Flemish
frencha numeric vector for speakers of French
italiana numeric vector for speakers of Italian
spanisha numeric vector for speakers of Spanish
portuguesea numeric vector for speakers of Portuguese
Details
Hartigan suggests the use of direct joining for this data set.
Source
A Survey of Europe Today, The Readers' Digest Association Ltd, London.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(languages.spoken.europe)
Hartigan (1975) Mortality Rates from Leukemia AMong Children
Description
The table contains the mortality rates from Leukemia recorded per million children between the ages of 0 to 14 and between 1956 and 1967. This is Table 18.1 in Chapter 15 of Hartigan (1975) on page 334.
Usage
data(leukemia.youth.mortality.1956.1957)
Format
A data frame with 18 observations on the following 13 variables.
countrya character vector for the country name
y.1956a numeric vector for the mortality rates in 1956
y.1957a numeric vector for the mortality rates in 1957
y.1958a numeric vector for the mortality rates in 1958
y.1959a numeric vector for the mortality rates in 1959
y.1960a numeric vector for the mortality rates in 1960
y.1961a numeric vector for the mortality rates in 1961
y.1962a numeric vector for the mortality rates in 1962
y.1963a numeric vector for the mortality rates in 1963
y.1964a numeric vector for the mortality rates in 1964
y.1965a numeric vector for the mortality rates in 1965
y.1966a numeric vector for the mortality rates in 1966
y.1967a numeric vector for the mortality rates in 1967
Details
Hartigan suggests using the adding algorithm on this data set to make a prediction.
Source
Spier (1972). Relationship between age of death to calendar yar of estimated maximum leukemia mortality rate, HSMHA Health Report, 87, 61 - 70.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(leukemia.youth.mortality.1956.1967)
Hartigan (1975) Expectations of Life by Country, Age and Sex
Description
A table with remaining life expectancies for males and females of sampled ages. This is Table 4.10 in Chapter 14 of Hartigan (1975) on page 101.
Usage
data(life.expectancy.1971)
Format
A data frame with 31 observations on the following 10 variables.
countrya character vector for the country
yeara numeric vector for the year in in which the data were computed
m0a numeric vector for the remaining life expectancies for a male of age 0
m25a numeric vector for the remaining life expectancies for a male of age 25
m50a numeric vector for the remaining life expectancies for a male of age 50
m75a numeric vector for the remaining life expectancies for a male of age 75
f0a numeric vector for the remaining life expectancies for a female of age 0
f25a numeric vector for the remaining life expectancies for a female of age 25
f50a character vector for the remaining life expectancies for a female of age 50
f75a numeric vector for the remaining life expectancies for a female of age 75
Details
None.
Source
Keylitz, N. and Flieger, W. (1971), Population, Freeman.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(life.expectancy.1971)
Hartigan (1975) Expectation of Life in Various Cities by Age and Sex
Description
Table defines life expectancy by attained age and sex in various cities in the specified years. This is Table 10.3 in Chapter 10 of Hartigan (1975) on page 182.
Usage
data(life.expectancy.age.sex.1971)
Format
A data frame with 16 observations on the following 10 variables.
citya character vector for the city
yeara numeric vector for the year of census
m00a numeric vector for the male expectancy with attained age 0
m25a numeric vector for the male expectancy with attained age 25
m50a numeric vector for the male expectancy with attained age 50
m75a numeric vector for the male expectancy with attained age 75
f00a numeric vector for the female expectancy with attained age 0
f25a numeric vector for the female expectancy with attained age 25
f50a numeric vector for the female expectancy with attained age 50
f75a numeric vector for the female expectancy with attained age 75
Details
This data set can be applied to the triads-leader algorithm.
Source
Keyfitz, N. and Flieger, W. (1971) Population, Freeman, San Francisco.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(life.expectancy.age.sex.1971)
Hartigan (1975) Relatedness Values of Selected Words
Description
Frequencies with which a pair is judged more highly related than other pairs, over many triads and subjects. This is Table 10.4 in Chapter 10 of Hartigan (1975) on page 184.
Usage
data(linguistic.relatedness)
Format
A data frame with 6 observations on the following 7 variables.
worda character vector for the
thea numeric vector for the frequency with which words are related to 'the'
boya numeric vector for the frequency with which words are related to 'boy'
hasa numeric vector for the frequency with which words are related to 'has'
losta numeric vector for the frequency with which words are related to 'lost'
aa numeric vector for the frequency with which words are related to 'a'
dollara numeric vector for the frequency with which words are related to 'dollar'
Details
This is an unusual data set to be used with the triads-leader algorithm.
Source
Levelt, W. J. M (1967). Psychological representations of syntactic structures, in The Structure and Psychology of Language, T. G. Bever and W. Weksel, eds, Holt, Rinehart and Winston, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(linguistic.relatedness)
Hartigan (1975) Dentition of Animals
Description
The table contains for each animal the number of teeth in each major grouping. This is Table 9.1 in Chapter 9 of Hartigan (1975) on page 170.
Usage
data(mammal.dentition)
Format
A data frame with 66 observations on the following 9 variables.
namea character vector for the name of the animal
top.ia numeric vector for the number of top incisors
bottom.ia numeric vector for the number of bottom incisors
top.ca numeric vector for the number of top canines
bottom.ca numeric vector for the number of bottom canines
top.pma numeric vector for the the number of top premolars
bottom.pma numeric vector for the number of bottom premolars
top.ma numeric vector for the number of top molars
bottom.ma numeric vector for the number of bottom molars
Details
Hartigan uses this table to illustrate a tree-leader algorithm.
Source
Palmer, E. I. (1957). Fieldbook of Mammals , Dutton, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(mammal.dentition)
Hartigan (1975) Minor Planets
Description
Some minor planets may have been sighted more than once. In the data frame, sightings thought to be of the same planet are listed together. This is Table 1.1 in the Introduction of Hartigan (1975) on page 2.
Usage
data(minor.planets.1961)
Format
A data frame with 19 observations on the following 4 variables.
namea character vector for the year of sighting and astronomer initials
nodea numeric vector for the angle in degrees in the earth plane at which the minor planet crosses the earth's orbit
inclinationa numeric vector for the angle in degrees between the plane of the earth's orbit and the plane of the planet's orbit
axisa numeric vector for the maximum distance of the minor planet from the sun in astronomical units
Details
None.
Source
Elements of Minor Planets (1961), University of Cincinnati Observatory
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(minor.planets.1961)
Hartigan (1975) Mutation Distances
Description
The table contains mutation distance between pairs of species. This is Table 11.12 in Chapter of Hartigan (1975) on page 209.
Usage
data(mutation.distances.1967)
Format
A data frame with 20 observations on the following 22 variables.
codea character vector for specifies identifier
speciesa character vector fir the species name
s.1a numeric vector for distance to species 1
s.2a numeric vector for distance to species 2
s.3a numeric vector for distance to species 3
s.4a numeric vector for distance to species 4
s.5a numeric vector for distance to species 5
s.6a numeric vector for distance to species 6
s.7a numeric vector for distance to species 7
s.8a numeric vector for distance to species 8
s.9a numeric vector for distance to species 9
s.10a numeric vector for distance to species 10
s.11a numeric vector for distance to species 11
s.12a numeric vector for distance to species 12
s.13a numeric vector for distance to species 13
s.14a numeric vector for distance to species 14
s.15a numeric vector for distance to species 15
s.16a numeric vector for distance to species 16
s.17a numeric vector for distance to species 17
s.18a numeric vector for distance to species 18
s.19a numeric vector for distance to species 19
s.20a numeric vector for distance to species 20
Details
The distance is defined by the number of positions in the protein molecule ccytochrome-c where the two species have differnt amino acides. Hartigan uses the single-linkage algorithm on this dat set.
Source
Fitch and Margoliash (1967) Science
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(mutation.distances.1967)
Hartigan (1975) Nails and Screws
Description
The table contains the attributes for a sample of nails and screws. This is Table 12.7 in Chapter 12 of Hartigan (1975) on page 228.
Usage
data(nails.screws)
Format
A data frame with 24 observations on the following 7 variables.
namea character vector for the name of the object
threadeda factor for the presence of threads with levels
NYheada factor for the type of head with levels
FORUYindentationa factor for the head indentation with levels
LNTbottoma factor for the type of bottom with levels
FSlengtha numeric vector for the length in half inches
brassa factor that determines if the object is made of brass with levels
NY
Details
All the attributes, with the exception of length, are factors. The factor values for the threaded variable are as follows.
| Y | yes |
| N | no |
The factor values for the head variable are as follows.
| F | flat |
| U | cut |
| O | cone |
| R | round |
| Y | cylinder |
The factor values for the head indentation variable are as follows.
| N | none |
| T | star |
| L | slit |
The value values for the brass variable are as follows
| Y | yes |
| N | no |
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(nails.screws)
Hartigan (1975) Achievement Test Schores, New Haven Schools
Description
The measurements are in years and months of national averages. There are ten months in the school year. At the beginning of fourth grades, the national average score is 4.0. This is Table 5.1 in Chapter 5 of Hartigan (1975) on page 118.
Usage
data(new.haven.school.scores)
Format
A data frame with 25 observations on the following 5 variables.
schoola character vector for the name of the school
reading.4a numeric vector for the reading scores for fourth grade
arithmetic.4a numeric vector for the arithmetic scores for fourth grade
reading.6a numeric vector for for the reading scores for sixth grade
arithmetic.6a numeric vector for the arithmetic scores for sixth grade
Details
None.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(new.haven.school.scores)
Hartigan (1975) Nutrients in Meat, Fish and Fowl
Description
A table with the nutrient levels in meat, fish and fowl. Nutrient levels were measured in a 3 ounce portion of various foods. This is Table 4.1 in Chapter 4 of Hartigan (1975) on page 86.
Usage
data(nutrients.meat.fish.fowl.1959)
Format
A data frame with 27 observations on the following 6 variables.
namea character vector for the food
energya numeric vector for the number of calories
proteina numeric vector for the amount of protein in grams
fata numeric vector for the amount of fat in grams
calciuma numeric vector for the amount of calcium in milligrams
irona numeric vector for the amount of iron in milligrams
Details
None.
Source
The Yearbook of Agriculture (1959), The United States Department of Agriculture, Washington, DC.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(nutrients.meat.fish.fowl.1959)
Hartigan (1975) Ohio Croplands
Description
The table presents the precentage of cropland devoted to various crops in Ohio counties. This is Table 15.7 in Chapter 15 of Hartigan( 1975) on page 287.
Usage
data(ohiio.croplands.1949)
Format
A data frame with 15 observations on the following 8 variables.
countya character vector for the county
corna numeric vector for the percentage of cropland devoted to corn
mixeda numeric vector for the percentage of cropland devoted to mixed crop
wheata numeric vector for the percentage of cropland devoted to wheat
oatsa numeric vector for the percentage of cropland devoted to oats
barleya numeric vector for the percentage of cropland devoted to varley
soya numeric vector for the percentage of cropland devoted to soy
haya numeric vector for the percentage of cropland devoted to hay
Details
Hartigan suggest the use of direct joining with this data set.
Source
U.S. Census of Agriculture, 1949.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(ohio.croplands.1949)
Hartigan (1975) Olympic Track 1896 to 1964
Description
Olympic track times, in tenths of a second, were recorded orver the years. This is Table 6.1 in Chapter 6 of Hartigan (1975) on page 131.
Usage
data(olympic.track.1896.1964)
Format
A data frame with 16 observations on the following 8 variables.
yeara character vector for the year
t.100ma numeric vector for the winning time in the 100 m
t.200ma numeric vector for the winning time in the 200 m
t.400ma numeric vector for the winning time in the 400 m
t.800ma numeric vector for the winning time in the 800 m
t.1500ma numeric vector for the winning time in the 1500 m
t.5000ma numeric vector for the winning time in the 5000 m
t.10000ma numeric vector for the winning time in the 10000 m
Details
None.
Source
The World Almanac (1966), New York World-Telegram, New York,
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(olympic.track.1896.1964)
Hartigan (1975) Correlation Between Physical Measuresments
Description
The table contains the correlations between various body parts. This is Table 17.1 in Chapter 17 of Hartigan (1975) on page 314.
Usage
data(physical.measure.correlations)
Format
A data frame with 7 observations on the following 7 variables.
hla numeric vector for the correlations with head length
hba numeric vector for the correlations with head breadth
fba numeric vector for the correlations with face breadth
fta numeric vector for the correlations with foot
fma numeric vector for the correlations with forearm
hta numeric vector for the correlations with height
fla numeric vector for the correlations with finger length
Details
Hartigan suggests performing factor analysis on this data set to determine
the minimum number of principal components. In addition, a joining algorithm
can be performed on the data set. Note that the data frame has the variable
names as row names. It can be used directly by the eigen function.
Source
Pearson, K. (1901). On lines and planes of closest fit to points in space. Philosophical Magazine, 559 - 572.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(physical.measure.correlations)
Hartigan (1975) Planets and Moons
Description
From astonomical knowledge of 1970, a table of planetary moons was compiled. This is the bottom portion of Table 5.5 in Chapter 5 of Hartigan (1975) on page 122.
Usage
data(planet.earth.distances.1970)
Format
A data frame with 8 observations on the following 5 variables.
namea character vector for the name of the planet
distancea numeric vector for its distance from the sun in thousands of miles
diametera numeric vector for its diameter in miles
perioda numeric vector for the period of its orbit in hours
massa numeric vector for the mass, relative to the earth
Details
None.
Source
Moore, P. (1970). The Atlas of the Universe, Rand McNally, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(planet.earth.distances.1970)
Hartigan (1975) Planets and Moons
Description
From astonomical knowledge of 1970, a table of planetary moons was compiled. This is the top portion of Table 5.5 in Chapter 5 of Hartigan (1975) on page 122.
Usage
data(planets.moons.1970)
Format
A data frame with 31 observations on the following 4 variables.
planet.moona character vector for the planet and the number of the moon
distancea numeric vector for the distance in thousands of miles between the moon and the planet
diametera numeric vector for the diameter in miles of the moon
perioda numeric vector for the period, in days, of the orbit of the moon about the plane
Details
None.
Source
Moore, P. (1970). The Atlas of the Universe, Rand McNally, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(planets.moons.1970)
Hartigan (1975) Portable Typewriters
Description
The table contains the features in a collection of portable typewriters. This is Table 10.5 in Chapter 10 of Hartigan (1975) on page 186.
Usage
data(portable.typewriters)
Format
A data frame with 20 observations on the following 21 variables.
modela character vector for the typewriter model
HTa numeric vector for the height in inches
WHa numeric vector for the width in inches
DHa numeric vector for the depth in inches
WTa numeric vector for the weight in pounds
PLa numeric vector for the platen length
KSa numeric vector for the number of keys
PEa factor for the pica or elite type with levels
1TAa factor for the availability of tabulator with levels
01TPa factor for the availability of touch pressure control with levels
01PRa factor for the availability of platen release with levels
01HHa factor for the availability of horizontal half spacing with levels
01VHa factor for the availability of vertical half spacing with levels
01PIa factor for the availability of page end indicator with levels
01PGa factor for the availability of paper guide with levels
01PBa factor for the availability of paper bail with levels
01PSa factor for the availability of paper support with levels
01EPa factor for the availability of erasure plate with levels
01TCa factor for the availability of two carriage re;eases with levels
01MRa factor for the availability of margin release with levels
01CLa factor for the availability of carriage lock with levels
01
Details
Hartigan suggests that the triads algorithm be used with this data set. The factor variables are binary variables. If the value is 1, then the associated feature is available. If the value is 0, then the associated feature is not available.
Source
Consumers' Reports Buying Guide (1967), Consumers' Union, Mount Vernon, NY.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(portable.typewriters)
Hartigan (1975) Nutrients in Meat, Fish and Fowl Percent RDA
Description
A table with the nutrient levels in meat, fish and fowl. Nutrient levels were measured in a 3 ounce portion of various foods. Values are percentages of recommendated daily allowances. This is Table 4.2 in Chapter 4 of Hartigan (1975) on page 87.
Usage
data(rda.meat.fish.fowl.1959)
Format
A data frame with 27 observations on the following 6 variables.
namea character vector for the food
energya numeric vector for the number of calorie
proteina numeric vector for the amount of protein
fata numeric vector for the amount of fat
calciuma numeric vector for the amount of calcium
irona numeric vector for the amount of iron
Details
None.
Source
The Yearbook of Agriculture (1959), The United States Department of Agriculture, Washington, DC.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(rda.meat.fish.fowl.1959)
Hartigan (1975) Mammals Milk
Description
Selected animals have been clustered by similarity of percentage constituents in milk. This is Table 1.2 in the Introduction of Hartigan (1975) on page 6.
Usage
data(sample.mammals.milk.1956)
Format
A data frame with 16 observations on the following 5 variables.
namea character vector for the name of the animals
watera numeric vector for the water content in the milk sample
proteina numeric vector for the amount of protein in the milk sample
fata numeric vector for the fat content in the milk sample
lactosea numeric vector for the amount of lactose in the milk sample
Details
None
Source
Spector, W. S. (1956). Handbook of Biological Data, Saunders, London
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(sample.mammals.milk.1956)
Hartigan (1975) Yield of Stocks
Description
The table contains the dividend by average price for each year and for a sample of stocks. This is Table 11.13 in Chapter 11 of Hartigan (1975) on page 210.
Usage
data(sample.stock.yields.1959.1969)
Format
A data frame with 34 observations on the following 12 variables.
stocka character vector for the company name
y.1959a numeric vector for the dividend yield in 1959
y.1960a numeric vector for the dividend yield in 1960
y.1961a numeric vector for the dividend yield in 1961
y.1962a numeric vector for the dividend yield in 1962
y.1963a numeric vector for the dividend yield in 1963
y.1964a numeric vector for the dividend yield in 1964
y.1965a numeric vector for the dividend yield in 1965
y.1966a numeric vector for the dividend yield in 1966
y.1967a numeric vector for the dividend yield in 1967
y.1968a numeric vector for the dividend yield in 1968
y.1969a numeric vector for the dividend yield in 1969
Details
Hartigan proposes applying the single linkage algorithm to this data set.
Source
Moody's Handbook of Common Stocks/
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(sample.stock.yields.1959.1969)
Hartigan (1975) City Crime
Description
A list of cities and the number of crimes per 100,000 population, as of 1970. This is Table 1.1 in Chapter 1 of Hartigan (1975) on page 28.
Usage
data(sample.us.city.crime.1970)
Format
A data frame with 16 observations on the following 8 variables.
citya character vector for the names of the cities
murdera numeric vector for the murder rates
rapea numeric vector for the rape rates
robberya numeric vector for the robbery rates
assaulta numeric vector for the assault rates
burglarya numeric vector for the burglary rates
larcenya numeric vector for the larceny rates
autoa numeric vector for the auto crime rates
Details
None.
Source
United Sates Statistical Abstracts (1970).
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(sample.us.city.crime.1970)
Hartigan (1975) Student Questionnaire
Description
The table contains student responses to a questionnaire about a data analysis course. This is Table 12.4 in Chapter 12 of Hartigan (1975) on page 224.
Usage
data(student.questionnaire)
Format
A data frame with 31 observations on the following 10 variables.
questiona numeric vector for the question number
texta character vector for the question text
s.1a numeric vector for the response from student 1
s.2a numeric vector for the response from student 2
s.3a numeric vector for the response from student 3
s.4a numeric vector for the response from student 4
s.5a numeric vector for the response from student 5
s.6a numeric vector for the response from student 6
s.7a numeric vector for the response from student 7
s.8a numeric vector for the response from student 8
Details
Student responses to the questionnaires are evaluated using the following scores.
| 1 | strongly disagree |
| 2 | disagree |
| 3 | neutral |
| 4 | agree |
| 5 | strongly agree |
Hartigan applies the adding algorithm to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(student.questionnaire)
Hartigan (1975) Selected Votes in the United Nations
Description
The table contains the votes for selected propositions by country in the United Nations between 1969 and 1970. This is Table 16.5 in Chapter 16 of Hartigan (1975) on page 306.
Usage
data(un.votes.1969.1970)
Format
A data frame with 23 observations on the following 11 variables.
countrya character vector for the country name
p.1a factor for proposition 1 with levels
ANYp.2a factor for proposition 2 with levels
ANYp.3a factor for proposition 3 with levels
ANYp.4a factor for proposition 4 with levels
ANYp.5a factor for proposition 5 with levels
ANYp.6a factor for proposition 6 with levels
ANYp.7a factor for proposition 7 with levels
ANYp.8a factor for proposition 8 with levels
ANYp.9a factor for proposition 9 with levels
ANYp.10a factor for proposition 10 with levels
ANY
Details
The propositions that were voted on were as follows.
| p.1 | to adopt USSR proposal to delete item on Korean unification |
| p.2 | to call upon the UK to use force against Rhodesia |
| p.3 | to declare the China admission question an important question |
| p.4 | to recognize mainland China and expel Formosa |
| p.5 | to make a study commission on China admission important |
| p.6 | to forma a study comssion on Portuguese colonialism |
| p.7 | convention on no statutory limit on ware crimes |
| p.8 | condemn Portuguese colonialism |
| p.9 | to defer consideration of South Africa expulsion |
| p.10 | South Africa expulsion is important question |
The factor levels are the outcomes for the proposition. Y implies yes, N is no and A is abstain..
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(un.votes.1969.1970)
Hartigan (1975) Frequency of Car Repairs
Description
The table contains the frequency of car repairs in 1969. Plus means above average. Minus means below average. This is Chapter 9 Table 9.4 in Chapter 9 of Hartigan (1975) on page 174.
Usage
data(us.car.repair.1969)
Format
A data frame with 33 observations on the following 14 variables.
modela character vector for the model of the vehicle
BRa factor for break system with levels
-+FUa factor for fuel system with levels
-+ELa factor for electrical with levels
-+EXa factor for exhaust with levels
-+STa factor for steering with levels
-+EMa factor for engine, mechanical with levels
-+RSa factor for rattles and squeeks with levels
-+RAa factor for real axle with levels
-+RUa factor for rust with levels
-+SAa factor for shock absorbers with levels
-+TCa factor for transmission, clutch with levels
-+WAa factor for wheel alignment with levels
-+OTa factor for other with levels
-+
Details
This table is used to illustrate the tree-leader algorithm.
Source
Consumer Reports Buying Guide (1969)
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.car.repair.1969)
Hartigan (1975) Civil War Battles in Chronological Order
Description
This table contains the Union and Confederate forces and numbers shot This is Table 5.4 in Chapter 5 Hartigan (1975) on page 121.
Usage
data(us.civil.war.battles)
Format
A data frame with 46 observations on the following 5 variables.
battlea character vector for the battle names
union.forcesa numeric vector for the Union forces deployed
union.shota numeric vector for the Union soldiers shot
confederate.forcesa numeric vector for the Confederate forces deplayed
confederate.shota numeric vector for the Confederate soldiers shot
Details
The data are in chronological order.
Source
Livermore, T L. (1957). Numbers and Losses in the Civial War, Indiana University Press, Bloomington.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.civil.war.battles)
Hartigan (1975) Congressman b y Bills
Description
The table contains the behavior of various bill sponsors in the 90th Congress. This is Table 13.7 in Chapter 13 of Hartigan (1975) on page 242.
Usage
data(us.congressional.bills)
Format
A data frame with 17 observations on the following 16 variables.
sponsora character vector for the congressman sponsor
b.1a factor for the congressman behavior for bill 1 with levels
1578b.2a factor for the congressman behavior for bill 2 with levels
1567b.3a factor for the congressman behavior for bill 3 with levels
1567b.4a factor for the congressman behavior for bill4 with levels
17b.5a factor for the congressman behavior for bill 5 with levels
167b.6a factor for the congressman behavior for bill 6 with levels
167b.7a factor for the congressman behavior for bill 7 with levels
167b.8a factor for the congressman behavior for bill 8 with levels
167b.9a factor for the congressman behavior for bill 9 with levels
169b.10a factor for the congressman behavior for bill 10 with levels
169b.11a factor for the congressman behavior for bill 11 with levels
169b.12a factor for the congressman behavior for bill 12 with levels
169b.13a factor for the congressman behavior for bill 13 with levels
169b.14a factor for the congressman behavior for bill 14 with levels
169b.15a factor for the congressman behavior for bill 15 with levels
169
Details
The bills, sponsoring congressmen and bill titles are as follows.
| b.1 | Aspinall | Authorize Biscayne National Monument in Florida |
| b.2 | Perkins | Promote health and safety in building trades |
| b.3 | Patman | Sr extend 2 years auth. reg. interest and dividend rates |
| b.4 | Dingell | Rel Dev fish protein concentrate |
| b.5 | Perkins | Establish commission on Negro history and culture |
| b.6 | Aspinall | Designate parts of Morris City, NJ, as wilderness |
| b.7 | Udall | Provide overtime and standby pay for transportation department |
| b.8 | Edwards | Amend bill for relief of sundry claimants |
| b.9 | Gross | Amend omnibus claims bill |
| b.10 | Gross | Strike title 8 of omnibus claims bill |
| b.11 | Hall | Strike title 9 of omnibus claims bill |
| b.12 | Gross | Strike title 10 of omnibus claims bill |
| b.13 | Hall | Strike title 11 of omnibus claims bill |
| b.14 | Talcott | Strike title 14 of omnibus claims bill |
| b.15 | Poage | Take FD and AG ACT AMD SPKRS TBLE AGREE S CONF |
The behavior is represented by a factor with the following values
| 1 | yes |
| 2 | pair yes |
| 3 | announced yes |
| 4 | announced no |
| 5 | pair no |
| 6 | no |
| 7 | general pair |
| 8 | abstain |
| 9 | absent |
| 0 | sponsor absent |
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.congressional.bills)
Hartigan (1975) Cost and Nutrient Contribution for Selected Foods
Description
The table contains the cost and nutrient content, in percent daily allowance, of various foods reported in 1959. This is Table 8.5 in Cja[ter 8 of Hartigan (1975) on page 160.
Usage
data(us.food.cost.nutrients.1959)
Format
A data frame with 10 observations on the following 8 variables.
fooda character vector for the food name
costa numeric vector for the cost of serving in U.S. cents
sizea character vector for for the portion size
proteina numeric vector for % recommended daily allowance of protein
irona numeric vector for for % recommended daily allowance of iron
thiaminea numeric vector for for % recommended daily allowance of thiamine
riboflavina numeric vector for for % recommended daily allowance of riboflavin
niacina numeric vector for for % recommended daily allowance of niacin
Details
The table is used to construst trees and distances as described in Hartigan (1975).
Source
Yearbook of Agriculture (1959).
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.food.cost.nutrients.1959)
Hartigan (1975) Links Between States
Description
The table defines the neighbors for each state. This is Table 11.10 in Chapter 11 of Hartigan (1975) on page 207.
Usage
data(us.links.between.states)
Format
A data frame with 50 observations on the following 11 variables.
codea character vector for the state code
namea character vector for the state name
neighborsa numeric vector for the number of neighboring states
n.1a character vector for the first neighbor
n.2a character vector for the second neighbor
n.3a character vector for the third neighbor
n.4a character vector for the fourth neighbor
n.5a character vector for the fifth neighbor
n.6a character vector for the sixth neighbor
n.7a character vector for the seventh neighbor
n.8a character vector for the eighth neighbor
Details
Hartigan combines this data set with the per capita data set in Table 11.9 and applies the single linkage algorithm.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.links.between.states)
Hartigan (1975) U.S. Per Capita Income in Dollars 1964
Description
The table contains the per capita income in the United Sates in 1964. This us Table 11.9 in Chapter 11 of Hartigan (1975) on page 206
Usage
data(us.per.capita.income.1964)
Format
A data frame with 50 observations on the following 3 variables.
codea character vector for the state codes
namea character vector for the state names
incomea numeric vector for the income per capita
Details
Hartigan applies density contour trees and single linkage clustering to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.per.capita.income.1964)
Hartigan (1975) Republican Vote for President
Description
The table contains the Republican percentage of the Presidential vote over 18 elections and for sourthern states. This is Table 14.1 in Chapter 14 of Hartigan (1975) on page 252.
Usage
data(us.president.vote.1900.1968)
Format
A data frame with 16 observations on the following 20 variables.
codea character vector for the state code
statea character vector for the state name
y.1900a numeric vector for the Republican percentage in 1900
y.1904a numeric vector for the Republican percentage in 1904
y.1908a numeric vector for the Republican percentage in 1908
y.1912a numeric vector for the Republican percentage in 1912
y.1916a numeric vector for the Republican percentage in 1916
y.1920a numeric vector for the Republican percentage in 1920
y.1924a numeric vector for the Republican percentage in 1924
y.1928a numeric vector for the Republican percentage in 1928
y.1932a numeric vector for the Republican percentage in 1932
y.1936a numeric vector for the Republican percentage in 1936
y.1940a numeric vector for the Republican percentage in 1940
y.1944a numeric vector for the Republican percentage in 1944
y.1948a numeric vector for the Republican percentage in 1948
y.1952a numeric vector for the Republican percentage in 1952
y.1956a numeric vector for the Republican percentage in 1956
y.1960a numeric vector for the Republican percentage in 1960
y.1964a numeric vector for the Republican percentage in 1964
y.1968a numeric vector for the Republican percentage in 1968
Details
Hartigan suggests that the direct splitting algorithm is applied to this data set.
Source
Peterson, S. (1969). A Statistical History of the American Presidential Elections, Ungar, New York
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.president.vote.1900.1968)
Hartigan (1975) Profitability of U.S. Economic Sectors
Description
The table contains the profit as a percentage of stockholder's equity for various economc sectors for the years 1959 through 1968. This is Table 14.12 in Chapter 14 of Hartigan (1975) on page 266.
Usage
data(us.sector.profitability.1959.1968)
Format
A data frame with 24 observations on the following 12 variables.
codea character vector for the sector code
sectora character vector for the sector name
y.1959a numeric vector for the profits in year 1959
y.1960a numeric vector for the profits in year 1960
y.1961a numeric vector for the profits in year 1961
y.1962a numeric vector for the profits in year 1962
y.1963a numeric vector for the profits in year 1963
y.1964a numeric vector for the profits in year 1964
y.1965a numeric vector for the profits in year 1965
y.1966a numeric vector for the profits in year 1966
y.1967a numeric vector for the profits in year 1967
y.1968a numeric vector for the profits in year 1968
Details
Hartigan suggests that the direct splitting algorithm be applied to this data set.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.sector.profitability.1959.1968)
Hartigan (1975) Demographic Data for the South
Description
A table of demographic information for southern states for the period 1960 to 1965. This is Table 2.2 in Chapter 2 of Hartigan (1975) on page 59.
Usage
data(us.south.demographics.1965)
Format
A data frame with 16 observations on the following 24 variables.
statea character vector for an abbreviation for the states
mean.altitudea numeric vector for the mean altitude above sea level, in tens of feet
mean.temperaturea numeric vector for the mean annual temperature, in degrees Fahrenheit
mean.precipitationa numeric vector for the mean annual precipitation, in inches
population.densitya numeric vector for the number of persons per square mile.
african.americansa numeric vector for the percentage of African-Americans
median.agea numeric vector for the median age in years
urban.populationa numeric vector for the percentage urban population
birthsa numeric vector for the number of births per 1000 population
rural.populationa numeric vector for the percentage rural farm population
manufacturing.employmenta numeric vector for the percentage of employment in manufacturing
automobilesa numeric vector for the number of automobiles per 100 population
telephonesa numeric vector for the number of telephones per 100 population
incomea numeric vector for the average income in hundreds of dollars
federal.revenuea numeric vector for the federal revenue per 100 dollars of state and local revenue
lawyersa numeric vector for the number of lawyers per 100,000 population
doctorsa character vector for the number of doctors per 100,000 population
white.infant.mortalitya numeric vector for the white infant mortality per 1000 births
school.yearsa numeric vector for the school years completed, in tenths of a year
education.expensea numeric vector for the education expenditure per pupil in tens of dollars
sound.plumbinga numeric vector for the percentage of houses with sound plumbing.
gop.1960.presidenta numeric vector for the percentage Republican vote in the 1960 presidential election
gop.1964.presidenta numeric vector for the percentage Republican vote in the 1964 presidential election
gop.1962.1964.governora numeric vector for the percentage Republican vote in the 1962/1964 governor elections
Details
None.
Source
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(us.south.demographics.1965)
Hartigan (1975) Vervet Sleeping Groups
Description
The table defines vervet sleeping groups measured over a set of dates. This is Table 7.5 in Chapter 7 of Hartigan (1975) on page 149.
Usage
data(vervet.sleeping.groups)
Format
A data frame with 22 observations on the following 18 variables.
datea character vector for the date in yy/mm/dd format
Ia factor for adult males with levels
ABCDEIIa factor for older adult males with levels
ABCDIIIa factor for adult males with levels
ABCDIVa factor for adult females with levels
ABCDEFVa factor for juvenile males with levels
ABCDFVIa factor for adult females with levels
ABCDEVIIa factor for young juvenile females with levels
ABCDEVIIIa factor for young juvenile females with levels
ABCDEIXa factor for young juvenile females with levels
ABCDEXa factor for juvenile females with levels
ABCDEFGXIa factor for subadult females with levels
ABCDEXIIa factor for adult females with levels
ABCDEXIIIa factor with levels
ABCDEFXIVa factor for invant male, son of IV with levels
ABCDEFXVa factor for infant male, son of XII with levels
ABCDEFXVIa factor for infant female from IV with levels
ABCDEXVIIa factor with levels
ABCDE
Details
Hartigan suggests using this data set to test the ditto algorithm.
Source
Struhsaker, T. T. (1967). Behavior of servet monkeys and other cercopithecines, Science 156, 1197 - 1203.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(vervet.sleeping.groups)
Hartigan (1975) Evaluation of Wines
Description
The table contains the evaluations of various wines from 1961 to 1970. This is Table 7.1 in Chapter 7 of Hartigan (1975) on page 144.
Usage
data(wine.evaluation.1961.1970)
Format
A data frame with 15 observations on the following 12 variables.
codea character vector
namea character vector
r.61a factor with levels
AEGr.62a factor with levels
AGPr.63a factor with levels
ADPr.64a factor with levels
DEGPr.65a factor with levels
ADGPr.66a factor with levels
AGr.67a factor with levels
AGr.68a factor with levels
ADGPr.69a factor with levels
AGr.70a factor with levels
G
Details
Hartigan uses this data set to illustrate the ditto algorithm.
Source
Gourmet Magazine (August 1971) pp 30-33.
SPAETH2 Cluster Analysis Datasets http://people.sc.fsu.edu/~jburkardt/datasets/spaeth2/spaeth2.html
References
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Examples
data(wine.evaluation.1961.1970)