The clustering_method used
Mandatory Code no
Data Type Code ucode
|| (CA) uses Chi-squared distance This is superior
because it ignores differences in exposure between
images, eliminating the need to rescale between images.
|Principal Component Analysis
|| (PCA) computes the distance between data vectors
with Euclidean distances.
|| A disadvantage of the K-means method is that the
final grouping is very dependent of what seeds are
initially chosen. Diday surpassed this by appplying
the K-means technique multiple times with different
seeds. Then, cross-tabuluating the results, and
using only the clusters that were repeatedly formed.
|Automatic Clustering and Hierarchical Ascendant Classifications (HAC)
|| HAC it uses only Ward's criterion. Ward's criterion
states that merging HAC clusters should be focused
on minimizing the added interclass variance. The two
clusters that differ the least between each other
will be merged and create a new group, one "level" higher.
|| K-Means is a method of clustering that devides the data
into a user defined number of groups. Two random images
"seeds" are chosen, and their centers of gravity are
computed. A partition is drawn down the middle between
the centers, the new centers of gravity are computed,
and the process is repeated for a given number of times.
The final result is VERY dependent on which image seeds
are the first chosen. Because our faces data set is
manufactured. We know exactly which images are identical,
except the random noise, and the exact number of groups.
The output discussed was obtained with 8 classes, using
factors 1-3, and an even factor weight of 1.0 between
those three factors.