- Item name
- _em_classes.clustering_method
- Category name
- em_classes
- Attribute name
- clustering_method
- Required in PDB entries
- no
- Used in currrent PDB entries
- No

Allowed Value | Details |
---|---|

Automatic Clustering and Hierarchical Ascendant Classifications (HAC) | HAC it uses only Ward's criterion. Ward's criterion states that merging HAC clusters should be focused on minimizing the added interclass variance. The two clusters that differ the least between each other will be merged and create a new group, one "level" higher. |

Correspondence Analysis | (CA) uses Chi-squared distance This is superior because it ignores differences in exposure between images, eliminating the need to rescale between images. |

Didays method | A disadvantage of the K-means method is that the final grouping is very dependent of what seeds are initially chosen. Diday surpassed this by appplying the K-means technique multiple times with different seeds. Then, cross-tabuluating the results, and using only the clusters that were repeatedly formed. |

K-Means Clustering | K-Means is a method of clustering that devides the data into a user defined number of groups. Two random images "seeds" are chosen, and their centers of gravity are computed. A partition is drawn down the middle between the centers, the new centers of gravity are computed, and the process is repeated for a given number of times. The final result is VERY dependent on which image seeds are the first chosen. Because our faces data set is manufactured. We know exactly which images are identical, except the random noise, and the exact number of groups. The output discussed was obtained with 8 classes, using factors 1-3, and an even factor weight of 1.0 between those three factors. |

Principal Component Analysis | (PCA) computes the distance between data vectors with Euclidean distances. |

Wards method | |

average linkage | |

centroid method | |

complete linkage | |

single linkage |

Allowed Value | Details |
---|---|

Automatic Clustering and Hierarchical Ascendant Classifications (HAC) | HAC it uses only Ward's criterion. Ward's criterion states that merging HAC clusters should be focused on minimizing the added interclass variance. The two clusters that differ the least between each other will be merged and create a new group, one "level" higher. |

Correspondence Analysis | (CA) uses Chi-squared distance This is superior because it ignores differences in exposure between images, eliminating the need to rescale between images. |

Didays method | A disadvantage of the K-means method is that the final grouping is very dependent of what seeds are initially chosen. Diday surpassed this by appplying the K-means technique multiple times with different seeds. Then, cross-tabuluating the results, and using only the clusters that were repeatedly formed. |

K-Means Clustering | K-Means is a method of clustering that devides the data into a user defined number of groups. Two random images "seeds" are chosen, and their centers of gravity are computed. A partition is drawn down the middle between the centers, the new centers of gravity are computed, and the process is repeated for a given number of times. The final result is VERY dependent on which image seeds are the first chosen. Because our faces data set is manufactured. We know exactly which images are identical, except the random noise, and the exact number of groups. The output discussed was obtained with 8 classes, using factors 1-3, and an even factor weight of 1.0 between those three factors. |

Principal Component Analysis | (PCA) computes the distance between data vectors with Euclidean distances. |

Wards method | |

average linkage | |

centroid method | |

complete linkage | |

single linkage |