eolearn.features.clustering
Module for computing clusters in EOPatch
- class eolearn.features.clustering.ClusteringTask(*args, **kwargs)[source]
Bases:
eolearn.core.eotask.EOTask
Tasks computes clusters on selected features using sklearn.cluster.AgglomerativeClustering.
The algorithm produces a timeless data feature where each cell has a natural number which corresponds to specific group. The cells marked with -1 are not marking clusters. They are either being excluded by a mask or later removed by depending on the ‘remove_small’ threshold.
Class constructor
- Parameters
features (dict(FeatureType.DATA_TIMELESS: set(str))) – A collection of features used for clustering. The features need to be of type DATA_TIMELESS
new_feature_name (str) – Name of feature that is the result of clustering
distance_threshold (float or None) – The linkage distance threshold above which, clusters will not be merged. If non None, n_clusters must be None nd compute_full_tree must be True
n_clusters (int or None) – The number of clusters found by the algorithm. If distance_threshold=None, it will be equal to the given n_clusters
affinity (str) – Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”.
linkage ({“ward”, “complete”, “average”, “single”}) – Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. - ward minimizes the variance of the clusters being merged. - average uses the average of the distances of each observation of the two sets. - complete or maximum linkage uses the maximum distances between all observations of the two sets. - single uses the minimum of the distances between all observations of the two sets.
remove_small (int) – If greater than 0, removes all clusters that have less points as “remove_small”
connectivity (array-like, callable or None) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. If set to None it uses the graph that has adjacent pixels connected.
mask_name (str) – An optional mask feature used for exclusion of the area from clustering