Pairwise constraint selection (`skquery.pairwise`)¶

This module regroups methods that query pairwise (i.e. must-link/cannot-link) constraints.

class skquery.pairwise.AIPC(epsilon=0.0)[source]¶

Active Informative Pairwise Constraint Formulation algorithm [1]_.

Selects pairwise constraints between strong and weak samples, as defined according to Shannon entropy based on fuzzy clustering.

Parameters¶

epsilonfloat, default=0.: Threshold value for the entropy of weak samples. By default, epsilon is computed during fit.

Attributes¶

epsilonfloat: Threshold value for the entropy of weak samples. The value can be given at initialization, otherwise it is computed during fit.
fuzzy_partitionarray-like: Soft partition of the dataset computed by fuzzy c-means clustering.

References¶

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)[source]¶

Select pairwise constraints with AIPC.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionarray-like: Existing partition of the data. Used to define the number of clusters.
n_clustersint: Number of clusters to find.

Returns¶

constraintsdict of lists: ML and CL constraints between strong and weak samples.

class skquery.pairwise.FFQS(neighborhoods=None, distances=None)[source]¶

Farthest First Query Selection [1]_.

Explores the data with farthest-first traversal to discover the cluster structure, then consolidates the skeleton with random selection.

Parameters¶

neighborhoodslist of lists, default=None: Optional precomputed neighborhoods of the data. This can be used as a warm start, e.g. to skip the Explore phase.
distancesarray-like, default=None: Euclidean pairwise distance matrix of the data. If none given, it will be computed during fit.

Attributes¶

neighborhoodslist of lists: Points whose cluster affectation is certified by the answers of the oracle to the queries. Each list contains points belonging to the same cluster.
p_distsarray-like: Euclidean pairwise distance matrix of the data.
unknownset: Points whose affectation has been queried but remain unknown, i.e. the oracle couldn’t answer the query.

References¶

. [1] Basu, S., Banerjee, A., Mooney, R. Active Semi-Supervision for: Pairwise Constrained Clustering. 2004. Proceedings of the 2004 SIAM International Conference on Data Mining. ISBN 978-1-61197-274-0, pp. 333-344.

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)[source]¶

Select pairwise constraints with the FFQS scheme.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionarray-like, default=None: Existing partition of the data that provides the number of clusters to find. It has priority over n_clusters, i.e. n_clusters will not be taken into account if a partition is passed.
n_clustersint, default=None: Number of clusters to find. If none given, only the Explore phase will be used.

Returns¶

constraintsdict of lists: ML and CL constraints derived from the neighborhoods.

get_constraints_from_neighborhoods()[source]¶

Derives constraints from the neighborhood structure : ML between all pairs of points in the same neighborhood and CL between all pairs of points in separate neighborhoods

Returns¶

ml, cllist of tuples: Pairwise constraints derived from the neighborhoods.

class skquery.pairwise.MinMax(neighborhoods=None, distances=None)[source]¶

Min-max Farthest First Query Selection [1]_.

FFQS with modified Consolidate step with similarity for constraint selection.

Parameters¶

neighborhoodslist of lists, default=None: Optional precomputed neighborhoods of the data. This can be used as a warm start, e.g. to skip the Explore phase.
distancesarray-like, default=None: Euclidean pairwise distance matrix of the data. If none given, it will be computed during fit.

Attributes¶

neighborhoodslist of lists: Points whose cluster affectation is certified by the answers of the oracle to the queries. Each list contains points belonging to the same cluster.
p_distsarray-like: Euclidean pairwise distance matrix of the data.
unknownset: Points whose affectation has been queried but remain unknown, i.e. the oracle couldn’t answer the query.

References¶

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)¶

Select pairwise constraints with the FFQS scheme.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionarray-like, default=None: Existing partition of the data that provides the number of clusters to find. It has priority over n_clusters, i.e. n_clusters will not be taken into account if a partition is passed.
n_clustersint, default=None: Number of clusters to find. If none given, only the Explore phase will be used.

Returns¶

constraintsdict of lists: ML and CL constraints derived from the neighborhoods.

get_constraints_from_neighborhoods()¶

Derives constraints from the neighborhood structure : ML between all pairs of points in the same neighborhood and CL between all pairs of points in separate neighborhoods

Returns¶

ml, cllist of tuples: Pairwise constraints derived from the neighborhoods.

class skquery.pairwise.NPU(neighborhoods=None, cc_alg=None)[source]¶

Normalized Point-based Uncertainty algorithm [1]_.

Uncertainty sampling using Shannon entropy based on an existing partition, normalized by the expected number of queries for a given point.

Parameters¶

neighborhoodslist of lists, default=None: Optional precomputed neighborhoods of the data. This can be used as a warm start, e.g. for incremental constrained clustering.
cc_algcallable, default=None: Constrained clustering algorithm to use in tandem with the selection procedure. Not needed if a partition is given during fit.

Attributes¶

partitionarray-like: Partition of the dataset.
neighborhoodslist of lists: Points whose cluster affectation is certified by the answers of the oracle to the queries. Each list contains points belonging to the same cluster.
cc_algcallable: Constrained clustering algorithm used to compute a partition using the selected constraints.
unknownset: Points whose affectation has been queried but remain unknown, i.e. the oracle couldn’t answer the query.

Notes¶

This implementation of NPU works either with a constrained clustering algorithm or by giving a partition as argument of the fit method. In the former case, only algorithms from active-semi-supervised-clustering library are supported.

References¶

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)[source]¶

Select pairwise constraints with NPU.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionarray-like: Existing partition of the data. Not used if a clustering algorithm has been defined at init.
n_clustersIgnored: Not used, present for API consistency.

Returns¶

constraintsdict of lists: ML and CL constraints derived from the neighborhoods.

class skquery.pairwise.RandomMLCL[source]¶

Random sampling of pairwise constraints.

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)[source]¶

Selects pairwise constraints randomly.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionIgnored: Not used, present for API consistency.
n_clustersIgnored: Not used, present for API consistency.

class skquery.pairwise.SASC(alpha=0.5)[source]¶

Sequential approach for Active Constraint Selection algorithm [1]_.

Selects a subset of boundary instances with Support Vector Data Description (SVDD), then computes individual constraint utility and sequential utility of constraints.

Parameters¶

alphaint, default=0.5: Proportion of constraints selected based on Assumption A.

Attributes¶

alphaint, default=0.5: Proportion of constraints selected based on Assumption A.
support_vectorsarray-like: Indexes of support vectors found through SVDD.
p_dists_svarray-like: Euclidean pairwise distance matrix of support vectors.

References¶

csts_to_file(constraints, filename='constraints')¶

Write the contents of a constraint dictionary into a text file.

Parameters¶

constraintsdict of list: Constraints to write.
filenamestring, default=”constraints”: Name of the output file.

fit(X, oracle, partition=None, n_clusters=None)[source]¶

Select pairwise constraints with SASC.

Parameters¶

Xarray-like: Instances to use for query.
oraclecallable: Source of background knowledge able to answer the queries.
partitionIgnored: Not used, present for API consistency.
n_clustersIgnored: Not used, present for API consistency.

Returns¶

constraintsdict of lists: ML and CL sequentially selected constraints.

is_between(X, a, b, c)[source]¶

u_t(c_t, u_t_ij, cl, ml, link)[source]¶

La fonction u_t se charge de la sélection des contraintes en se basant sur les mesures d’utilité calculées par les fonctions u_indA et u_indB.

Voici un résumé du fonctionnement de cette fonction :

Elle commence par identifier la contrainte avec la plus grande utilité dans la matrice d’utilité u_t_ij.

2. Elle vérifie ensuite si cette paire de contraintes n’a pas déjà été sélectionnée. Si ce n’est pas le cas, elle l’ajoute à l’ensemble des contraintes sélectionnées c_t, ainsi qu’à la liste appropriée de contraintes de lien (ml) ou de contraintes de non-lien (cl), selon le paramètre link.

3. Après avoir ajouté une nouvelle contrainte, elle met à jour la valeur d’utilité de cette contrainte dans la matrice d’utilité u_t_ij pour éviter de la sélectionner à nouveau.

4. Enfin, elle met à jour les valeurs d’utilité de toutes les autres contraintes dans u_t_ij en fonction de leur distance à l’ensemble des contraintes déjà sélectionnées. Cette mise à jour est réalisée en multipliant l’utilité actuelle de chaque contrainte par la distance minimale entre cette contrainte et l’ensemble des contraintes déjà sélectionnées ( via la méthode djc_t() ), et en divisant le résultat par la distance maximale parmi toutes les contraintes.

En résumé, cette fonction sélectionne les contraintes une par une en fonction de leur utilité, en veillant à mettre à jour l’utilité des contraintes restantes à chaque étape pour tenir compte de l’information déjà capturée par les contraintes sélectionnées.

Pairwise constraint selection (`skquery.pairwise`)¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Attributes¶

Notes¶

References¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

scikit-query

Navigation

Related Topics

Pairwise constraint selection (skquery.pairwise)¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Attributes¶

Notes¶

References¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

Parameters¶

Attributes¶

References¶

Parameters¶

Parameters¶

Returns¶

Pairwise constraint selection (`skquery.pairwise`)¶