Informative subset selection (skquery.select
)¶
This module regroups preprocessing methods to select informative subsets of a dataset on which query strategies can be applied.
- class skquery.select.EntropySelection(m)[source]¶
Selects the most informative samples according to the entropy of their soft partition. Strategy based on Chen and Jin (2020).
- csts_to_file(constraints, filename='constraints')¶
Write the contents of a constraint dictionary into a text file.
Parameters¶
- constraintsdict of list
Constraints to write.
- filenamestring, default=”constraints”
Name of the output file.
- select(X, partition, return_best=False, fuzziness=2, tolerance=1e-05, max_iter=100)[source]¶
Selects the most informative samples according to the entropy of their soft partition.
Parameters¶
- X: array-like
Dataset to select from.
- partition: array-like
Partition of the dataset.
- return_best: bool, default=False
Whether to return the best sample along with the set of selected samples.
- fuzziness: float, default=2
Fuzziness parameter for the fuzzy c-means algorithm.
- tolerance: float, default=10**-5
Tolerance parameter for the fuzzy c-means algorithm.
- max_iter: int, default=100
Maximum number of iterations for the fuzzy c-means algorithm.
Returns¶
- selected: list
List of selected samples.
- best: int
Index of the best sample, if
return_best
is True.
- class skquery.select.NearestNeighborsSelection(m, distances=None)[source]¶
- csts_to_file(constraints, filename='constraints')¶
Write the contents of a constraint dictionary into a text file.
Parameters¶
- constraintsdict of list
Constraints to write.
- filenamestring, default=”constraints”
Name of the output file.
- select(X, partition, return_best=False)[source]¶
Selects the most informative samples according to the Cai et al. strategy.
Parameters¶
- X: array-like
Dataset to select from.
- partition: array-like
Partition of the dataset.
- return_best: bool, default=False
Whether to return the best sample along with the set of selected samples.
Returns¶
- selected: list
List of selected samples.
- best: int
Index of the best sample, if
return_best
is True.
- class skquery.select.RandomSelection(m)[source]¶
Random selection of points.