Informative subset selection (skquery.select)

This module regroups preprocessing methods to select informative subsets of a dataset on which query strategies can be applied.

class skquery.select.EntropySelection(m)[source]

Selects the most informative samples according to the entropy of their soft partition. Strategy based on Chen and Jin (2020).

csts_to_file(constraints, filename='constraints')

Write the contents of a constraint dictionary into a text file.

Parameters

constraintsdict of list

Constraints to write.

filenamestring, default=”constraints”

Name of the output file.

select(X, partition, return_best=False, fuzziness=2, tolerance=1e-05, max_iter=100)[source]

Selects the most informative samples according to the entropy of their soft partition.

Parameters

X: array-like

Dataset to select from.

partition: array-like

Partition of the dataset.

return_best: bool, default=False

Whether to return the best sample along with the set of selected samples.

fuzziness: float, default=2

Fuzziness parameter for the fuzzy c-means algorithm.

tolerance: float, default=10**-5

Tolerance parameter for the fuzzy c-means algorithm.

max_iter: int, default=100

Maximum number of iterations for the fuzzy c-means algorithm.

Returns

selected: list

List of selected samples.

best: int

Index of the best sample, if return_best is True.

class skquery.select.NearestNeighborsSelection(m, distances=None)[source]
csts_to_file(constraints, filename='constraints')

Write the contents of a constraint dictionary into a text file.

Parameters

constraintsdict of list

Constraints to write.

filenamestring, default=”constraints”

Name of the output file.

select(X, partition, return_best=False)[source]

Selects the most informative samples according to the Cai et al. strategy.

Parameters

X: array-like

Dataset to select from.

partition: array-like

Partition of the dataset.

return_best: bool, default=False

Whether to return the best sample along with the set of selected samples.

Returns

selected: list

List of selected samples.

best: int

Index of the best sample, if return_best is True.

class skquery.select.RandomSelection(m)[source]

Random selection of points.

csts_to_file(constraints, filename='constraints')

Write the contents of a constraint dictionary into a text file.

Parameters

constraintsdict of list

Constraints to write.

filenamestring, default=”constraints”

Name of the output file.

select(X, partition=None, return_best=False)[source]

Selects a random subset of the dataset.

Parameters

X: array-like

Dataset to select from.

partition: Ignored

Not used, present for API consistency.

Returns

selected: list

List of selected samples.