Releases · webis-de/small-text

17 Aug 13:41

v2.0.0.dev3

cdfdb6e

v2.0.0.dev3 Latest

Latest

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Due to overlap with the previouses v2.0.0.dev* relases, no changes will be shown here, but instead we refer to the CHANGELOG file.

Assets 2

25 May 14:21

chschroeder

v2.0.0.dev2

d6b2ee3

v2.0.0.dev2

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Due to overlap with v2.0.0.dev1, no changes will be shown here, but instead we refer to the CHANGELOG file.

Assets 2

24 Nov 19:15

chschroeder

v2.0.0.dev1

a801c75

v2.0.0.dev1

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Added

General
- Python requirements raised to Python 3.8 since Python 3.7 has reached end of life on 2023-06-27.
- Dropped torchtext as an integration dependency. For individual use cases it can of course still be used.
- Added environment variables SMALL_TEXT_PROGRESS_BARS and SMALL_TEXT_OFFLINE to control the default behavior for progress bars and model downloading.
PoolBasedActiveLearner:
- initialize_data() has been replaced by initialize() which can now also be used to provide an initial model in cold start scenarios. (#10)
Classification:
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support torch.compile() which can be enabled on demand. (Requires PyTorch >= 2.0.0).
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support Automatic Mixed Precision.
- SetFitClassification.__init__() now has a verbosity parameter (similar to TransformerBasedClassification) through which you can control the progress bar output of SetFitClassification.fit().
- TransformerBasedClassification:
  - Removed unnecessary token_type_ids keyword argument in model call.
  - Additional keyword args for config, tokenizer, and model can now be configured.
Embeddings:
- Prevented unnecessary gradient computations for some embedding types and unified code structure.
Pytorch:
- Added an inference_mode() context manager that applies torch.inference_mode or torch.no_grad for older Pytorch versions.
Query Strategies:
- New strategies: DiscriminativeRepresentationLearning, LabelCardinalityInconsistency, ClassBalancer, and ProbCover.
- Query strategies now have a tie-breaking mechanism to randomly permutate when there is a tie in scores.
- Added ScoringMixin to enable a reusable scoring mechanism for query strategies.
- LightweightCoreset can now process input in batches. (#23)
Vector Index Functionality:
- A new vector index API provides implementations over a unified interface to use different implementations for k-nearest neighbor search.
- Existing strategies that used a hard-coded vector search ([ContrastiveActiveLearning][contrastive_active_learning], [SEALS][seals], [AnchorSubsampling][anchor_subsampling]) have been adapted and can now be used with different vector index implementations.

Fixed

Fixed a bug where the clone() operation wrapped the labels, which then raised an error. This affected the single-label scenario for PytorchTextClassificationDataset and TransformersDataset. (#35)
Fixed a bug where the batching in greedy_coreset() and lightweight_coreset() resulted in incorrect batch sizes. (#50)
Fixed a bug where lightweight_coreset() failed when computing the norm of the elementwise mean vector.

Changed

General
- Moved split_data() method from small_text.data.datasets to small_text.data.splits.
Dependencies
- Raised setfit version to 1.1.0.
Classification:
- The initialize() methods of all PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) are now more unified. (#57)
- KimCNNClassifier / TransformerBasedClassification: model selection is now disabled by default. Also, it no longer saves models when disabled, thereby greatly reducing the runtime.
Utils
- init_kmeans_plusplus_safe() now supports weighted kmeans++ initialization for scikit-learn>=1.3.0.

Removed

Deprecated functionality
- Removed default_tensor_type() method.
- Removed small_text.utils.labels.get_flattened_unique_labels().
- Removed small_text.integrations.pytorch.utils.labels.get_flattened_unique_labels().
- Classification
  - Removed early stopping legacy arguments in __init__() for KimCNN and TransformerBasedClassification. (Use fit() keyword arguments instead.)
  - Removed model selection legacy argument in TransformerBasedClassification.__init__().
The explicit installation instruction for conda was removed, but the small-text conda-forge package will remain.

Assets 2

18 Aug 16:02

chschroeder

v1.4.1

c8a19ba

v1.4.1

Bugfix release.

Fixed

Fixed an out of bounds error that occurred when DiscriminativeActiveLearning queries all remaining unlabeled data.
Fixed typos/wording in PoolBasedActiveLearner docstrings.
Pinned SetFit version in notebook example. (#64)
Fixed an out of bounds error that could occur in SetFitClassification for both 32bit systems and Windows. (#66)
Fixed errors in notebook examples that occurred with more recent seaborn / matplotlib versions.

Changed

Documentation: added links to bibliography. (#65)

Assets 2

09 Jun 12:14

chschroeder

v1.4.0

b22b200

v1.4.0

Fixes SetFit seed control and adds the AnchorSubsampling query strategy.

Added

New query strategy: AnchorSubsampling.

Fixed

Changed the way how the seed is controlled in SetFitClassification since the seed was fixed unless explicitly set via the respective trainer keyword argument.

Changed

Documentation: Added a section where compatible transformer models are listed.
Documentation: Updated showcase section.

Assets 2

29 Dec 21:23

chschroeder

v1.3.3

53b22ad

v1.3.3

Bugfix release.

Changed

An errata section was added to the documentation.

Fixed

Fixed a deviation from the paper, where DeltaFScore also considered negative label predictions for the agreement. (#51)
Fixed a bug in KappaAverage that affected the stopping behavior. (#52)

Contributors

@zakih2 @vmanc

Contributors

zakih2 and vmanc

Assets 2

19 Aug 18:16

chschroeder

v1.3.2

60cddaf

v1.3.2

Bugfix release.

Fixed

Fixed a bug in TransformerBasedClassification where validations_per_epoch>=2 left the model in eval mode. (#40)

Assets 2

22 Jul 19:55

chschroeder

v1.3.1

7186796

v1.3.1

Bugfix release.

Fixed

Fixed a bug where parameter groups were omitted when using TransformerBasedClassification's layer-specific fine-tuning functionality. (#36, #38)
Fixed a bug where class weighting resulted in nan values. (#39)

Contributors

@JP-SystemsX

Contributors

JP-SystemsX

Assets 2

21 Feb 21:15

chschroeder

v1.3.0

3d99fb5

v1.3.0

SetFitClassification now also supports dropout sampling (like KimCNNClassifier and TransformerBasedClassification).

Added

Added dropout sampling to SetFitClassification.

Fixed

Fixed broken link in README.md.
Fixed typo in README.md. (#26)

Changed

Stopping Criteria

The ClassificationChange stopping criterion now supports multi-label classification.

Documentation

Updated the active learning setup figure.
The documentation of integrations has been reorganized.

Contributors

@rmitsch

Contributors

rmitsch

Assets 2

04 Feb 21:44

chschroeder

v1.2.0

36ba0e3

v1.2.0

This release adds a SetFit classifier, the BALD query strategy, and two new example notebooks.

Added

Active Learning

PoolBasedActiveLearner now handles keyword arguments passed to the classifier's fit() during the update() step.
New strategy: BALD.
SubsamplingQueryStrategy now uses the remaining unlabeled pool when more samples are requested than are available.

Classification

Added new classifier: SetFitClassification which wraps huggingface/setfit.

Examples

Revised both existing notebook examples.
Added a notebook example for active learning with SetFit classifiers.
Added a notebook example for cold start initialization with SetFit classifiers.

Documentation

A showcase section has been added to the documentation.

Fixed

Distances in lightweight_coreset were not correctly projected onto the [0, 1] interval (but ranking was unaffected).

Changed

Coreset implementations now use the distance-based (as opposed to the similarity-based) formulation.

Assets 2

Releases: webis-de/small-text

v2.0.0.dev3

Uh oh!

v2.0.0.dev2

Uh oh!

v2.0.0.dev1

Added

Fixed

Changed

Removed

Uh oh!

v1.4.1

Fixed

Changed

Uh oh!

v1.4.0

Added

Fixed

Changed

Uh oh!

v1.3.3

Changed

Fixed

Contributors

Contributors

Uh oh!

v1.3.2

Fixed

Uh oh!

v1.3.1

Fixed

Contributors

Contributors

Uh oh!

v1.3.0

Added

Fixed

Changed

Contributors

Contributors

Uh oh!

v1.2.0

Added

Fixed

Changed

Uh oh!