distilled counterfactual data

distilled counterfactual data

Automated generation of high-quality counterfactual data.

DISCO (DIStilled COunterfactual Data) is a method for automatically generating high-quality counterfactual data at scale. It uses a large general language model to generate phrasal perturbations, which are then filtered by a task-specific teacher model to distill high-quality counterfactual data. The method has been applied to natural language inference tasks, demonstrating improved robustness and generalization across distributions.

AdversarialMachine LearningNatural Language
Key facts
Maturity
Support
C4DT
Inactive
Lab
Active
  • Technical

Natural Language Processing Lab

Natural Language Processing Lab
Antoine Bosselut

Prof. Antoine Bosselut

The NLP lab is focused on advanced NLP research areas like knowledge representations, reasoning, narrative understanding, and biomedical NLP.

This page was last edited on 2024-02-20.