Inactive

CENTER FOR
DIGITAL TRUST

distilled counterfactual data

Automated generation of high-quality counterfactual data.

DISCO (DIStilled COunterfactual Data) is a method for automatically generating high-quality counterfactual data at scale. It uses a large general language model to generate phrasal perturbations, which are then filtered by a task-specific teacher model to distill high-quality counterfactual data. The method has been applied to natural language inference tasks, demonstrating improved robustness and generalization across distributions.

AdversarialMachine LearningNatural Language

Maturity

Support

C4DT

Lab

Maturity

Support

C4DT

Lab

Technical

Source code: Personal Github
Last commit: 2023-07-27

Natural Language Processing Lab

Natural Language Processing Lab

Antoine Bosselut

Prof. Antoine Bosselut

The NLP lab is focused on advanced NLP research areas like knowledge representations, reasoning, narrative understanding, and biomedical NLP.

This page was last edited on 2024-02-20.

This page was last edited on 2024-02-20.