crow

crow

Benchmarking Commonsense Reasoning in Real-World Tasks

CRoW is a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. It is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. The study reveals a significant performance gap when NLP systems are evaluated on CRoW compared to humans, indicating that commonsense reasoning is far from being solved in real-world task settings.

BenchmarkMachine LearningNatural Language
Key facts
Maturity
Support
C4DT
Inactive
Lab
Active
  • Technical

Natural Language Processing Lab

Natural Language Processing Lab
Antoine Bosselut

Prof. Antoine Bosselut

The NLP lab is focused on advanced NLP research areas like knowledge representations, reasoning, narrative understanding, and biomedical NLP.

This page was last edited on 2024-02-20.