Inactive

CENTER FOR
DIGITAL TRUST

crow

Benchmarking Commonsense Reasoning in Real-World Tasks

CRoW is a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. It is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. The study reveals a significant performance gap when NLP systems are evaluated on CRoW compared to humans, indicating that commonsense reasoning is far from being solved in real-world task settings.

BenchmarkMachine LearningNatural Language

Key facts

Maturity

Support

C4DT

Lab

Key facts

Maturity

Support

C4DT

Lab

Technical

Source code: Personal Github
Last commit: 2023-12-14

Natural Language Processing Lab

Natural Language Processing Lab

Antoine Bosselut

Prof. Antoine Bosselut

The NLP lab is focused on advanced NLP research areas like knowledge representations, reasoning, narrative understanding, and biomedical NLP.

This page was last edited on 2024-02-20.

This page was last edited on 2024-02-20.