Quotebank is a large-scale dataset of 235 million quotations extracted from 162 million English news articles published between 2008 and 2020. Speaker attribution is performed using a Wikidata-linked entity linking pipeline and a probabilistic model trained on Wikipedia-derived supervision, yielding a quotation-to-speaker corpus suitable for NLP, social science, and computational journalism research.
This page was last edited on 2024-04-16.
This page was last edited on 2024-04-16.