Inactive

SynthIE

Exploiting Asymmetry for Synthetic Training Data Generation

SynthIE leverages the asymmetry between text-to-triples and triples-to-text: it uses a large language model to generate fluent text from Wikidata (subject, relation, object) triples, creating a large synthetic corpus for training information-extraction models. The resulting dataset bootstraps high-quality IE models without requiring expensive manual annotation, and outperforms models trained on existing human-labeled corpora.

Natural Language

Maturity

Support

C4DT

Lab

Maturity

Support

C4DT

Lab

Research papers
Technical

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Data Science Lab

Prof. Robert West

Our research aims to make sense of large amounts of data. Frequently, the data we analyze is collected on the Web, e.g., using server logs, social media, wikis, online news, online games, etc. We distill heaps of raw data into meaningful insights by developing and applying algorithms and techniques in areas including social and information network analysis, machine learning, computational social science, data mining, natural language processing, and human computation.

Go back

This page was last edited on 2024-04-16.

Go back

This page was last edited on 2024-04-16.