EPFL Logo CENTER FOR
DIGITAL TRUST
Entity Insertion in Wikipedia

Entity Insertion in Wikipedia

Multilingual entity insertion in Wikipedia articles

Proposes a framework for inserting entities into Wikipedia articles across multiple languages. It processes Wikipedia dumps to extract data and train models for entity insertion. The key components are: 1) Data processing pipeline to extract relevant data from Wikipedia dumps. 2) Modeling code for training entity insertion models using a ranking loss or pointwise loss. 3) Benchmarking code to evaluate models against baselines like BM25, EntQA, and GPT language models.

Machine LearningNatural Language
Key facts
Maturity
Support
C4DT
Inactive
Lab
Active
  • Technical

Data Science Lab

Data Science Lab
Robert West

Prof. Robert West

Our research aims to make sense of large amounts of data. Frequently, the data we analyze is collected on the Web, e.g., using server logs, social media, wikis, online news, online games, etc. We distill heaps of raw data into meaningful insights by developing and applying algorithms and techniques in areas including social and information network analysis, machine learning, computational social science, data mining, natural language processing, and human computation.

This page was last edited on 2024-04-16.