VQuAnDa 1.0
Verbalisation QUestion ANswering DAtaset


VQuAnDa 1.0 is a answer verbalization dataset which is based on a commonly used large-scale Question Answering dataset – LC-QuAD. It contains 5,000 question, corresponding SPARQL query and the verbalized answer. The target knowledge base is DBpedia, specifically the April, 2016 version.


Rank System BLEU Perplexity


  • Find the train and test splits at our github repo also.
  • Use DBpedia v04-16 to benchmark your system on LC-QuAD. Here's a guide on setting up your own endpoint.
  • We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results
Every data item in the dataset consists of the following fields:

question: "Human corrected version of the verbalized question.",
verbalized_answer: "Human corrected version of the answer(s).",
query: "Valid corresponding SPARQL query on DBpedia.",
uid: "Unique ID to the datapoint.",
v0.1.0 - 11-12-2019
  • [RELEASE] First version of the dataset released with 5,000 datapoints.
  • VQuAnDa.sda.tech published
title={VQuAnDa: Verbalization QUestion ANswering DAtaset},
author={Kacupaj, Endri and Zafar, Hamid and Maleshkova, Maria and Lehmann, Jens},
booktitle={European Semantic Web Conference (ESWC)},