Schema2QA 2.0

A large-scale question answering dataset on real-world data

View the Project on GitHub stanford-oval/schema2qa

Stanford Schema2QA Dataset


Schema2QA is the first large question answering dataset over real-world data. It covers 6 common domains: restaurants, hotels, people, movies, books, and music, based on crawled metadata from 6 different websites (Yelp, Hyatt, LinkedIn, IMDb, Goodreads, and In total, there are over 2,000,000 examples for training, consisting of both augmented human paraphrase data and high-quality synthetic data generated by Genie. All questions are annotated with executable virtual assistant programming language ThingTalk.

Schema2QA includes challenging evaluation questions collected from crowd workers. Workers are prompted with only what the domain is and what properties are supported. Thus, the sentences are natural and diverse. They also contain entities unseen during training. The collected sentences are manually annotated with ThingTalk by the authors. In total there are over 5,000 examples for dev and test.

An example of an evaluation question and its ThingTalk annotation is shown below:

“What are the highest ranked burger joints in the 40 mile area around Asheville NC?”

sort(aggregateRating.ratingValue desc of @org.schema.Restaurant.Restaurant() 
  filter distance(geo, new Location("asheville nc" )) <= 40 mi && 
         servesCuisine =~ "burger")[1] ;

What’s new in 2.0

The main difference is that all the examples in the dataset has been reannotated with ThingTalk 2.0. This is a major redesign of the language to make it more accessible, less verbose, and more compatible with pre-trained neural network. More details about the changes can be found in the release history. The synthetic data is regenerated with latest Genie v0.8.0, with improvement over both quality and efficiency. There are also minor annotation fixes, duplicated examples removed in the evaluation set. So the size of evaluation set is actually slightly smaller for some domains, but the diversity remains the same.

You can still find information about Schema2QA 1.0 here. However, we do not recommend using Schema2QA 1.0 any more as it contains outdated ThingTalk annotation.


All numbers are evaluated on the Schema2QA test set which is not included in this repository. Please contact us at to evaluate your model(s) on the test data. Accuracy on dev set can be found here. Note that the accuracy is now different from what we reported in our papers as the dataset has changed.


Trained with the full Schema2QA training data, including synthetic data using manual natural language annotations of the properties, and human paraphrase data. Both are augmented with crawled real property values.

Rank Model Restaurants People Movies Books Music Hotels Average
73.3% 80.0% 81.7% 72.5% 70.3% 69.5% 74.5%
64.3% 73.8% 66.8% 46.7% 58.0% 55.9% 60.9%


Trained with dataset fully synthesized with AutoQA, using automatically generated natural language annotations and a neural paraphraser.

Rank Model Restaurants People Movies Books Music Hotels Average
77.3% 76.2% 83.4% 65.1% 62.9% 72.2% 72.9%
62.6% 58.4% 60.4% 44.0% 50.3% 60.4% 56.0%

Validation data can be found under directories of each domain in this git repository. The training sets can be downloaded from the following links:

Detailed statistics of the dataset can be found in the stats page.

Getting started

This repository also contains the Makefile to run the full data synthesis, training, and evaluation of Schema2QA dataset. Detailed instructions can be found in installation and run instructions.


The dataset is released under CC BY 4.0. Please cite the following papers if use this dataset in your work:

% Schema2QA & BERT-LSTM model
  title={Schema2QA: High-Quality and Low-Cost Q\&A Agents for the Structured Web},
  author={Xu, Silei and Campagna, Giovanni and Li, Jian and Lam, Monica S},
  booktitle={Proceedings of the 29th ACM International Conference on Information \& Knowledge Management},

% AutoQA 
  title={AutoQA: From Databases to Q\&A Semantic Parsers with Only Synthetic Training Data},
  author={Xu, Silei and Semnani, Sina and Campagna, Giovanni and Lam, Monica},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},

% BART parser
    title={A Few-Shot Semantic Parser for {W}izard-of-{O}z Dialogues with the Precise {T}hing{T}alk Representation},
    author={Campagna, Giovanni  and Semnani, Sina  and Kearns, Ryan  and Koba Sato, Lucas Jun  and Xu, Silei  and Lam, Monica},
    booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
    address={Dublin, Ireland},
    publisher={Association for Computational Linguistics},