Datasets
Hierarchical Moonstone Dataset
This dataset contains nine chapter of the novel The Moonstone annotated for hierarchical topical structure. Each chapter is annotated by 3-6 people. The dataset is described in my COLING 2014 paper and, in more detail, in my Ph.D. thesis. Download it here.
Flat Moonstone Dataset
This dataset contains 20 chapters of The Moonstone annotated for topical shifts. Each chapter is annotated by 4-6 people. The dataset is described in my NAACL 2012 paper and also in my Ph.D. thesis. Download it here.
Summaries of short fiction
Here is a dataset of 20 short stories annotated for summary-worthy sentences by 4 people. This is the dataset described in my Computational Linguistics paper. Here are the instructions that the annotators have received.