# Citation Heilbron, M. (Micha), Armeni, K. (Kristijan), Schoffelen, J.M. (Jan Mathijs), Hagoort, P. (Peter), Lange, F.P. de (Floris) (2022). A hierarchy of linguistic predictions during natural language comprehension [Data set]. https://doi.org/10.34973/dfkm-h813. # Abstract Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analysing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous, probabilistic predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable signatures of syntactic, phonemic and semantic predictions. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction. # Background information You can find more information, including relevant publications pertaining to this dataset on the collection overview page at https://doi.org/10.34973/dfkm-h813. A complete list of files that are part of this dataset can be found in the file MANIFEST.txt, including a SHA256 hash for each file to allow verification of correct data transfer. # Restrictions on data access and reuse The access to and use of this dataset is only allowed under the conditions listed in the data use agreement, as detailed in the file LICENSE.txt. Neither the Donders Institute or Radboud University, nor the researchers that provide this dataset should be included as an author of publications or presentations if this authorship would be based solely on the use of this data. However, we ask you to acknowledge the use of the data and data derived from the data when publicly presenting any results or algorithms that benefitted from their use: 1) Papers, book chapters, books, posters, oral presentations, and all other presentations of results derived from the data should acknowledge the origin of the data as follows: "Data were provided (in part) by the Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen". 2) Authors of publications or presentations using the data should cite relevant publications describing the methods developed and used by the Donders Institute to acquire and process the data. The specific publications that are appropriate to cite in any given study will depend on what the data were used for and for what purposes. When applicable, a list of publications will be specified on the collection overview page.