Storytelling Is Intrinsically Mentalistic: A Functional Magnetic Resonance Imaging Study of Narrative Production across Modalities


 People utilize multiple expressive modalities for communicating narrative ideas about past events. The three major ones are speech, pantomime, and drawing. The current study used functional magnetic resonance imaging to identify common brain areas that mediate narrative communication across these three sensorimotor mechanisms. In the scanner, participants were presented with short narrative prompts akin to newspaper headlines (e.g., “Surgeon finds scissors inside of patient”). The task was to generate a representation of the event, either by describing it verbally through speech, by pantomiming it gesturally, or by drawing it on a tablet.In a control condition designed to remove sensorimotor activations, participants described the spatial properties of individual objects(e.g., “binoculars”). Each of the three modality-specific subtractions produced similar results, with activations in key components of the mentalizing network, including the TPJ,

[temporoparietal junction]

posterior STS [posterior superior temporal sulcus], and posterior cingulate cortex. Conjunction analysis revealed that these areas constitute a cross-modal “narrative hub”that transcends the three modalities of communication. The involvement of these areas in narrative production suggests that people adopt an intrinsically mentalistic and character-oriented perspective when engaging in storytelling, whether using speech,pantomime, or drawing.


Theories of language origin can be divided into“vocal” and “gestural” models (McGinn, 2015;Arbib, 2012;Armstrong & Wilcox, 2007;MacNeilage & Davis, 2005;Corballis, 2002).Gestural models posit that manually produced symbols evolved earlier than those produced vocally and that speech was a replacement for a preestablished symbolic system that was mediated by gesture alone.Importantly, the kind of gesturing that gestural models allude to is“pantomime” or iconic gesturing. Iconic gesturing through pantomime is thought to have predated symbolic gesturing, passing through an intermediate stage that Arbib (2012)refers to as “proto-symbol.”

From a neuroscientific perspective, these theories of language origin establish a fundamental contrast between two different sensorimotor routes for the conveyance of language, namely,the audiovocal route for speech and the visuo-manual route for pantomime. Language is an inherently multimodal phenomenon, not least through the gesturing that accompanies speaking (Beattie, 2016;Kendon, 2015;McNeill, 2005).Humans have yet a third means of conveying semantic ideas, and that is through the generation of images, as occurs through drawing and writing (Elkins, 2001).We have argued elsewhere that the capacity for drawing is an evolutionary offshoot of the system for producing iconic gestures such as pantomimes (Yuan & Brown, 2014).Drawing is essentially a tool-use gesture that “leaves a trail behind” in the form of a resulting image. Overall, speech,pantomime, and image generation comprise a “narrative triad,”representing the three major modalities by which humans have evolved to referentially communicate their ideas to one another.

Perhaps, the most important function of language ist he communication of narrative, conveying the actions of agents, or“who did what do whom.”


Agency is one of the primary elements that is encoded in syntactic structure (Tallerman, 2015).Although word order varies across languages, 96% of languages place the subject (the agent) before the thing that the subject acts upon(Tomlin, 1986).Hence, an “agent first” organization of sentences seems to be an ancestral feature of language grammar (Jackendoff, 1999),and gestural models of language origin highlight this type of sentence organization as well (Armstrong & Wilcox, 2007). Although language is well designed to communicate agency through syntax, it typically does so in a multimodal manner, combining speech and gesture. A basic question for the evolutionary neuroscience of human communication is whether the conveyance of narrative is linked to specific sensorimotor modalities (vocal vs. manual) or whether there are cross-modal narrative areas in the brain that transcend these modalities. This question led us to design an experiment in which we would explore for the first time whether cross-modal brain areas mediate the communication of narrative ideas using speech,pantomime, and drawing as the triad of production modalities.

Most previous neuroimaging studies of cross-modal communication are perceptual, and we are not aware of production studies that have compared any pair of functions among speech,pantomime, and drawing in healthy adults.

Evolutionary Implications

Both vocal and gestural models of language attempt to account for the origins of syntax. As mentioned in the Introduction, language grammar seems to have an intrinsically narrative structure to it,being efficient at describing who did what to whom—in other words,agency. Standard subject–verb–object models of syntactic structure (Tallerman, 2015)essentially encapsulate the kinds of transitive actions that we examined in our headlines. A large majority of languages operate on an agent-first basis, putting the actor before either the action or the target of the action. To the extent that agency is one of the most fundamental things that is conveyed in grammars (and which is lacking in so-called proto-languages; Bickerton, 1995),then our results have application to evolutionary models of language.In particular, the imaging results that were obtained in the most purely linguistic condition (speech) were replicated almost identically in the nonlinguistic conditions of pantomime and drawing.This cross-modal similarity suggests that the capacity of syntax to represent agency can be achieved through nonlinguistic means employing essentially the same brain network.

A number of biological theories of language propose that syntax emerged from basic processes of motor sequencing (Arbib, 2012;Fitch, 2011;Jackendoff, 2011).Although this might account for grammar’s connection with object-directed actions—in other words, the gestural level of representation—it may not do justice to the sense of agency that is well contained in syntactic structure. Hence, we suggest that another important evolutionary ingredient in the emergence of syntax—beyond the “plot” elements contained in motor sequencing—would be the incorporation of circuits that mediate the sense of agency, not least“other” agency. To be clear, we are not arguing that the TPJ and pSTS are syntax areas. We are simply suggesting that, whereas circuits in the IFG [inferior frontal gyrus] more typically associated with syntax (Zaccarella & Friederici, 2017)might mediate the gestural level of language, the TPJ might have a stronger connection with agents in the overall scheme of language,discourse, and narrative. Agency can be conveyed linguistically through speech and sign, but it can also be conveyed nonlinguistically through pantomime (iconic gesturing) and drawing.


In this first three-modality fMRI study of narrative production, we observed results that suggest that people generate stories in an intrinsically mentalistic fashion focused on the protagonist, rather than in a purely gestural manner related to the observable action sequence. The same set of mentalizing and social cognition areas came up with each of the three modalities of production that make up the narrative triad, pointing to a common set of cognitive operations across modalities. These operations are most likely rooted in character processing, as related to a character’s intentions, motivations, beliefs, emotions, and actions. Hence,narratives—whether spoken, pantomimed, or drawn—seem to be rooted in the communication of “other-agency.”