Abstracts and biogs

David L. Hoover (New York University) "Simulations and difficult problems"

The ideal conditions for an authorship attribution or computational stylistics investigation are well known, but many significant or even just tempting problems suffer from some kind of deficiency or limitation that reduces the effectiveness or validity of some kinds of analysis, and makes others impossible. One way to cope with such a situation is to create what I will loosely call a simulation that overcomes the limitations of the problem and to use the results of the simulation to suggest, at least, a solution to the more intractable problem. My talk will illustrate this process with three difficult problems. One is a style variation problem that involves a complex combination of chronology, genre, and mode of composition, one is a tricky case of co-authorship, and one involves Vickers's rare n-gram method of authorship attribution for early modern drama.

Biography David L. Hoover is Professor of English at New York University, where he teaches Digital Humanities, Authorship, science fiction, and Chaucer. Since 2007, he has also taught the Out-of-the-Box Text-Analysis class at the Digital Humanities Summer Institute at the University of Victoria. His most recent publications include “Text-analysis Tools in Excel,” (forthcoming) in O’Sullivan, J., ed. Digital Humanities for Literary Studies, "The Microanalysis of Textual Variation," DSH (2017), and "Argument, Evidence, and the Limits of Digital Literary Studies," in Debates in the Digital Humanities: 2016. Active in Digital Humanities for thirty-five years, his main interests are in computational stylistics, corpus stylistics, and authorship attribution. He is currently writing a book on how modes of composition (handwriting, dictation, typing, word-processing) affect (or do not affect) authorial style.

Ruth Ahnert (Queen Mary University of London) "The culture of networks"

This paper draws on a collaborative opinion-piece I am working on with Sebastian E. Ahnert, Scott Weingart and Nicole Coleman. It offers a history of the use of networks, and a proposal for how to use them as a metaphor, as a visual language, a powerful set of computational measures, and a conceptual discourse. The contention is that ‘networks’ cut across normal concepts of intellectual domains, whether that be the divisions and categorisations found in university departments and faculties, or in library catalogues. Andrew Piper has argued recently for a concept of ‘cultural analytics’, not as “computer science applied to culture”, but rather as an analytical practice that “requires a wholesale rethinking of both of these categories”. Networks are one framework that forces us to rethink those categories, and to propose a new analytical mode.

Biography I research broadly in the area of Tudor culture and book history. My first book The Rise of Prison Literature in the Sixteenth Century (2013) explored the kinds of writing undertaken by Tudor prisoners. More recently, my research has employed digital methods from the field of Complex Networks to study Tudor letters. Work in this area has been funded by Stanford Humanities Center, the Folger Shakespeare Library, the AHRC, and the QMUL Innovation Grant. I am also involved in a number of centres and projects that seek to bring together collaborators interested in network analysis. With Joad Raymond I am Director of the Centre for Early Modern Mapping News and Networks, which we established and launched in September 2013. I am on the steering committee of QMUL’s Digital Initiative Network. With Elaine Treharne I am also series editor of the Stanford University Press’s Text Technologies book series.

John Nance (Florida State University) "TBC"

Awaiting abstract and biog.

Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, Astrid Kulsdom (Royal Netherlands Academy of Arts and Sciences) "Alexandria: Reconsidering textual networks in the digital paradigm"

It's a truism that computers have come to play an essential role in the ways we store, process and study text. However, we sometimes overlook the fact that the affordances and limitations of a prevailing technology may blind us to aspects not supported by that technology. For instance, using TEI/XML may subtly encourage us to ignore textual phenomena not part of the TEI/XML encoding model (Sahle 2013, 381-2). What is more, current digital textual research rarely surpasses the traditional boundaries of the field. Despite the acclaimed interdisciplinary nature of digital humanities, textual research is often carried out within a closed environment and from a particular perspective on the textual object (be it historical, linguistic, or literary). The resulting textual model may serve a specific objective, but it neglects the potential of digital technology to unite and synthesise scholarly knowledge.

The present contribution introduces Text As Graph (TAG), a data model developed at the R&D department of the Humanities Cluster of the Dutch Royal Academy of Science. TAG makes use of a hypergraph model for text, which allows researchers to store, query, and analyse text that is encoded from different perspectives. By means of an interactive presentation of Alexandria, a text repository system that serves as the reference implementation of the TAG model, we will demonstrate how users can store and query a wide range of information (i.e., perspectives) about the same text. Researchers, then, can make effective use of existing knowledge and material in order to study text from multiple scholarly perspectives. Internally, the hypergraph can always be expanded with new layers of information that may be interconnected. Alexandria thus stimulates new ways of looking at textual objects, facilitates the exchange of information across disciplines, and secures textual knowledge for future endeavours. From a philosophical perspective, the TAG model and Alexandria raise compelling questions about our notions of textuality, and prompt us to reconsider how we can best model the variety of textual dimensions.


TAG. https://github.com/HuygensING/TAG/

Sahle, Patrick. 2013. Digitale Editionsformen-Teil 3: Textbegriffe Und Recodierung. Norderstedt: Books on Demand. http://kups.ub.uni-koeln.de/5353


Elli Bleeker is a postdoctoral researcher in the Research and Development Team at the Humanities Cluster, part of the Royal Netherlands Academy of Arts and Sciences. She specializes in digital scholarly editing and computational philology, with a focus on modern manuscripts and genetic criticism. Elli completed her PhD at the Centre for Manuscript Genetics (2017) on the role of the scholarly editor in the digital environment. As a Research Fellow in the Marie Sklodowska-Curie funded network "DiXiT" (2013–2017), she received advanced training in manuscript studies, text modeling, and XML technologies.

Bram Buitendijk is a software developer in the Research and Development team at the Humanities Cluster, part of the Royal Netherlands Academy of Arts and Sciences. He has worked on transcription and annotation software, collation software, and repository software.

Ronald Haentjens Dekker is a software architect and lead engineer of the Research and Development Team at the Humanities Cluster, part of the Royal Netherlands Academy of Arts and Sciences. As a software architect, he is responsible for translating research questions into technology or algorithms and explaining to researchers and management how specific technologies will influence their research. He has worked on transcription and annotation software, collation software, and repository software, and he is the lead developer of the CollateX collation tool. He also conducts workshops to teach researchers how to use scripting languages in combination with digital editions to enhance their research.

Astrid Kulsdom is a project manager and researcher in the Research and Development team at the Humanities Cluster, part of the Royal Netherlands Academy of Arts and Sciences. After completing her research Master in Literary Studies at Radboud University in 2012, she has worked as a project manager for several government institutions. As project manager of the Research and Development team, she combines her philological knowledge with her project management skills in order to effectively manage all strands of research within the team.

Marco Büchler (University of Göttingen & Leibnitz Institute of European History) " Challenges and Implications of Historical Text Reuse Detection on Big Data"

Abstract Text reuse or intertextuality is one of the main subjects of interest in the Digital Humanities. This talk is an introduction to the field of historical text reuse detection, its objectives and challenges. Furthermore, this interactive presentation will illustrate the process of finding the right parameters and settings to strike the right precision/recall balance while running computational text reuse analyses. The use case presented will be the comparison between the gospels of Mark and Luke. The talk will conclude with an outline of recent research on the changes observed during the text reuse process.

Biography Marco Büchler holds a Diploma in Computer Science. From 2006 to 2014 he worked as a Research Associate in the Natural Language Processing Group at Leipzig University. From April 2008 to March 2011 Marco served as the technical Project Manager for the eAQUA project and continued to work in that capacity for the following eTRACES project. In March 2013 he received his PhD in eHumanities. Since May 2014 he leads a Digital Humanities Research Group at the Göttingen Centre for Digital Humanities. His research includes Natural Language Processing on Big Humanities Data. Specifically, he works on Historical Text Reuse Detection and its application in the business world. In addition to his primary responsibilities, Marco manages the Medusa project (Big Scale co-occurrence and NGram framework) as well as the TRACER machine for detecting historical text reuse. As of July 2017 he also leads the Digital Historical Research at the Leibnitz Institute of European History in Mainz.

Paul Nulty (Cambridge University) "Methods and interactive tools for exploring the semantics of essentially contested political concepts"

Abstract Political concepts are often characterised as "essentially contested" in the sense that their essential meanings are necessarily in dispute when we deploy them in adversarial political discourse. When tasked with pinning down an elusive word meaning, modern lexicographers, computational linguists, and natural language engineers usually turn to a descriptive analysis of the term's use in context, often looking for statistical patterns of syntactic or document-based word co-occurrence in large collections of digital text. This talk presents an application of the tools of statistical corpus semantics to the problem of delineating the various complex, multi-faceted, and value-laden meanings of abstract political concepts. As an example, comparisons of the lexical-semantic structure of political terms from libertarian and socialist online communities are presented. Network analysis methods including community-detection and measures of centrality are used to compare and contrast the terms of particular importance to the structure of word association networks from different ideologies.

Biography Dr Paul Nulty is a Research Associate with the Concept Lab, part of the Cambridge Centre for Digital Knowledge at CRASSH. Paul's research focuses on applications of natural language processing to questions in digital humanities and social science. Specific interests include computational models of semantic relations between entities, and methods for extracting political concepts and ideological positions from text. In 2013 he gained his PhD from the School of Computer Science in University College Dublin, on the topic of understanding and predicting lexical expressions of semantic relations between nouns. From 2012-2015 he was a Research Officer in the Department of Methodology at the London School of Economics and Political Science, where he worked on general applications of computational text analysis to political science, such as content analysis of Twitter data and ideological scaling of speeches and manifestos.

John Rager (Amherst College) "A digital textual analysis course at Amherst College"

This paper is a description/discussion of a course intended to appeal to both humanities and computer science majors. There were eight students, four were computer science majors, four were humanities majors. The course was immeasurably improved by the presence of English professor Peter Berek. The course arose partially out of collaborative conversions between me at Amherst and Michael Witmore at the Folger. The goal of the course was to explore the literature in (literary) digital textual analysis. I wanted to try to discuss not only papers in the humanities, but especially to discuss, in some depth, the computer science behind the analyses done in the papers.

The in-class discussions were driven by papers from the digital humanities literature. The usual pattern was to assign a paper as reading and then spend one or two classes discussing it. In discussion, we would attempt to clarify the process followed by the authors and the conclusions drawn in the work. I would then spend more class time discussing the science behind the paper. I chose the papers so that they would lead to discussions of a broad array of scientific topics. In the full discussion, I can hand out a list of the papers and the topics. (The topics included Clustering, Topic Modeling, Naïve Bayes and other classifiers, n-grams, context-free grammars and parsing.) I will describe the six laboratory exercises (Docuscope Tagging; Topic Modeling Using Mallet; Building Classifiers Using Mallet; Cluster Analysis and WEKA;Miscellaneous MALLET and Weka;Python NLTK, Tagging and BiGrams) A group project was always envisioned as a major part of the course. Each group contained both computer science and humanities majors. One group looked at learning to distinguish early modern English plays written for adult companies from those written for boys’ companies. A second group looked at learning to distinguish male and female authors in 19th century fictional prose. The last group tried to prove that Shakespeare and Kit Marlowe were the same person. The students made several interesting observations about textual analysis work, which I may discuss. (They include the obervation that overfitting is sometimes ignored and model verification is sometimes given short shrift.)

Dhiaa Janaby (Newcastle University) "Media at war: The discursive construction of Saddam Hussein in the Iraq-Iran War and the US-led invasion"

It has been argued by (Keeble, 2004) that the image of Saddam in the US media changed following the Iraqi invasion of Kuwait in 1990 and continued to change  until the US-led invasion of 2003. This shift could be a result of the degree of the US stance and involvement in these wars: for instance, the US stance was tilted towards Iraq and against Iran in the Iraq-Iran war, but was totally the opposite after Iraq attacked Kuwait.  Therefore, the principal objective of this study  to examine the discourse of the major US press during the Iraq-Iran war (1980- 1988) and the US-led invasion (2003), to see how Saddam was constructed and talked about, and also to see whether there  a shift in the US press stance in reporting Saddam in these wars. The methodology used to achieve this aim corpus linguistics approach, in combination with critical discourse analysis represented by the Discourse Historical Approach (DHA).

The results showed that there was a shift in the reporting: for instance, Saddam was given a voice in the Iraq-Iran war through the use of ‘saying’ verbs, but he was muted/ silenced and had no access to the US press discourse during the US-led invasion. The frequency of Saddam in the press during the US-led invasion was significantly higher than during the Iraq-Iran war, despite the fact that the latter lasted for eight years whereas the former only lasted for a few weeks. Furthermore, he was demonized, Hitlerized, Stalinized and criminalized in the US press during the US-led invasion, but this was not the case in the Iraq-Iran war. Moreover, Saddam has never been collocated with the use of chemical weapons in the corpus of data on the Iraq-Iran war. Instead ‘Iraq’, ‘Iraq’s’ and ‘Iraqi’ were frequent collocates for the CWs. By contrast, in an examination of the collocates of CWs in the US-led invasion it was found that Saddam appeared forty-eight times as a collocate with CWS, and ‘Saddam’s’ occurred seven times.


Keeble, R. (2004) 'Information warfare in an age of hyper-militarism', journal of Communication Management, pp. 43-58.

Jane Demmen, Andrew Hardie, and Jonathan Culpeper (Lancaster University) "Part-of-speech tagging in Shakespeare: Trials, tribulations and preliminary results"

Analysing the language of Shakespeare's plays using computational methods, although reasonably well established now, has to date focused very little on grammatical patterns and parts of speech (POS). Yet grammatical profiles and patterns of grammatical usage can help us enrich the description of language in the plays, for example by examining variation:

  • in different plays and genres
  • across early to later plays
  • at character level (including representation of different accents and social ranks)
  • between character dialogue authored by Shakespeare and that by other playwrights of his era.

The aim of the Encyclopaedia of Shakespeare's Language project (AHRC project AH/N002415/1; http://wp.lancs.ac.uk/shakespearelang/) is to create a new, empirically-based dictionary of Shakespeare's language along with character and play profiles and themes. Amongst the corpus linguistic methods involved in the project, we use the CLAWS (Constituent Likelihood Automatic Word-tagging System; http://ucrel.lancs.ac.uk/claws/) to annotate Shakespeare's plays with POS tags which indicate the grammatical category (noun, verb, adverb, adjective, etc.) to which they are deemed to belong. To facilitate this, CLAWS has undergone some further development to improve the accuracy with which it handles historical texts, including expansion of its lexicon to include a wider range of archaic forms (notably, verbs agreeing with thou).

In this paper we discuss some of the issues and problems with POS tagging texts from the Early Modern period, and we display some preliminary data quantifying major POS categories in the different plays by Shakespeare.

Willard McCarty (King's College London) "What happens when we intervene?"

In “Varying the cognitive span”, David Gooding argues that technologies not only expand the range of available evidence and change the way we work but also make new phenomena (2003: 255). In After Phrenology Michael Anderson agrees: invented technologies, he argues, offer new affordances; with them, “we are not merely doing better with tools what we were doing all along in perception. Rather, we are constructing new properties to perceive in the world… properties that actually require these tools to perceive them accurately” (2014: 181f). Theirs is an argument from the cognitive sciences which applies across the board to the use of instruments for enquiry. I want to raise two questions in its wake: first, how the digital machine differs from other such instruments in this regard; second, what happens, what we do with data, when we intervene with the machine in coming to grips with a text or textual corpus? For the first question I will set out a view of the machine as a paradoxically deterministic but combinatorially free and complex device. In other words, I will argue that binary encoding matters, and so does what Goldstine and von Neumann did with it (1947). For the second question, Gooding warns us against the temptation to “iron the reticularities and convolutions out of thought (and action) to make a flat sheet on which a methodologically acceptable pattern can be printed” (1990: 5). If successful in getting beyond it, I will sketch out my own version of the process he calls ‘construal’ (1990, passim), by which new knowledge is made in the struggle between the perceptual and the conceptual. I am keenly interested in whether this sketch makes sense to others and helps to illumine what we do with the machine.


Anderson, Michael L. 2014. After Phrenology: Neural Reuse and the Interactive Brain. Cambridge MA: MIT Press.

Goldstine, Herman and John von Neumann. 1947. Planning and Coding of Problems for an Electronic Computing Instrument. Report on the Mathematical and Logical aspects of an Electronic Computing Instrument, Part II, Volume 1-3. Princeton NJ: Institute for Advanced Study. https://library.ias.edu/files/pdfs/ecp/planningcodingof0103inst.pdf

Gooding, David. 2003. “Varying the Cognitive Span: Experimentation, Visualisation, and Computation”. In The Philosophy of Scientific Experimentation. Ed. Hans Radder. 255-301. Pittsburgh PA: University of Pittsburgh Press.

Gooding, David. 1990. Experiment and the Making of Meaning: Human Agency in Scientific Observation and Experiment. Dordrecht: Kluwer.

Rebecca Mason (Glasgow University) "Imposing structures on historical legal documents"

Eliciting what historians do when they are working in the archive--the commonest site of historical research--is quite difficult. We know that they are searching out and examining documents and other primary sources, but how they work on and find meaning from these sources is shrouded in mystery. Historians are often faced with many problems when attempting to transfer historical sources into a readable database, including when it comes to determining the types of structures to be imposed on archival sources. This often boils down to the inescapable realities of historical research. The historian imposes structure on historical documents by deciding the focus of the question under investigation and selecting the categories around which to describe and organise the data that is collected, which in turn affects the outcomes of data analysis. Discussing the application of digital textual analysis to complex bodies of early modern Scottish legal sources, the basis of my own research, this presentation will discuss how historians overcome certain design problems that affect their research when attempting to quantify complex information into a relational database. Focusing closely on my own relational database, this presentation will reveal the findings of my PhD research, demonstrating how databases can enhance historical research, as well as recognising difficulties faced when collating large amounts of data.

Biography Rebecca is a third-year AHRC-funded PhD student in the History department at the University of Glasgow. Her thesis explores the litigating activities of married women in courts in seventeenth-century Glasgow, focusing on their rights to real estate and moveable property in contrasting legal jurisdictions. Her research is funded as part of the UK-wide AHRC project entitled ‘Women Negotiating the Boundaries of Justice: Britain and Ireland, c.1100-c.1750’.  She is a student council representative of the Economic History Society and a steering committee member of Women’s History Scotland. More broadly, her research interests include: the relationship between gender and economic development, the gendered structures of premodern law, and the impact of social worth and marital status when entering law.

Jewell Thomas (University of North Carolina, Chapel Hill) "'A strong basis of suspended action': Lexonomic structure of the novels published in Charles Dickens's All the Year Round 1859-1862"

This paper analyzes the lexical structure of a set of novels published in sequence in Dickens’s periodical All the Year Round. I develop and demonstrate a new technique, ‘lexonomics,’ for measuring this structure which allows computational-stylistic comparison of novels without requiring the construction of a priori assumptions which typical digital humanities text-modeling techniques need. This study reports analyses of Charles Dickens’s Great Expectations and compares the lexonomic structure of this novel with that of Charles Lever’s A Day's Ride, showing how lexonomic differences reflect literary differences. This study also reports lexonomic analyses for the three other novels published in Dickens’s All the Year Round in the years 1859-62: Tale of Two Cities, The Woman in White, and A Strange Story. These three novels, along with Great Expectations, were considered by Dickens to be exemplary serial novels. Synoptic analyses show that The Woman in White is the most lexically inter-connected novel of the five, followed by Great Expectations, A Strange Story, A Day’s Ride and A Tale of Two Cities.

Biography Jewell is a PhD student at the University of North Carolina in Chapel Hill.

Jacqueline Cordell (University of Nottingham) "To regularise, or not to regularise? Orthographic annotation and Middle English (literary) text""

This paper explores the potential of using orthographic annotation in corpus approaches to studying lexical patterns in late medieval literature. Orthographic annotation is a useful approach in addressing the high amounts of spelling variation evidenced in historical varieties of English, with its associated benefits dependent on the given research focus. For instance, regularised spelling is a key component of corpus linguistic analyses of historical literature; spelling variation interferes with the ability of corpus tools and software to correctly identify the number of times a word appears in a digital corpus of text, resulting in the possibility of a single word being divided into different lexical entries within the word lists generated for a given corpus. These categorisation errors resulting from systemic spelling irregularity produce an adverse knock-on effect affecting the quality of subsequent corpus analysis of language features like key words and collocations, approaches which are heavily dependent on the information derived from these initial word lists.

To this end the presentation offers a practical introduction to Variant Detector (VARD), a corpus tool which enables spelling regularisation through successive levels of manual and automated training through a user-friendly interface (read: no programming skills required). This introduction focuses on points of general interest including: the formatting requirements of digitised texts compatible with VARD, the processes by which the tool identifies variants, and the options available in both the manual and automated training settings. To illustrate these differences discussion will draw on the annotation data collected during my current doctoral research examining the language of Piers Plowman and an associated reference corpus of ME narrative poetry. Using this data, I identify quantitatively--and qualitatively--based benchmarks of annotation-based success resulting from VARDing these Middle English texts. This data is also used as a point of departure for suggesting approaches to future annotation projects that target individual sets of variants as a method of preventing the time-based costs of spelling regularisation from outweighing the potential benefits to the individual researcher.

Mark J. Hill and Simon Hengchen (University of Helsinki) "Quantifying the impact of messy data on historical text analysis: Eighteenth century periodicals as a case study"

Quantitative methods for historical text analysis offer exciting opportunities for researchers interested in gaining new insights into long studied texts. However, the methodological underpinnings of these methods remains underexplored. In light of this, this paper takes two datasets made up of identical early eighteenth century titles (periodicals) and compares them. The first corpus is a collection of clean versions of texts from various sources, while the second corpus is made up of messy (in terms of OCR) versions extracted from the Eighteenth Century Collection Online (ECCO).[1] With these two corpora the aim is to achieve four things: First, offer some descriptive analyses. This includes differences and similarities in word, sentence, and paragraph counts; average sentence length; and variances in differences between correct and incorrect words. The second aim is to use this information to engage in statistical analyses. That is, use the differences recorded between OCR errors and clean data to quantify the significance of those errors (i.e., to what extent is the messy version representative of the clean version). Third, the paper will offer some--more qualitative--reflections on differences in outputs from specific text analysis methods. These include comparing: inter-corpus and cross-corpus similarities in vector space; LDA topic modelling; and outputs from the Stanford Lexical Parser.[2] Finally, the paper concludes by offering some thoughts on how those engaging with messy data can (or cannot) move forward – in particular, quantifying the problem and highlighting some methods which are less susceptible to errors caused by bad OCR.waiting abstract and biog.

[1] For more on OCR errors in ECCO see: Paddy Bullard "Digital Humanities and Electronic Resources in the Long Eighteenth Century." Literature Compass 10, no 10 (2013): 748–760 and Patrick Spedding, "'The New Machine': Discovering the Limits of ECCO." Eighteenth-Century Studies 44, no. 4 (2011): 437–53.

[2] There has been surprisingly little work done in this area. However, Rodriguez et al’s 2010 "Comparison of Named Entity Recognition tools for raw OCR text" (online) stands out.

Iain Emsley (Sussex University), Pip Willcox (Oxford University), David De Roure (Oxford University), and Alan Chamberlain (University of Nottingham) "Galvanising David Garrick: Using Sonification for comparative reading and modelling"

The late Eighteenth Century saw an interest in improving communication through the elocutionary movement (Goring: 2014). In 1775, Sir Joshua Steele created and published a symbolic performance notation. He used this notation to preserve records of David Garrick’s performances, which he admired (Steele: 1779), and that he then played back. Other critics, most notably Thaddeus Fitzpatrick, were critical of Garrick’s Shakespeare performances (Fitzpatrick: 1760). Steele’s notation can be viewed as a forerunner of linguists’ suprasegmental approach to prosody (Kassler: 2005). In this paper, we present sonifications as experimental digital approaches to Steele’s notation work as reproduction of Garrick’s performances.

The use of sonification as a method to reproduce Steele’s notation is presented through two studies of the same lines spoken by Steele and Garrick as described in Steele’s Prosodia Rationalis (Steele: 1779). We reflect on the challenges in reproducing experiments from archival sources and using digital methods to present them through their modelling and constraints. We then discuss how the Garrick model can be used to recreate one aspect of Fitzpatrick’s critique of Garrick’s speech and how the digital provides methods of aural reconstruction.

The paper also considers the challenges in reading and modelling the notation in a digital form. This encourages us to view the experiments as a way of thinking about the original methods and algorithms discussed in the sources, including completing an implied piece of work. The approaches to digital and modelling provide methods through which we are able to explore and attempt to explicate the underlying aims of the presented modellers (Steele and Fitzpatrick), and to echo their underlying experiments. Our claim is that our experimental digital methods allow us to be metaphorical galvanists through simulation and experimentation, stimulating the recreated model with new questions.   


Fitzpatrick, T 1760. An enquiry into the real merit of a certain popular performer. In a series of letters, first published in the Craftsman ... With an introduction to D----d G----k, Esq. M. Thrush, London

Goring, P. 2014. “The Elocutionary Movement in Britain” in The Oxford Handbook of Rhetorical Studies, MacDonald M. J (ed.), Oxford University Press, Oxford. DOI: 10.1093/oxfordhb/9780199731596.013.043

Kassler, J.C. 2005. “Representing Speech Through Musical Notation”, Journal of Musicological Research, 24:3-4, 227-239, DOI: 10.1080/01411890500233965

Steele, J. 1779. Prosodia rationalis: or, An essay towards establishing the melody and measure of speech, to be expressed and perpetuated by peculiar symbols (2 ed.), J. Nichols, London


Mel Evans (Leicester University) "Messy signals: Communication and interpretation in a traditional/digital editing project"

This paper discusses the role of computational stylistics and digital humanities in the AHRC-funded project ‘Editing Aphra Behn in the Digital Age’, and the theoretical and practical concerns of combining digital approaches with traditional editing techniques and models. The project’s primary output is a new scholarly edition of the works of Aphra Behn (c.1640 – 1689). The project also seeks to raise the profile of Behn as a writer, and that of Restoration culture and society, through a combination of online and offline activities and events. Behn’s works, which span plays, prose, letters and translations, are rife with attribution issues, like many of her early modern contemporaries. This project investigates Behn’s style and authorship across her writings using computational stylistic techniques, providing a multi-genre authorial style profile.

As we approach the half-way stage in the project, I report on three dimensions of the attribution work. Firstly, I survey the theoretical and methodological challenges, and advantages, of undertaking a large-scale literary attribution study on someone who “isn’t Shakespeare”. Secondly, I present our current findings and explore their associated complexities, or messy signals, particularly in relation to Behn’s plays. I note the success rates of different measures, such as Zeta (Burrows 2007) and Support Vector Machine classifiers, and linguistic features (MFW and character n-grams) (Stamatatos 2013). Finally, I discuss the role of communication and interpretation in making sense of the computational results for the purposes of a scholarly edition; in particular, the ways in which traditional scholarship can provide a road-map for quantitative analysis, the benefits and limitations of discipline-specific ignorance, and the importance of shared reference points and understanding for the development of robust and persuasive accounts of authorial style.


Burrows, J. 2007. ‘All the Way Through: Testing for Authorship in Different Frequency Strata’. Literary and Linguistic Computing 22 (1): 27–47.

Stamatatos, E. 2013. ‘On the Robustness of Authorship Attribution Based on Character N-Gram Features’. Journal of Law and Policy, 421–39.

John Jowett (Shakespeare Institute) "Digital Shakespeare and the Limits of Structure"

This paper reflects on the classic definition of the digital era, 'Text is an Ordered Hierarchy of Content Objects'. It takes the example of Shakespeare as a dramatist working in the material and semiotic conditions of the theatre, and as a writer whose earliest extant texts reflect the uneven conventions of layout and structure that prevailed in the early modern printing house. The guidelines of the Text Encoding Initiative and Internet Shakespeare Editions are considered for their treatment of mixed, optional, or uncertain action in stage directions. Further questions as to the relation between textual disorderliness and digital tagging are raised by original-spelling editions such as the New Oxford Shakespeare Critical Reference Edition, which legitimately seek to retain many of the irregularities of the early modern text.

Biography John Jowett is Professor of Shakespeare Studies at the Shakespeare Institute, University of Birmingham. He is author of Shakespeare and Text (2007), which he is currently revising for a second edition. His most recent editorial project is the digital and print New Oxford Shakespeare, of which he was a general editor.

Itay Marienberg-Milikowsky (University of Hamburg) "Beyond digitization? The case of Digital Humanities and Hebrew literature"

The flourishing of Digital Humanities did not pass over Hebrew Literature, a literature that is characterized by linguistic continuity and a cultural and generic affluence, due to complex historical circumstances. The thriving of science and technology in Israel, the present center of development of the Hebrew language and literature, has contributed exceptionally to the creation of some advanced digital enterprises. However, despite the impressive progress of digital accessibility of worthy corpora of literary treasures, literary research in Hebrew Literature has been avoiding the next step--computational research. This restraint misses a real opportunity that this literature may offer: it is big enough for Distant Reading, but although it is divided into various historical periods it is still smaller than other literatures. This trait allows the researcher to move relatively smoothly from distant to close reading,
and vice versa. In my paper, I will seek to examine this situation, while pointing at the conceptual and practical challenges that are required for its change. One of my claims shall be that bridging the gap between digital humanists and traditional humanists, is a step that could not be overestimated.

Biography Itay Marienberg-Milikowsky is a guest researcher at the Interdisciplinary Center for Narratology, Hamburg University. Starting June 2018 he will be a post-Doc Fellow of the Minerva Stiftung, opening a new chapter in his research: Reading the Great Un-read of Talmudic Narratives. While his PhD dissertation and post-doc studies focus on Rabbinic literature, Marienberg-Milikowsky has published on all historical strata of Hebrew literature: ancient, medieval and modern. Currently he is coordinating the project “Remapping Israeli Literature with a Digitized Lexicon”, supported by the Israel Scholarship Education Foundation, and held in Ben Gurion University of the Negev.

Alessandro Vatri (Wolfson College Oxford and the Alan Turing Institute) and Barbara McGillivray (University of Cambridge and the Alan Turing Institute) "A computational approach to lexical polysemy in Ancient Greek"

Abstract Lexical polysemy, the phenomenon whereby the same word can have different meanings, plays a significant role in language, and is closely connected with semantic change, whereby words acquire new meanings over time. We present a Bayesian learning approach to model polysemy and semantic change in a large corpus of Ancient Greek texts. This model allows us to conduct a large-scale analysis to measure the relationship between semantic change and polysemy, and to quantify the role played by linguistic and non-linguistic factors such as genre.

Biography Alessandro Vatri (DPhil Oxon) is Junior Research Fellow of Wolfson College, University of Oxford. He is mainly interested in communication in the ancient Greek world and, in particular, in the connection between the linguistic form of texts and the original socio-anthropological circumstances of their production and reception. His recent and forthcoming publications focus on ancient textual practices, ancient rhetoric, Greek oratory, and the reconstruction of native language comprehension. His areas of interest also include ancient Greek synchronic and historical linguistics, ancient literary criticism, corpus linguistics, and the digital humanities. His first monograph Orality and Performance in Classical Attic Prose. A Linguistic Approach (Oxford University Press) was published in 2017.

Biography Barbara McGillivray is research fellow at The Alan Turing Institute and the University of Cambridge. She holds a PhD in computational linguistics from the University of Pisa. Her research interests include: Language Technology for Cultural Heritage, Latin computational linguistics, and quantitative historical linguistics. She has written two monographs in this area, Methods in Latin Computational Linguistics (Brill, 2014) and Quantitative Historical Linguistics. A corpus framework (Oxford University Press, 2017).

Anupam Basu (Washington University in St Louis) "Spenser's spell: Archaism and historical stylometrics"

Awaiting abstract and biog.

Arianna Ciula and Chris Pak (King's College London) "A corpus linguistic study of 'models' and 'modelling': Intellectual and technical challenges"

McCarty (2003) reflects on Winograd and Flores’ (1986) notion that ‘in designing tools we are designing ways of being.’ He asks ‘what ways of being do we have in mind?’ and ‘what ways of knowing do we have in hand? What is the epistemology of our practice?’ (McCarty 2003). Duranti (2004) highlights a fundamental ambiguity residing at the heart of any research endeavour. Despite the risks that such ambiguity implies in ‘letting others decide what we stand for,’ he cites Galison’s (1999) claim that ‘ambiguity, in the form of trading zones, can be a positive force in allowing the exchange of ideas and the co-existence of different scientific paradigms’ (Duranti 2004, 410).

As part of the project “Modelling between Digital and Humanities: Thinking in Practice,” supported by the Volkswagen foundation, this presentation examines the role of models in designing ways of knowing and being in selected disciplines and their capacity to develop trading zones that foster interdisciplinary exchange. In particular, this presentation outlines a corpus linguistic approach to understanding the role of models and modelling processes in the humanities and offers indicative findings based on an analysis of academic journal articles published from 1900–2017 in five disciplines: Archaeology, Anthropology, History, Philosophy and the Digital Humanities.

This paper will detail the process of corpus construction, which draws on n-gram data and analysis of OCR-scanned full-text documents provided by the Jstor Data for Research service. This analysis will track the occurrence of “model,” related words and their inflections, focussing on collocation (the co-occurrence of two or more words) and colligation (the co-occurrence of a lexical word and a grammatical category) to ask how different disciplines in the humanities describe their uses of models and modelling processes in their research, and whether and how models and other related semantic categories are associated.


Duranti, A., 2005. On theories and models. Discourse Studies, 7(4/5), pp. 409–429.

Galison, P., 1999. Trading zone: coordinating action and belief. In: M. Biagioli, ed. 1999. The Science Studies Reader. New York: Routledge. pp. 137–160.

McCarty, W., 2003. “Knowing true things by what their mockeries be”: modelling in the humanities. Computing in the Humanities Working Papers, [online]Available at: <http://projects.chass.utoronto.ca/chwp/CHC2003/McCarty2.htm>

Modelling between digital and humanities: thinking in practice, [online] Available at: <http://modellingdh.eu/>

Winograd, T. and F. Flores., 1986. Understanding Computers and Cognition: A New Foundation for Design. Boston: Addison-Wesley.

Gary Taylor (Florida State University) "Invisible writers: Finding 'Anonymous' in the digital archives"

Abstract One disadvantage of big data is that it may overlook, or bury under huge numbers, the little guy. In particular, data-driven authorship studies of early modern plays tend to favor writers with large dramatic canons. But several known Elizabethan writers who worked for the commercial theatres--Thomas Nashe, Thomas Lodge, Henry Chettle--have only a single surviving play of undisputed authorship. Others---Anthony Munday, Thomas Watson,Thomas Achelley---have no undisputed surviving single-author play. Even Thomas Kyd, with three surviving plays, has a significantly smaller dramatic canon than Christopher Marlowe or William Shakespeare. This paper, focusing on scenes from Arden of Faversham and 1 Henry VI, demonstrates a new method for empirically identifying such writers.

Biography Gary Taylor is Distinguished Research Professor and Chair of the English department at Florida State University, where he founded the History of Text Technologies program. He was a general editor of the Oxford Middleton trilogy (2007-12) and of the New Oxford Shakespeare Complete Works and Authorship Companion (in print and online, 2016-17), and is now working on the New Oxford Shakespeare Complete Alternative Versions.

Hugh Craig (Newcastle University, Australia) "Digital dating: Early modern plays and the 'ever-rolling stream'"

Abstract Accurate chronology is one of the foundations of literary history and is an essential element in bibliography. Quantitative stylistic analysis offers a new source of evidence to supplement what we can glean about chronology from the documentary record. In early modern English drama considerable work has already been done dating plays and parts of plays using the incidence of incoming and outgoing forms like HAS and HATH, as well as orthographic changes, such as increasing abbreviation in printed sources, and shifting patterns in metre. In this talk I will present a new stylometric method for establishing chronology -- an adaptation of Principal Components Analysis to create a purpose-built and transferable function -- and discuss sample results from experiments on comedies and tragedies from the period 1585-1624.

Biography Hugh Craig is Professor of English at the University of Newcastle (NSW). At Newcastle he has been Head of School, Assistant Dean (Research), and Deputy Head, Faculty of Education and Arts. Currently he is Director of the Centre for 21st Century Humanities and of the Centre for Literary and Linguistic Computing. He has held visiting positions at Magdalen College, Oxford; the Istituto di Linguistica Computazionale, Pisa; the University of Canterbury, New Zealand; the University of Victoria, Canada; the University of Birmingham; and the University of Würzburg, Germany. His research interests are in Early Modern English literature and in stylometry. He has long-standing collaborations with colleagues in bioinformatics and speech and language therapy. His findings on Shakespeare authorship have been controversial, but have influenced the inclusions and exclusions in the New Oxford Shakespeare (2016-21). Recent publications include Shakespeare, Computers, and the Mystery of Authorship (with Arthur F. Kinney, Cambridge University Press, 2009) and Style, Computers, and Early Modern Drama: Beyond Authorship (with Brett Greatley-Hirsch, Cambridge University Press, 2017). He is President of the Australasian Association for Digital Humanities and is a Fellow of the Australian Academy of the Humanities.