Introduction to Stylometry

Instructors: Jeremi Ochab
Duration: one week

You may have heard that computational text analysis for digital humanities and cultural analytics is fun. We assure you it’s not: it’s a grim endeavour that can easily go wrong really quickly and requires patience, expertise and making responsible choices. In this one-week course, we offer a comprehensive survival guide to multivariate text analysis with R, where we start with the basics of counting words and spend a lot of time on fundamentals: text representation, calculation of differences and similarities, vector manipulations, unsupervised and supervised methods of text classification. We will guide you through the user-friendly interface of stylo software to introduce important concepts and operations. Then, we will show you how to expand on that: understand the workflow, design your own research, discuss real-world studies and run simple replication experiments. By the end of the course, you will be able to pursue research questions like:

Which textual features can betray an author’s or translator’s identity?
What unconscious elements of language reflect the author’s education, gender, religious background, and social or historical conditions?
What elements of style are affected by literary period, genre, and topic?
What are the textual relationships between books or authors? Week 1: Fundamentals
Day 1. Sorrows of software set-up and introduction to crimes of literary computation
Day 2. Torture of text representations and multidimensional misery
Day 3. Unsupervised and network analysis abyss
Day 4. Classification carnage
Day 5. Agony of application: authorship, genre, gender, poetic meters

← Back to all workshops

Share on

X Facebook LinkedIn Bluesky

ESU DH 2026

Share on