Digital Archives: Reading and Manipulating Large-Scale Catalogues, Curating and Creating Small-Scale Archives

Instructors: Yael Netzer
Duration: both weeks

The purpose of this two-week workshop is to develop practical and critical skills toward the representation of knowledge in digital archives and to build a small-scale digital archive. This workshop blends theory and hands-on activities, enabling participants to engage with digital catalogues, metadata structures, and archival curation tools. Additional Open Lab Session: An optional free exploration day will be available for participants to receive one-on-one guidance on their projects and tools. No prior knowledge is required for this workshop. Participants are encouraged to bring their own datasets but will be provided with starter collections if needed.

Week 1 – Reading and Working with Data / Collections in OpenRefine

Digital data in various formats is at the heart of humanities research. Often, datasets are large, messy, or structured in unfamiliar ways. This week, students will learn to inspect, clean, and enrich digital catalogues using OpenRefine, as well as how to enhance datasets with Linked Open Data (LOD) from sources such as the Library of Congress, VIAF, and Wikidata. By the end of this week, students will be proficient in:

Understanding different file formats (CSV, TSV, Spreadsheets, JSON, XML TEI)
Using regular expressions for data manipulation (with some skill and aid from chatGPT)
Writing expressions with GREL (OpenRefine’s scripting language)
Fetching and reconciling data via REST API (e.g., GeoNames, Wikidata)
Scraping and structuring data from the web
Mapping textual data to geographic locations

Schedule:

Class 1: Introduction, loading a file, faceting, and exploring data
Class 2: Regular expressions and working with dates
Class 3: Clustering techniques for data cleaning
Class 4: Fetching external data using REST APIs (GeoNames example)
Hands-On Session: Practicing administrative tasks (changing working directory, memory allocation)
Class 5: Reconciliation and enriching data with Wikidata
Class 6: Handling JSON and XML file formats
Class 7: Web scraping techniques and automation
Class 8: From text to map – Geospatial representations in OpenRefine
Class 9: Summary and discussion

Week 2 – Building a Digital Archive: Archives of the Present This week focuses on the creation and structuring of small-scale digital archives, but also introduces the concept of archives of the present—a critical reflection on how contemporary events, data, and digital traces shape our archival practices. Participants will work with their own or provided collections, conceptualizing metadata structures and curatorial strategies. The workshop covers best practices in digital archive development, including metadata schema selection, linked data integration, and user-friendly design. The discussion of archives of the present will explore:

How digital documentation of real-time events (social media, news articles, live-streamed content) can be archived
The ethical challenges of archiving contemporary materials
Methods for ensuring accessibility and preservation of ephemeral data
The evolving nature of authority files and metadata in fast-changing digital environments By the end of this week, students will be proficient in:
Theoretical foundations of archival studies
Metadata structuring and best practices
Using Omeka-S for archive implementation
Using Tropy for organizing and annotating images
Linking archives to external sources and ontologies
Designing and publishing an accessible, structured digital archive
Engaging with contemporary data collection and preservation strategies

Schedule:

Class 1: Theory of archives – an introduction
Class 2: Digital archives – examples and reviewing participant collections
Class 3: Modeling the domain
Class 4: Metadata – methods of description, challenges, and dilemmas
Class 5: Introduction to Omeka-S – setting up and structuring an archive
Class 6: Using Tropy – basic features and integration with Omeka
Hands-On Session: Working on participant collections
Class 7: Archives of the present – Capturing and preserving digital traces
Class 8: Linking and integrating with external resources and authority files
Class 9: Publishing – designing Omeka pages for public access
Class 10: Summary and reflections

To enrich the learning experience, this workshop will aim to incorporate:

Case studies of successful digital archive projects
Collaborative group work, where teams handle different types of archival materials
Expanded toolset beyond OpenRefine and Omeka, including basic Python for data manipulation and SPARQL for querying LOD sources
Introduction to IIIF (International Image Interoperability Framework) for handling digital images in archives
Machine learning-assisted metadata extraction, including OCR (Transkribus), Google Vision API, and Named Entity Recognition (NER)
Sustainability and long-term digital archive maintenance strategies

← Back to all workshops

Share on

X Facebook LinkedIn Bluesky

ESU DH 2026

Share on