Algorithmic Reading

Due to big data’s volume and variety, coding proficiency and machine learning are becoming part and parcel of the new humanities. Every week, 28 billion photos are uploaded to Google Photos, and YouTube hosts more than an estimated 100,000 years of video content. Such vast repositories far exceed the scope of traditional scholarly methods. Machine-assisted “algorithmic reading” is therefore essential for processing and interpreting both textual and binary large objects at this scale.

Beyond the dichotomy of close and distant reading, algorithmic reading is not simply about identifying patterns or generating statistical summaries—nor is it about delegating interpretive tasks to artificial intelligence (AI). Rather, it aims to create intuitive modes of exploring terabyte- and petabyte-scale digital collections, akin to the serendipitous experience of “walking through the stacks.” For textual sources, the multilingual capabilities of transformer-based large language models and vision-language models enable exploration across more than 200 languages. These AI systems can be calibrated to engage with a wide variety of sources, from digitized manuscripts and newspapers to audiovisual records, virtual worlds, and unstructured data dumps from a decommissioned online service. Discovery methods now extend beyond keyword searches to include natural language prompts, semantic and stylistic clustering, and image or audio matching.

DeepPast, one of BDSL’s flagship projects, demonstrates how algorithmic reading can open new interpretive possibilities for navigating and understanding complex digital archives, merging technological innovation with the interpretive depth that defines humanities scholarship.