***CANCELLED: Using ABBYY FineReader to OCR Scanned Text for Mining

Date: 03/31/2020

Time: 1:00 pm - 2:00 pm

Center For Digital Humanities


**Please note: In response to concerns over the 2019 Novel Coronavirus (2019-nCoV), this event will not take place as planned. Stony Brook University continues to closely monitor the guidance provided by the CDC and New York State Department of Health (NYSDOH).**

Seating is limited to 10 participants. This course is only open to current SBU faculty, staff, and students. Completion of Pt. 1 Using the Bookeye Scanner is recommended. Registration required.

ABBYY FineReader® software is an advanced OCR (optical character recognition) application used to convert a scanned image of text into a searchable, machine readable format (plain text file, PDF, etc.). Digital humanities researchers often use this application to prepare scanned texts such as books or newspapers for text analysis either individually or as part of a larger data corpus. In this workshop participants will learn the basics of operating the ABBYY FineReader® software to produce a suitable unstructured text data from a scanned PDF document. We will also touch upon the topic of “cleaning” text data to prepare it for analysis.

Victoria Pilato

Victoria Pilato

Digital Projects Librarian at Stony Brook University Libraries
Victoria is liaison to the Philosophy and Religious Studies departments. Feel free to contact her at victoria.pilato@stonybrook.edu
Victoria Pilato
Posted in Arts, Humanities, & Social Sciences Events, Digital Humanities Events, Workshops