I just launched a new open source project devoted to the Edgar Church collection.
The goal is to create a generic store of data surrounding the collection. This will include:
- A full listing of every book in the collection
- Original catalog grade
- CGC Grade (if applicable)
The data right now is in two parts- only one of which is usable:
data.csv
a comma separated values (CSV) collection of a list that Arty from the boards put together a decade ago. Obviously the data needs to be updated, but the format is great and the data, while outdated, is still valid as a snapshot of what the CGC census for the Church collection looked like a decade ago.- The original catalog listing. I was sent this many years ago (greggy?) as a hard copy and have always wanted this data in format I could search or use for visualizations. I've scanned it here for use in the project.
The immediate goal is to extract the text from the scanned listing. From there, the basic data needs to be pushed into the CSV. After that it just needs to be cleaned up and enhanced with publisher information, information about grades in later auctions and catalogs (non-CGC) and the like.
If you've got experience with extracting text from scanned or printed resources, I would love to have your help. I've got a budget for the work, including Mechanical Turk or something similar, I just have no direct experience with this kind of work and would love to partner up with someone who does.
By Edgar Church Collection Data Released | It's All Just Comics April 7, 2016 - 3:54 pm
[…] a little over 15 months, countless hours of my own time and 30+ hours of paid assistance, I've finally gotten the Edgar […]