About this project: Picking it up again a year on

14 Dec

This has been nagging away at me all through 2012 which has been an unfortunately busy year.  But I cannot escape it any more. I keep reminding myself that once I’ve got the photographs up on the pages, “all” I need to do is post the letters and schedule them for publication.

At the moment I plan to take the scanned copies of the 1970s typescripts, run them through the OCR software and tidy up the resulting mess, and then copy and paste them here with their scheduled dates.

However, that turns out to be pretty daunting. Here is what the pdfs look like:

Sample PDF

Sample PDF

And here is what the resultant .txt file looks like:

Sample txt file

Sample txt file

There are 1004 pages of typescript all together and as you can see the OCR software doesn’t cope with the strange 70s type-face. Having counted the pages, I wonder if it would be quicker to copy type than to clean up the OCRs. I feel the need for a spreadsheet and a schedule coming on. I do have over eighteen months, after all.

(And my, but doesn’t it sound like something out of the Boys’ Own Paper?)

Leave a comment

Posted by on 14 December, '12 in About, Imperial War Museum


Write a reply.....

This site uses Akismet to reduce spam. Learn how your comment data is processed.