RSS

Tag Archives: Abbyy FineReader

A few words from the typing pool

Desk

When Ben asked me to take on this project I was above all grateful for the opportunity, as it arrived just as I was setting up shop for myself as a proofreader and writing support assistant for hire – a venture which began in testing circumstances but has since proved one of my better decisions, not least for access to so many vivid and rare first-person insights into what a century ago would develop into one of the costliest conflicts in human history.

Initially I was asked to go over the output of the software Ben had been using to convert scans of the decades-ago typed transcriptions of her ancestors’ letters, which were done in something like Lucida Sans – a plain enough typeface to work with but still challenging for the software concerned, not least because most of the scans weren’t vertically very well aligned, which added an extra element of jauntiness to the outcome.

Added to that, though it seemed to be improving with more input the errors that kept recurring were possibly more to do with the print on the pages themselves – it may only take a lighter pixel or two from the program’s perspective for an m to be rendered as r n, or a slightly heavier amount of ink at the foot of an h to be a b instead. This, and considerable confusion created by numbers, punctuation and some of the original authors’ idiosyncratic writing made for an uphill struggle, with sometimes unintentionally comical consequences (“the men are hoping they’ll all be homo by Christmas.”)

So after doing my best to make sense of the initial output in text files and then Word documents, it became clear that checking these files – with the added challenge of not being able to predict where the errors would likely turn up, as you could expect in the work of a person – would take as much time, if not longer, to do to a reasonable standard than simply re-transcribing them, which is what I went on to do. To begin with I displayed the scanned .pdfs on my PC monitor and typed the .doc files up on a laptop, which sadly passed on, and since which I’ve been using the right-hand side of my thankfully wide PC screen to copy the contents of the left.

Transcribing the transcriptions has proven the better option – not simply for being an easier job, and less of a strain on the eyes and psyche. It’s meant that as I’ve gone through the letters I’ve been able to absorb more of the character of the writers, the precarity and adventure of their circumstances and how they all, through struggles difficult to imagine a century on, loved and missed their family and worked hard, week in week out for years of that abysmal conflict to assure them they were fit, chipper, stronger than ever, resolved to fight and survive and ready to win.

The question arose of how much, if any ‘editorialising’ should go on, no-one wishing to censor or misrepresent anything written. At first I thought it an error on the part of the original transcriber though that wasn’t the case but Ted, Richard and Paul apparently were all taught that the apostrophe in words like did’nt, has’nt and could’nt went, as written here, before the n. I’ve elected to keep that the case, along with idiosyncrasies like Ted’s “at anyrate”; but for ease of comprehension adding some punctuation where needed. Being originally handwritten letters in challenging environments, the flow of the text can be uneven and hard to read at times, but no less rewarding. There is even a precursor to textspeak, when they often sign off “yr loving son”.

Harrow

WWI veteran Richard Harrow, a character in the HBO drama Boardwalk Empire

Perhaps obviously the most illuminating aspect for me – I turn 40 on the centenary of the Gallipoli campaign in April next year – is how different the world has become, for our concept of warfare, our expectations and our notions of duty. Reading of Captain Berryman’s excitement at his first sighting of an aeroplane (once mentioned as a ‘plane, though not otherwise shortened) his childlike wonder at the beauty of it up against a clear blue sky completely overlooked its purpose – to find out where exactly he and his comrades were, for later bombardment. I can’t imagine what he’d make of today’s warplanes, which don’t even need a pilot; in a world where we can watch the horrors he endured re-enacted for our entertainment and edification (or titillation) in our homes, on contraptions half the size of his fancy new gramophone records; and where a letter from home, far from taking weeks on end to arrive, if indeed it did, can reach his pocket in little more time than it takes to switch a light on. I have to stop complaining about the 3G around here.

This was a privileged family in Edwardian England, rooted in Empire but the rigours of The War To End All Wars took its toll on officers and Tommies alike, and the horrors that ensued are recounted in sometimes grisly detail in the Berryman brothers’ letters home. As historical documents of such a dark period they are priceless, but they speak also to the innocence of the time; a grim dramatic irony overshadowing, for instance, talk of how the treatment of prisoners and civilians in 1915 became so abominable, the Germans absolutely had to be stopped to ensure that no such cruelty would ever be repeated.

I have plenty of work ahead and the more I progress, the more invested I’ve become. I relish opportunities to research some of the minutiae of life that emerges; requests for things to be sent over, or mentions of friends and comrades, which can descend into a rabbit-hole of Googling (of what?) but which rarely provides only a minimum of information and can turn up all sorts of surprises along the way. But as well as being a challenge and a privilege to be involved, it’s a valuable education. I look forward to discovering how life for the Berrymans progresses over the years of the war to come with trepidation, enthusiasm and at some point I hope, a larger monitor.

Chris

 
Leave a comment

Posted by on 20 January, '14 in About

 

Tags: , , , ,

About this Project: The proof of the pudding

It’s less than a year until the centenary of the start of WWI, and ten months until the first of the letters is scheduled for publication on the 6th July.  The last 10 months have been busy for me, but it’s time to pick this project up again.

A friend of mine has agreed to proof-read the letters for me. His name is Chris Miller and, appropriately for this project, he is the great-great-nephew of the Winslow Boy who was killed at Ypres on the 31st October 1914 aged 19.  If you want proof-reading done, just drop Chris a line.  His work is sorely needed on this project.

The OCR software I’m using is Abbyy FineReader and it’s fussy.  Like all powerful software you have to know how to configure it in order to get the best out of it.

The three screenshots below show just what a difference it made when I managed to re-tune the settings in the OCR software.  Here is one of the original letters, typed in a crazy 1970s typeface.

RCPBtoGFB19140804

Here is the Word version with the software sorting stuff out automatically. I’ve set MS Word to show a dot wherever there is a wordspace, to bring out just how impossible the results are.  As you can see it reads My dear Mother as M y d e a r M o t b c r.  With results this bad, retyping looked like the only option.

RCPBtoGFB19140804poor

Fiddling around with the settings has produced this; a change in quality so dramatic that it makes the project doable.

RCPBtoGFB19140804good

There is still plenty of proof-reading for Chris to do in order for the text to be postable here.  But the project is exciting again, rather than a heart-sink.

 
Leave a comment

Posted by on 1 September, '13 in About

 

Tags: