Promoting Historically-Inspired Performances of Early Music and Baroque Opera

Home Page

Directory of Composers

Handel Operas

Handel Oratorios

Vivaldi Operas

Early Music Links



Early Music Festivals

Early Music Performer Discographies

New Books on Early Music

Charpentier Discography

Upcoming Performances with HIP Staging

Early Music Mailing List Archives



Tiny Type in CD Booklets -- No Solution in Sight

The most widely recognized problem with the record industry's forced replacement of the LP with the CD was the vastly inferior sound of CDs. (However, for the moronic majority, the sound from any cheap CD player was an improvement over what emerged from their dreadful record grinders.) In the 25 years since the first CDs were released, sound quality has improved significantly due to improvements in digital recorders and playback equipment. The best CDs played on the best available equipment (e.g., network-attached storage feeding the digital file from a CD to a Linn Klimax DS) is said to be comparable to or better than the sound of a fine analogue LP played on a top quality turntable front end.

The industry changeover to CD had an additional impact on classical music listeners, especially opera enthusiasts, that remains unresolved. LPs came with LP-sized pamphlets that contained readable librettos printed in the same font sizes found in books and magazines. In contrast, CDs come with CD-sized mini-booklets. The type fonts are so small that most people over 50 have trouble reading them. I have to take off my glasses with progressive lenses to read CD booklet librettos -- a real nuisance, since I need my glasses for normal reading and for everything else. Some booklets use type so small that it's difficult to read them without a magnifying glass. Collins Classics was the worst in that regard. They truly deserved to go out of business.

A few very high quality, independent record labels such as Glossa and Challenge Classics post their CD liner notes online. However, most labels, including all of the labels owned by the multinational media conglomerates, do not. They would much rather alienate their best customers than risk the possibility that someone might see their liner notes without paying. If record companies posted liner notes on the Internet before going to press, they would have the benefit of free, multi-lingual proofreading. Abominations such as the comically incompetent, machine libretto translation in Vivaldi's Arsilda on cpo would have been avoided.

At the least, record companies ought to make liner notes available in digital form upon request to customers who have purchased their CDs. Doesn't anyone in the industry recognize the serious usability problems older people -- who must comprise a substantial percentage of the purchasers of classical recordings -- have with the tiny type in CD booklets?

While it does not appear that the industry will offer a remedy anytime soon, hardware and software products are slowly emerging that may someday allow users to blow up the tiny type themselves without destroying the fragile booklets. Copying or scanning most CD booklets on typical flatbed scanners damages the booklets and yields an uneven and unacceptable scan. A possible solution is use a book scanner, a new class of product designed to scan the interior of pages without damaging book bindings.

Only a handful of companies produce book scanners. Kirtas and 4DigitalBooks offer expensive units for libraries with robotic, automatic page turning that cost more than $100,000. It would be interesting to know whether they work with CD booklets. If they do, perhaps some libraries will scan their CD booklets.

Indus sells three models of book scanners in which the book is opened flat on a tray below an overhead scanner. The software corrects for page curvature. (E.g., the Indus Book Scanner 5001). Scanners such as this will work only on books that can be opened flat, thus excluding many CD booklets. Similar products are offered by DLSG, the Digital Library Systems Group, e.g., the Bookeye and Bookeye 3, and i2s DigiBook. (Hopefully the i2S DigiBook scanner works better than their clunky and incompetent website.)

A much less expensive book scanner now available is the Plustek OpticBook 3600, which costs about $240. Be sure to read the unfavorable Amazon reviews before buying -- the included software reportedly periodically corrupts the Windows Registry! It may enable determined and technically expert users to scan some CD booklets page by page without damaging the booklets, since books only need to be opened to 90 degrees. (pdf brochure).

Scanning produces an image file that can be viewed onscreen, but to capture the text for font manipulation and enlargement, it is necessary to process the image file with Optical Character Recognition (OCR) software. Unfortunately, the best available OCR programs fall far short of user expectations. They work quite well in digitizing standard, typewritten documents but stumble on imperfect print, imperfect scans, non-standard fonts, small font sizes, italics, and foreign characters. Beware of magazine reviews, as they typically test OCR programs using simple, standard documents. No available OCR software is likely to achieve satisfactory results with CD booklet scans.

OCR software has improved little by little over the past decade but still has a long way to go. Prime Recognition, which sells expensive ($1,500 and up), OCR software (PrimeOCR), calculates:

[Using conventional OCR programs:]

A 2000 character page would generate:

  • 74 marked characters as suspicious
  • 40 true errors
  • 25 errors that were not marked as suspicious, and hence left in the data after manual error correction (40 * 62%)

[Using Prime Recognition OCR software:]

  • 74 characters marked as suspicious. (Defined to be equal to conventional OCR engine)
  • 14 true errors (65% fewer errors)
  • 6.3 errors left in the data after manual error correction (14 * 45%)
  • The results they calculate for conventional OCR are consistent with my experience with OmniPage 12. My results have been much worse with small type, italics, and foreign characters, and deteriorate to complete gibberish with very small type. Even assuming that Prime Recognition's software would work on CD booklets as effectively as in the example above (almost certainly an incorrect assumption), 20 errors per page is simply unacceptable.

    Thus, while it may now be possible to scan CD booklets with a book scanner, using even the best available OCR program will produce text files requiring painstaking proofreading and error correction. Perhaps after another 10 years of incremental improvements in OCR, together with increases in CPU speed and complexity and declines in the cost of memory, OCR will become a viable option.

    Copyright © 2002-2015 John Wall