View Single Post
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#7
Optical Page Reader
Abstract
I plan to make an application for the Nokia 900 to allow blind and visually impaired people to read books, newspapers, magazines, signs, and other printed text on their own. The user simply opens the app and takes a picture of the document. The app then processes the image and reads it aloud.
The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine.

General Project Description
There are around 160 million visually impaired and blind people.* Those with less severe impairments can read with the help of magnifiers and CCTVs, while others rely on braille and text-to-speech technology. Unfortunately, equipment and software that can read normal printed text aloud is very expensive and rarely portable.
The n900 has the right combination of hardware and software for an accessibility device. It has a quality camera, a relatively powerful processor, and video out for even better magnification. Because Maemo is a GNU/Linux OS, standard desktop software can be used to build accessibility technologies for Maemo.
I will make an application that will magnify and read printed text aloud. The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine.

Implementation Details:
Camera Utility:
The camera utility will capture images from the n900's 5 MP rear camera. I plan to use Gdigicam or v4l2 to access the camera. This should be a simpler part of the project, since there are other apps that use the n900 camera to reference. The camera utility will feed the images to be processed for OCR or magnification.
Image Processing Utility:
Based on several tests I found that images recorded by the n900 camera need some processing to get good results from OCRfeeder and Tesseract. Each image needs to be thresholded, taking into account the variations in lighting on objects. So far, dividing the image into smaller blocks, determining the average background shade, setting pixels that are very different from that value to black and the rest to white seems to give the best result. I will begin by writing a utility that prepares the image for OCRing.
OCR components:
This application recognizes the layout of text and images in a document and uses OCR engines like Tessaract to read the text. A backend based on OCRFeeder will handle the unpaper and OCR processing of the image and return the text to be read. Unpaper will be used to help with orienting the text better and removing erroneous dark patches before OCRing the image. Tesseract was recently ported to Maemo, and is also the most accurate of the open source OCR engines.
Accessibility Applications and Libraries:
There are two options for speech output, using a text-to-speech engine directly or using GNOME accessibility software. Using a text-to-speech engine would be much simpler to implement, while GNOME accessibility software would provide speech output capabilities for the application. This provides several benefits over interfacing directly with a text-to-speech engine like espeak or festival. First, it abstracts the application from the specific TTS software, and second, it lays the groundwork for accessibility in other applications. Time permitting, I would like to take this second approach, but either would serve the basic purpose of the project. The relevant applications and libraries are GAIL, HAIL, AT-SPI, ORCA, and Festival.
Page Reader Front End:
This component will control the camera utility, image processing utility, and OCR components to get useful image and text data. Once the images and text have been captured and processed, it will then display the magnified image on the screen and read the text aloud. The user interface will provide options to zoom and pan the image, activate or deactivate the text-to-speech output, and control the voice.
Launcher:
Since making Hildon-Desktop accessible is beyond the scope of this project, I will write a simple UI that blind users can use to open a set of fully accessible applications and configure basic settings. With GNOME accessibility libraries ported to Maemo, some applications will become accessible either immediately or with a some additional work. Hopefully, phone, email, and notes applications will be accessible. I have mocked up an accessible talking launcher.




Interim Period
April-May 24
During this period, I will:
1.Discuss improvements and refine the project.
2.Read and understand relevant code and documentation (OCRFeeder, Festival, GNOME A11y, etc.).
Code Period
May 24 - August 2
Write image processing utility
May 24 - June 7
Write camera utility
June 7 - June 14
Port OCRFeeder, Tesseract, Unpaper, Festival,
June 14 - July 28
Write front end
June 28 - July 12
Create launcher and integrate GNOME a11y software
July 12 - August 2
Testing Period
August 2 - August 16
During this period I will focus my efforts on accessibility testing with users with the blind and visually impaired.
Based upon observation and feedback, I will make adjustments to make improvements.
Final Evaluation Period
August 16 - August 20