![]() |
[GSoC Proposal] N900 as an Accessibility device
This is my idea for my Summer of Code project. I'd like to get some discussion going around this, so I can get feedback and refine my proposal.
There are around 160 million visually impaired and blind people.* Those with less severe impairments can read with the help of magnifiers and CCTVs, while others rely on braille and text-to-speech technology. Unfortunately, equipment and software that can read normal printed text aloud is very expensive and rarely portable. The n900 has the right combination of hardware and software for an accessibility device. It has a quality camera, a relatively powerful processor, and video out for even better magnification. Because Maemo is a GNU/Linux OS, standard desktop software can be used to build accessibility technologies for Maemo. I plan to make an application for the Nokia 900 to allow blind and visually impaired people to read books, newspapers, magazines, signs, and other printed text on their own. The user simply opens the app and takes a picture of the document. The app then processes the image and reads it aloud. The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine. Linux software I plan to use: OCRFeeder Tessaract (OCR engine) Unpaper espeak/festival/MBROLA Pocket Sphinx? (for voice control) AT-SPI Gtk Accessibility Interface Library Hildon Accessibility Interface Library Orca GDigicam/V4L2 *World Health Organization |
Re: [GSoC Proposal] N900 as an Accessibility device
I think there are aspects there that would be useful to everyone, not just visually impaired people. I for one can see it extremely useful being able to take a photo of something and have OCR just automatically process it into a document, hassle free.
There is one snag though I can see, can someone visually impaired use the N900 enough in the first to open said application? I sometimes have trouble myself getting the right application link to activate, though I suppose you could make it double sized. |
Re: [GSoC Proposal] N900 as an Accessibility device
I'm not visually impaired, and I can see it being useful to me as well. This project would also enable blind users to be a part of the maemo community. An n900 with open source software for accessibility is a lot cheaper than any similar devices, so there's a lot of value in that alone.
Part of this project would be to create a launcher for visually impaired users to open accessible applications. In particular, I would like to make the phone accessible if I have time, so blind users can carry an n900 in place of their current phone as well. Based on discussions on irc, this would probably be something more simplistic than making hildon-desktop accessible for the purposes of SoC |
Re: [GSoC Proposal] N900 as an Accessibility device
I'm visually impaired (Aniridia in both eyes), and I'd really love such an app :)
There already is an app which I cannot recall the name of currently that can translate signs and similar things using OCR. You might want to have a look at that too. My biggest gripe about Maemo 5''s UI is how incosistent the font size is in the different aspects. i.e. for me the numers in the phone app or the details in the app manager are unreadable :) |
Re: [GSoC Proposal] N900 as an Accessibility device
Everybody would benefit from some of these subjects, like voice control.
|
Re: [GSoC Proposal] N900 as an Accessibility device
I have uploaded a video demoing what this would do. Some parts were done by hand and on the desktop, but it gives a good idea of what the n900 could do.
http://www.youtube.com/watch?v=GGOZYGM5sOs |
Re: [GSoC Proposal] N900 as an Accessibility device
Optical Page Reader
Abstract I plan to make an application for the Nokia 900 to allow blind and visually impaired people to read books, newspapers, magazines, signs, and other printed text on their own. The user simply opens the app and takes a picture of the document. The app then processes the image and reads it aloud. The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine. General Project Description There are around 160 million visually impaired and blind people.* Those with less severe impairments can read with the help of magnifiers and CCTVs, while others rely on braille and text-to-speech technology. Unfortunately, equipment and software that can read normal printed text aloud is very expensive and rarely portable. The n900 has the right combination of hardware and software for an accessibility device. It has a quality camera, a relatively powerful processor, and video out for even better magnification. Because Maemo is a GNU/Linux OS, standard desktop software can be used to build accessibility technologies for Maemo. I will make an application that will magnify and read printed text aloud. The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine. Implementation Details: Camera Utility: The camera utility will capture images from the n900's 5 MP rear camera. I plan to use Gdigicam or v4l2 to access the camera. This should be a simpler part of the project, since there are other apps that use the n900 camera to reference. The camera utility will feed the images to be processed for OCR or magnification. Image Processing Utility: Based on several tests I found that images recorded by the n900 camera need some processing to get good results from OCRfeeder and Tesseract. Each image needs to be thresholded, taking into account the variations in lighting on objects. So far, dividing the image into smaller blocks, determining the average background shade, setting pixels that are very different from that value to black and the rest to white seems to give the best result. I will begin by writing a utility that prepares the image for OCRing. OCR components: This application recognizes the layout of text and images in a document and uses OCR engines like Tessaract to read the text. A backend based on OCRFeeder will handle the unpaper and OCR processing of the image and return the text to be read. Unpaper will be used to help with orienting the text better and removing erroneous dark patches before OCRing the image. Tesseract was recently ported to Maemo, and is also the most accurate of the open source OCR engines. Accessibility Applications and Libraries: There are two options for speech output, using a text-to-speech engine directly or using GNOME accessibility software. Using a text-to-speech engine would be much simpler to implement, while GNOME accessibility software would provide speech output capabilities for the application. This provides several benefits over interfacing directly with a text-to-speech engine like espeak or festival. First, it abstracts the application from the specific TTS software, and second, it lays the groundwork for accessibility in other applications. Time permitting, I would like to take this second approach, but either would serve the basic purpose of the project. The relevant applications and libraries are GAIL, HAIL, AT-SPI, ORCA, and Festival. Page Reader Front End: This component will control the camera utility, image processing utility, and OCR components to get useful image and text data. Once the images and text have been captured and processed, it will then display the magnified image on the screen and read the text aloud. The user interface will provide options to zoom and pan the image, activate or deactivate the text-to-speech output, and control the voice. Launcher: Since making Hildon-Desktop accessible is beyond the scope of this project, I will write a simple UI that blind users can use to open a set of fully accessible applications and configure basic settings. With GNOME accessibility libraries ported to Maemo, some applications will become accessible either immediately or with a some additional work. Hopefully, phone, email, and notes applications will be accessible. I have mocked up an accessible talking launcher. Interim Period April-May 24 During this period, I will: 1.Discuss improvements and refine the project. 2.Read and understand relevant code and documentation (OCRFeeder, Festival, GNOME A11y, etc.). Code Period May 24 - August 2 Write image processing utility May 24 - June 7 Write camera utility June 7 - June 14 Port OCRFeeder, Tesseract, Unpaper, Festival, June 14 - July 28 Write front end June 28 - July 12 Create launcher and integrate GNOME a11y software July 12 - August 2 Testing Period August 2 - August 16 During this period I will focus my efforts on accessibility testing with users with the blind and visually impaired. Based upon observation and feedback, I will make adjustments to make improvements. Final Evaluation Period August 16 - August 20 |
Re: [GSoC Proposal] N900 as an Accessibility device
Did you get this? Has the deadline passed? I know people at the NFB, due to my work on the DIYBookScanner project. Get in touch, maybe I can get you a sponsor letter or something.
|
Re: [GSoC Proposal] N900 as an Accessibility device
Quote:
|
Re: [GSoC Proposal] N900 as an Accessibility device
By the way, I sent Cory Doctorow the video after I saw a related post on his blog. He thinks it's way cool: http://craphound.com/
|
All times are GMT. The time now is 07:19. |
vBulletin® Version 3.8.8