Active Topics

 


Reply
Thread Tools
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#1
This is my idea for my Summer of Code project. I'd like to get some discussion going around this, so I can get feedback and refine my proposal.


There are around 160 million visually impaired and blind people.* Those with less severe impairments can read with the help of magnifiers and CCTVs, while others rely on braille and text-to-speech technology. Unfortunately, equipment and software that can read normal printed text aloud is very expensive and rarely portable.
The n900 has the right combination of hardware and software for an accessibility device. It has a quality camera, a relatively powerful processor, and video out for even better magnification. Because Maemo is a GNU/Linux OS, standard desktop software can be used to build accessibility technologies for Maemo.
I plan to make an application for the Nokia 900 to allow blind and visually impaired people to read books, newspapers, magazines, signs, and other printed text on their own. The user simply opens the app and takes a picture of the document. The app then processes the image and reads it aloud.
The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine.

Linux software I plan to use:
OCRFeeder
Tessaract (OCR engine)
Unpaper
espeak/festival/MBROLA
Pocket Sphinx? (for voice control)
AT-SPI
Gtk Accessibility Interface Library
Hildon Accessibility Interface Library
Orca
GDigicam/V4L2


*World Health Organization
 

The Following User Says Thank You to dmj726 For This Useful Post:
Posts: 716 | Thanked: 303 times | Joined on Sep 2009 @ Sheffield, UK
#2
I think there are aspects there that would be useful to everyone, not just visually impaired people. I for one can see it extremely useful being able to take a photo of something and have OCR just automatically process it into a document, hassle free.

There is one snag though I can see, can someone visually impaired use the N900 enough in the first to open said application? I sometimes have trouble myself getting the right application link to activate, though I suppose you could make it double sized.
__________________
http://www.speedtest.net/result/877713446.png

My Websites
CSD Projects - Flickr - UAE4Maemo (UAE4All Compatibility List)

Favourite N900 Applications
Picodrive - UAE4All

Please post your UAE4All compatibility reports. Even better, post them to my UAE4Maemo site!
Not sure how UAE4All works such as mouse emulation? Read the FAQ.
 
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#3
I'm not visually impaired, and I can see it being useful to me as well. This project would also enable blind users to be a part of the maemo community. An n900 with open source software for accessibility is a lot cheaper than any similar devices, so there's a lot of value in that alone.
Part of this project would be to create a launcher for visually impaired users to open accessible applications. In particular, I would like to make the phone accessible if I have time, so blind users can carry an n900 in place of their current phone as well. Based on discussions on irc, this would probably be something more simplistic than making hildon-desktop accessible for the purposes of SoC

Last edited by dmj726; 2010-03-28 at 18:15.
 
Posts: 296 | Thanked: 47 times | Joined on Oct 2009
#4
I'm visually impaired (Aniridia in both eyes), and I'd really love such an app
There already is an app which I cannot recall the name of currently that can translate signs and similar things using OCR. You might want to have a look at that too.

My biggest gripe about Maemo 5''s UI is how incosistent the font size is in the different aspects.
i.e. for me the numers in the phone app or the details in the app manager are unreadable
 
volt's Avatar
Posts: 1,309 | Thanked: 1,187 times | Joined on Nov 2008
#5
Everybody would benefit from some of these subjects, like voice control.
 
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#6
I have uploaded a video demoing what this would do. Some parts were done by hand and on the desktop, but it gives a good idea of what the n900 could do.
http://www.youtube.com/watch?v=GGOZYGM5sOs
 
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#7
Optical Page Reader
Abstract
I plan to make an application for the Nokia 900 to allow blind and visually impaired people to read books, newspapers, magazines, signs, and other printed text on their own. The user simply opens the app and takes a picture of the document. The app then processes the image and reads it aloud.
The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine.

General Project Description
There are around 160 million visually impaired and blind people.* Those with less severe impairments can read with the help of magnifiers and CCTVs, while others rely on braille and text-to-speech technology. Unfortunately, equipment and software that can read normal printed text aloud is very expensive and rarely portable.
The n900 has the right combination of hardware and software for an accessibility device. It has a quality camera, a relatively powerful processor, and video out for even better magnification. Because Maemo is a GNU/Linux OS, standard desktop software can be used to build accessibility technologies for Maemo.
I will make an application that will magnify and read printed text aloud. The software will use the rear camera to capture an image of the document then perform some image processing to increase contrast and adjust the angle. The processed image will be sent to an OCRFeeder-based backend, which will analyze the layout and recognize the text. The application will take this text and output it via a text-to-speech engine.

Implementation Details:
Camera Utility:
The camera utility will capture images from the n900's 5 MP rear camera. I plan to use Gdigicam or v4l2 to access the camera. This should be a simpler part of the project, since there are other apps that use the n900 camera to reference. The camera utility will feed the images to be processed for OCR or magnification.
Image Processing Utility:
Based on several tests I found that images recorded by the n900 camera need some processing to get good results from OCRfeeder and Tesseract. Each image needs to be thresholded, taking into account the variations in lighting on objects. So far, dividing the image into smaller blocks, determining the average background shade, setting pixels that are very different from that value to black and the rest to white seems to give the best result. I will begin by writing a utility that prepares the image for OCRing.
OCR components:
This application recognizes the layout of text and images in a document and uses OCR engines like Tessaract to read the text. A backend based on OCRFeeder will handle the unpaper and OCR processing of the image and return the text to be read. Unpaper will be used to help with orienting the text better and removing erroneous dark patches before OCRing the image. Tesseract was recently ported to Maemo, and is also the most accurate of the open source OCR engines.
Accessibility Applications and Libraries:
There are two options for speech output, using a text-to-speech engine directly or using GNOME accessibility software. Using a text-to-speech engine would be much simpler to implement, while GNOME accessibility software would provide speech output capabilities for the application. This provides several benefits over interfacing directly with a text-to-speech engine like espeak or festival. First, it abstracts the application from the specific TTS software, and second, it lays the groundwork for accessibility in other applications. Time permitting, I would like to take this second approach, but either would serve the basic purpose of the project. The relevant applications and libraries are GAIL, HAIL, AT-SPI, ORCA, and Festival.
Page Reader Front End:
This component will control the camera utility, image processing utility, and OCR components to get useful image and text data. Once the images and text have been captured and processed, it will then display the magnified image on the screen and read the text aloud. The user interface will provide options to zoom and pan the image, activate or deactivate the text-to-speech output, and control the voice.
Launcher:
Since making Hildon-Desktop accessible is beyond the scope of this project, I will write a simple UI that blind users can use to open a set of fully accessible applications and configure basic settings. With GNOME accessibility libraries ported to Maemo, some applications will become accessible either immediately or with a some additional work. Hopefully, phone, email, and notes applications will be accessible. I have mocked up an accessible talking launcher.




Interim Period
April-May 24
During this period, I will:
1.Discuss improvements and refine the project.
2.Read and understand relevant code and documentation (OCRFeeder, Festival, GNOME A11y, etc.).
Code Period
May 24 - August 2
Write image processing utility
May 24 - June 7
Write camera utility
June 7 - June 14
Port OCRFeeder, Tesseract, Unpaper, Festival,
June 14 - July 28
Write front end
June 28 - July 12
Create launcher and integrate GNOME a11y software
July 12 - August 2
Testing Period
August 2 - August 16
During this period I will focus my efforts on accessibility testing with users with the blind and visually impaired.
Based upon observation and feedback, I will make adjustments to make improvements.
Final Evaluation Period
August 16 - August 20
 
Posts: 277 | Thanked: 348 times | Joined on Nov 2009 @ Fargo, North Dakota, USA
#8
Did you get this? Has the deadline passed? I know people at the NFB, due to my work on the DIYBookScanner project. Get in touch, maybe I can get you a sponsor letter or something.
 
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#9
Originally Posted by fake View Post
Did you get this? Has the deadline passed? I know people at the NFB, due to my work on the DIYBookScanner project. Get in touch, maybe I can get you a sponsor letter or something.
They'll announce who's accepted in a few weeks. If you're interested in mentoring or just like the project, you might want to send an email to vdv100 at gmail dot com.
 
Posts: 148 | Thanked: 92 times | Joined on Oct 2009
#10
By the way, I sent Cory Doctorow the video after I saw a related post on his blog. He thinks it's way cool: http://craphound.com/
 
Reply


 
Forum Jump


All times are GMT. The time now is 08:14.