CMU 15-418 (Spring 2012) Final Project:
A Parallel Algorithm For Noisy or Distorted Character Recognition
Samuel Russell, Rob Waaser

Project Proposal

Checkpoint Report

Final Report

Working Schedule

Week What We Plan To Do What We Actually Did
Apr 1-7Implement the CUDA framework. Implemented CUDA framework
Apr 8-14Implement distortion filters and version 1 of matching function Implemented v1, v2, and v3 of image matching function. Using ImageMagic to generate distortions
Apr 15-22Create automated accuracy testing scripts
Experiment with different versions of matching function
Automated Accuracy (and timing) test scripts were created.
Character libraries for and PHP Captchas were created.
Post-Processing code was written
Apr 23-26Implement timing code and sequential version
Create Char Libraries for OCR Failures and Yahoo Mail Capthcas
Refine Post-processing algorithm
Implemented timing code and sequential version
Apr 27-29Work on parallel performance optimizations
Create Char Library for reCaptcha
Performance Optimizations
Apr 30-May 3Work on parallel performance optimizations Performance Optimizations
May 4-6Develop method to distribute work between multiple cluster machines and collect results Added functions for direct bmp reading
Built recaptcha library
May 7-10Tweak parameters to maximize accuracy and performance
Create final presentation.
Write demo-code to automatically pull down captchas images and track accuracy

Working Results
Target Example Character Accuracy Word Accuracy Sequential Time (per captcha) Parallel Time (per captcha) Speedup
OCR Failures Not Implemented 89% 50% 7248ms 225ms (151ms kernel) 32x
PHP Captchas 75% 38% 18185ms 504ms (387ms kernel) 36x
reCaptcha 55% 10% 450ms (300ms kernel)
Working Log

4/2/12: Created Project Website, Wrote Initial Proposal
4/8/12: Primary code structure is up. Working on reading in bmps. Target captchas / images are chosen .
4/15/12: Experimenting with different matching functions. The two most promessing so far are a sum of pixel-by-pixel multiplication, and a edge-closeness function. the multiplication will likely work better for captchas that won't need edge detection. For reCaptcha which uses inversion, we will have to use edge detection images and the edge-closeness may prove better.
4/16/12: It looks like ImageMagick will be used as an out-of-the-box solution for creating letter distortions.
4/17/12: ImageMagick was used to create a small library for PHP Captcha. Mangal was the base font, and a set of distortions were applied including horizontal resize, vertical resize, and BilinearForward distortions based on control points. Accuracy increased fairly significantly.
A page showing sample maps was created.
4/19/12: Post Processing function was implemented to filter maps and generate a guess. Sample output is available here for one of the successfull php captchas
4/26/12: Sequential version is implemented and timed. Speedup is good but can be improved. One of the big areas I want to focus on is reducing memory transfer required (specifically the results buffer) can be minimized.
4/27/12: Yahoo mail has changed the format of their captchas. I only have 10 test images I saved of the old format. The new format appears to use a range of fonts, fairly severe distortions, and edge filters the images. Theoretically still solvable with our algorithm but difficult definitely went up compared to old format, most notably with their use of multiple fonts.
5/1/12: Made a sizeable speed improvement by reducing columns to the maximum in a CUDA kernel, and then only copying one row worth of maximums back from GPU. This operation was previously being done sequentially by post-processing.
Further optimizations by minimzing kernel operations
5/4/12: Added functions to read bmp files directly instead of converting
Build initial recaptcha library
5/8/12: Fixed bug in bmp reading functions
Expanded recaptcha library. Got to ~50% letter accuracy without post-processing implemented.