Feature/ocr preprocessing error checking by bradleyrule · Pull Request #37 · NovakLabOSU/FracFeedExtractor

bradleyrule · 2026-02-16T03:13:32Z

Refactored PDF text extraction to make better use of OCR by improving error checking and OCR pre-processing.

1. Improved error checking for text extraction

check_spelling()
- Uses pyspellchecker to output the ratio of misspelled words to total words
Current threshold for using OCR is > 5% of words misspelled

2. Reduced code redundancy

3. Improved OCR pre-processing

…sing-error-checking

bradleyrule added 5 commits January 18, 2026 17:59

Added image denoising to OCR

33cab3a

Merge remote-tracking branch 'origin/main' into feature/ocr-preproces…

3374e0e

…sing-error-checking

reworked OCR implementation to improve text quality

1902005

Added docstrings to functions

0f6ee38

Fixed formatting errors

5bf631c

Provide feedback