A CNN-based captcha solver for Taiwan Railway booking website with a training set generator that mimics captcha style and uses data augmentation.
simple-railway-captcha-solver is a deep learning project that uses convolutional neural networks to automatically recognize and solve captchas from the Taiwan Railway booking website. It solves the problem of limited labeled training data by generating synthetic captchas that mimic the real ones and by augmenting a small set of manually labeled images. The project demonstrates a complete pipeline from data generation to model training and real-world testing.
Students, researchers, and developers interested in computer vision, deep learning, and practical applications of CNNs for solving captcha systems. It's particularly relevant for those studying data generation techniques and model deployment in constrained data scenarios.
It provides a fully documented, open-source implementation that tackles the real-world challenge of captcha solving with limited labeled data, offering insights into synthetic data generation, multi-output CNN architectures, and practical web automation integration.
實作基於CNN的台鐵訂票驗證碼辨識以及透過模仿及資料增強的訓練集產生器 (Simple captcha solver based on CNN and a training set generator by imitating the style of captcha and data augmentation)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieved up to 99.39% per-digit accuracy and 91.57% full-captcha success rate in real-world tests, demonstrating effective model design for the target captcha.
Creates realistic training data by meticulously mimicking captcha styles, fonts, and noise, effectively overcoming labeled data scarcity for specialized computer vision tasks.
Uses separate CNNs for 5-digit, 6-digit, and length classification, handling variable-length captchas robustly as shown in the detailed model implementations.
Includes Selenium-based scripts for browser automation to evaluate performance directly on the live website, providing practical deployment insights.
The project is archived since 2022, targets a captcha system replaced in 2019, and the author admits to messy code and unresolved issues, making it unreliable for current use.
Relies on TensorFlow 1.4.0 and Keras 2.1.2, which are obsolete and may have compatibility issues with modern systems, requiring significant effort to update.
The README admits difficulty in balancing English letters and digits, leading to near-zero accuracy for letters in some models due to limited data augmentation for rare classes.