Enhancing Text Image Super-Resolution by Intentionally Weakening OCR Loss to Impose Stricter Reconstruction Constraints on the SR Network
K. Mehrgan
1
(
Dept. of Elec. Eng., Ferdowsi University of Mashhad, Mashhad, Iran
)
A. Ebrahimi moghadam
2
(
Dept. of Elec. Eng., Ferdowsi University of Mashhad, Mashhad, Iran
)
M. Khademi Doroh
3
(
Dept. of Elec. Eng., Ferdowsi University of Mashhad, Mashhad, Iran
)
Keywords: Super-resolution, text Image recognition, intentional loss weakening, intelligent feedback.,
Abstract :
Low-resolution text images often lead to significant errors in Optical Character Recognition (OCR), negatively impacting the performance of automated text recognition systems. Text image super-resolution (SR) is a critical step for improving OCR accuracy, particularly when dealing with inputs of very low resolution. While conventional SR methods succeed in enhancing general image quality, they often struggle to preserve the fine-grained details and structural integrity of characters. In this paper, we propose a novel text super-resolution method that leverages intelligent feedback; by intentionally weakening the OCR loss, our approach imposes stricter reconstruction constraints on the SR network. This unique approach specifically guides the network to generate images that faithfully preserve character structures. The modified loss function compels the SR network to reconstruct fine details lost in the low-resolution input, thereby leading to a significant improvement in downstream OCR accuracy. Experimental results demonstrate that our method not only enhances visual clarity but also boosts the accuracy of subsequent OCR systems by approximately 10% compared to the original low-resolution images. This novel approach represents an effective step toward optimizing the pipeline for text recognition from low-resolution inputs.
[1] R. Shu, C. Zhao, S. Feng, L. Zhu, and D. Miao, "Text-enhanced scene image super-resolution via stroke mask and orthogonal attention," IEEE Trans. on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6317-6330, Nov. 2023.
[2] J. Ma, S. Guo, and L. Zhang, "Text prior guided scene text image super-resolution," IEEE Trans. on Image Processing, vol. 32, pp. 1341-1353, 2023.
[3] J. Ma, Z. Liang, and L. Zhang, "A text attention network for spatial deformation robust scene text image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 5911-5920, New Orleans, LA, USA, 19-24 Jun. 2022.
[4] ع. عابدی و ا. کبیر، "فراتفکیکپذیری مبتنی بر نمونه تکتصویر متن با روش نزول گرادیان ناهمزمان ترتیبی،" نشریه مهندسی برق و مهندسی کامپیوتر ایران، ب- مهندسی کامپیوتر، سال 14، شماره 3، صص. 192-177، پاییز 1395.
[5] K. Mehrgan, A. R. Ahmadyfard, and H. Khosravi, "Super-resolution of license-plates using weighted interpolation of neighboring pixels from video frames," International J. of Engineering, Trans. B: Applications, vol. 33, no. 5, pp. 992-999, May 2020.
[6] C. Dong, C. C. Loy, K. He, and X. Tang, "Learning a deep convolutional network for image super-resolution," in Proc. 13th European Conf, Computer Vision, pp. 184-199, Zurich, Switzerland, 6-12 Sept. 2014.
[7] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, "Video super-resolution with convolutional neural networks," IEEE Trans. Comput Imaging, vol. 2, no. 2, pp. 109-122, Jun. 2016.
[8] M. Hradiš, J. Kotera, P. Zemcık, and F. Šroubek, "Convolutional neural networks for direct text deblurring," in Proc. of the British Machine Vision Conf., 13 pp., Swansea, UK, 7-10 Dec. 2015.
[9] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE Trans. Pattern Anal Mach Intell, vol. 38, no. 2, pp. 295-307, Feb. 2015.
[10] D. Gudivada and P. K. Rangarajan, "Enhancing PROBA-V satellite imagery for vegetation monitoring using FSRCNN-based super-resolution," in Proc. Int. Conf. on Next Generation Electronics, 6 pp., Vellore, India, 14-16 Dec. 2023.
[11] J. Zhang, M. Liu, X. Wang, and C. Cao, "Residual net use on FSRCNN for image super-resolution," in Proc. 40th Chinese Control Conf., pp. 8077-8083, Shanghai, China, 26-28 Jul. 2021. [12] T. Khachatryan, D. Galstyan, and E. Harutyunyan, "A comprehensive approach for enhancing deep learning datasets quality using combined SSIM algorithm and FSRCNN," in Proc. IEEE East-West Design & Test Symp., 4 pp., 22-25 Sept. 2023.
[13] Y. Zhu, X. Sun, W. Diao, H. Li, and K. Fu, "RFA-Net: reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery," IEEE J. Sel Top Appl Earth Obs Remote Sens, vol. 15, pp. 5689-5703, 2022.
[14] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, "Deep networks for image super-resolution with sparse prior," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 370-378, Santiago, Chile, 7-13 Dec. 2015.
[15] M. Chen, et al., "RFA-Net: residual feature attention network for fine-grained image inpainting," Engineering Applications of Artificial Intelligence, vol. 119, Article ID: 105814, Mar. 2023.
[16] Z. Wang and J. Tang, "Advancing quality and detail: enhanced-lapSRN for chip socket image super-resolution," in Proc. Int. Conf. on Image Processing, Computer Vision and Machine Learning, pp. 153-159, Chengdu, China, 3-5 Nov. 2023.
[17] R. Tang, et al., "Medical image super-resolution with Laplacian dense network," Multimedia Tools and Applications, vol. 81, no. 3, pp. 3131-3144, Jan. 2022.
[18] K. Wu, C. K. Lee, and K. Ma, "Memsr: training memory-efficient lightweight model for image super-resolution," in Proc. 39th Int. Conf. on Machine Learning, pp. 24076-24092, Baltimore, MD, USA, 17-23 Jul. 2022.
[19] Z. Du, et al., "Fast and memory-efficient network towards efficient image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 853-862, New Orleans, LA, USA, 19-20 Jun. 2022.
[20] K. H. Liu, B. Y. Lin, and T. J. Liu, "MADnet: a multiple attention decoder network for segmentation of remote sensing images," in Proc. Int. Conf. on Consumer Electronics-Taiwan pp. 835-836, PingTung, Taiwan, 17-19 Jul. 2023.
[21] D. Zhang, W. Zhang, W. Lei, and X. Chen, "Diverse branch feature refinement network for efficient multi‐scale super‐resolution," IET Image Process, vol. 18, no. 6, pp. 1475-1490, May 2024.
[22] T. Tong, G. Li, X. Liu, and Q. Gao, "Image super-resolution using dense skip connections," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 4799-4807, Venice, Italy, 22-29 Oct. 2017.
[23] K. Zhang, W. Zuo, and L. Zhang, "Learning a single convolutional super-resolution network for multiple degradations," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3262-3271, Salt Lake City, UT, USA, 18-22 Jun. 2018.
[24] W. Zhang, Y. Liu, C. Dong, and Y. Qiao, "Ranksrgan: super resolution generative adversarial networks with learning to rank," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 10, pp. 7149-7166, Oct. 2021.
[25] C. Ledig, et al., "Photo-realistic single image super-resolution using a generative adversarial network," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4681-4690, Honolulu, HI, USA, 21-26 Jul. 2017.
[26] B. K. Xie, S. B. Liu, and L. Li, "Large-scale microscope with improved resolution using SRGAN," Optics & Laser Technology, vol. 179, Article ID: 111291, Dec. 2024. [27] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
[28] J. Baek, et al., "What is wrong with scene text recognition model comparisons? dataset and model analysis," in Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 4715-4723, Seoul, South, Korea, 27 Oct.-2 Nov. 2019.
[29] W. Wang, et al., "Scene text image super-resolution in the wild," in Proc. 16th European Conf. on Computer Vision, pp. 650-666, Glasgow, UK, 20-28 Aug. 2020.
[30] D. Karatzas, et al., "ICDAR 2015 competition on robust reading," in Proc. 13th Int. Conf. on Document Analysis and Recognition, pp. 1156-1160, Tunis, Tunisia, 23-26 Aug. 2015.
[31] K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proc. Int. Conf. on Computer Vision. pp. 1457-1464, Barcelona, Spain, 6-13 Nov. 2011.
[32] H. Zhao, X. Kong, J. He, Y. Qiao, and C. Dong, "Efficient image super-resolution using pixel attention," in Proc., Computer Vision-ECCV Workshops, pp. 56-72, Glasgow, UK, 23-28 Aug. 2020.
[33] S. Anwar and N. Barnes, "Densely residual laplacian super-resolution," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 3, pp. 1192-1204, Mar. 2022.
[34] [34] H. Chen, J. Gu, and Z. Zhang, Attention in Attention Network for Image Super-Resolution, arXiv Preprint, arXiv:2104.09497, 2021.
[35] X. Chen, X. Wang, J. Zhou, and C. Dong, "Activating more pixels in image super-resolution transformer," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 22367-22377, Vancouver, Canada, 18-22 Jun, 2023.
[36] Z. Chen, Y. Zhang, J. Gu, L. Kong, X. Yang, and F. Yu, "Dual aggregation transformer for image super-resolution," in Proc. IEEE/CVF Int. Conf. on Computer Vision, pp. 12278-12287, Vancouver, Canada, 18-22 Jun, 2023.