تقویت فراتفکیکسازی تصاویر متنی توسط تضعیف عامدانه تابع زیان خوانش برای اعمال سختگیری بیشتر بر شبکه فراتفکیکساز
کمیل مهرگان
1
(
دانشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران
)
عباس ابراهیمی مقدم
2
(
انشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران
)
مرتضی خادمی درح
3
(
دانشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران
)
کلید واژه: بازخورد هوشمندانه, تضعیف عامدانه تابع زیان, خوانش تصاویر متنی, فراتفکیکسازی.,
چکیده مقاله :
تصاویر متنی با وضوح پایین معمولاً باعث ایجاد خطاهای جدی در خوانش و بازیابی متن میشوند که این امر میتواند بر کارایی سیستمهای خوانش متن، تأثیر منفی بگذارد. فراتفکیکسازی تصاویر متنی، بهویژه در شرایطی که تصاویر اولیه دارای تفکیکپذیری پایینی هستند، از عوامل کلیدی در بهبود دقت سیستمهای خوانش متن است. روشهای سنتی فراتفکیکسازی، هرچند در بهبود کیفیت تصاویر موفق بودهاند، اما همچنان در حفظ جزئیات
دقیق حروف و ساختار متن با چالش مواجهند. در این پژوهش، روشی برای فراتفکیکسازی تصاویر متنی ارائه شده که با بهرهگیری از بازخورد هوشمندانه توسط تضعیف عامدانه تابع زیان خوانش، سختگیری بیشتری بر شبکه فراتفکیکساز اعمال کرده تا بهطور ویژه تصاویری تولید کند که در آن ساختار حروف بهخوبی حفظ شده باشد. این تابع زیان، شبکه فراتفکیکسازی را وادار به بازسازی جزئیات ازدسترفته در تصاویر کرده و دقت سیستمهای خوانش متن
را بهطور قابل توجهی بهبود میبخشد. نتایج تجربی نشان میدهند که این روش نهتنها به افزایش وضوح بصری تصاویر منجر میشود، بلکه کارایی و دقت سیستمهای خوانش متن را حدود ۱۰ درصد نسبت به تصاویر اولیه بهبود میبخشد. این رویکرد جدید گامی مؤثر در جهت بهینهسازی فرایندهای خوانش متن از تصاویر با تفکیکپذیری پایین به شمار میرود.
چکیده انگلیسی :
Low-resolution text images often lead to significant errors in Optical Character Recognition (OCR), negatively impacting the performance of automated text recognition systems. Text image super-resolution (SR) is a critical step for improving OCR accuracy, particularly when dealing with inputs of very low resolution. While conventional SR methods succeed in enhancing general image quality, they often struggle to preserve the fine-grained details and structural integrity of characters. In this paper, we propose a novel text super-resolution method that leverages intelligent feedback; by intentionally weakening the OCR loss, our approach imposes stricter reconstruction constraints on the SR network. This unique approach specifically guides the network to generate images that faithfully preserve character structures. The modified loss function compels the SR network to reconstruct fine details lost in the low-resolution input, thereby leading to a significant improvement in downstream OCR accuracy. Experimental results demonstrate that our method not only enhances visual clarity but also boosts the accuracy of subsequent OCR systems by approximately 10% compared to the original low-resolution images. This novel approach represents an effective step toward optimizing the pipeline for text recognition from low-resolution inputs.
[1] R. Shu, C. Zhao, S. Feng, L. Zhu, and D. Miao, "Text-enhanced scene image super-resolution via stroke mask and orthogonal attention," IEEE Trans. on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6317-6330, Nov. 2023.
[2] J. Ma, S. Guo, and L. Zhang, "Text prior guided scene text image super-resolution," IEEE Trans. on Image Processing, vol. 32, pp. 1341-1353, 2023.
[3] J. Ma, Z. Liang, and L. Zhang, "A text attention network for spatial deformation robust scene text image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 5911-5920, New Orleans, LA, USA, 19-24 Jun. 2022.
[4] ع. عابدی و ا. کبیر، "فراتفکیکپذیری مبتنی بر نمونه تکتصویر متن با روش نزول گرادیان ناهمزمان ترتیبی،" نشریه مهندسی برق و مهندسی کامپیوتر ایران، ب- مهندسی کامپیوتر، سال 14، شماره 3، صص. 192-177، پاییز 1395.
[5] K. Mehrgan, A. R. Ahmadyfard, and H. Khosravi, "Super-resolution of license-plates using weighted interpolation of neighboring pixels from video frames," International J. of Engineering, Trans. B: Applications, vol. 33, no. 5, pp. 992-999, May 2020.
[6] C. Dong, C. C. Loy, K. He, and X. Tang, "Learning a deep convolutional network for image super-resolution," in Proc. 13th European Conf, Computer Vision, pp. 184-199, Zurich, Switzerland, 6-12 Sept. 2014.
[7] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, "Video super-resolution with convolutional neural networks," IEEE Trans. Comput Imaging, vol. 2, no. 2, pp. 109-122, Jun. 2016.
[8] M. Hradiš, J. Kotera, P. Zemcık, and F. Šroubek, "Convolutional neural networks for direct text deblurring," in Proc. of the British Machine Vision Conf., 13 pp., Swansea, UK, 7-10 Dec. 2015.
[9] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE Trans. Pattern Anal Mach Intell, vol. 38, no. 2, pp. 295-307, Feb. 2015.
[10] D. Gudivada and P. K. Rangarajan, "Enhancing PROBA-V satellite imagery for vegetation monitoring using FSRCNN-based super-resolution," in Proc. Int. Conf. on Next Generation Electronics, 6 pp., Vellore, India, 14-16 Dec. 2023.
[11] J. Zhang, M. Liu, X. Wang, and C. Cao, "Residual net use on FSRCNN for image super-resolution," in Proc. 40th Chinese Control Conf., pp. 8077-8083, Shanghai, China, 26-28 Jul. 2021. [12] T. Khachatryan, D. Galstyan, and E. Harutyunyan, "A comprehensive approach for enhancing deep learning datasets quality using combined SSIM algorithm and FSRCNN," in Proc. IEEE East-West Design & Test Symp., 4 pp., 22-25 Sept. 2023.
[13] Y. Zhu, X. Sun, W. Diao, H. Li, and K. Fu, "RFA-Net: reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery," IEEE J. Sel Top Appl Earth Obs Remote Sens, vol. 15, pp. 5689-5703, 2022.
[14] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, "Deep networks for image super-resolution with sparse prior," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 370-378, Santiago, Chile, 7-13 Dec. 2015.
[15] M. Chen, et al., "RFA-Net: residual feature attention network for fine-grained image inpainting," Engineering Applications of Artificial Intelligence, vol. 119, Article ID: 105814, Mar. 2023.
[16] Z. Wang and J. Tang, "Advancing quality and detail: enhanced-lapSRN for chip socket image super-resolution," in Proc. Int. Conf. on Image Processing, Computer Vision and Machine Learning, pp. 153-159, Chengdu, China, 3-5 Nov. 2023.
[17] R. Tang, et al., "Medical image super-resolution with Laplacian dense network," Multimedia Tools and Applications, vol. 81, no. 3, pp. 3131-3144, Jan. 2022.
[18] K. Wu, C. K. Lee, and K. Ma, "Memsr: training memory-efficient lightweight model for image super-resolution," in Proc. 39th Int. Conf. on Machine Learning, pp. 24076-24092, Baltimore, MD, USA, 17-23 Jul. 2022.
[19] Z. Du, et al., "Fast and memory-efficient network towards efficient image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 853-862, New Orleans, LA, USA, 19-20 Jun. 2022.
[20] K. H. Liu, B. Y. Lin, and T. J. Liu, "MADnet: a multiple attention decoder network for segmentation of remote sensing images," in Proc. Int. Conf. on Consumer Electronics-Taiwan pp. 835-836, PingTung, Taiwan, 17-19 Jul. 2023.
[21] D. Zhang, W. Zhang, W. Lei, and X. Chen, "Diverse branch feature refinement network for efficient multi‐scale super‐resolution," IET Image Process, vol. 18, no. 6, pp. 1475-1490, May 2024.
[22] T. Tong, G. Li, X. Liu, and Q. Gao, "Image super-resolution using dense skip connections," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 4799-4807, Venice, Italy, 22-29 Oct. 2017.
[23] K. Zhang, W. Zuo, and L. Zhang, "Learning a single convolutional super-resolution network for multiple degradations," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3262-3271, Salt Lake City, UT, USA, 18-22 Jun. 2018.
[24] W. Zhang, Y. Liu, C. Dong, and Y. Qiao, "Ranksrgan: super resolution generative adversarial networks with learning to rank," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 10, pp. 7149-7166, Oct. 2021.
[25] C. Ledig, et al., "Photo-realistic single image super-resolution using a generative adversarial network," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4681-4690, Honolulu, HI, USA, 21-26 Jul. 2017.
[26] B. K. Xie, S. B. Liu, and L. Li, "Large-scale microscope with improved resolution using SRGAN," Optics & Laser Technology, vol. 179, Article ID: 111291, Dec. 2024. [27] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
[28] J. Baek, et al., "What is wrong with scene text recognition model comparisons? dataset and model analysis," in Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 4715-4723, Seoul, South, Korea, 27 Oct.-2 Nov. 2019.
[29] W. Wang, et al., "Scene text image super-resolution in the wild," in Proc. 16th European Conf. on Computer Vision, pp. 650-666, Glasgow, UK, 20-28 Aug. 2020.
[30] D. Karatzas, et al., "ICDAR 2015 competition on robust reading," in Proc. 13th Int. Conf. on Document Analysis and Recognition, pp. 1156-1160, Tunis, Tunisia, 23-26 Aug. 2015.
[31] K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proc. Int. Conf. on Computer Vision. pp. 1457-1464, Barcelona, Spain, 6-13 Nov. 2011.
[32] H. Zhao, X. Kong, J. He, Y. Qiao, and C. Dong, "Efficient image super-resolution using pixel attention," in Proc., Computer Vision-ECCV Workshops, pp. 56-72, Glasgow, UK, 23-28 Aug. 2020.
[33] S. Anwar and N. Barnes, "Densely residual laplacian super-resolution," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 3, pp. 1192-1204, Mar. 2022.
[34] [34] H. Chen, J. Gu, and Z. Zhang, Attention in Attention Network for Image Super-Resolution, arXiv Preprint, arXiv:2104.09497, 2021.
[35] X. Chen, X. Wang, J. Zhou, and C. Dong, "Activating more pixels in image super-resolution transformer," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 22367-22377, Vancouver, Canada, 18-22 Jun, 2023.
[36] Z. Chen, Y. Zhang, J. Gu, L. Kong, X. Yang, and F. Yu, "Dual aggregation transformer for image super-resolution," in Proc. IEEE/CVF Int. Conf. on Computer Vision, pp. 12278-12287, Vancouver, Canada, 18-22 Jun, 2023.