تقویت فراتفکیک‌سازی تصاویر متنی توسط تضعیف عامدانه تابع زیان خوانش برای اعمال سخت‌گیری بیشتر بر شبکه فراتفکیک‌ساز

محورهای موضوعی : مهندسی برق و کامپیوتر

کمیل مهرگان ¹ , عباس ابراهیمی مقدم ^{2
*} , مرتضی خادمی درح ³

1 - دانشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران
2 - انشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران
3 - دانشکده مهندسی دانشگاه فردوسی مشهد، مشهد، ایران

تاریخ دریافت : 1403/10/05 تاریخ پذیرش : 1404/02/09 تاریخ انتشار : 1404/07/26

کلید واژه: بازخورد هوشمندانه, تضعیف عامدانه تابع زیان, خوانش تصاویر متنی, فراتفکیک‌سازی.,

چکیده مقاله :

تصاویر متنی با وضوح پایین معمولاً باعث ایجاد خطاهای جدی در خوانش و بازیابی متن می‌شوند که این امر می‌تواند بر کارایی سیستم‌های خوانش متن، تأثیر منفی بگذارد. فراتفکیک‌سازی تصاویر متنی، به‌ویژه در شرایطی که تصاویر اولیه دارای تفکیک‌پذیری پایینی هستند، از عوامل کلیدی در بهبود دقت سیستم‌های خوانش متن است. روش‌های سنتی فراتفکیک‌سازی، هرچند در بهبود کیفیت تصاویر موفق بوده‌اند، اما همچنان در حفظ جزئیات
دقیق حروف و ساختار متن با چالش مواجهند. در این پژوهش، روشی برای فراتفکیک‌سازی تصاویر متنی ارائه شده که با بهره‌گیری از بازخورد هوشمندانه توسط تضعیف عامدانه تابع زیان خوانش، سخت‌گیری بیشتری بر شبکه فراتفکیک‌ساز اعمال کرده تا به‌طور ویژه تصاویری تولید کند که در آن ساختار حروف به‌خوبی حفظ شده باشد. این تابع زیان، شبکه فراتفکیک‌سازی را وادار به بازسازی جزئیات ازدست‌رفته در تصاویر کرده و دقت سیستم‌های خوانش متن
را به‌طور قابل توجهی بهبود می‌بخشد. نتایج تجربی نشان می‌دهند که این روش نه‌تنها به افزایش وضوح بصری تصاویر منجر می‌شود، بلکه کارایی و دقت سیستم‌های خوانش متن را حدود ۱۰ درصد نسبت به تصاویر اولیه بهبود می‌بخشد. این رویکرد جدید گامی مؤثر در جهت بهینه‌سازی فرایندهای خوانش متن از تصاویر با تفکیک‌پذیری پایین به شمار می‌رود.

چکیده انگلیسی:

Low-resolution text images often lead to significant errors in Optical Character Recognition (OCR), negatively impacting the performance of automated text recognition systems. Text image super-resolution (SR) is a critical step for improving OCR accuracy, particularly when dealing with inputs of very low resolution. While conventional SR methods succeed in enhancing general image quality, they often struggle to preserve the fine-grained details and structural integrity of characters. In this paper, we propose a novel text super-resolution method that leverages intelligent feedback; by intentionally weakening the OCR loss, our approach imposes stricter reconstruction constraints on the SR network. This unique approach specifically guides the network to generate images that faithfully preserve character structures. The modified loss function compels the SR network to reconstruct fine details lost in the low-resolution input, thereby leading to a significant improvement in downstream OCR accuracy. Experimental results demonstrate that our method not only enhances visual clarity but also boosts the accuracy of subsequent OCR systems by approximately 10% compared to the original low-resolution images. This novel approach represents an effective step toward optimizing the pipeline for text recognition from low-resolution inputs.

منابع و مأخذ:

[1] R. Shu, C. Zhao, S. Feng, L. Zhu, and D. Miao, "Text-enhanced scene image super-resolution via stroke mask and orthogonal attention," IEEE Trans. on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6317-6330, Nov. 2023.
[2] J. Ma, S. Guo, and L. Zhang, "Text prior guided scene text image super-resolution," IEEE Trans. on Image Processing, vol. 32, pp. 1341-1353, 2023.
[3] J. Ma, Z. Liang, and L. Zhang, "A text attention network for spatial deformation robust scene text image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 5911-5920, New Orleans, LA, USA, 19-24 Jun. 2022.
[4] ع. عابدی و ا. کبیر، "فراتفکیک‌پذیری مبتنی بر نمونه تک‌تصویر متن با روش نزول گرادیان ناهمزمان ترتیبی،" نشریه مهندسی برق و مهندسی کامپیوتر ایران، ب- مهندسی کامپیوتر، سال 14، شماره 3، صص. 192-177، پاییز 1395.
[5] K. Mehrgan, A. R. Ahmadyfard, and H. Khosravi, "Super-resolution of license-plates using weighted interpolation of neighboring pixels from video frames," International J. of Engineering, Trans. B: Applications, vol. 33, no. 5, pp. 992-999, May 2020.
[6] C. Dong, C. C. Loy, K. He, and X. Tang, "Learning a deep convolutional network for image super-resolution," in Proc. 13th European Conf, Computer Vision, pp. 184-199, Zurich, Switzerland, 6-12 Sept. 2014.
[7] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, "Video super-resolution with convolutional neural networks," IEEE Trans. Comput Imaging, vol. 2, no. 2, pp. 109-122, Jun. 2016.
[8] M. Hradiš, J. Kotera, P. Zemcık, and F. Šroubek, "Convolutional neural networks for direct text deblurring," in Proc. of the British Machine Vision Conf., 13 pp., Swansea, UK, 7-10 Dec. 2015.
[9] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE Trans. Pattern Anal Mach Intell, vol. 38, no. 2, pp. 295-307, Feb. 2015.
[10] D. Gudivada and P. K. Rangarajan, "Enhancing PROBA-V satellite imagery for vegetation monitoring using FSRCNN-based super-resolution," in Proc. Int. Conf. on Next Generation Electronics, 6 pp., Vellore, India, 14-16 Dec. 2023.
[11] J. Zhang, M. Liu, X. Wang, and C. Cao, "Residual net use on FSRCNN for image super-resolution," in Proc. 40th Chinese Control Conf., pp. 8077-8083, Shanghai, China, 26-28 Jul. 2021. [12] T. Khachatryan, D. Galstyan, and E. Harutyunyan, "A comprehensive approach for enhancing deep learning datasets quality using combined SSIM algorithm and FSRCNN," in Proc. IEEE East-West Design & Test Symp., 4 pp., 22-25 Sept. 2023.
[13] Y. Zhu, X. Sun, W. Diao, H. Li, and K. Fu, "RFA-Net: reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery," IEEE J. Sel Top Appl Earth Obs Remote Sens, vol. 15, pp. 5689-5703, 2022.
[14] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, "Deep networks for image super-resolution with sparse prior," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 370-378, Santiago, Chile, 7-13 Dec. 2015.
[15] M. Chen, et al., "RFA-Net: residual feature attention network for fine-grained image inpainting," Engineering Applications of Artificial Intelligence, vol. 119, Article ID: 105814, Mar. 2023.
[16] Z. Wang and J. Tang, "Advancing quality and detail: enhanced-lapSRN for chip socket image super-resolution," in Proc. Int. Conf. on Image Processing, Computer Vision and Machine Learning, pp. 153-159, Chengdu, China, 3-5 Nov. 2023.
[17] R. Tang, et al., "Medical image super-resolution with Laplacian dense network," Multimedia Tools and Applications, vol. 81, no. 3, pp. 3131-3144, Jan. 2022.
[18] K. Wu, C. K. Lee, and K. Ma, "Memsr: training memory-efficient lightweight model for image super-resolution," in Proc. 39th Int. Conf. on Machine Learning, pp. 24076-24092, Baltimore, MD, USA, 17-23 Jul. 2022.
[19] Z. Du, et al., "Fast and memory-efficient network towards efficient image super-resolution," in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 853-862, New Orleans, LA, USA, 19-20 Jun. 2022.
[20] K. H. Liu, B. Y. Lin, and T. J. Liu, "MADnet: a multiple attention decoder network for segmentation of remote sensing images," in Proc. Int. Conf. on Consumer Electronics-Taiwan pp. 835-836, PingTung, Taiwan, 17-19 Jul. 2023.
[21] D. Zhang, W. Zhang, W. Lei, and X. Chen, "Diverse branch feature refinement network for efficient multi‐scale super‐resolution," IET Image Process, vol. 18, no. 6, pp. 1475-1490, May 2024.
[22] T. Tong, G. Li, X. Liu, and Q. Gao, "Image super-resolution using dense skip connections," in Proc. of the IEEE Int. Conf. on Computer Vision, pp. 4799-4807, Venice, Italy, 22-29 Oct. 2017.
[23] K. Zhang, W. Zuo, and L. Zhang, "Learning a single convolutional super-resolution network for multiple degradations," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3262-3271, Salt Lake City, UT, USA, 18-22 Jun. 2018.
[24] W. Zhang, Y. Liu, C. Dong, and Y. Qiao, "Ranksrgan: super resolution generative adversarial networks with learning to rank," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 10, pp. 7149-7166, Oct. 2021.
[25] C. Ledig, et al., "Photo-realistic single image super-resolution using a generative adversarial network," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4681-4690, Honolulu, HI, USA, 21-26 Jul. 2017.
[26] B. K. Xie, S. B. Liu, and L. Li, "Large-scale microscope with improved resolution using SRGAN," Optics & Laser Technology, vol. 179, Article ID: 111291, Dec. 2024. [27] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
[28] J. Baek, et al., "What is wrong with scene text recognition model comparisons? dataset and model analysis," in Proc. of the IEEE/CVF Int. Conf. on Computer Vision, pp. 4715-4723, Seoul, South, Korea, 27 Oct.-2 Nov. 2019.
[29] W. Wang, et al., "Scene text image super-resolution in the wild," in Proc. 16th European Conf. on Computer Vision, pp. 650-666, Glasgow, UK, 20-28 Aug. 2020.
[30] D. Karatzas, et al., "ICDAR 2015 competition on robust reading," in Proc. 13th Int. Conf. on Document Analysis and Recognition, pp. 1156-1160, Tunis, Tunisia, 23-26 Aug. 2015.
[31] K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," in Proc. Int. Conf. on Computer Vision. pp. 1457-1464, Barcelona, Spain, 6-13 Nov. 2011.
[32] H. Zhao, X. Kong, J. He, Y. Qiao, and C. Dong, "Efficient image super-resolution using pixel attention," in Proc., Computer Vision-ECCV Workshops, pp. 56-72, Glasgow, UK, 23-28 Aug. 2020.
[33] S. Anwar and N. Barnes, "Densely residual laplacian super-resolution," IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 3, pp. 1192-1204, Mar. 2022.
[34] [34] H. Chen, J. Gu, and Z. Zhang, Attention in Attention Network for Image Super-Resolution, arXiv Preprint, arXiv:2104.09497, 2021.
[35] X. Chen, X. Wang, J. Zhou, and C. Dong, "Activating more pixels in image super-resolution transformer," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 22367-22377, Vancouver, Canada, 18-22 Jun, 2023.
[36] Z. Chen, Y. Zhang, J. Gu, L. Kong, X. Yang, and F. Yu, "Dual aggregation transformer for image super-resolution," in Proc. IEEE/CVF Int. Conf. on Computer Vision, pp. 12278-12287, Vancouver, Canada, 18-22 Jun, 2023.

مقالات مرتبط

اینورتر منبع امپدانسی فعال جدید با تنش ولتاژ کاهش یافته در دو سرکلیدها
تاریخ چاپ : 1404/09/22
طراحی ساختار مناسب ترانسفورماتور الکترونیک قدرت بر مبنای استفاده از مبدل های چند پورته با قابلیت نصب ذخیره ساز
تاریخ چاپ : 1404/09/22
تولید الگوی آزمون خودکار پیشرفته با استفاده از الگوریتم PSO-FAN
تاریخ چاپ : 1404/09/22
مدل سازی رفتار گذرای وابسته به زمان مدار حلقه قفل فاز دیجیتالی به کمک شبکه¬ی عصبی واحد بازگشتی گیتی
تاریخ چاپ : 1404/09/22
تحلیل عملکرد و ارزیابی کارایی روش¬های مختلف کنترل در درایو موتورهای سنکرون مغناطیس دائم
تاریخ چاپ : 1404/09/22
مروری جامع بر سنتز آرایه‌های آنتن خطی و صفحه‌ای
تاریخ چاپ : 1404/09/22

اشتراک گذاری

آدرس مقاله

تقویت فراتفکیک‌سازی تصاویر متنی توسط تضعیف عامدانه تابع زیان خوانش برای اعمال سخت‌گیری بیشتر بر شبکه فراتفکیک‌ساز