طراحی شتاب‌دهنده تقریبی کم‌توان بر بستر تراشه‌های FPGA برای کاربردهای هوش مصنوعی

محورهای موضوعی : مهندسی برق و کامپیوتر

نادیا سهرابی ¹ , امیر باوفای طوسی ^{2
*} , مهدی صدیقی ³

1 - دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران
2 - دانشكده كامپيوتر، دانشگاه سجاد، مشهد، ایران،
3 - دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران

تاریخ دریافت : 1403/06/29 تاریخ پذیرش : 1403/11/15 تاریخ انتشار : 1404/05/21

کلید واژه: جمع‌کننده تقریبی, شبکه عصبی کانولوشنی, طراحی شبکه عصبی تشخیص ارقام دست‌نویس, محاسبات تقریبی.,

چکیده مقاله :

یکی از روش‌های یادگیری ماشین شبکه‌های عصبی می‌باشند که در کاربردهایی نظیر پردازش تصویر به کار می‌روند. یکی از چالش‌های شبکه‌های عصبی، حجم بالای محاسبات آنهاست. به همین دلیل معماری‌های زیادی برای این گونه کاربردها ارائه‌ شده که راه‌حل‌هایی برای محاسبات پیچیده آنها ارائه می‌دهند. معمولاً برای تسریع الگوریتم‌های شبکه‌های عصبی از شتاب‌دهنده‌های سخت‌افزاری قابل بازپیکربندی مانند تراشه‌های FPGA استفاده می‌شود؛ اما مشکل اصلی این تراشه‌ها توان ‌مصرفی نسبتاً بالای آنهاست. برای کاهش توان مصرفی در تراشه‌های FPGA از تکنیک محاسبات تقریبی می‌توان استفاده کرد. ایده اصلی محاسبات تقریبی این است که با ایجاد تغییراتی در مدار یا کد، بین دقت و مصرف انرژی مصالحه‌ای برقرار شود. در این پژوهش یک شبکه عصبی کانولوشنی برای تشخیص ارقام دست‌نویس به‌صورت دقیق و تقریبی با هدف بهبود توان مصرفی طراحی و پیاده‌سازی شده است. ایده تقریب‌سازی در بخش محاسبات جمع‌کننده شبکه عصبی ارائه ‌شده است. این روش با جلوگیری از انتشار رقم نقلی در بیت‌های پایین جمع‌کننده، توان مصرفی را کاهش می‌دهد. نتایج مقایسه شبکه عصبی به‌صورت دقیق و تقریبی نشان می‌دهد که با تقریب‌سازی 6 بیت وزن پایین جمع‌کننده، توان مصرفی 75_/43% کاهش می‌یابد و هیچ خطایی رخ نمی‌دهد.

چکیده انگلیسی:

One of the challenges of neural networks is the high calculations. For this reason, many architectures have been proposed for such applications, which provide solutions for their complex calculations. Reconfigurable hardware accelerators such as FPGA are usually used to accelerate neural network; But the main problem of these chips is their relatively high-power consumption. To reduce the power consumption in FPGA, the approximate calculation technique can be used. The main idea of approximate computing is to make compromise between accuracy and energy consumption by making changes in the circuit or code. In this research, a convolutional neural network has been designed and implemented to recognize handwritten digits in an accurate and approximate manner with the aim of improving the power consumption. This method reduces the power consumption by preventing the transmission of transfer digit in the low bits of the adder. The results of the comparison of the neural network accurately and approximately show that by approximating the 6 bits of the low weight of the adder, the power consumption is reduced by 43% and no error occurs. Also, by approximating 7 bits of low weight, with 20% error, the power consumption is reduced by 44.11%

منابع و مأخذ:

[1] Y. Qian, et al., "Approximate logic synthesis in the loop for designing low-power neural network accelerator," in Proc. IEEE Int. Symp. on Circuits and Systems, 5 pp., Daegu, Korea, 22-28 May 2021.
[2] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2020.
[3] www.altera.com
[4] M. Hamdan, "VHDL auto-generation tool for optimized hardware acceleration of convolutional neural networks on FPGA (VGT)," A thesis submitted to the graduate faculty, Iowa State University, 2018.
[5] C. L. Giles and C. W. Omlin, "Pruning recurrent neural networks for improved generalization performance," IEEE Trans. on Neural Networks, vol. 5, no. 5, pp. 848-851, Sept. 1994.
[6] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy-efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2021.
[7] F. Li, Y. Lin, and L. He, "FPGA power reduction using configurable dual-Vdd," in Proc. of the 41st Annual Design Automation Conf., pp. 735-740, San Diego, CA, USA, 7-11 Jun. 2004.
[8] K. Yin Kyaw, W. Ling Goh, and K. Seng Yeo, "Low-power high-speed multiplier for error-tolerant application," in ¬Proc. IEEE Int. Conf. of Electron Devices and Solid-State Circuits, 4 pp., Hong Kong, China, 15-17 Dec. 2010.
[9] S. S. P. Goswami, B. Paul, S. Dutt, and G. Trivedi, "Comparative review of approximate multipliers," in ¬Proc. 30th Int. Conf. Radioelektronika, 6 pp., Bratislava, Slovakia, 15-16 Apr.
2020. [10] M. Vasudevan and C. Chakrabarti, "In image processing using approximate datapath units," in ¬Proc. IEEE Int. Symp. on Circuits and Systems, pp. 1544-1547, Melbourne, Australia, 1-5 Jun. 2014.
[11] S. Ullah, et al., "Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators," in ¬Proc. 55th ACM/ESDA/IEEE Design Automation Conf., 6 pp., San Francisco, CA, USA 24-28 Jun. 2018.
[12] S. Ullah, S. Rehman, M. Shafique, and A. Kumar, "High-performance accurate and approximate multipliers for FPGA-based hardware accelerators," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 2, pp. 211-224, Feb. 2021.
[13] K. Nepal, Y. Li, R. I. Bahar, and S. Reda, "Automated high-level synthesis of low power/area approximate computing circuits," First Workshop on Approximate Computing Across the System Stack, 6 pp., Salt Lake City, UT, USA, 2-2 Mar. 2014.
[14] Y. Kim, Y. Zhang, and P. Li, "An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems," in ¬Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 130-137, San Jose, CA, USA, 18-21 Nov. 2013.
[15] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, "MACACO: modeling and analysis of circuits for approximate computing," in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 667-673, 7-10 Nov. 2011.
[16] D. P. Williamson and D. B. Shmoys, The Design of Approximation Algorithms, Cambridge University Press, vol. 1, pp. 14-15, 2011.
[17] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in Proc. Intl. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 301-312, London, UK, 3-7 Mar. 2012.
[18] K. Lengwehasatit and A. Ortega, "Scalable variable complexity approximate forward DCT," IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 11, pp. 1236-1248, Nov. 2004.
[19] Z. Li, et al., "Laius: an 8-bit fixed-point CNN hardware inference engine," in Proc. IEEE Int. Symp. on Parallel and Distributed Processing with Applications and IEEE Int. Conf. on Ubiquitous Computing and Communications, pp. 143-150, Guangzhou, China, 12-15 Dec. 2017.
[20] T. Yang, T. Sato, and T. Ukezono, "An accuracy-configurable adder for low-power applications," IEICE Trans. on Electronics, vol. E103-C, no. 3, pp. 68-76, 2020.
[21] M. Sano, et al., "An accuracy-controllable approximate adder for FPGAs," in Proc. 4th Int. Symp. on Advanced Technologies and Applications in the Internet of Things, pp. 60-66, Ibaraki, Japan 24-26 Aug. 2022.
[22] D. Piyasena, R. Wickramasinghe, D. Paul, S. Lam, and M. Wu, "Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies," in Proc. 29th Int. Conf. on Field Programmable Logic and Applications, pp. 354-359, Barcelona, Spain, 8-12 Sept. 2019.

مقالات مرتبط

اشتراک گذاری

آدرس مقاله

طراحی شتاب‌دهنده تقریبی کم‌توان بر بستر تراشه‌های FPGA برای کاربردهای هوش مصنوعی