طراحی شتابدهنده تقریبی کمتوان بر بستر تراشههای FPGA برای کاربردهای هوش مصنوعی
نادیا سهرابی
1
(
دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران
)
امیر باوفای طوسی
2
(
دانشكده كامپيوتر، دانشگاه سجاد، مشهد، ایران،
)
مهدی صدیقی
3
(
دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران
)
کلید واژه: جمعکننده تقریبی, شبکه عصبی کانولوشنی, طراحی شبکه عصبی تشخیص ارقام دستنویس, محاسبات تقریبی.,
چکیده مقاله :
یکی از روشهای یادگیری ماشین شبکههای عصبی میباشند که در کاربردهایی نظیر پردازش تصویر به کار میروند. یکی از چالشهای شبکههای عصبی، حجم بالای محاسبات آنهاست. به همین دلیل معماریهای زیادی برای این گونه کاربردها ارائه شده که راهحلهایی برای محاسبات پیچیده آنها ارائه میدهند. معمولاً برای تسریع الگوریتمهای شبکههای عصبی از شتابدهندههای سختافزاری قابل بازپیکربندی مانند تراشههای FPGA استفاده میشود؛ اما مشکل اصلی این تراشهها توان مصرفی نسبتاً بالای آنهاست. برای کاهش توان مصرفی در تراشههای FPGA از تکنیک محاسبات تقریبی میتوان استفاده کرد. ایده اصلی محاسبات تقریبی این است که با ایجاد تغییراتی در مدار یا کد، بین دقت و مصرف انرژی مصالحهای برقرار شود. در این پژوهش یک شبکه عصبی کانولوشنی برای تشخیص ارقام دستنویس بهصورت دقیق و تقریبی با هدف بهبود توان مصرفی طراحی و پیادهسازی شده است. ایده تقریبسازی در بخش محاسبات جمعکننده شبکه عصبی ارائه شده است. این روش با جلوگیری از انتشار رقم نقلی در بیتهای پایین جمعکننده، توان مصرفی را کاهش میدهد. نتایج مقایسه شبکه عصبی بهصورت دقیق و تقریبی نشان میدهد که با تقریبسازی 6 بیت وزن پایین جمعکننده، توان مصرفی 75/43% کاهش مییابد و هیچ خطایی رخ نمیدهد.
چکیده انگلیسی :
One of the challenges of neural networks is the high calculations. For this reason, many architectures have been proposed for such applications, which provide solutions for their complex calculations. Reconfigurable hardware accelerators such as FPGA are usually used to accelerate neural network; But the main problem of these chips is their relatively high-power consumption. To reduce the power consumption in FPGA, the approximate calculation technique can be used. The main idea of approximate computing is to make compromise between accuracy and energy consumption by making changes in the circuit or code. In this research, a convolutional neural network has been designed and implemented to recognize handwritten digits in an accurate and approximate manner with the aim of improving the power consumption. This method reduces the power consumption by preventing the transmission of transfer digit in the low bits of the adder. The results of the comparison of the neural network accurately and approximately show that by approximating the 6 bits of the low weight of the adder, the power consumption is reduced by 43% and no error occurs. Also, by approximating 7 bits of low weight, with 20% error, the power consumption is reduced by 44.11%
[1] Y. Qian, et al., "Approximate logic synthesis in the loop for designing low-power neural network accelerator," in Proc. IEEE Int. Symp. on Circuits and Systems, 5 pp., Daegu, Korea, 22-28 May 2021.
[2] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2020.
[3] www.altera.com
[4] M. Hamdan, "VHDL auto-generation tool for optimized hardware acceleration of convolutional neural networks on FPGA (VGT)," A thesis submitted to the graduate faculty, Iowa State University, 2018.
[5] C. L. Giles and C. W. Omlin, "Pruning recurrent neural networks for improved generalization performance," IEEE Trans. on Neural Networks, vol. 5, no. 5, pp. 848-851, Sept. 1994.
[6] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy-efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2021.
[7] F. Li, Y. Lin, and L. He, "FPGA power reduction using configurable dual-Vdd," in Proc. of the 41st Annual Design Automation Conf., pp. 735-740, San Diego, CA, USA, 7-11 Jun. 2004.
[8] K. Yin Kyaw, W. Ling Goh, and K. Seng Yeo, "Low-power high-speed multiplier for error-tolerant application," in ¬Proc. IEEE Int. Conf. of Electron Devices and Solid-State Circuits, 4 pp., Hong Kong, China, 15-17 Dec. 2010.
[9] S. S. P. Goswami, B. Paul, S. Dutt, and G. Trivedi, "Comparative review of approximate multipliers," in ¬Proc. 30th Int. Conf. Radioelektronika, 6 pp., Bratislava, Slovakia, 15-16 Apr.
2020. [10] M. Vasudevan and C. Chakrabarti, "In image processing using approximate datapath units," in ¬Proc. IEEE Int. Symp. on Circuits and Systems, pp. 1544-1547, Melbourne, Australia, 1-5 Jun. 2014.
[11] S. Ullah, et al., "Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators," in ¬Proc. 55th ACM/ESDA/IEEE Design Automation Conf., 6 pp., San Francisco, CA, USA 24-28 Jun. 2018.
[12] S. Ullah, S. Rehman, M. Shafique, and A. Kumar, "High-performance accurate and approximate multipliers for FPGA-based hardware accelerators," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 2, pp. 211-224, Feb. 2021.
[13] K. Nepal, Y. Li, R. I. Bahar, and S. Reda, "Automated high-level synthesis of low power/area approximate computing circuits," First Workshop on Approximate Computing Across the System Stack, 6 pp., Salt Lake City, UT, USA, 2-2 Mar. 2014.
[14] Y. Kim, Y. Zhang, and P. Li, "An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems," in ¬Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 130-137, San Jose, CA, USA, 18-21 Nov. 2013.
[15] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, "MACACO: modeling and analysis of circuits for approximate computing," in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 667-673, 7-10 Nov. 2011.
[16] D. P. Williamson and D. B. Shmoys, The Design of Approximation Algorithms, Cambridge University Press, vol. 1, pp. 14-15, 2011.
[17] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in Proc. Intl. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 301-312, London, UK, 3-7 Mar. 2012.
[18] K. Lengwehasatit and A. Ortega, "Scalable variable complexity approximate forward DCT," IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 11, pp. 1236-1248, Nov. 2004.
[19] Z. Li, et al., "Laius: an 8-bit fixed-point CNN hardware inference engine," in Proc. IEEE Int. Symp. on Parallel and Distributed Processing with Applications and IEEE Int. Conf. on Ubiquitous Computing and Communications, pp. 143-150, Guangzhou, China, 12-15 Dec. 2017.
[20] T. Yang, T. Sato, and T. Ukezono, "An accuracy-configurable adder for low-power applications," IEICE Trans. on Electronics, vol. E103-C, no. 3, pp. 68-76, 2020.
[21] M. Sano, et al., "An accuracy-controllable approximate adder for FPGAs," in Proc. 4th Int. Symp. on Advanced Technologies and Applications in the Internet of Things, pp. 60-66, Ibaraki, Japan 24-26 Aug. 2022.
[22] D. Piyasena, R. Wickramasinghe, D. Paul, S. Lam, and M. Wu, "Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies," in Proc. 29th Int. Conf. on Field Programmable Logic and Applications, pp. 354-359, Barcelona, Spain, 8-12 Sept. 2019.