Optimal tile size modeling to increase data reuse in convolutional neural networks
sakineh seydi
1
(
)
Mostafa Salehi
2
(
University of Tehran
)
Keywords: Convolutional neural networks, energy consumption, external memory, data reuse, tiling,
Abstract :
Artificial neural networks are a type of computational model whose performance is inspired by biological neural networks in the human brain. Convolutional networks are an example of these networks that are used in applications such as image classification, object recognition.
As the neural network becomes larger, the number of parameters and data movement increases and the need for external memory increases, which increases energy consumption. One of the main solutions to reduce energy consumption and DRAM access is data reuse, which can be done at three levels as described below. 1- Data path and processing unit level 2- Loop and computational scheduler level 3- Interlayer and network level. Tiling is a technique used to reuse data at the scheduler level. In this paper, we modeled the number of data reuses as a precise mathematical formula. Then, in the form of an optimization problem, we obtained the optimal parameters with the aim of maximum data reuse for each configuration of the network. We also examined the relationship between network structural parameters such as kernel size and step with tile size, which showed that the optimal tile size in 70% of the network layers is smaller than 4 times the kernel size
[1] S. Genovese, “Artificial Intelligence: A Guide for Thinking Humans,” ORDO, vol. 71, no. 1, pp. 444–449, 2020, doi: 10.1515/ordo-2021-0028.
[2] O. Campesato, “Artificial Intelligence, Machine Learning, and Deep Learning,” Artif. Intell. Mach. Learn. Deep Learn., Feb. 2020, doi: 10.1515/9781683924654/HTML.
[3] J. Cheng, J. Wu, C. Leng, Y. Wang, and Q. Hu, “Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 10, pp. 4730–4743, Oct. 2018, doi: 10.1109/TNNLS.2017.2774288.
[4] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.
[5] “Sci-Hub | Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA ’17 | 10.1145/3020078.3021736.” Accessed: Jan. 11, 2024. [Online]. Available: https://sci-hub.se/https://dl.acm.org/doi/abs/10.1145/3020078.3021736
[6] P. Dhilleswararao, S. Boppu, M. S. Manikandan, and L. R. Cenkeramaddi, “Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey,” IEEE Access, vol. 10. Institute of Electrical and Electronics Engineers Inc., pp. 131788–131828, 2022. doi: 10.1109/ACCESS.2022.3229767.
[7] ChenYu-Hsin, EmerJoel, and SzeVivienne, “Eyeriss,” ACM SIGARCH Comput. Archit. News, vol. 44, no. 3, pp. 367–379, Jun. 2016, doi: 10.1145/3007787.3001177.
[8] S. Zheng et al., “Efficient Scheduling of Irregular Network Structures on CNN Accelerators,” IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 39, no. 11, pp. 3408–3419, Nov. 2020, doi: 10.1109/TCAD.2020.3012215.
[9] M. Horowitz, “1.1 Computing’s energy problem (and what we can do about it),” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 57, pp. 10–14, 2014, doi: 10.1109/ISSCC.2014.6757323.
[10] E. Valpreda et al., “HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures,” Electron. 2022, Vol. 11, Page 2933, vol. 11, no. 18, p. 2933, Sep. 2022, doi: 10.3390/ELECTRONICS11182933.
[11] M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer CNN accelerators,” in Proceedings of the Annual International Symposium on Microarchitecture, MICRO, IEEE Computer Society, Dec. 2016. doi: 10.1109/MICRO.2016.7783725.
[12] J. Li et al., “SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators,” Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018, vol. 2018-January, pp. 343–348, Apr. 2018, doi: 10.23919/DATE.2018.8342033.
[13] Q. Nie and S. Malik, “MemFlow: Memory-Driven Data Scheduling with Datapath Co-Design in Accelerators for Large-Scale Inference Applications,” IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 39, no. 9, pp. 1875–1888, Sep. 2020, doi: 10.1109/TCAD.2019.2925377.
[14] Q. Nie and S. Malik, “CNNFlow: Memory-driven Data Flow Optimization for Convolutional Neural Networks,” ACM Trans. Des. Autom. Electron. Syst., vol. 28, no. 3, Feb. 2022, doi: 10.1145/3577017/ASSET/73AC8D40-245E-445B-B998-83087708C500/ASSETS/GRAPHIC/TODAES-2022-P-2217-F14.JPG.
[15] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolutional network accelerator,” Proc. ACM Gt. Lakes Symp. VLSI, GLSVLSI, vol. 20-22-May-, pp. 199–204, May 2015, doi: 10.1145/2742060.2743766.
[16] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep Learning with Limited Numerical Precision.” PMLR, pp. 1737–1746, Jun. 01, 2015. Accessed: Jul. 07, 2024. [Online]. Available: https://proceedings.mlr.press/v37/gupta15.html
[17] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” FPGA 2015 - 2015 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 161–170, Feb. 2015, doi: 10.1145/2684746.2689060.
[18] I. Dadras, S. Seydi, M. H. Ahmadilivani, J. Raik, and M. E. Salehi, “Fully-Fusible Convolutional Neural Networks for End-to-End Fused Architecture with FPGA Implementation,” 2023 30th IEEE Int. Conf. Electron. Circuits Syst., pp. 1–5, Dec. 2023, doi: 10.1109/ICECS58634.2023.10382831.
[19] B. Rokh and A. Azarpeyvand, “A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification”.