본문 바로가기
로그인

RESEARCH

Semiconductor System Lab

Through this homepage, we would like to share our sweats, pains,
excitements and experiences with you.

HI SYSTEMS 

LNPU: Sparse DNN Learning Processor

본문

Overview

Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16. 

Implementation results
Figure 7 
Performance comparison
Figure 6 
Architecture
Figure 2 
Features

  - Fine-grained Mixed Precision 

  - Fully Reconfigurable Sparse 

  - DL Accelerator FP8/FP16 Mixed Operation


Related Papers

  - ISSCC 2019 [pdf] 

Address#1233, School of Electrical Engineering, KAIST, 291 Daehak-ro (373-1 Guseong-dong), Yuseong-gu, Daejeon 34141, Republic of Korea
Tel +82-42-350-8068 Fax +82-42-350-3410E-mail [email protected]·© SSL. All Rights Reserved.·Design by NSTAR