CN110555516B - 基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 - Google Patents
基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 Download PDFInfo
- Publication number
- CN110555516B CN110555516B CN201910796486.7A CN201910796486A CN110555516B CN 110555516 B CN110555516 B CN 110555516B CN 201910796486 A CN201910796486 A CN 201910796486A CN 110555516 B CN110555516 B CN 110555516B
- Authority
- CN
- China
- Prior art keywords
- layer
- input
- convolution
- bit
- calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 13
- 238000013461 design Methods 0.000 claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000013139 quantization Methods 0.000 claims abstract description 14
- 230000009977 dual effect Effects 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 51
- 235000019800 disodium phosphate Nutrition 0.000 claims description 35
- 239000000872 buffer Substances 0.000 claims description 28
- 238000010586 diagram Methods 0.000 claims description 22
- 230000000295 complement effect Effects 0.000 claims description 6
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000012856 packing Methods 0.000 claims 1
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 238000011161 development Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Description
名称 | 主要参数 | 输入大小 | 输出大小 |
Conv1 | 卷积层,卷积核(3,3,16) | (1280,384,3) | (1280,384,16) |
BN1 | 批量归一化层 | (1280,384,16) | (1280,384,16) |
Maxpool1 | 池化层,池化核(2,2) | (1280,384,16) | (640,192,16) |
Conv2 | 卷积层,卷积核(3,3,32) | (640,192,16) | (640,192,32) |
BN2 | 批量归一化层 | (640,192,32) | (640,192,32) |
Maxpool2 | 池化层,池化核(2,2) | (640,192,32) | (320,96,32) |
Conv3 | 卷积层,卷积核(3,3,64) | (320,96,32) | (320,96,64) |
BN3 | 批量归一化层 | (320,96,64) | (320,96,64) |
Maxpool3 | 池化层,池化核(2,2) | (320,96,64) | (160,48,64) |
Conv4 | 卷积层,卷积核(3,3,128) | (160,48,64) | (160,48,128) |
BN4 | 批量归一化层 | (160,48,128) | (160,48,128) |
Maxpool4 | 池化层,池化核(2,2) | (160,48,128) | (80,24,128) |
Conv5 | 卷积层,卷积核(3,3,256) | (80,24,128) | (80,24,256) |
BN5 | 批量归一化层 | (80,24,256) | (80,24,256) |
Maxpool5 | 池化层,池化核(2,2) | (80,24,256) | (40,12,256) |
Conv6 | 卷积层,卷积核(3,3,512) | (40,12,256) | (40,12,512) |
BN6 | 批量归一化层 | (40,12,512) | (40,12,512) |
Conv7 | 卷积层,卷积核(3,3,512) | (40,12,512) | (40,12,512) |
BN7 | 批量归一化层 | (40,12,512) | (40,12,512) |
Conv8 | 卷积层,卷积核(3,3,512) | (40,12,512) | (40,12,512) |
BN8 | 批量归一化层 | (40,12,512) | (40,12,512) |
Conv9 | 卷积层,卷积核(1,1,40) | (40,12,512) | (40,12,40) |
Region | 检测层 | (40,12,40) | 若干检测结果 |
网络名称 | 全精度准确度 | 8位量化后准确度 |
YOLOv2-tiny | 77.63% | 77.04% |
名称 | 输入 | 卷积核 | DSP | C×K | Col | 带宽 | 延时 |
Conv1 | (1280,384,3) | (3,3,16) | 32 | (4,16) | 2 | 266Mb/s | 16.58ms |
Conv2 | (640,192,16) | (3,3,32) | 64 | (4,32) | 2 | 1066Mb/s | 22.12ms |
Conv3 | (320,96,32) | (3,3,64) | 64 | (2,64) | 2 | 2133Mb/s | 22.12ms |
Conv4 | (160,48,64) | (3,3,128) | 64 | (4,32) | 2 | 4266Mb/s | 22.12ms |
Conv5 | (80,24,128) | (3,3,256) | 64 | (8,16) | 2 | 8533Mb/s | 22.12ms |
Conv6 | (40,12,256) | (3,3,512) | 64 | (16,8) | 2 | 17066Mb/s | 22.12ms |
Conv7 | (40,12,512) | (3,3,512) | 128 | (32,8) | 3 | 17066Mb/s | 22.12ms |
Conv8 | (40,12,512) | (3,3,512) | 128 | (32,8) | 2 | 34133Mb/s | 22.12ms |
Conv9 | (40,12,512) | (1,1,40) | 2 | (2,2) | 2 | 457Mb/s | 15.05ms |
总计 | 610 | 84986Mb/s | 22.12ms |
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910796486.7A CN110555516B (zh) | 2019-08-27 | 2019-08-27 | 基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910796486.7A CN110555516B (zh) | 2019-08-27 | 2019-08-27 | 基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110555516A CN110555516A (zh) | 2019-12-10 |
CN110555516B true CN110555516B (zh) | 2023-10-27 |
Family
ID=68736833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910796486.7A Active CN110555516B (zh) | 2019-08-27 | 2019-08-27 | 基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110555516B (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956258B (zh) * | 2019-12-17 | 2023-05-16 | 深圳鲲云信息科技有限公司 | 一种神经网络加速电路和方法 |
WO2021184143A1 (zh) * | 2020-03-16 | 2021-09-23 | 华为技术有限公司 | 一种数据处理装置以及数据处理方法 |
CN111459877B (zh) * | 2020-04-02 | 2023-03-24 | 北京工商大学 | 基于FPGA加速的Winograd YOLOv2目标检测模型方法 |
CN111738423A (zh) * | 2020-06-28 | 2020-10-02 | 湖南国科微电子股份有限公司 | 神经网络模型的编译方法、装置、存储介质及电子设备 |
CN111931921B (zh) * | 2020-10-13 | 2021-01-26 | 南京风兴科技有限公司 | 一种用于稀疏神经网络的乒乓存储方法及装置 |
CN112801285B (zh) * | 2021-02-04 | 2024-01-26 | 南京微毫科技有限公司 | 一种基于fpga的高资源利用率cnn加速器及其加速方法 |
CN113568597B (zh) * | 2021-07-15 | 2024-07-26 | 上海交通大学 | 面向卷积神经网络的dsp紧缩字乘法方法及系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806243A (zh) * | 2018-04-24 | 2018-11-13 | 东南大学 | 一种基于Zynq-7000的交通流量信息采集终端 |
CN109214504A (zh) * | 2018-08-24 | 2019-01-15 | 北京邮电大学深圳研究院 | 一种基于fpga的yolo网络前向推理加速器设计方法 |
-
2019
- 2019-08-27 CN CN201910796486.7A patent/CN110555516B/zh active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806243A (zh) * | 2018-04-24 | 2018-11-13 | 东南大学 | 一种基于Zynq-7000的交通流量信息采集终端 |
CN109214504A (zh) * | 2018-08-24 | 2019-01-15 | 北京邮电大学深圳研究院 | 一种基于fpga的yolo网络前向推理加速器设计方法 |
Non-Patent Citations (7)
Title |
---|
A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection;Duy Thanh Nguyen;《IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS》;20190412;第27卷;正文第3、4节、图3 * |
Double MAC on a DSP: Boosting the Performanceof Convolutional Neural Networks on FPGAs;Sugil Lee等;《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》;20180406;第38卷;正文第1、2节,图1 * |
卷积神经网络的FPGA实现及优化;王开宇等;《实验室科学》;20180828(第04期);全文 * |
基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现;陈辰;《计算机科学与探索》;20190514;第13卷(第10期);正文第3、4节 * |
面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法;陆维娜等;《计算机辅助设计与图形学学报》;20181115(第11期);全文 * |
面向卷积神经网络的FPGA硬件加速器设计;肖皓等;《工业控制计算机》;20180625(第06期);全文 * |
面向边缘计算的嵌入式FPGA卷积神经网络构建方法;卢冶等;《计算机研究与发展》;20180315(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110555516A (zh) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110555516B (zh) | 基于FPGA的YOLOv2-tiny神经网络低延时硬件加速器实现方法 | |
CN109543830B (zh) | 一种用于卷积神经网络加速器的拆分累加器 | |
CN111062472B (zh) | 一种基于结构化剪枝的稀疏神经网络加速器及其加速方法 | |
TWI684141B (zh) | 人工神經元中以非零封包加速乘法運算的裝置及方法 | |
CN108229671B (zh) | 一种降低加速器外部数据存储带宽需求的系统和方法 | |
CN107633297B (zh) | 一种基于并行快速fir滤波器算法的卷积神经网络硬件加速器 | |
CN106846235B (zh) | 一种利用NVIDIA Kepler GPU汇编指令加速的卷积优化方法及系统 | |
WO2019205617A1 (zh) | 一种矩阵乘法的计算方法及装置 | |
CN110109646B (zh) | 数据处理方法、装置和乘加器及存储介质 | |
EP3709225A1 (en) | System and method for efficient utilization of multipliers in neural-network computations | |
US11809836B2 (en) | Method and apparatus for data processing operation | |
Wong et al. | Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic | |
CN111582444A (zh) | 一种矩阵数据的处理、装置、电子设备及存储介质 | |
CN116090518A (zh) | 基于脉动运算阵列的特征图处理方法、装置以及存储介质 | |
Cao et al. | Efficient LUT-based FPGA accelerator design for universal quantized CNN inference | |
Li et al. | HAW: Hardware-aware point selection for efficient Winograd convolution | |
Solovyev et al. | Real-Time Recognition of Handwritten Digits in FPGA Based on Neural Network with Fixed Point Calculations | |
Sudrajat et al. | GEMM-Based Quantized Neural Network FPGA Accelerator Design | |
CN110807479A (zh) | 一种基于Kmeans算法的神经网络卷积计算加速方法 | |
CN116151340B (zh) | 并行随机计算神经网络系统及其硬件压缩方法、系统 | |
CN111797977B (zh) | 一种用于二值化神经网络的加速器结构及循环展开方法 | |
US20240069864A1 (en) | Hardware accelerator for floating-point operations | |
KR102726930B1 (ko) | 심층신경망 연산을 위한 가변 비트-정밀도 곱셈-누산기 구조 | |
US20240134606A1 (en) | Device and method with in-memory computing | |
JP2019159670A (ja) | 固定小数点を用いて認識処理を行う多層の畳み込みニューラルネットワーク回路を実現する演算処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220929 Address after: Room 00036, 1st Floor, Building F5, Phase II, Innovation Industrial Park, No. 2800, Innovation Avenue, High tech Zone, Hefei, Anhui, 230088 Applicant after: Hefei Huixi Intelligent Technology Co.,Ltd. Address before: No. 803, Unit 2, Building 3, Nanlihan Lanting, Jingshu District, Beijing 100083 Applicant before: Xu Ningyi Applicant before: He Guanghui Effective date of registration: 20220929 Address after: No. 803, Unit 2, Building 3, Nanlihan Lanting, Jingshu District, Beijing 100083 Applicant after: Xu Ningyi Applicant after: He Guanghui Address before: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District Applicant before: SHANGHAI JIAO TONG University |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231127 Address after: Room 202, No. 6, Lane 388, Urban Road, Minhang District, Shanghai, 201109 Patentee after: He Guanghui Patentee after: Xu Ningyi Address before: Room 00036, 1st Floor, Building F5, Phase II, Innovation Industrial Park, No. 2800, Innovation Avenue, High tech Zone, Hefei, Anhui, 230088 Patentee before: Hefei Huixi Intelligent Technology Co.,Ltd. |