As artificial intelligence technologies are increasingly integrated into our daily lives, the efficient implementation of neural networks on portable and affordable devices has emerged as a crucial research area. Due to the high costs and energy consumption, CPUs and GPUs struggle to meet these demands. This study explored implementing a convolutional neural network on an FPGA using high-level synthesis (HLS). While high-level synthesis automatically converts programming code (e.g., in C language) into a hardware description language (HDL), and significantly shortens the development cycle, such implementation may still be far from optimal. Therefore, the developer must use specific directives to ensure appropriate utilization of various types of parallelization, which proves to be non-trivial for more complex circuits. We proposed a metric for the efficiency of the optimization strategy, measuring the ratio between speedup and resource consumption, allowing for evaluating strategies for individual layers of the convolutional neural network. Based on these evaluations, we used a linear programming model to select optimization strategies to improve the overall neural network's performance. Our FPGA implementation achieved a runtime of 4 ms at a frequency of 50 MHz, outperforming a conventional processor operating at 3200 MHz, and demonstrated advantages in energy efficiency.
|