[detailed] FPGA: the future of machine deep learning?

The rapid increase in data volume and accessibility in recent years has transformed the artificial intelligence algorithm design philosophy. The manual creation of algorithms has been replaced by the ability of computers to automatically acquire combinable systems from large amounts of data, leading to major breakthroughs in key areas such as computer vision, speech recognition, and natural language processing. Deep learning is the most commonly used technology in these fields, and it has also received much attention from the industry. However, deep learning models require an extremely large amount of data and computing power, and only better hardware acceleration conditions can meet the needs of existing data and model scales.

Existing solutions use a graphics processing unit (GPU) cluster as a general purpose computing graphics processing unit (GPGPU), but field programmable gate arrays (FPGAs) offer another solution worth exploring. Increasingly popular FPGA design tools make it more compatible with the upper-layer software often used in deep learning, making FPGAs easier for model builders and deployers. The flexible FPGA architecture allows researchers to explore model optimization outside of a fixed architecture such as a GPU. At the same time, FPGAs perform better under unit power consumption, which is critical for large-scale server deployments or research on embedded applications with limited resources. This article examines deep learning and FPGAs from a hardware acceleration perspective, pointing out the trends and innovations that make these technologies match each other and motivating how FPGAs can help deepen the development of deep learning.

1 Introduction

Machine learning has a profound impact on daily life. Whether it's clicking on personalized recommendations on the site, using voice communication on a smartphone, or using face recognition technology to take photos, some form of artificial intelligence is used. This new trend of artificial intelligence is also accompanied by a change in the concept of algorithm design. In the past, data-based machine learning mostly used the expertise of specific fields to artificially â€œshapeâ€ the â€œfeaturesâ€ to be learned. The ability of computers to acquire combined feature extraction systems from a large amount of sample data makes computer vision, speech recognition and Significant performance breakthroughs have been achieved in key areas such as natural language processing. Research on these data-driven technologies, known as deep learning, is now being watched by two important groups in the technology world: one is the researchers who want to use and train these models to achieve extremely high-performance cross-task computing, and the second is hope New applications in the real world to apply these models to application scientists. However, they all face a constraint that the hardware acceleration capability still needs to be strengthened to meet the need to expand the size of existing data and algorithms.

For deep learning, current hardware acceleration relies primarily on the use of graphics processing unit (GPU) clusters as general purpose computing graphics processing units (GPGPUs). Compared to traditional general-purpose processors (GPPs), the core computing power of GPUs is several orders of magnitude more, and it is easier to perform parallel computing. In particular, NVIDIA CUDA, as the most mainstream GPGPU writing platform, is used by all major deep learning tools for GPU acceleration. Recently, the open parallel programming standard OpenCL has received much attention as an alternative tool for heterogeneous hardware programming, and the enthusiasm for these tools is also rising. Although OpenCL has received slightly less support than CUDA in the field of deep learning, OpenCL has two unique features. First, OpenCL is open source and free to developers, unlike the CUDA single vendor approach. Second, OpenCL supports a range of hardware, including GPUs, GPP, Field Programmable Gate Arrays (FPGAs), and Digital Signal Processors (DSPs).

1.1. FPGA

As a powerful GPU in terms of algorithm acceleration, it is especially important that FPGAs support different hardware immediately. The difference between FPGA and GPU is that the hardware configuration is flexible, and the FPGA can provide better performance than the GPU in unit energy consumption when running sub-programs in the deep learning (such as the calculation of sliding windows). However, setting up an FPGA requires specific hardware knowledge that many researchers and application scientists do not have. Because of this, FPGAs are often seen as an expert-specific architecture. Recently, FPGA tools have begun to adopt software-level programming models including OpenCL, making them increasingly popular with users trained by mainstream software development.

For researchers looking at a range of design tools, the screening criteria for tools are usually related to whether they have user-friendly software development tools, whether they have flexible and scalable model design methods, and whether they can be quickly calculated to reduce the training of large models. Time related. As FPGAs become easier to write because of the emergence of highly abstract design tools, their reconfigurability makes custom architectures possible, while high parallel computing power increases instruction execution speed, and FPGAs will be researchers for deep learning. brings advantages.

Silicon Power Battery

Silicon Power Battery,Scooter Silicon Battery,Electric Car Silicone Battery,Electric Car Silicon Battery

MAIN NEW ENERGY CO.,LTD , https://www.greensaver-battery.com