0 Datasets
0 Files
Get instant academic access to this publication’s datasets.
Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.
Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.
Yes, message the author after sign-up to request supplementary files or replication code.
Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaborationJoin our academic network to download verified datasets and collaborate with researchers worldwide.
Get Free AccessFPGA devices have demonstrated to be one of the most popular prototyping solutions. N anoelectronics is one of the fields that can prototype its architectures in these devices. Many FPGA vendors have recently included embedded processors in their devices, such as Xilinx with ARM -Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard interface buses. ARM proposed the Advanced Microcontroller Bus Architecture (AMBA) as an open-standard. Its third generation included the Advanced eXtensible Interface (AXI) to reach higher performances. In this paper we analyse the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator. This CNN accelerator processes, in dedicated hardware, a stream of visual information from a neuromorphic visual sensor for classification. In the PS side, a Linux operating system is running, which recollects visual events from the neuromorphic sensor into a normalized frame, and then it transfers these frames to the CNN accelerator of multi-layered CNNs; and read results, using an AXI-DMA bus in a per-layer way. As these kind of accelerators try to process information as quick as possible, data bandwidth becomes critical. Maintaining a good balanced data throughput rate requires some considerations, such as data partitioning techniques to balance RX and TX transfers, or different transfer management techniques: polling versus dedicated interrupt-based kernel-level driver. For longer enough packets, the kernel-level driver solution improves global computation timings in a CNN classification example. Kernel-level driver ensures safer solutions and enables OS tasks, scheduling for better computation distribution.
Antonio Ríos-Navarro, Ricardo Tapiador-Morales, Ángel Jiménez-Fernández, C. Amaya, Manuel Jesus Dominguez Morales, Tobi Delbrück, Alejandro Linares-Barranco (2018). Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator. , DOI: https://doi.org/10.1109/nano.2018.8626313.
Datasets shared by verified academics with rich metadata and previews.
Authors choose access levels; downloads are logged for transparency.
Students and faculty get instant access after verification.
Type
Article
Year
2018
Authors
7
Datasets
0
Total Files
0
Language
en
DOI
https://doi.org/10.1109/nano.2018.8626313
Access datasets from 50,000+ researchers worldwide with institutional verification.
Get Free Access