|Publication type:||Conference paper|
|Type of review:||Peer review (abstract)|
|Title:||Optimum use of resources in heterogeneous system architectures|
|Proceedings:||Embedded World Conference 2017 Proceedings|
|Conference details:||Embedded World Conference, Nuremberg, Germany, 14-16 March 2017|
|Subjects:||4k video processing; CUDA; System-on-Chip; GPU|
|Subject (DDC):||004: Computer science|
|Abstract:||Current High Performance Embedded Architectures offer architectural improvements over previous generations of embedded and desktop architectures. A Heterogeneous System Architecture constitutes the most fundamental difference. Thus introducing many powerful hardware accelerator blocks like a GPU or DSP. However, most application designers do not take into account the recent advantages of Heterogeneous System Architectures and therefore encounter performance problems. Heterogeneous System Architectures are suitable for processing large amounts of data like 4k-HD video at 30 frames per second (fps). In many cases, software frameworks are used which are not optimized for current architectures (i.e. all processing is done in the CPU). The GPU in such systems is merely used to drive display output. This leads to designs which do not fulfill the performance expectations. In video systems, this often shows in a very reduced frame rate. Described performance challenges can be solved by adapting the problem to the available system architecture. The paper will give real world examples. The solution to performance bottle necks could be to assess different parts of the implementation according to their impact on performance. In case certain tasks can be re-assigned to dedicated hardware blocks, valuable resources can be freed. However, using dedicated hardware blocks (for video these can be scalers, encoders, overlay) require additional communication, which may lead to an excessive amount of memory transfers. Capacity bottlenecks manifest in video systems with a very slow video frame rate (<4 fps), while a frame rate of at least 30 fps is required. The paper describes use cases, where the GPU is utilized for tasks like scaling and mixing video streams or general purpose processing of large data. A short introduction on GPU architectures and advice on how to efficiently use CUDA and OpenCL computing languages in high performance embedded architectures is presented. Memory allocation methods like Managed Memory, Zero Copy and Virtual Unified Addressing are compared and quantified for certain use cases. On a real life video processing example, it will be demonstrated how memory allocation techniques can improve performance up to 40%. The main challenge for using current generation high performance embedded architectures lies in the optimal distribution of a complex problem onto the available resources. In the cases at hand, it was observed that the GPU is capable of many more tasks than just driving the display, provided that memory allocation is carefully considered.|
|Fulltext version:||Published version|
|License (according to publishing contract):||Licence according to publishing contract|
|Departement:||School of Engineering|
|Organisational Unit:||Institute of Embedded Systems (InES)|
|Appears in collections:||Publikationen School of Engineering|
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.