Software-Based Real-Time Singles and Coincidence Processing in Positron Emission Tomography (PET) at the Institute for Experimental Molecular Imaging (ExMI)
Real-Time Singles and Coincidence Processing, Positron Emission Tomography
For the deployment of novel PET scanners developed in the context of the HYPMED project , the institute for Experimental Molecular Imaging (ExMI) aims at utilizing software-based approaches for its singles and coincidence processing chain [2,3]. Contrary to current hardware-based implementations, this allows for easier prototyping and the retrieval of raw scanner data, which would otherwise be lost. However, achieving real-time processing throughput becomes a challenge with the increasing data streams of the scanners and the growing complexity of state-of-the-art positioning algorithms .
The application framework developed at the ExMI utilizes the Message Passing Interface (MPI) to allow for distributed processing of incoming scanner data. While throughputs of above 500 MB/s can already be reached with the current implementation, scalability to larger detector systems is desirable. As the application data input is limited by the physical connection between acquisition device and control PC, a throughput rate of 1-2 GB/s is targeted.
For performance analysis, hardware performance counter information collected during program execution gave insight into the dynamic application behaviour. We were able to optimize the locality of memory accesses since the collected metrics revealed high waiting times due to data translation lookaside buffer (TLB) cache misses. As data TLB cache entries map to 4 KB pages, memory accesses exceeded this range. By sorting a critical data structure before processing its entries, scalability and throughput rates could be improved significantly.
The performance counter data additionally showed low vectorization ratios in the kernel function. Although a memory layout of a structure of arrays was used inside the kernel loop, memory accesses proved to be irregular. Utilizing an array of structures instead led to scalar code execution but had a positive effect on execution time.
Starting point of our performance evaluation was the original source code parallelized with MPI, utilizing a release build for compilation with the Intel 18.0 compiler and IntelMPI 2017.4. The stable-sort-timeline version contains optimizations aimed at improving data TLB cache locality, while vector-of-structs utilizes an array of structures to improve data locality within the kernel as described above.
The graphs below show the median throughput rate of 20 application runs on a single Intel Xeon Broadwell EP node with 24 cores, one node of the CLAIX2016 partition of the RWTH Compute Cluster. The stable-sort-timeline version speeds up execution by up to 97% compared to the initial release version. Additionally, parallel efficiency for 48 processes is increased from 41% to 58%. Further runtime improvements by the vector-of-structs version add up to an overall doubling of the throughput rate over the initial release version resulting in a peak throughput rate of 1,250 MB/s. Although this outcome satisfies the set goal of reaching a throughput rate above 1 GB/s, future collaboration efforts may focus on the analysis of other critical processing steps for achieving further performance improvements.
Lines of code: ~30,000
Programming languages & models: C++, MPI
Development & Cooperation
- Christian Wassermann, IT Center / Chair for High Performance Computing, RWTH Aachen University
- Thomas Dey, Physics of Molecular Imaging Systems, Institute for Experimental Molecular Imaging (ExMI), University Hospital Aachen
- Julian Miller, IT Center / Chair for High Performance Computing, RWTH Aachen University
 European Institute for Biomedical Imaging Research (Ed.). 2019. Digital Hybrid Breast PET/MRI for Enhanced Diagnosis of Breast Cancer (HYPMED). Retrieved May 20, 2019 from http://hypmed.eu
 Benjamin Goldschmidt, David Schug, Christoph W. Lerche, André Salomon, Pierre Gebhardt, Bjoern Weissler, Jakob Wehner, Peter M. Dueppenbecker, Fabian Kiessling, and Volkmar Schulz. 2016. Software-Based Real-Time Acquisition and Processing of PET Detector Raw Data. IEEE transactions on bio-medical engineering 63, 2 (2016), 316–327. DOI: https://doi.org/10.1109/TBME.2015.2456640
 B. Goldschmidt, C. W. Lerche, T. Solf, A. Salomon, F. Kiessling, and V. Schulz. 2013. Towards Software-Based Real-Time Singles and Coincidence Processing of Digital PET Detector Raw Data. IEEE Transactions on Nuclear Science 60, 3 (2013), 1550–1559. DOI: https://doi.org/10.1109/TNS.2013.2252193
 Christoph W. Lerche, André Salomon, Benjamin Goldschmidt, Sarah Lodomez, Björn Weissler, and Torsten Solf. 2016. Maximum likelihood positioning and energy correction for scintillation detectors. Physics in medicine and biology 61, 4 (2016), 1650–1676. DOI: https://doi.org/10.1088/0031-9155/61/4/1650