1.1 Goal of this paper
It is necessary to reduce cache misses and branch misses for increasing the effect of acceralating database operations using SIMD. The author aims to acceralate database operations which has irregular memory access by using SIMD. Hash join which search big hash table which data size is larger than CPU cache size is a main example in this paper.
Prefetching is known as effective technique for reducing cache misses and branch misses. There are software prefetching techniques such as AMAC(asynchronous memory access chaining) without SIMD. But if we combine these techniques and SIMD such a (1) and (2) we cannot reduce cache misses and branch misses.
(1) Directly vectorized AMAC probe
This approach simply execute AMAC in each lane using SIMD. There is a difference of necessary instructions between lanes matching tuple key and keys in bucket executing hash join. So in this approach there is a idling lane.
(2) Fully vectorized AMAC probe
This approach firstly execute AMAC in each lane using SIMD. If a stage end process for a tuple then new tuple in the lane immediately. In this approach there are no idling lanes. But in this approach there are redundance because of multiple execution of residual operations such as creating a hash value.
The authors propose Interleaved Multi-Vectorizing(IMV). IMV interleaves residual operations to reduce the redundancy which exists in Fully vectorized AMAC probe. The experimental shows IMV achieves 3.17X better performance compared with the pure SIMD.
The author of this paper presented the detail in VLDB2020 “Micro-architectures & Flash Storage” session.
 Z. Fang, B. Zheng, C. Weng, Interleaved Multi-Vectorizing, PVLDB2020.