Interleaving Bits with x86 SIMD instructions (SSE)


Author Dominik Auras
Type C Code Snippets
The first snippet requires about 1.24 cycles per output byte, and the 2nd approximately 1.83 cycles (on my Core 2 Duo).


This work is licensed under the terms of the GPL.

Compile flags

-msse2 -O2 -mtune=native -march=native  -flax-vector-conversions

Code Snippet 1

Code Snippet 2

Reference Code