Module with Intel CPU Hardware intrinsics.
In order to use the functions provided by this module, you need to import this module:
>>> import HardwareIntrinsics
Depending on the characteristics of your CPU (e.g
SSE3...), this module will import only the intrinsic functions supported by your CPU.
Let's take the example of the instruction
mm_or_ps which allows to perform a OR of 4 x int32:
>>> help mm_or_ps # mm_or_ps # # Compute the bitwise OR of packed single-precision (32-bit) floating-point # elements in "a" and "b", and store the results in "dst". # # __m128 _mm_or_ps (__m128 a, __m128 b) # ORPS xmm, xmm/m128
__m128 in the Intel documentation show that the function is expecting a 128-bit vector.
kalk, a 128-bit vector is represented by e.g
byte16 (and more). You can pass these vectors directly to
>>> mm_or_ps(int4(1,2,3,4), int4(5,6,7,8)) # mm_or_ps(int4(1, 2, 3, 4), int4(5, 6, 7, 8)) out = float4(7E-45, 8E-45, 1E-44, 1.7E-44)
Note that the result is returned using
float4. This is because
m128 types is by default converted to a
But you can easily bitcast it to int4:
>>> bitcast(int4, out) # bitcast(int4, out) out = int4(5, 6, 7, 12)
Let's use a similar example with
>>> bitcast(int4, mm_blend_ps(int4(1,2,3,4), int4(5,6,7,8), 0b1010)) # bitcast(int4, mm_blend_ps(int4(1, 2, 3, 4), int4(5, 6, 7, 8), 10)) out = int4(1, 6, 3, 8)
Or for example using
>>> mm_extract_epi8(int4(1,2,3,4), 0) # mm_extract_epi8(int4(1, 2, 3, 4), 0) out = 1 >>> mm_extract_epi8(int4(1,2,3,4), 4) # mm_extract_epi8(int4(1, 2, 3, 4), 4) out = 2