Module with Intel CPU Hardware intrinsics.
In order to use the functions provided by this module, you need to import this module:
>>> import HardwareIntrinsics
Depending on the characteristics of your CPU (e.g AVX2
, SSE3
...), this module will import only the intrinsic functions supported by your CPU.
Let's take the example of the instruction mm_or_ps
which allows to perform a OR of 4 x int32:
>>> help mm_or_ps
# mm_or_ps
#
# Compute the bitwise OR of packed single-precision (32-bit) floating-point
# elements in "a" and "b", and store the results in "dst".
#
# __m128 _mm_or_ps (__m128 a, __m128 b)
# ORPS xmm, xmm/m128
The arguments __m128
in the Intel documentation show that the function is expecting a 128-bit vector.
In kalk
, a 128-bit vector is represented by e.g bool4
, or int4
or float4
or byte16
(and more). You can pass these vectors directly to mm_or_ps
:
>>> mm_or_ps(int4(1,2,3,4), int4(5,6,7,8))
# mm_or_ps(int4(1, 2, 3, 4), int4(5, 6, 7, 8))
out = float4(7E-45, 8E-45, 1E-44, 1.7E-44)
Note that the result is returned using float4
. This is because m128
types is by default converted to a float4
in kalk
.
But you can easily bitcast it to int4:
>>> bitcast(int4, out)
# bitcast(int4, out)
out = int4(5, 6, 7, 12)
Let's use a similar example with mm_blend_ps
:
>>> bitcast(int4, mm_blend_ps(int4(1,2,3,4), int4(5,6,7,8), 0b1010))
# bitcast(int4, mm_blend_ps(int4(1, 2, 3, 4), int4(5, 6, 7, 8), 10))
out = int4(1, 6, 3, 8)
Or for example using mm_extract_epi8
:
>>> mm_extract_epi8(int4(1,2,3,4), 0)
# mm_extract_epi8(int4(1, 2, 3, 4), 0)
out = 1
>>> mm_extract_epi8(int4(1,2,3,4), 4)
# mm_extract_epi8(int4(1, 2, 3, 4), 4)
out = 2