Intel Avx Intrinsics

In order to use the functions provided by this module, you need to import this module:

>>> import HardwareIntrinsics

These intrinsic functions are only available if your CPU supports Avx features.

mm256_add_pd

mm256_add_pd

Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256d _mm256_add_pd (__m256d a, __m256d b) VADDPD ymm, ymm, ymm/m256

mm256_add_ps

mm256_add_ps

Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256 _mm256_add_ps (__m256 a, __m256 b) VADDPS ymm, ymm, ymm/m256

mm256_addsub_pd

mm256_addsub_pd

Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

__m256d _mm256_addsub_pd (__m256d a, __m256d b) VADDSUBPD ymm, ymm, ymm/m256

mm256_addsub_ps

mm256_addsub_ps

Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

__m256 _mm256_addsub_ps (__m256 a, __m256 b) VADDSUBPS ymm, ymm, ymm/m256

mm256_and_pd

mm256_and_pd

Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256d _mm256_and_pd (__m256d a, __m256d b) VANDPD ymm, ymm, ymm/m256

mm256_and_ps

mm256_and_ps

Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256 _mm256_and_ps (__m256 a, __m256 b) VANDPS ymm, ymm, ymm/m256

mm256_andnot_pd

mm256_andnot_pd

Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

__m256d _mm256_andnot_pd (__m256d a, __m256d b) VANDNPD ymm, ymm, ymm/m256

mm256_andnot_ps

mm256_andnot_ps

Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

__m256 _mm256_andnot_ps (__m256 a, __m256 b) VANDNPS ymm, ymm, ymm/m256

mm256_blend_pd

mm256_blend_pd

Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

__m256d _mm256_blend_pd (__m256d a, __m256d b, const int imm8) VBLENDPD ymm, ymm, ymm/m256, imm8

mm256_blend_ps

mm256_blend_ps

Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

__m256 _mm256_blend_ps (__m256 a, __m256 b, const int imm8) VBLENDPS ymm, ymm, ymm/m256, imm8

mm256_blendv_pd

mm256_blendv_pd

Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

__m256d _mm256_blendv_pd (__m256d a, __m256d b, __m256d mask) VBLENDVPD ymm, ymm, ymm/m256, ymm

mm256_blendv_ps

mm256_blendv_ps

Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

__m256 _mm256_blendv_ps (__m256 a, __m256 b, __m256 mask) VBLENDVPS ymm, ymm, ymm/m256, ymm

mm256_broadcast_pd

mm256_broadcast_pd

Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of "dst".

__m256d _mm256_broadcast_pd (__m128d const * mem_addr) VBROADCASTF128, ymm, m128

mm256_broadcast_ps

mm256_broadcast_ps

Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of "dst".

__m256 _mm256_broadcast_ps (__m128 const * mem_addr) VBROADCASTF128, ymm, m128

mm256_broadcast_sd

mm256_broadcast_sd

Broadcast a double-precision (64-bit) floating-point element from memory to all elements of "dst".

__m256d _mm256_broadcast_sd (double const * mem_addr) VBROADCASTSD ymm, m64

mm256_broadcast_ss

mm256_broadcast_ss

Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst".

__m256 _mm256_broadcast_ss (float const * mem_addr) VBROADCASTSS ymm, m32

mm256_ceil_pd

mm256_ceil_pd

Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_ceil_pd (__m256d a) VROUNDPD ymm, ymm/m256, imm8(10)

mm256_ceil_ps

mm256_ceil_ps

Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".

__m256 _mm256_ceil_ps (__m256 a) VROUNDPS ymm, ymm/m256, imm8(10)

mm256_cmp_pd

mm256_cmp_pd

Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

__m256d _mm256_cmp_pd (__m256d a, __m256d b, const int imm8) VCMPPD ymm, ymm, ymm/m256, imm8

mm256_cmp_ps

mm256_cmp_ps

Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

__m256 _mm256_cmp_ps (__m256 a, __m256 b, const int imm8) VCMPPS ymm, ymm, ymm/m256, imm8

mm256_cmpeq_pd

mm256_cmpeq_pd

__m256d _mm256_cmpeq_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(0) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpeq_ps

mm256_cmpeq_ps

__m256 _mm256_cmpeq_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(0) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpge_pd

mm256_cmpge_pd

__m256d _mm256_cmpge_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(13) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpge_ps

mm256_cmpge_ps

__m256 _mm256_cmpge_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(13) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpgt_pd

mm256_cmpgt_pd

__m256d _mm256_cmpgt_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(14) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpgt_ps

mm256_cmpgt_ps

__m256 _mm256_cmpgt_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(14) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmple_pd

mm256_cmple_pd

__m256d _mm256_cmple_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(2) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmple_ps

mm256_cmple_ps

__m256 _mm256_cmple_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(2) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmplt_pd

mm256_cmplt_pd

__m256d _mm256_cmplt_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(1) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmplt_ps

mm256_cmplt_ps

__m256 _mm256_cmplt_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(1) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpneq_pd

mm256_cmpneq_pd

__m256d _mm256_cmpneq_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(4) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpneq_ps

mm256_cmpneq_ps

__m256 _mm256_cmpneq_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(4) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnge_pd

mm256_cmpnge_pd

__m256d _mm256_cmpnge_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(9) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnge_ps

mm256_cmpnge_ps

__m256 _mm256_cmpnge_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(9) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpngt_pd

mm256_cmpngt_pd

__m256d _mm256_cmpngt_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(10) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpngt_ps

mm256_cmpngt_ps

__m256 _mm256_cmpngt_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(10) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnle_pd

mm256_cmpnle_pd

__m256d _mm256_cmpnle_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(6) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnle_ps

mm256_cmpnle_ps

__m256 _mm256_cmpnle_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(6) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnlt_pd

mm256_cmpnlt_pd

__m256d _mm256_cmpnlt_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(5) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpnlt_ps

mm256_cmpnlt_ps

__m256 _mm256_cmpnlt_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(5) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpord_pd

mm256_cmpord_pd

__m256d _mm256_cmpord_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(7) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpord_ps

mm256_cmpord_ps

__m256 _mm256_cmpord_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(7) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpunord_pd

mm256_cmpunord_pd

__m256d _mm256_cmpunord_pd (__m256d a, __m256d b) CMPPD ymm, ymm/m256, imm8(3) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cmpunord_ps

mm256_cmpunord_ps

__m256 _mm256_cmpunord_ps (__m256 a, __m256 b) CMPPS ymm, ymm/m256, imm8(3) The above native signature does not exist. We provide this additional overload for completeness.

mm256_cvtepi32_pd

mm256_cvtepi32_pd

Convert packed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

__m256d _mm256_cvtepi32_pd (__m128i a) VCVTDQ2PD ymm, xmm/m128

mm256_cvtepi32_ps

mm256_cvtepi32_ps

Convert packed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

__m256 _mm256_cvtepi32_ps (__m256i a) VCVTDQ2PS ymm, ymm/m256

mm256_cvtpd_epi32

mm256_cvtpd_epi32

Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

__m128i _mm256_cvtpd_epi32 (__m256d a) VCVTPD2DQ xmm, ymm/m256

mm256_cvtpd_ps

mm256_cvtpd_ps

Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

__m128 _mm256_cvtpd_ps (__m256d a) VCVTPD2PS xmm, ymm/m256

mm256_cvtps_epi32

mm256_cvtps_epi32

Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

__m256i _mm256_cvtps_epi32 (__m256 a) VCVTPS2DQ ymm, ymm/m256

mm256_cvtps_pd

mm256_cvtps_pd

Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

__m256d _mm256_cvtps_pd (__m128 a) VCVTPS2PD ymm, xmm/m128

mm256_cvttpd_epi32

mm256_cvttpd_epi32

Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

__m128i _mm256_cvttpd_epi32 (__m256d a) VCVTTPD2DQ xmm, ymm/m256

mm256_cvttps_epi32

mm256_cvttps_epi32

Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

__m256i _mm256_cvttps_epi32 (__m256 a) VCVTTPS2DQ ymm, ymm/m256

mm256_div_pd

mm256_div_pd

Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

__m256d _mm256_div_pd (__m256d a, __m256d b) VDIVPD ymm, ymm, ymm/m256

mm256_div_ps

mm256_div_ps

Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

__m256 _mm256_div_ps (__m256 a, __m256 b) VDIVPS ymm, ymm, ymm/m256

mm256_dp_ps

mm256_dp_ps

Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".

__m256 _mm256_dp_ps (__m256 a, __m256 b, const int imm8) VDPPS ymm, ymm, ymm/m256, imm8

mm256_extractf128_pd

mm256_extractf128_pd

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

__m128d _mm256_extractf128_pd (__m256d a, const int imm8) VEXTRACTF128 xmm/m128, ymm, imm8

mm256_extractf128_ps

mm256_extractf128_ps

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

__m128 _mm256_extractf128_ps (__m256 a, const int imm8) VEXTRACTF128 xmm/m128, ymm, imm8

mm256_extractf128_si256

mm256_extractf128_si256

Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst".

__m128i _mm256_extractf128_si256 (__m256i a, const int imm8) VEXTRACTF128 xmm/m128, ymm, imm8

mm256_floor_pd

mm256_floor_pd

Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_floor_pd (__m256d a) VROUNDPS ymm, ymm/m256, imm8(9)

mm256_floor_ps

mm256_floor_ps

Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".

__m256 _mm256_floor_ps (__m256 a) VROUNDPS ymm, ymm/m256, imm8(9)

mm256_hadd_pd

mm256_hadd_pd

Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

__m256d _mm256_hadd_pd (__m256d a, __m256d b) VHADDPD ymm, ymm, ymm/m256

mm256_hadd_ps

mm256_hadd_ps

Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

__m256 _mm256_hadd_ps (__m256 a, __m256 b) VHADDPS ymm, ymm, ymm/m256

mm256_hsub_pd

mm256_hsub_pd

Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

__m256d _mm256_hsub_pd (__m256d a, __m256d b) VHSUBPD ymm, ymm, ymm/m256

mm256_hsub_ps

mm256_hsub_ps

Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

__m256 _mm256_hsub_ps (__m256 a, __m256 b) VHSUBPS ymm, ymm, ymm/m256

mm256_insertf128_pd

mm256_insertf128_pd

Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

__m256d _mm256_insertf128_pd (__m256d a, __m128d b, int imm8) VINSERTF128 ymm, ymm, xmm/m128, imm8

mm256_insertf128_ps

mm256_insertf128_ps

Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

__m256 _mm256_insertf128_ps (__m256 a, __m128 b, int imm8) VINSERTF128 ymm, ymm, xmm/m128, imm8

mm256_insertf128_si256

mm256_insertf128_si256

Copy "a" to "dst", then insert 128 bits from "b" into "dst" at the location specified by "imm8".

__m256i _mm256_insertf128_si256 (__m256i a, __m128i b, int imm8) VINSERTF128 ymm, ymm, xmm/m128, imm8

mm256_lddqu_si256

mm256_lddqu_si256

Load 256-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm256_loadu_si256" when the data crosses a cache line boundary.

__m256i _mm256_lddqu_si256 (__m256i const * mem_addr) VLDDQU ymm, m256

mm256_load_pd

mm256_load_pd

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

__m256d _mm256_load_pd (double const * mem_addr) VMOVAPD ymm, ymm/m256

mm256_load_ps

mm256_load_ps

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

__m256 _mm256_load_ps (float const * mem_addr) VMOVAPS ymm, ymm/m256

mm256_load_si256

mm256_load_si256

Load 256-bits of integer data from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

__m256i _mm256_load_si256 (__m256i const * mem_addr) VMOVDQA ymm, m256

mm256_loadu_pd

mm256_loadu_pd

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary.

__m256d _mm256_loadu_pd (double const * mem_addr) VMOVUPD ymm, ymm/m256

mm256_loadu_ps

mm256_loadu_ps

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary.

__m256 _mm256_loadu_ps (float const * mem_addr) VMOVUPS ymm, ymm/m256

mm256_loadu_si256

mm256_loadu_si256

Load 256-bits of integer data from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary.

__m256i _mm256_loadu_si256 (__m256i const * mem_addr) VMOVDQU ymm, m256

mm256_maskload_pd

mm256_maskload_pd

Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

__m256d _mm256_maskload_pd (double const * mem_addr, __m256i mask) VMASKMOVPD ymm, ymm, m256

mm256_maskload_ps

mm256_maskload_ps

Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

__m256 _mm256_maskload_ps (float const * mem_addr, __m256i mask) VMASKMOVPS ymm, ymm, m256

mm256_maskstore_pd

mm256_maskstore_pd

Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask".

void _mm256_maskstore_pd (double * mem_addr, __m256i mask, __m256d a) VMASKMOVPD m256, ymm, ymm

mm256_maskstore_ps

mm256_maskstore_ps

Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask".

void _mm256_maskstore_ps (float * mem_addr, __m256i mask, __m256 a) VMASKMOVPS m256, ymm, ymm

mm256_max_pd

mm256_max_pd

Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst".

__m256d _mm256_max_pd (__m256d a, __m256d b) VMAXPD ymm, ymm, ymm/m256

mm256_max_ps

mm256_max_ps

Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst".

__m256 _mm256_max_ps (__m256 a, __m256 b) VMAXPS ymm, ymm, ymm/m256

mm256_min_pd

mm256_min_pd

Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst".

__m256d _mm256_min_pd (__m256d a, __m256d b) VMINPD ymm, ymm, ymm/m256

mm256_min_ps

mm256_min_ps

Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst".

__m256 _mm256_min_ps (__m256 a, __m256 b) VMINPS ymm, ymm, ymm/m256

mm256_movedup_pd

mm256_movedup_pd

Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".

__m256d _mm256_movedup_pd (__m256d a) VMOVDDUP ymm, ymm/m256

mm256_movehdup_ps

mm256_movehdup_ps

Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

__m256 _mm256_movehdup_ps (__m256 a) VMOVSHDUP ymm, ymm/m256

mm256_moveldup_ps

mm256_moveldup_ps

Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

__m256 _mm256_moveldup_ps (__m256 a) VMOVSLDUP ymm, ymm/m256

mm256_movemask_pd

mm256_movemask_pd

Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".

int _mm256_movemask_pd (__m256d a) VMOVMSKPD reg, ymm

mm256_movemask_ps

mm256_movemask_ps

Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".

int _mm256_movemask_ps (__m256 a) VMOVMSKPS reg, ymm

mm256_mul_pd

mm256_mul_pd

Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256d _mm256_mul_pd (__m256d a, __m256d b) VMULPD ymm, ymm, ymm/m256

mm256_mul_ps

mm256_mul_ps

Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256 _mm256_mul_ps (__m256 a, __m256 b) VMULPS ymm, ymm, ymm/m256

mm256_or_pd

mm256_or_pd

Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256d _mm256_or_pd (__m256d a, __m256d b) VORPD ymm, ymm, ymm/m256

mm256_or_ps

mm256_or_ps

Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

__m256 _mm256_or_ps (__m256 a, __m256 b) VORPS ymm, ymm, ymm/m256

mm256_permute2f128_pd

mm256_permute2f128_pd

Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

__m256d _mm256_permute2f128_pd (__m256d a, __m256d b, int imm8) VPERM2F128 ymm, ymm, ymm/m256, imm8

mm256_permute2f128_ps

mm256_permute2f128_ps

Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

__m256 _mm256_permute2f128_ps (__m256 a, __m256 b, int imm8) VPERM2F128 ymm, ymm, ymm/m256, imm8

mm256_permute2f128_si256

mm256_permute2f128_si256

Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst".

__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int imm8) VPERM2F128 ymm, ymm, ymm/m256, imm8

mm256_permute_pd

mm256_permute_pd

Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

__m256d _mm256_permute_pd (__m256d a, int imm8) VPERMILPD ymm, ymm, imm8

mm256_permute_ps

mm256_permute_ps

Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

__m256 _mm256_permute_ps (__m256 a, int imm8) VPERMILPS ymm, ymm, imm8

mm256_permutevar_pd

mm256_permutevar_pd

Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

__m256d _mm256_permutevar_pd (__m256d a, __m256i b) VPERMILPD ymm, ymm, ymm/m256

mm256_permutevar_ps

mm256_permutevar_ps

Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

__m256 _mm256_permutevar_ps (__m256 a, __m256i b) VPERMILPS ymm, ymm, ymm/m256

mm256_rcp_ps

mm256_rcp_ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

__m256 _mm256_rcp_ps (__m256 a) VRCPPS ymm, ymm/m256

mm256_round_pd1

mm256_round_pd1

Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_round_pd (__m256d a, _MM_FROUND_CUR_DIRECTION) VROUNDPD ymm, ymm/m256, imm8(4)

mm256_round_pd1_to_nearest_integer

mm256_round_pd1_to_nearest_integer

Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_round_pd (__m256d a, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC) VROUNDPD ymm, ymm/m256, imm8(8)

mm256_round_pd1_to_negative_infinity

mm256_round_pd1_to_negative_infinity

Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_round_pd (__m256d a, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC) VROUNDPD ymm, ymm/m256, imm8(9)

mm256_round_pd1_to_positive_infinity

mm256_round_pd1_to_positive_infinity

Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_round_pd (__m256d a, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC) VROUNDPD ymm, ymm/m256, imm8(10)

mm256_round_pd1_to_zero

mm256_round_pd1_to_zero

Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".

__m256d _mm256_round_pd (__m256d a, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC) VROUNDPD ymm, ymm/m256, imm8(11)