harmony 鸿蒙Using Neon Instructions
Using Neon Instructions
ARM Neon is an advanced Single Instruction Multiple Data (SIMD) architecture extension for ARM processors. It supports parallel processing of multiple pieces of data by using one instruction. It is widely used in fields such as multimedia encoding/decoding and 2D/3D graphics to improve execution performance.
The Neon extension is used since ARMv7. Currently, it is set as default in Cortex-A7, Cortex-A12, and Cortex-A15 processors, but is optional in other ARMv7 Cortex-A series processors. For details, see Introducing NEON Development Article.
The ARMv8-A CPU integrates the Neon extension by default, which is supported in both AArch64 and AArch32. For details, see Learn the architecture - Introducing Neon.
Architecture Support in OpenHarmony
In OpenHarmony, the Neon extension is enabled by default in arm64-v8a ABIs. It is disabled by default in armeabi-v7a ABIs, in order to support as many ARMv7-A devices as possible.
In the LLVM toolchain of the OpenHarmony SDK, the armeabi-v7a ABI supports precompiled runtime libraries with many configurations. The directory structure is as follows. native-root is the root directory where the native package of the C APIs is decompressed.
{native-root}/llvm/lib/clang/current/lib/arm-linux-ohos/
|-- a7_hard_neon-vfpv4
| |-- clang_rt.crtbegin.o
| |-- clang_rt.crtend.o
| |-- ...
|
|-- a7_soft
| |-- clang_rt.crtbegin.o
| |-- clang_rt.crtend.o
| |-- ...
|
|-- a7_softfp_neon-vfpv4
|-- clang_rt.crtbegin.o
|-- clang_rt.crtend.o
|-- ...
hard, soft, and softfp are float-abi. If they are not specified, softfp is used by default. neon-vfpv4 is the parameter type specified by -mfpu. The LLVM toolchain selects binary libraries that depend on different architecture configurations based on the compilation parameters.
How to Use
The Neon extension can be used in the following ways: 1. Use the Auto-Vectorization feature of LLVM. The compiler generates instructions. This feature is enabled by default and can be disabled by running -fno-vectorize. For details, see Auto-Vectorization in LLVM. 2. Use the Neon intrinsics library. You can directly operate low-level Neon instructions. 3. Manually write Neon assembly instructions.
For details, see Arm Neon.
Examples
The following example describes how to use Neon intrinsics in an armeabi-v7a OpenHarmony C++ project. 1. Include the arm_neon.h header file in the source code. The Neon intrinsics feature is closely related to the CPU architecture. Therefore, you are advised to include this header file in macros such as cpu_features_macros.
#include "cpu_features_macros.h"
void call_neon_intrinsics(short *output, const short* input, const short* kernel, int width, int kernelSize)
{
int nn, offset = -kernelSize/2;
for (nn = 0; nn < width; nn++)
{
int mm, sum = 0;
int32x4_t sum_vec = vdupq_n_s32(0); // Neon instruction function
for(mm = 0; mm < kernelSize/4; mm++)
{
int16x4_t kernel_vec = vld1_s16(kernel + mm*4);
int16x4_t input_vec = vld1_s16(input + (nn+offset+mm*4));
sum_vec = vmlal_s16(sum_vec, kernel_vec, input_vec);
}
...
}
...
}
- Call the corresponding implementation functions based on the CPU feature. “` c void Compute(void) { #if defined (CPU_FEATURES_ARCH_ARM) static const ArmFeatures features = GetArmInfo().features;
// Determine whether the CPU features are supported based on the features field. if (features.neon) { // Run optimized code. } else { // Call normal functions written in C. } #endif }
3. Add the corresponding options to the **CMakeLists.txt** file.
``` makefile
if (${OHOS_ARCH} STREQUAL "armeabi-v7a")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -mfloat-abi=softfp")
endif ()
Now you can use Neon intrinsics in the project.
你可能感兴趣的鸿蒙文章
harmony 鸿蒙Drawing and Display Sample
harmony 鸿蒙Hardware Compatibility
harmony 鸿蒙Using MindSpore Lite for Model Inference
harmony 鸿蒙Using MindSpore Lite for Offline Model Conversion and Inference
harmony 鸿蒙Using Native APIs in Application Projects
- 所属分类: 后端技术
- 本文标签:
热门推荐
-
2、 - 优质文章
-
3、 gate.io
-
8、 golang
-
9、 openharmony
-
10、 Vue中input框自动聚焦