harmony 鸿蒙Using Neon Instructions

2023-10-30
浏览 (787)

Using Neon Instructions

ARM Neon is an advanced Single Instruction Multiple Data (SIMD) architecture extension for ARM processors. It supports parallel processing of multiple pieces of data by using one instruction. It is widely used in fields such as multimedia encoding/decoding and 2D/3D graphics to improve execution performance.

The Neon extension is used since ARMv7. Currently, it is set as default in Cortex-A7, Cortex-A12, and Cortex-A15 processors, but is optional in other ARMv7 Cortex-A series processors. For details, see Introducing NEON Development Article.

The ARMv8-A CPU integrates the Neon extension by default, which is supported in both AArch64 and AArch32. For details, see Learn the architecture - Introducing Neon.

Architecture Support in OpenHarmony

In OpenHarmony, the Neon extension is enabled by default in arm64-v8a ABIs. It is disabled by default in armeabi-v7a ABIs, in order to support as many ARMv7-A devices as possible.

In the LLVM toolchain of the OpenHarmony SDK, the armeabi-v7a ABI supports precompiled runtime libraries with many configurations. The directory structure is as follows. native-root is the root directory where the native package of the C APIs is decompressed.

{native-root}/llvm/lib/clang/current/lib/arm-linux-ohos/
  |-- a7_hard_neon-vfpv4
  |    |-- clang_rt.crtbegin.o
  |    |-- clang_rt.crtend.o
  |    |-- ...
  |
  |-- a7_soft
  |    |-- clang_rt.crtbegin.o
  |    |-- clang_rt.crtend.o
  |    |-- ...
  |
  |-- a7_softfp_neon-vfpv4
          |-- clang_rt.crtbegin.o
          |-- clang_rt.crtend.o
          |-- ...

hard, soft, and softfp are float-abi. If they are not specified, softfp is used by default. neon-vfpv4 is the parameter type specified by -mfpu. The LLVM toolchain selects binary libraries that depend on different architecture configurations based on the compilation parameters.

How to Use

The Neon extension can be used in the following ways: 1. Use the Auto-Vectorization feature of LLVM. The compiler generates instructions. This feature is enabled by default and can be disabled by running -fno-vectorize. For details, see Auto-Vectorization in LLVM. 2. Use the Neon intrinsics library. You can directly operate low-level Neon instructions. 3. Manually write Neon assembly instructions.

For details, see Arm Neon.

Examples

The following example describes how to use Neon intrinsics in an armeabi-v7a OpenHarmony C++ project. 1. Include the arm_neon.h header file in the source code. The Neon intrinsics feature is closely related to the CPU architecture. Therefore, you are advised to include this header file in macros such as cpu_features_macros.

#include "cpu_features_macros.h"

void call_neon_intrinsics(short *output, const short* input, const short* kernel, int width, int kernelSize)
{
   int nn, offset = -kernelSize/2;

   for (nn = 0; nn < width; nn++)
   {
        int mm, sum = 0;
        int32x4_t sum_vec = vdupq_n_s32(0); // Neon instruction function
        for(mm = 0; mm < kernelSize/4; mm++)
        {
            int16x4_t  kernel_vec = vld1_s16(kernel + mm*4);
            int16x4_t  input_vec = vld1_s16(input + (nn+offset+mm*4));
            sum_vec = vmlal_s16(sum_vec, kernel_vec, input_vec);
        }
        ...
   }
   ...
}

Call the corresponding implementation functions based on the CPU feature. “` c void Compute(void) { #if defined (CPU_FEATURES_ARCH_ARM) static const ArmFeatures features = GetArmInfo().features;

// Determine whether the CPU features are supported based on the features field. if (features.neon) { // Run optimized code. } else { // Call normal functions written in C. } #endif }


3. Add the corresponding options to the **CMakeLists.txt** file.
``` makefile
if (${OHOS_ARCH} STREQUAL "armeabi-v7a")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -mfloat-abi=softfp")
endif ()

Now you can use Neon intrinsics in the project.