Skip to content

aws-neuron/nki-library

NKI Library

The NKI Library provides pre-built reference kernels you can use directly in your model development with the AWS Neuron SDK and NKI. These kernel APIs provide the default classes, functions, and parameters you can use to integrate the NKL kernels into your models. More details can be found in the NKI Library Documentation

NOTE

The kernels in this repo require the Neuron 2.27 release.

Kernel Reference

Kernel API Description
Attention CTE Kernel The kernel implements attention with support for multiple variants and optimizations.
Attention TKG Kernel The kernel implements attention specifically optimized for token generation use cases.
Cross Entropy Kernel The kernel implements memory-efficient cross entropy for large vocabularies using online log-sum-exp.
MLP Kernel The kernel implements a Multi-Layer Perceptron with optional normalization fusion and various optimizations.
MoE CTE Kernel The kernel implements Mixture of Experts optimized for Context Encoding use cases.
MoE TKG Kernel The kernel implements Mixture of Experts optimized for Token Generation use cases.
Output Projection CTE Kernel The kernel computes the output projection operation optimized for Context Encoding use cases.
Output Projection TKG Kernel The kernel computes the output projection operation optimized for Token Generation use cases.
QKV Kernel The kernel performs Query-Key-Value projection with optional normalization fusion.
RMSNorm-Quant Kernel The kernel performs optional RMS normalization followed by quantization to fp8.
RoPE Kernel The kernel applies Rotary Position Embedding to input embeddings with optional LNC sharding.
Router Top-K Kernel The kernel computes router logits and top-K selection for Mixture of Experts models.
Cumsum Kernel The kernel computes cumulative sum along the last dimension.

Experimental Kernels

Kernel API Description
Attention Block TKG Kernel The kernel implements fused attention block for TKG with RMSNorm, QKV, RoPE, and output projection.

Integration with the Neuron Compiler

The Neuron compiler includes a bundled version of this package within neuronx-cc, accessible under the nkilib Python namespace (for example, import nkilib). This bundled version is referred to as "bundled nkilib" throughout this guide. Bundled nkilib has been validated to work with that particular compiler version and can be used out of the box.

If you want to contribute a kernel change or use the latest kernels, you can integrate with this package directly.

Note: Unlike bundled nkilib, kernels from this package are not guaranteed to be compatible with the latest release of the Neuron compiler. To start from a known good commit compatible with your compiler version, find the branch corresponding to your compiler version in this repository.

Installation

  1. Install neuronx-cc as usual (most likely already done). For more information, see the Neuron Quick Start Guide.
  2. Install this package into the same virtual environment as the rest of your project:
    pip install nki-library
  3. Import and use kernels as usual. This package automatically replaces bundled nkilib kernels with the content of this package. No code changes are required.

Uninstalling

To uninstall, run the following command:

pip uninstall nki-library

After uninstalling, the compiler falls back to the bundled nkilib.

Controlling which package gets loaded

To temporarily revert to the bundled version of nkilib, set the NKILIB_FORCE_BUNDLED_LIBRARY environment variable to a truthy value:

export NKILIB_FORCE_BUNDLED_LIBRARY=true

On the next execution of neuronx-cc, it will use the bundled version of nkilib. To go back to the kernels from this package, unset NKILIB_FORCE_BUNDLED_LIBRARY

unset NKILIB_FORCE_BUNDLED_LIBRARY