The NKI Library provides pre-built reference kernels you can use directly in your model development with the AWS Neuron SDK and NKI. These kernel APIs provide the default classes, functions, and parameters you can use to integrate the NKL kernels into your models. More details can be found in the NKI Library Documentation
The kernels in this repo require the Neuron 2.27 release.
| Kernel API | Description |
|---|---|
| Attention CTE Kernel | The kernel implements attention with support for multiple variants and optimizations. |
| Attention TKG Kernel | The kernel implements attention specifically optimized for token generation use cases. |
| Cross Entropy Kernel | The kernel implements memory-efficient cross entropy for large vocabularies using online log-sum-exp. |
| MLP Kernel | The kernel implements a Multi-Layer Perceptron with optional normalization fusion and various optimizations. |
| MoE CTE Kernel | The kernel implements Mixture of Experts optimized for Context Encoding use cases. |
| MoE TKG Kernel | The kernel implements Mixture of Experts optimized for Token Generation use cases. |
| Output Projection CTE Kernel | The kernel computes the output projection operation optimized for Context Encoding use cases. |
| Output Projection TKG Kernel | The kernel computes the output projection operation optimized for Token Generation use cases. |
| QKV Kernel | The kernel performs Query-Key-Value projection with optional normalization fusion. |
| RMSNorm-Quant Kernel | The kernel performs optional RMS normalization followed by quantization to fp8. |
| RoPE Kernel | The kernel applies Rotary Position Embedding to input embeddings with optional LNC sharding. |
| Router Top-K Kernel | The kernel computes router logits and top-K selection for Mixture of Experts models. |
| Cumsum Kernel | The kernel computes cumulative sum along the last dimension. |
| Kernel API | Description |
|---|---|
| Attention Block TKG Kernel | The kernel implements fused attention block for TKG with RMSNorm, QKV, RoPE, and output projection. |
The Neuron compiler includes a bundled version of this package within neuronx-cc, accessible under the nkilib Python namespace (for example, import nkilib). This bundled version is referred to as "bundled nkilib" throughout this guide. Bundled nkilib has been validated to work with that particular compiler version and can be used out of the box.
If you want to contribute a kernel change or use the latest kernels, you can integrate with this package directly.
Note: Unlike bundled nkilib, kernels from this package are not guaranteed to be compatible with the latest release of the Neuron compiler. To start from a known good commit compatible with your compiler version, find the branch corresponding to your compiler version in this repository.
- Install
neuronx-ccas usual (most likely already done). For more information, see the Neuron Quick Start Guide. - Install this package into the same virtual environment as the rest of your project:
pip install nki-library
- Import and use kernels as usual. This package automatically replaces bundled nkilib kernels with the content of this package. No code changes are required.
To uninstall, run the following command:
pip uninstall nki-libraryAfter uninstalling, the compiler falls back to the bundled nkilib.
To temporarily revert to the bundled version of nkilib, set the NKILIB_FORCE_BUNDLED_LIBRARY environment variable to a truthy value:
export NKILIB_FORCE_BUNDLED_LIBRARY=trueOn the next execution of neuronx-cc, it will use the bundled version of nkilib. To go back to the kernels from this package, unset NKILIB_FORCE_BUNDLED_LIBRARY
unset NKILIB_FORCE_BUNDLED_LIBRARY