Add ModernBERT family by georgeguimaraes · Pull Request #435 · elixir-nx/bumblebee

georgeguimaraes · 2025-12-28T11:12:13Z

This adds support for ModernBERT, a recent encoder model that improves on BERT with a few architectural changes:

Rotary position embeddings (RoPE) instead of absolute position embeddings
Alternating local and global attention layers for efficiency on longer sequences
Gated linear units (GeGLU) in the feed-forward blocks
Pre-normalization (norm before attention/FFN rather than after)
No bias in layer normalization

ModernBert (Encoder)

Supported architectures:

:base
:for_masked_language_modeling
:for_sequence_classification
:for_token_classification

The MLM head uses tied embeddings (shares weights with the input token embeddings).

ModernBertDecoder (Decoder)

A decoder-only variant trained for causal language modeling (text generation).

Supported architectures:

:base
:for_causal_language_modeling

Reference: https://arxiv.org/abs/2412.13663

Copilot

Pull request overview

This PR adds comprehensive support for ModernBERT, a recent encoder model that modernizes BERT with architectural improvements including RoPE position embeddings, alternating local/global attention, gated linear units (GeGLU), and pre-normalization. The implementation follows established patterns in the codebase and includes proper model-to-HuggingFace parameter mappings.

Full implementation of ModernBERT model with four architectures: :base, :for_masked_language_modeling, :for_sequence_classification, and :for_token_classification
Special attention architecture with alternating local (window-based) and global attention layers, each with distinct RoPE theta values
Test coverage for base and MLM architectures with validation against reference outputs

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
lib/bumblebee/text/modernbert.ex	Core implementation including encoder with alternating attention patterns, gated FFN, RMS normalization, mean pooling for sequence classification, and tied embeddings for MLM head
lib/bumblebee/text/pre_trained_tokenizer.ex	Adds ModernBERT special token configuration (UNK, SEP, PAD, CLS, MASK)
lib/bumblebee.ex	Registers ModernBERT model architectures and tokenizer type mapping
test/bumblebee/text/modernbert_test.exs	Integration tests for `:base` and `:for_masked_language_modeling` architectures with output validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/bumblebee/text/modernbert_test.exs

georgeguimaraes · 2026-01-01T19:06:16Z

Matched the output with transformers, looks good now.

Added ModernBertDecoder as well, which is the decoder variant for causal language modeling. It's a separate model in transformers (different file, different config), so I created a separate module for it.

Also tested loading answerdotai/ModernBERT-base to make sure params mapping works with real checkpoints. No issues there.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-01T19:09:53Z

test/bumblebee/text/modernbert_decoder_test.exs

+
+  @moduletag model_test_tags()
+
+  test ":for_causal_language_modeling" do


The test file is missing a test for the :base architecture. ModernBertDecoder declares support for the :base architecture in its architectures/0 function, but there is no corresponding test case. Other decoder models in the codebase (e.g., GPT2, Llama) include tests for all their supported architectures, including :base.

Add support for ModernBERT, a modern encoder model with architectural improvements over BERT: - Rotary position embeddings (RoPE) instead of absolute position embeddings - Alternating local and global attention layers for efficiency - Gated linear units (GeGLU) in feed-forward blocks - Pre-normalization with LayerNorm (no bias) - First layer reuses embedding norm for attention Supported architectures: - :base - :for_masked_language_modeling - :for_sequence_classification - :for_token_classification Reference: https://arxiv.org/abs/2412.13663

lib/bumblebee/text/modernbert.ex

Replace manual layer norm implementation with Axon.Layers.layer_norm, passing 0 as beta to implement layer norm without bias. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename local_rope_theta/global_rope_theta to rotary_embedding_base_local/ rotary_embedding_base to match naming convention in other models (Gemma 3).

Replace global_attention_every_n_layers with layer_types list to match transformers v5 format. Backward compatibility: generates layer_types from global_attn_every_n_layers when not present in checkpoint config.

jonatanklosko

I refactored it to use Layers.Transformer.blocks, instead of doing the loop ourselves.

FTR I considered adding index as a parameter to the :block_type function, so that we could have the conditional there. It would be a bit more general solution than, however it would require passing index to Layers.Transformer.block, which implies there is a single list of blocks. That is almost always the case, but Albert has actually N groups of M blocks, so there index would be ambiguous. So for now I went with another approach, which is to special case layer norm based on the name.

Copilot AI review requested due to automatic review settings December 28, 2025 11:12

Copilot started reviewing on behalf of georgeguimaraes December 28, 2025 11:12 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

jonatanklosko reviewed Dec 29, 2025

View reviewed changes

test/bumblebee/text/modernbert_test.exs Show resolved Hide resolved

jonatanklosko reviewed Dec 29, 2025

View reviewed changes

test/bumblebee/text/modernbert_test.exs Outdated Show resolved Hide resolved

georgeguimaraes force-pushed the feat/modernbert-implementation branch 2 times, most recently from 684e256 to af54694 Compare January 1, 2026 18:20

georgeguimaraes changed the title ~~Add ModernBERT model~~ Add ModernBERT family Jan 1, 2026

georgeguimaraes force-pushed the feat/modernbert-implementation branch from af54694 to 6904c32 Compare January 1, 2026 19:03

georgeguimaraes requested review from Copilot and jonatanklosko January 1, 2026 19:06

Copilot started reviewing on behalf of georgeguimaraes January 1, 2026 19:07 View session

Copilot AI reviewed Jan 1, 2026

View reviewed changes

georgeguimaraes force-pushed the feat/modernbert-implementation branch from 6904c32 to d47539c Compare January 3, 2026 21:20

test: Add :base architecture test for ModernBertDecoder

8f40655

jonatanklosko reviewed Jan 5, 2026

View reviewed changes

lib/bumblebee/text/modernbert.ex Outdated Show resolved Hide resolved

lib/bumblebee/text/modernbert.ex Outdated Show resolved Hide resolved

lib/bumblebee/text/modernbert.ex Outdated Show resolved Hide resolved

georgeguimaraes and others added 4 commits January 6, 2026 10:22

refactor: Unify rotary embedding option naming

6d6fd78

Rename local_rope_theta/global_rope_theta to rotary_embedding_base_local/ rotary_embedding_base to match naming convention in other models (Gemma 3).

refactor: Use layer_types for per-layer attention configuration

c391e84

Replace global_attention_every_n_layers with layer_types list to match transformers v5 format. Backward compatibility: generates layer_types from global_attn_every_n_layers when not present in checkpoint config.

style: Fix formatting in modernbert.ex

dfb8ad0

georgeguimaraes requested a review from jonatanklosko January 6, 2026 14:42

jonatanklosko added 2 commits January 19, 2026 17:38

Merge branch 'main' into feat/modernbert-implementation

a1786e4

Simplify

3846cf5

jonatanklosko approved these changes Jan 19, 2026

View reviewed changes

jonatanklosko merged commit f519f9b into elixir-nx:main Jan 19, 2026
2 checks passed

georgeguimaraes deleted the feat/modernbert-implementation branch January 21, 2026 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ModernBERT family#435

Add ModernBERT family#435
jonatanklosko merged 8 commits intoelixir-nx:mainfrom
georgeguimaraes:feat/modernbert-implementation

georgeguimaraes commented Dec 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

georgeguimaraes commented Jan 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonatanklosko left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		@moduletag model_test_tags()

		test ":for_causal_language_modeling" do

Conversation

georgeguimaraes commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ModernBert (Encoder)

ModernBertDecoder (Decoder)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

georgeguimaraes commented Jan 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonatanklosko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

georgeguimaraes commented Dec 28, 2025 •

edited

Loading