How to combine model and mmproj ? #2122

SwHaraday · 2026-02-11T04:32:57Z

SwHaraday
Feb 11, 2026

Hi,

I've been playing with Qwen3VL-8B-Instruct-Q8_0.gguf on Win11+conda+cuda environment.
It seems to work just for text question but images are not given to model.

Anyone please educate me how to combine mmproj-GGUF to model ?
Just showing link to related information will be appreciated.
My code to load model is as below.

from llama_cpp import Llama
llm = Llama(
    model_path="./Qwen3VL-8B-Instruct-Q8_0.gguf",
    mmproj_path="./mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf",
    n_ctx=1000,
    n_gpu_layer=-1,
    verbose=True, 
)

environment:
Windows11 pro 24H2
GPU NVIDIA RTX3090
python 3.11
torch 2.5.1+cu124
llama_cpp_python 0.3.23

Thank you in advance.

Answered by yamikumo-DSD

Feb 12, 2026

I guess you're using JamePeng's fork since the ver is 0.3.23.
The interface of abetlen's main branch and the fork are incompatible, hence you need to modify some to let things work.

Anyway, the usage of mmproj is described in class Llava15ChatHandler in llama_cpp/llama_chat_format.py.
Basically, you can use mmproj by copy-n-pasting the core part of it or by defining a class inheriting it.

class Llava15ChatHander:
    # The constructor takes the path to the `mmproj.gguf` file.
    def __init__(self, clip_model_path, verbose):
        ...

    # The core logic communicating with libmtmd of C++ side is defined here.
    def __call__(self, ...):
        ...

The __call__ method does

initializ…

View full answer

yamikumo-DSD · 2026-02-12T11:01:38Z

yamikumo-DSD
Feb 12, 2026

I guess you're using JamePeng's fork since the ver is 0.3.23.
The interface of abetlen's main branch and the fork are incompatible, hence you need to modify some to let things work.

Anyway, the usage of mmproj is described in class Llava15ChatHandler in llama_cpp/llama_chat_format.py.
Basically, you can use mmproj by copy-n-pasting the core part of it or by defining a class inheriting it.

class Llava15ChatHander:
    # The constructor takes the path to the `mmproj.gguf` file.
    def __init__(self, clip_model_path, verbose):
        ...

    # The core logic communicating with libmtmd of C++ side is defined here.
    def __call__(self, ...):
        ...

The __call__ method does

initialize mtmd context,
parse prompt, extract URLs in base64-format,
convert the images into bitmap object,
replace the URLs by media marker,
tokenize texts and images together by mtmd_tokenize, which gives you chunks,
text chunks -> evaluated by normal eval method,
image chunks -> evaluated by mtmd_helper_eval_chunk_single, after which you need to update llama.n_tokens by new_n_past,
free bitmap objects
pass llama.input_ids to llama.create_completion.

1 reply

SwHaraday Feb 13, 2026
Author

Hi yamikumo-DSD
Thank you for your reply that gave me big hint !!

I found what I exactly want to know at "https://github.com/JamePeng/llama-cpp-python?tab=readme-ov-file" ,
section 'Loading a Local Image With Qwen3VL(Thinking/Instruct)'.

Now code works fine.
Hope this can be a foothold for those who walking same muddy path.

Your kindness is very much appreciated.

yamikumo-DSD · 2026-02-12T11:09:13Z

yamikumo-DSD
Feb 12, 2026

A short code snippet to use Llava15ChatHandler (I haven't tested).
Reference: https://note.com/claude_a/n/nf8ff1291b779

import base64
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava15ChatHandler


def image_to_base64(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")


def main():
    chat_handler = Llava15ChatHandler(
        clip_model_path="./llm/mmproj-model-f16.gguf"
    )

    llm = Llama(
        model_path="./llm/gemma-3-4b-it-Q4_K_M.gguf",
        chat_handler=chat_handler,
        n_ctx=4096,
        n_gpu_layers=-1,
        logits_all=True
    )

    image_path = "./sample_image.png"
    image_base64 = image_to_base64(image_path)

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in the image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_base64}"}
                }
            ]
        }
    ]

    print("Assistant: ", end="", flush=True)
    stream = llm.create_chat_completion(
        messages=messages,
        stream=True
    )

    for chunk in stream:
        if "content" in chunk["choices"][0]["delta"]:
            print(chunk["choices"][0]["delta"]["content"], end="", flush=True)

    print("\n")
    llm.close()

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to combine model and mmproj ? #2122

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to combine model and mmproj ? #2122

Uh oh!

SwHaraday Feb 11, 2026

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

yamikumo-DSD Feb 12, 2026

Uh oh!

SwHaraday Feb 13, 2026 Author

Uh oh!

yamikumo-DSD Feb 12, 2026

SwHaraday
Feb 11, 2026

Replies: 2 comments 1 reply

yamikumo-DSD
Feb 12, 2026

SwHaraday Feb 13, 2026
Author

yamikumo-DSD
Feb 12, 2026