1. Prepare OpenVINO models

1.1. Direct using model files

import openvino as ov
core = ov.Core()
ov_model = core.read_model("/PATH/TO/INPUT_MODEL")
ov.save_model(ov_model, "/PATH/TO/OV_MODEL.xml", compress_to_fp16=False)
compiled_model = ov.compile_model(ov_model)

1.1.1 ONNX

ONNX model, which is a single .onnx file, can been read directly by OpenVINO read_model function

1.1.2 PaddlePaddle

PaddlePaddle models saved for inference, which has two files that names lik "inference.pdmodel" and "inference.pdiparams" in the same directory. Then pass the "PATH/TO/inference.pdmodel" to OpenVINO read_model function

1.1.3 Tensorflow

TensorFlow models saved in frozen graph format can also be passed to OpenVINO read_model function

1.1.4 TFLite

TFLite models saved for inference with extension .tflite can be read directly by OpenVINO read_model function

After read_model, you can use compile_model to generate compiled_model for inference

1.2. Convert with OVC CLI tool

Using the OVC CLI tool provided by OpenVINO

Command line example:

ovc PATH/TO/INPUT/MODEL --input input_ids[1,128],attention_mask[-1,128] --output_model PATH/TO/OUTPUT/MODEL.xml

The input parameter is optional.

By default, the input shapes will remain the same as the original model.
Alternatively, you can set specific shapes to generate a fixed-shape model,
or use -1 to indicate that the shape of this dimension is dynamic.

1.3. Convert with Python API

1.3.1. Tensorflow 2 SavedModel / MetaGraph / Checkpoint

import openvino as ov
ov_model_tf_SavedModel = ov.convert_model("PATH/TO/TF/SavedModel/DIR")
ov_model_tf_MetaGraph = ov.convert_model("PATH/TO/TF/meta_graph.meta")
ov_model_tf_Checkpoint = ov.convert_model(["PATH/TO/TF/inference_graph.pb", "PATH/TO/TF/checkpoint_file.ckpt"])

Save OpenVINO model files. compress_to_fp16 = True to save f16 based model, otherwise save f32 based model

ov.save_model(ov_model_tf, "model/exported_tf_model.xml", compress_to_fp16=False)

Directly compile model for inference

compiled_model_tf = core.compile_model(ov_model_tf, device_name="CPU")

1.3.2. Simple Torch model

Load torch model with PyTorch functions

pt_model = LOADED_TORCH_MODEL_WITH_TORCH_FUNCTIONS
pt_model.eval()

Prepare example_input

example_input = torch.zeros((1, 3, 224, 224))

Convert to openvino model.

The input parameter is optional.
By default, the input shapes will remain the same as the original model.
Alternatively, you can set specific shapes to generate a fixed-shape model,
or use -1 to indicate that the shape of this dimension is dynamic.

import openvino as ov
ov_model_pytorch = ov.convert_model(pt_model, example_input=example_input, input=[[1, 3, 224, 224]])

Save OpenVINO model files. compress_to_fp16 = True to save f16 based model, otherwise save f32 based model

ov.save_model(ov_model_pytorch, "model/exported_pytorch_model.xml", compress_to_fp16=False)

Directly compile model for inference

compiled_model_pytorch = core.compile_model(ov_model_pytorch, device_name="CPU")

1.3.3. Convert from Torch Model with KV-Cache

The main process is the same as simple Torch model, but need a step to make model stateful (store kv-cache in OpenVINO internal) before save_model.
In the ov_model_helper.py, We have provided the function "patch_model_stateful".

First, named INPUTs/OUTPUTs which are KV-Cache tensors with key_values.* and *present.**.
Second, call patch_model_stateful before save_model
Refs to ov_model_helper.py#L300-L318 or Ref to FireRedAsrAedWrapper::convert_ov_model

2. Using OpenVINO to inference

We have provided a base class OV_Operator in ov_operator_async.py.
Using UnimernetEncoderModel as example. It inherits from the base class and implement the setup_model and call methods.
The setup_model method can integrate certain data preprocessing functions into the OpenVINO execution workflow.
The call method mainly defines the model’s input.

First init class

ov_model = UnimernetEncoderModel(model_path)

Second setup model.

# stream_num==1 then activate sync mode (LATENCY MODE) otherwise using async mode (THROUGHPUT MODE).  
# bf16==True then will use BF16 data type for inference.  
# f16==True then will use F16 data type for inference.  
# Priority level: BF16 > F16 > F32  
# TODO: AMX-F16 > AMX-BF16 > AVX512-F16 > AVX512-BF16 > F32
ov_model.setup_model(stream_num = 2, bf16=True, f16=True)

Final call as Torch

res = ov_model(inputs)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
img_det_rec.py		img_det_rec.py
img_det_rec_result.py		img_det_rec_result.py
ov_model_helper.py		ov_model_helper.py
ov_operator_async.py		ov_operator_async.py
postprocess.py		postprocess.py
prepeocess.py		prepeocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Prepare OpenVINO models

1.1. Direct using model files

1.1.1 ONNX

1.1.2 PaddlePaddle

1.1.3 Tensorflow

1.1.4 TFLite

1.2. Convert with OVC CLI tool

Command line example:

1.3. Convert with Python API

1.3.1. Tensorflow 2 SavedModel / MetaGraph / Checkpoint

1.3.2. Simple Torch model

1.3.3. Convert from Torch Model with KV-Cache

2. Using OpenVINO to inference

About

Uh oh!

Releases

Packages

Languages

a3213105/OpenVINO_python_sample

Folders and files

Latest commit

History

Repository files navigation

1. Prepare OpenVINO models

1.1. Direct using model files

1.1.1 ONNX

1.1.2 PaddlePaddle

1.1.3 Tensorflow

1.1.4 TFLite

1.2. Convert with OVC CLI tool

Command line example:

1.3. Convert with Python API

1.3.1. Tensorflow 2 SavedModel / MetaGraph / Checkpoint

1.3.2. Simple Torch model

1.3.3. Convert from Torch Model with KV-Cache

2. Using OpenVINO to inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages