-
Notifications
You must be signed in to change notification settings - Fork 828
Description
Hello,
I’m currently evaluating ExecuTorch (Tag V1.0.0 export & Runner) vs TFLite on real hardware (not FVP).
The goal is purely technology validation for industrial use cases, before selecting a runtime for deployment on microcontroller-class devices.
Target: Cortex-M55 + Ethos-U55
Model: MobileNetV2 (int8, Ethos-U compatible)
For ExecuTorch: from torchvision.models import mobilenet_v2
For TFLite: tf.keras.applications.MobileNetV2
The same model architecture and equivalent quantization were used for both runtimes (both with INT8 input/output).
ExecuTorch
PTE size: 3.5 MB (3 512 288 bytes)
Memory:
- Planned (with Input / Output):
256 KiB - Runtime / temporary:
1.6 MiB - Method:
30 KiB
Total :
256 + 1600 + 30 = 1886 KiB
≈ 1.84 MiB
After run:
Runtime (used): 1.44 MiB (1474 KiB)
Method (used): 0.24 KiB
Total real:
1464 + 0.24 + 256 = 1720 KiB
= 1.679 MiB
Inference time:
- One image: 113 ms
- 100 images: 113.748 ms
Timing only on:
g_method->execute();
TensorFlow Lite
Vela TFLite size: 3.7 MB (3 684 976 bytes)
Arena memory before running (with inputs/outputs): 1.7 MiB
Arena used (after run): 1.51 MiB
Inference time:
- One image: 91.221 ms
- 100 images: 91.204 ms
Timing only on:
g_interpreter->Invoke();
As I understand it, when comparing memory usage, the TFLite arena is roughly equivalent to Planned + Runtime.
Do these values seem correct to you?
Based on these measurements, TFLite currently shows lower latency and memory usage than ExecuTorch, which was unexpected on our side and would be interesting to understand.