This is a ready-to-run reference stack for robust, real-time face/head pose under occlusion:
- MediaPipe Face Mesh → dense landmarks (468)
- Occluder mask via RVM portrait matting (alpha) + quick skin mask inside face hull
- Masked PnP (RANSAC) for full 6‑DoF pose (R + t), ignoring occluded points
- 6DRepNet fallback (rotation-only) for when landmarks are unreliable
- Kalman smoothing (per angle)
- Optional: SynergyNet ONNX wrapper for 3DMM personalization (rotation + shape/expr), to run offline/occasionally
Quickstart: python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt python app/run.py --source app/demo_data/sample.mp4 --mask-source rvm --show-mask
python app/run.py --source webcam --mask-source rvm
Flags: --source {webcam|/path/to/video} --mask-source {skin|rvm|none} --fov-deg 60.0 --show-mask --use-6drepnet --use-3dmm --synergy-onnx /path/to/synergy.onnx --save-out out.mp4
Datasets (optional): BIWI Kinect Head Pose: app/datasets/get_biwi_sample.py LaPa or CelebAMask-HQ: app/datasets/get_lapa_sample.py
Notes:
- RVM is loaded via torch.hub (mobilenetv3). Internet needed on first run; cached after.
- 6DRepNet returns (pitch, yaw, roll). We convert to (yaw, pitch, roll) for consistency.
- Metric translation (x,y,z) comes from PnP or depth; 6DRepNet is rotation-only.