Holistic landmarks
21 hand × 2, 33 body, and 468 face points per frame from MediaPipe Holistic, fused into a single feature record.
BridgeTalk fuses hand tracking, body pose, and facial landmarks to recognize location-anchored signs — touching the chin for thank you, the chest for me, the ear for hear. Real signing is more than gestures in the air; we read where the hand meets the body.
21 hand × 2, 33 body, and 468 face points per frame from MediaPipe Holistic, fused into a single feature record.
Chin, forehead, mouth, nose, ears, cheeks, temples, chest, shoulders, neutral space, lap. Sized to each person's face width.
Proximity with dwell tracking. Distinguishes a sustained touch from an accidental pass-through.
Location, motion, and shape signs combined: thank you, eat, hear, think, sorry, me, you, plus numbers and fingerspelling fallback.
Auto-spaced output with capitalization, punctuation, undo and clear. Read aloud with the device's best voice. History saved locally.
No server uploads, no account. Holistic runs locally; only the optional alphabet ML uses your own machine's Python backend.
Settings panel lets you adjust confidence, hold-time, cooldown, overlay layers, TTS speed, and audio feedback live.
The alphabet model is a small RandomForest. Train it on real keypoint data with one script and drop the pickle into models/.
This is a real recognition engine, not a research demo of full sign-language translation. There is no pretrained, browser-ready model in 2026 that recognizes general ASL or PSL conversation — the state of the art needs server-side GPUs and still tops out around 60–70% on a few thousand isolated signs.
What you get here: ~90 signs that work reliably because their multimodal signature (handshape × location × motion) is unambiguous. The recognizer is built so a trained model can be plugged in later via a one-line hook. See the README for the roadmap.