Speech recognition is unreliable in noisy places, compromises privacy and security when around strangers, and inaccessible to people with speech disorders. Lip reading can mitigate many of these challenges but the existing silent speech recognizers for lip reading are error prone. Developing new recognizers and acquiring new datasets is impractical for many since it requires enormous amount of time, effort, and other resources. To address these, first, we develop LipType, an optimized version of LipNet for improved speed and accuracy. We then develop an independent repair model that processes video input for poor lighting conditions, when applicable, and corrects potential errors in output for increased accuracy. We tested this model with both LipType and other speech and silent speech recognizers to demonstrate its effectiveness.
https://doi.org/10.1145/3411764.3445565
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2021.acm.org/)