We consider a problem of gaze prediction using a simple webcam for use with an application that screens for dyslexia. The specific use case enables us to define additional boundaries that simplify our problem. On a higher level the problem is composed of two sub problems. The first one requires us to detect the users face, find its landmarks and in the end locate the eye center. We solve this as a computer vision problem. The second sub problem is mapping the features to a point on the screen. For this we used a deep neural network. For optimal results we preform experiments on how to most efficiently calibrate. In the best test case we were able to achieve an average of 3,37 cm prediction error. This puts our approach at most conditionally suitable for use with the target application.
|