Face Detection with Landmarks and Emotion Recognition
APIs’ Capabilities
Implementation
Technical Challenges
- Accessing Your Webcam in HTML
Try it
- Running the Browser Examples
References

Face Detection with Landmarks and Emotion Recognition –> Try it!

This project demonstrates a comprehensive web-based face detection and emotion recognition system using JavaScript’s face-api.js library. The system runs entirely in the browser, providing real-time face analysis capabilities including detection, landmark identification, and emotion classification.

Key Features:

Face detection Gives a bounding box for every face detected.
Face landmarks recognition Gets the coordinates of the eyes, ears, cheeks, nose, and mouth of every face detected.
Emotion recognition Determine whether a person is happy - sad - angry - disgusted - fearful - surprised.
Track faces across video frames Get an identifier for each unique detected face. The identifier is consistent across invocations.
Process video frames in real time Face detection is performed on the device, and is fast enough to be used in real-time applications.

APIs’ capabilities

Face detection

The most accurate face detector is a SSD (Single Shot Multibox Detector), which is basically a CNN based on MobileNet V1, with some additional box prediction layers stacked on top of the network. Furthmore, face-api.js implements an optimized Tiny Face Detector, basically an even tinier version of Tiny Yolo v2 utilizing depthwise seperable convolutions instead of regular convolutions, which is a much faster, but slightly less accurate face detector compared to SSD MobileNet V1. The networks return the bounding boxes of each face, with their corresponding scores, e.g. the probability of each bounding box showing a face.

Screenshot 2022-10-22 at 15 41 23

Face Landmarks

For that purpose face-api.js implements a simple CNN, which returns the 68 point face landmarks of a given face image:

Screenshot 2022-10-22 at 15 36 29

Face expressions

The face expression recognition model is lightweight, fast and provides reasonable accuracy. The model has a size of roughly 310kb and it employs depthwise separable convolutions and densely connected blocks. It has been trained on a variety of images from publicly available datasets as well as images scraped from the web. Note, that wearing glasses might decrease the accuracy of the prediction results.

Screenshot 2022-10-22

Implementation

Including the Script

First of all, get the latest build from dist/face-api.js or the minifed version from dist/face-api.min.js and include the script:

<script src="face-api.js"></script>

Loading the Model Data

Depending on the requirements of your application you can specifically load the models you need, but to run a full end to end example we will need to load the face detection, face landmark and face recognition model. The model files can simply be provided as static assets in your web app or you can host them somewhere else and they can be loaded by specifying the route or url to the files. Let’s say you are providing them in a models directory along with your assets under public/models

faceapi.nets.tinyFaceDetector.loadFromUri('/models'),
faceapi.nets.faceLandmark68Net.loadFromUri('/models'),
faceapi.nets.faceExpressionNet.loadFromUri('/models')

Making predictions

The neural nets accept HTML image, canvas or video elements or tensors as inputs.

const detections = await faceapi.detectAllFaces(video, new faceapi.TinyFaceDetectorOptions()).withFaceLandmarks().withFaceExpressions()

Displaying Detection Results

Preparing the overlay canvas:

const canvas = faceapi.createCanvas(video)
    document.body.append(canvas)
    const displaySize = {width: video.width, height: video.height}
    faceapi.matchDimensions(canvas, displaySize)

face-api.js predefines some highlevel drawing functions, which you can utilize:

faceapi.draw.drawDetections(canvas, resizedDetections)
faceapi.draw.drawFaceLandmarks(canvas, resizedDetections)
faceapi.draw.drawFaceExpressions(canvas, resizedDetections)

Technical Challenges

Accessing Your Webcam in HTML

We can communicate with our webcam and access its video stream from our browser with just some JavaScript code. We only need a browser that supports the getUserMedia function.

Two components are essentials in getting data from our webcam displayed on our screen.

The HTML video element
The JavaScript getUserMedia function

The video element is pretty straightforward in what it does. It is responsible for taking the video stream from your webcam and actually displaying it on the screen.

<video id="video" width="720" height="560" autoplay="true"></video>

By setting the autoplay attribute to true, we ensure that our video starts to display automatically once we have our webcam video stream.

That is because we haven’t added the JavaScript that ties together our video element with your webcam. We’ll do that next! The getUserMedia function allows us to do three things:

Specify whether we want to get video data from the webcam, audio data from a microphone, or both.
If the user grants permission to access the webcam, specify a success function to call where you can process the webcam data further.
If the user does not grant permission to access the webcam or your webcam runs into some other kind of error, specify a error function to handle the error conditions.

function startVideo() {
    navigator.getMedia = navigator.getUserMedia ||
                         navigator.webkitGetUserMedia ||
                         navigator.mozGetUserMedia ||
                         navigator.msGetUserMedia;
    // Capture video
    navigator.getMedia({
        video: true,
        audio: false
    }, function(stream) {
        video.srcObject=stream;
        video.play();
    }, function(error){
        // an error occoured
    });
}

For what we are trying to do, we call the getUserMedia function and tell it to only retrieve the video from the webcam. Once we retrieve the video, we tell our success function to send the video data to our video element for display on our screen.

Some error handling:

This error is caused because the function createObjectURL no longer accepts media stream object for Google Chrome

I changed this:

video.src=vendorUrl.createObjectURL(stream);
video.play();

to this:

video.srcObject=stream;
video.play();

Try it

Live Demo

Experience the face detection system in action: Try it live!

Running the Browser Examples

To run the project locally:

Download this repository
Open the terminal and navigate to the downloaded repository
Run the command caddy file-server --browse --listen :8080
Browse to http://localhost:8080/

System Requirements:

Modern web browser with WebRTC support
Webcam access permissions
Stable internet connection for model loading

Performance Notes:

Initial model loading may take 10-15 seconds
Real-time processing at 15-25 FPS depending on hardware
Works best with good lighting conditions

Table of Contents