More freedom in front of the (web) camera with Speaker Tracking

Speaker Tracking enables the automated focusing and tracking of people by a camera. It has long been used in many different areas. A classic example is personal surveillance in security technology. Speaker Tracking is also used in video conferences, presentations and lectures. In the education sector, teachers can make use of the function during the transmission and recording of hybrid teaching formats such as lectures and seminars in order to improve the quality of their digital teaching.

The basic principle of Speaker Tracking

The basic principle of the application is simple. The system uses sensors or software to detect speakers, localize them in real time in the room and focus on them in the camera’s frame.

This can work using purely optical signals or by combining audio and image sources.
Some systems use cameras that can mark objects or a certain area in the field of view and consequently monitor movements. Striking patterns, shapes or colors can be useful for this purpose. When multiple cameras interact, the position of the object is accurately determined and the image is adjusted accordingly when the object moves. Other systems work with special infrared transmitters that the speakers wear on their bodies to be identified by the cameras. With such optical methods, a good view of the object must be maintained at all times for tracking to work.

Also available are systems that combine cameras with a microphone system. The cameras work with body and face recognition and, for example, recognize all the people in a meeting room. Using what is known as autoframing, the image is cropped so that everyone present can be easily seen within the manufacturer’s specifications. When a person begins to speak, he or she is detected by the distance to the audio sensors or the microphone and their position is passed on to the camera. This person is now highlighted in the image.

PTZ cameras are required to adjust the image section. These have automated pan, tilt and zoom functions. Depending on the position, the camera pans, tilts or zooms to capture the speaker. This happens through physical movement of the camera or digitally via image sensors. In a multi-camera setup, the image automatically changes to a better perspective when another person in the room is doing the talking. Microphones must be distributed appropriately in larger rooms to ensure consistent sound quality.

Additional functions also allow several speakers or the entire room to be displayed in the image at the same time. Speaker tracking systems have interfaces to common video management and conference platforms and can be used integrated there.

Improved experience with expert design

With expert design, speaker tracking helps make it easier to create video for live streaming and recording, and minimizes user effort. This allows speakers to move freely around the room without having to operate technology on the side or worry about whether they can be seen well in the picture. No other person is needed to manually operate the camera. In the case of recordings for later use, the necessary editing effort can also be minimized with appropriate planning.

During a presentation in a meeting, a lecture or a class, the person or teacher can concentrate on the content. The movement allows them to make the presentation more lively and, for example, to better integrate analog elements such as blackboards. The greater naturalness of the presentation and the focus on the speaker can increase the attention of the remote participants. If several speakers are present, tracking leads to a better overview on the screen. This increases the quality of the user experience and the overall video recording.

If you want to use speaker tracking for your meetings or events, please contact us. We will be happy to advise you on all applications and topics related to AV technology, plan the implementation in your premises and help you organize your operations.

Author: Felix Niedrich, Editor macom GmbH