Trees swaying in the wind, camouflage, and shadows can all make it difficult for computer vision to track moving objects. ©NuminousDharma/ iStock / Getty Images Plus

Teaching machines to see in motion 


An algorithm that uses space and time patterns can spot movement more accurately, even in videos with busy or changing backgrounds.

From self-driving cars to sports broadcasting, from surveillance drones to video conferencing, computer vision increasingly animates our world by detecting, tracking or interacting with moving objects.  

But things can go awry when computer vision is faced with real world situations. A moving person in camouflage or among swaying trees, under flickering lights or in shadows; these can throw computer vision off. And when speed matters, for example in real-time data processing, current approaches often fall short. 

That’s where Sajid Javed and his team at Khalifa University’s Department of Computer Science come in. They’ve conceived an algorithm that links pixels across space and time in the video analysis.  

Computer vision hinges on accurately isolating moving objects from a static background in a process called background subtraction. Existing approaches use robust principal component analysis (RPCA) or the advanced tensor-based variant, TRPCA. They mathematically separate video data into two parts: static background and moving objects.  

“[The algorithm] targets the known gap—unstructured sparsity—by enforcing graph-consistent structure exactly where segmentation quality is decided.” 

Sajid Javed 

In RPCA, each video frame is treated as a flat 2D image, losing some of the space and time-related details in the process. TRPCA goes further by processing video data as a multidimensional block of data, called a tensor, that includes height, width, time and color. This gives a more accurate separation of background and moving objects, improving background subtraction.  

However, both algorithms have a blind spot: they treat moving objects as scattered pixels. When the scene is complicated—for example, with lighting changes or cluttered backgrounds—the algorithms break the moving objects into pieces instead of recognizing them as coherent shapes. 

The KU team’s algorithm solves these shortcomings by incorporating space and time data into the analysis of the moving objects in the TRPCA model. “It targets the known gap—unstructured sparsity—by enforcing graph-consistent structure exactly where segmentation quality is decided,” Javed says. 

The researchers used mathematical models to link pixels spatially within each frame and temporally across frames. By treating moving objects as whole shapes rather than moving dots, their movement looked smooth and consistent over time.  

The researchers tested their methods on six publicly available video datasets. They used two approaches: batch processing, which analyzes the whole video at once, and online processing, which handles one frame at a time. The online technique worked well in real-time processing, which is a must for self-driving cars. What’s more, their method gave better results than 18 other leading techniques on all datasets tested.  

The researchers are working on enhancing the learning abilities of their algorithm to better track scene changes, developing systems that integrate supervised and unsupervised learning, and adapting the system to work with moving cameras.  

Reference

Alawode, B. & Javed, S. Learning spatial-temporal regularized tensor sparse RPCA for background subtraction. IEEE Trans. Neural Networks and Learning Systems 36, 11034–11048, 2025. | Article 

Related articles