AI that sees and speaks
Naoufel Werghi, Professor of Computer Science at the Centre for Cyber-Physical Systems, Khalifa University.

Naoufel Werghi aims not only to replicate the human visual system, but to extend its capabilities so that machines can perceive patterns invisible to the eye and process information at scales beyond human capacity.
Tell us about your journey.
I started my PhD working on robots that can ‘see’—machines capable of sensing the environment, analyzing images and making decisions. As I delved deeper, I realized that I was grappling with the same fundamental problems that once preoccupied David Marr, a visionary neuroscientist and the founder of modern computer vision. He believed that for robots to see, we first needed computers to analyze images and understand the context.
When I joined Khalifa University in 2010, just three years after it was established, I saw in it the potential for a great academic career and professional growth, and I wasn’t wrong.
What themes are driving your research right now?
There are three main areas. First, we’re training AI to detect threats in luggage—teaching our algorithms to spot contraband in X-ray images. Second, we’re working on medical imaging, with a focus on spotting small, subtle patterns of disease in its early stages. Finally, we have partnered with the Abu Dhabi National Oil Company to monitor flaring at oil and chemical plants, by tracking incomplete combustion to detect harmful pollutants, such as toxic gases and methane.
How are you taking your research into the real world?
I recently launched a startup called IBSAR Technologies, born out of our work in luggage detection. In our lab, we have the same scanners found in airports, so we can run very realistic simulations. We’ve designed a system that can recognize up to five different threat items with 95% accuracy. To do this, we had to think like smugglers to see how they might hide illegal items. We’ve also built a data set comprising more than 45,000 X-ray images covering 22 items.
“Maybe now we’re only discovering a percent or two of the many things that the brain will inspire us to create in the future.”
Naoufel Werghi
The Khalifa Innovation Centre has offered tremendous support in launching the startup from the very beginning. They did a fantastic job.
What are the most exciting advances in your field?
The emergence of vision-language models stands out. These are trained on both images and text, and combine the different types of information in their analysis. For example, you could give them a medical image and ask them to describe what’s going on. Being able to associate image and text also means machines are better at explaining their thinking. That’s a groundbreaking shift that moves us closer to true semantic understanding, and a significant step toward explainable AI.
Interestingly, the way these models associate image and text were also inspired by the brain, just like Marr’s first model was. Maybe now we’re only discovering a percent or two of the many things that the brain will inspire us to create in the future.
