AI chatbots need to embrace linguistic diversity
ChatGPT may have taken the world by storm, but its application is being held back in countries like the UAE where Arabic is a leading language.
Large language models (LLM) — trained with vast datasets of linguistic patterns — have been a source of both contention and fascination since late 2022. Notably, the freely available platform ChatGPT has sparked discussions worldwide about how nations should implement and regulate artificial intelligence (AI).
In April 2023, the United Arab Emirate’s Artificial Intelligence, Digital Economy, and Remote Work Applications Office launched a guide outlining practical application scenarios for generative AI. “The UAE’s leadership has been proactive compared with other countries in taking strategic decisions for LLM and generative AI, as well as in promoting awareness regarding their advantages and potential applications,” says Yahya Zweiri, professor of aerospace engineering and director of the Advanced Research and Innovation Center at Khalifa University.
ChatGPT in UAE’s manufacturing industry
In the UAE, the scope of applications for this technology extends far beyond the realm of chatbots. In collaboration with Strata, a UAE-based company manufacturing critical parts for aircrafts, the centre at Khalifa University is developing AI agents that can interact with robots in the manufacturing sector.
“Using robotic and autonomous systems in manufacturing often requires experts to operate them. That has kept technicians and system operators from embracing these new technologies,” explains Zweiri. “We are trying to leverage ChatGPT to minimise the workload for skilled operators.”
ChatGPT in particular is being used to translate verbal instructions from human technicians into codes comprehensible to robots. These instructions could include commands to inspect an aircraft’s tail or to drill parts in certain dimensions. “This makes it much simpler to use robots in industrial contexts and broadens the user base for this technology,” says Zweiri.
Overcoming language barriers
A persistent challenge, however, is the limitation of major LLMs to work in the Arabic language. “Large language models mostly learn from information on the internet. Most web content accessed through search engines is thought to be in English, so LLMs would also be best trained in English,” Zweiri explains. “Arabic itself is a diverse language with numerous dialects, and Arabic relies more on pronunciation than on spelling. This makes it hard to train large language models in Arabic,” he says.
The issue was highlighted by Mohamed Seghier, a biomedical engineering professor at Khalifa’s Healthcare Engineering Innovation Center, in a letter to Nature. When he asked ChatGPT questions about a well-known neuroscience phenomenon in both English and Arabic, “The English version was excellent, but nothing made sense in the Arabic one,” Seghier says.
Arabic ranks as the sixth most spoken language in the world. “When these tools give inaccurate answers in a particular language, the speakers of this language will not have access to the same amount or quality of knowledge as other people, perpetuating inequality,” Seghier says.
An enhanced fluency and intelligibility in Arabic, however, could be particularly helpful in contexts such as healthcare, he explains. “Stroke patients take speech rehabilitation sessions with therapists when they have lost the ability to speak. But we have a lack of therapists who can do this in Arabic,” he says. “If we can develop chatbots powered by tools such as ChatGPT that can offer this type of treatment, that could be very meaningful for quality of life. We could also build standardised tests for phasic patients who have severe communication difficulties.”
Zweiri highlights the emergence of new LLMs in Arabic such as Jais, trained on 116 billion Arabic tokens and 279 billion English tokens of data. “This could be the first potential Arabic language model with good reference,” he says. “Ongoing research and collaborations are expanding the horizons of what these technologies can achieve, and addressing linguistic diversity remains an intriguing challenge for the future.”