Now accepting applications for Fast Company Middle East’s Most Innovative Companies. Click here to apply.
Bringing the advantages of generative AI across the Arabic-speaking world, Inception, a G42 company based in Abu Dhabi, released an open-source Arabic large language model, Jais – a name inspired by UAE’s highest peak.
It is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset.
The model is a collaboration between Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and Cerebras Systems. It was trained on Condor Galaxy; the recently announced multi-exaFLOP AI supercomputer built by G42 and Cerebras.
By open-sourcing Jais, Inception aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem.
“With this release, we are setting a new standard for AI advancement in the Middle East and ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape,” says Andrew Jackson, CEO of Inception.
The large dataset can be broken down into 116 billion Arabic tokens designed to capture the complexity and nuance of the language. It also included 279 billion English word tokens to increase the model’s performance through cross-language transfer.
In tandem with the model’s release, Inception and MBZUAI also established an academic partnership to provide early access to current and future Arabic LLMs developed by the team for testing purposes.
MBZUAI President and University Professor Eric Xing said developing such a high-caliber Arabic LLM demanded cutting-edge AI research in addition to an in-depth and nuanced understanding of the Arabic language, its diversity and heritage, and the growing importance of LLMs across all echelons of society.