A team of engineers, scientists, and a semiconductor manufacturer from Silicon Valley worked together to publish sophisticated Arabic language software that can power applications for generative AI.
With 13 billion parameters, the new massive language model known as Jais was created from a large collection of data mixing Arabic and English, some of which came via computer code. There aren’t many large bilingual language models, according to the group of academics and engineers who started the research.
Supercomputers built by Silicon Valley-based Cerebras Systems, which makes chips the size of dinner plates that compete with Nvidia’s potent AI hardware, were used to develop the new language model. Because Nvidia’s processors are in short supply, businesses all around the world are looking for alternatives.
Jais, which takes its name from the highest mountain in the United Arab Emirates, is the result of a partnership between Cerebras, the Mohamed bin Zayed University of Artificial Intelligence, and the AI-focused subsidiary Inception of the G42 technology conglomerate, based in Abu Dhabi.
According to Timothy Baldwin, a professor at Mohamed bin Zayed University of Artificial Intelligence, there isn’t enough Arabic data to train a model the size of Jais, thus the computer code found in the English language data helped train the model’s reasoning capabilities.
Because it outlines the (logical) stages, the code “gives the model a big leg up in terms of reasoning abilities,” Baldwin told Reuters. An open source license will be used to make Jais accessible.
On a Condor Galaxy supercomputer owned by Cerebras, the group developed the Jais model. The first of these units is expected to arrive this year, and the last two will be delivered in 2024, according to a recent announcement from Cerebras that it had sold three of them to G42.