publive-image

Exclusive: AI models from Meta, OpenAI, and others fall short of compliance with EU regulations, and face fines of $38 million

A new tool designed to audit compliance with the European Union's upcoming AI Act shows that highly popular artificial intelligence models fall far short in performance. The AI checker was developed by Swiss startup LatticeFlow AI and its research partners from ETH Zurich and Bulgaria's INSAIT and tested the generative AI models offered by leading tech firms such as OpenAI, Meta and Alibaba.

Results indicate that, while there were some top-scoring AI models, they are not yet perfect and face several challenge areas, including cybersecurity resilience and the risk of discriminatory outputs. Debates about new AI regulation started long before the launch of OpenAI's ChatGPT in 2022. In a matter of just a few months with ChatGPT, lawmakers moved to create special rules for "general-purpose" AIs (GPAI). The EU AI Act, due to be rolled out in the coming two years, has already gained much interest from the tech world.

LatticeFlow AI, a Switzerland-based startup, in cooperation with ETH Zurich and INSAIT in Bulgaria, developed an architecture to verify AI models. Using the Large Language Model (LLM) Checker, they assign scores that range from 0 to 1 across a variety of categories. Technical robustness and safety are among them. Published on October 14, the leaderboard showed that models coming from Alibaba, Anthropic, OpenAI, Meta, and Mistral scored 0.75 or higher.

But the test also discovered a few of the critical blind spots. OpenAI's "GPT-3.5 Turbo" scored only 0.46 in a test for biased output. In the same group, Alibaba's "Qwen1.5 72B Chat" did even worse with a score of 0.37. In a test for "prompt hijacking", a form of cyber attack. For "Llama 2 13B Chat" Meta scored 0.42, while Mistral's "8x7B Instruct" scored 0.38. Despite all these drawbacks, the "Claude 3 Opus" came out on top in Anthropic's. It won with a quotient average of 0.89.

Regarding the general results of the statement, LatticeFlow CEO and cofounder Petar Tsankov said, “The general results are good, but some areas demonstrate much-needed improvement.” He added, “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models.”

He further said, “With greater focus on optimising for compliance, we believe model providers can be well-prepared to meet regulatory requirements.”

LatticeFlow's LLM Checker is free to test models for companies. Enforcement measures are still being polished and refined, but the test already shows early signs of where tech companies will fall short. Non-compliance with the AI Act may then lead to fines of up to 35 million euros or 7% of worldwide turnover. According to the wire agency, the European Commission has welcomed this new tool as a "first step" toward putting in place the technical requirements of the AI Act.