The Open Source Initiative (OSI) has published its official definition of “open” artificial intelligence (AI), which could lead to conflicts with major technology firms such as Meta, whose models do not meet these new criteria.
According to the OSI, for an AI system to be classified as genuinely open source, it must provide detailed information about the training data used, to the extent that the information provided should enable others to replicate the AI. Additionally, it must include the complete source code for building and operating the AI. Access to the settings and weights from training is also necessary for reproducing any results initially outlined in the model’s release paper.
This definition challenges Meta’s Llama, marketed as the largest open source AI model. While Llama is publicly available, it restricts commercial use for applications exceeding 700 million users and does not disclose training data, thus failing to comply with OSI’s criteria for unrestricted use, modification, and sharing.
Open source AI definitions multiplying
OSI’s definition of open-source software has been widely accepted by developers looking to build on each other’s work without legal concerns for 25 years. The Linux Foundation has also attempted to define “open-source AI” in recent months, reflecting ongoing discussions within the open-source community about adapting its values to the plethora of AI model types.
Stefano Maffulli, OSI’s executive director, explained to The Verge that refining this definition involved consultations with global experts over two years, including collaboration with academics in machine learning, philosophers, and content creators from the Creative Commons sector.
While Meta cites safety concerns for restricting access to training data, critics argue that the company aims to minimise legal liability and protect its competitive advantage. Reports indicate that many AI models may be trained on copyrighted material.
It has been noted that Meta acknowledged the existence of copyrighted content in its training data. Multiple lawsuits have been filed against Meta, OpenAI, Perplexity, Anthropic, and other firms for alleged copyright infringement. In most cases, plaintiffs must rely on circumstantial evidence to prove that their material has been improperly sourced.