New OSI regulations redefine the AI Open Source | Meta login | WhatsApp download | Meta AI APK | Turtles AI
The Open Source Initiative has updated the definition of open source AI, mandating transparency on training data. This new standard could challenge the openness of models such as Meta’s Llama, which currently do not meet the new guidelines.
Key points:
- The OSI requires access to training data to consider an AI truly open source.
- The definition challenges models like Meta’s Llama, with restrictions on commercial use and lack of transparency about the data.
- Meta argues that there is no single definition of open source AI and emphasizes the complexity of the issue.
- Concerns about copyright and data protection could influence the choices of large technology companies.
The Open Source Initiative (OSI) recently drafted a new definition of open source AI, establishing standards that require greater transparency with respect to the data used to train models. This move, which implies that an AI system must provide access not only to the source code but also to the details of the training data and settings used, represents a significant step in the current technology landscape. Until now, models such as Meta’s Llama were considered examples of open source AI, but with the new OSI guidelines, such claims may be called into question. Llama, although accessible for download and use, has commercial limitations and does not disclose information about training data, making it inadequate with respect to the criteria defined by OSI for true openness.
Meta responded to these new regulations, stating that there is no single definition of open source AI and that defining this category is complex, given the evolving nature of AI models. According to Meta spokeswoman Faith Eischen, the company shares some positions with the OSI, but differences remain clear, especially when it comes to making artificial intelligence more accessible without compromising security or competitive advantage.
In recent years, the OSI has seen a growing interest in the topic of open source in relation to AI. It is not just Meta that is in this situation: other companies such as OpenAI and Anthropic have found themselves embroiled in litigation over alleged copyright abuses related to the data used to train their models. With pressure mounting from experts and open source advocates, the technology community is beginning to demand greater accountability. Simon Willison, an independent researcher, pointed out that a clear definition of open source AI could help combat “open washing,” a practice in which companies declare themselves open source without really adhering to openness principles.
Clément Delangue, CEO of Hugging Face, praised the OSI definition, considering it an important contribution to the debate on openness in AI and the crucial role of training data. Stefano Maffulli, executive director of OSI, also pointed out that the definition was developed through a two-year collaborative process, involving experts from various fields, from philosophy to data science.
While Meta expresses security concerns about making its training data public, critics say the real motivation may be a desire to reduce legal liability and preserve its competitive advantage. It is estimated that many of the AI models currently in use were trained on copyrighted content, and litigation is already emerging in this context. Maffulli compared the current situation to that experienced by Microsoft in the 1990s, when the company saw open source as a threat to its business model. History seems to be repeating itself, with tech giants justifying the need to keep technology under lock and key for economic reasons and complexity.
Transparency and openness become important issues in the AI debate, and the future of open technologies may depend on the choices big companies decide to make.