AI for Good blog

To share or not to share: the dilemma of open source vs. proprietary large language models 

AI Governance

by Haythem Abdelkefi

Discover the full AI Governance Day 2024 Report – From Principles to Implementation here.

 Panelists: 

  • Jim Zemlin, Executive Director of the Linux Foundation 
  • Chris Albon, Director of Machine Learning at the Wikimedia Foundation 

Moderator: 

  • Bilel Jamoussi, Deputy to the Director and Chief of Telecommunication Standardization Policy Department at the International Telecommunication Union (ITU) 

The open source philosophy 

Mr. Jim Zemlin, representing the Linux Foundation, emphasized the foundational role of open source in modern technology.  

“Open source has been a fundamental building block for all modern technology systems.” (Jim Zemlin)

Jim Zemlin, Executive Director of the Linux Foundation

He highlighted that 80% to 90% of the code in any modern computing system is open source (Linux Foundation and Harvard’s Census II Study, Boston Consulting Group (BCG) Report,PhoenixNAP’s Software Composition Analysis). Mr. Jim Zemlin pointed out that large language models would not exist without open source tools like PyTorch and other components.  

However, Jim Zemlin acknowledged the challenges, particularly market consolidation. He proposed the need for standards to define what constitutes an open large language model.  

Meta’s approach to open source 

Meta, a company known for its dual contributions to both open source and proprietary AI, was represented by Ms. Melinda Claybaugh. She underscored Meta’s commitment to open source while recognizing the need for a nuanced approach.  

“What we really want to convey is that this is not binary […] there’s actually a real spectrum.” (Melinda Claybaugh) 

  

Melinda Claybaugh, Director of Privacy Policy at Meta

Meta’s approach includes releasing model weights while keeping training data proprietary. Ms. Claybaugh emphasized Meta’s commitment to responsible open sourcing, including rigorous testing and the release of responsible user guides for developers.  

“For us, a responsible open approach is all the kind of testing that are done from the data collection stage, filtering data, doing risk assessments and mitigations along the way.” (Melinda Claybaugh) 

Ethical considerations 

From an ethical standpoint, Isabella Hampton from the Future of Life Institute discussed the implications of keeping LLMs proprietary versus open source. She argued that open source should be viewed as a means to an end, not the end itself.  

“Open source is a tool that we can leverage to accomplish our goals.” (Isabella Hampton) 

Ms. Hampton highlighted the importance of maintaining a focus on transparency, competition, and safety in the development of these models. 

   

Isabella Hampton, Policy Researcher at the Future of Life Institute

Google’s view 

Ms. Melike Yetken Krilla of Google recognized the benefits and risks associated with open source models. She shared Google’s history of open source contributions, such as the Transformer architecture and the AlphaFold protein structure prediction.  

“There is a balance needed in the regulatory action between embracing and allowing some of this innovation while ensuring competition and doing so together” (Melike Yetken Krilla) 

Ms. Krilla advocated for a thoughtful and gradual approach to releasing models, with safety testing and commitments to avoid harm.  

“First, we’re looking at safety testing in advance […] and then identifying how and at what level to release.” (Melike Yetken Krilla) 

  

Melike Yetken Krilla, Head of International Organizations at Google 

Wikipedia’s open content model 

Mr. Chris Albon from the Wikimedia Foundation highlighted the role of open content in broadening access to knowledge. He underscored the importance of transparency and the community-driven model of Wikipedia.  

“Wikipedia is one of the best things the internet ever created, a huge pool of information created by humans” (Chris Albon) 

Mr. Albon noted that the integration of open source models into platforms like Wikipedia enhances the value proposition by providing tools for better content moderation and accuracy. However, he stressed the need for credit to the original sources.  

  

Chris Albon, Director of Machine Learning at the Wikimedia Foundation

Governance and regulation 

In addressing the potential for both positive and negative impacts of open source LLMs, the panelists discussed necessary governance frameworks and policies. Jim Zemlin emphasized the importance of placing the regulatory burden on those best equipped to handle it.

 “Put the regulatory burden on those who are most equipped to handle it Upstream.” (Jim Zemlin) 

Melinda Claybaugh called for a nuanced approach that recognizes the reality of the open source ecosystem.  

 “I think we really need to avoid a kind of blanket approach to regulation.” (Melinda Claybaugh) 

Ms. Isabella Hampton expressed optimism about initiatives like the National AI Research Resource (NAIRR), which aims to provide resources for safety research. 

Krilla highlighted the importance of collaboration on standardization, involving governments, civil society, and businesses.  

 “We’re thinking very thoughtfully about how we are looking at releasing these models and to whom.” (Isabella Hampton) 

A consensus was reached on the need for a balanced and nuanced approach to the open source versus proprietary debate. The leaders emphasized that open source and proprietary models each have their place, depending on the specific context and goals. The discussion underscored the critical role of open source in fostering innovation, ensuring transparency, and preventing market consolidation, while also recognizing the need for responsible governance and collaboration to address potential risks.  

 

Watch the full session here:

Are you sure you want to remove this speaker?