Google Compares Gemini to Claude: Differences in Security Protocols Emerge | Quick start guide to large language models pdf download | Hands-on large language models pdf free | Top 10 most popular large language models | Turtles AI
Using competing AI models for benchmarking raises ethical and legal questions. Google is evaluating Gemini against Anthropic’s Claude, noting significant differences in performance and security protocols.
Key Points:
- Google compares Gemini to Claude to gauge its responses.
- Claude demonstrates higher security standards than Gemini.
- Contractors report issues in testing Gemini on sensitive topics.
- It is unclear whether Google has obtained Anthropic’s approval.
As technology races to develop advanced AI models continue, Google has launched a series of internal tests to benchmark its Gemini model against Claude, developed by Google-invested company Anthropic. The tests are conducted by contractors who analyze the responses provided by the two systems based on criteria such as veracity, accuracy, and level of detail of the responses. These evaluators are given up to 30 minutes per request to analyze the responses in a methodical manner. However, the use of a competing model as a benchmark is raising questions about compliance with contractual regulations and the ethics of the process.
According to internal correspondence seen by TechCrunch, Google contractors noticed explicit references to Claude within the platform used to benchmark against Gemini. Some output submitted for evaluation was clearly labeled “I am Claude, created by Anthropic.” These details suggest that Google is using Anthropic’s model to evaluate its system, although the company has not clarified whether this use is done with the manufacturer’s permission.
Gemini and Claude’s performances show significant differences, particularly when it comes to security protocols. Claude tends to avoid responding to potentially risky requests, demonstrating a more stringent security setting than Gemini. In contrast, Gemini’s responses have been deemed problematic by contractors in some cases, including serious security violations for inappropriate content. For example, Claude refused to respond to a prompt deemed unsafe, while Gemini provided a response that included references to sensitive and inappropriate topics.
These assessments also highlight the operational challenges faced by contractors, who often find themselves having to analyze Gemini responses in areas outside their expertise, such as healthcare. This practice has raised internal concerns, as AI-generated inaccuracies on sensitive topics could have significant consequences.
Legally, Anthropic’s commercial terms of service specifically prohibit access to Claude to develop or improve competing models without explicit approval. Google spokeswoman Shira McNamara said that DeepMind, the division responsible for developing Gemini, does not use Anthropic models for training, although she acknowledged that comparisons with competing models’ outputs are common practice in the industry.
With AI development practices increasingly under scrutiny, leading companies like Google are coming under intense scrutiny for their methodologies. How tech giants conduct benchmarking and honor contractual agreements is a critical issue in ensuring transparency and fairness in the industry.
In an ever-changing landscape, methodological rigor and adherence to ethical and contractual standards are key to responsible AI development.