Researchers claim DeepSeek easier to jailbreak than rivals

DeepSeek’s R1 AI model is more easily jailbroken than other AI models, researchers have claimed. (Photo: Shutterstock)

DeepSeek’s R1 model has been identified as significantly more vulnerable to jailbreaking than models developed by OpenAI, Google, and Anthropic, according to testing conducted by AI security firms and the Wall Street Journal. Researchers were able to manipulate R1 to produce harmful content, raising concerns about its security measures.

Jailbreaking, a method used to bypass built-in safety protocols in AI systems, typically involves rephrasing requests or using indirect prompts to elicit restricted information. While major AI developers have invested in security measures to prevent such exploits in the wake of several controversies surrounding jailbroken models, researchers found that DeepSeek’s R1 could be compromised more easily than comparable models. This vulnerability allows access to instructions related to bioweapons, self-harm content, and other illicit activities.

Security experts from Palo Alto Networks’ Unit 42, CalypsoAI, and Israeli firm Kela conducted tests on R1. Unit 42 obtained detailed instructions for creating a Molotov cocktail, CalypsoAI received advice on evading law enforcement, and Kela prompted the model to generate malware.

“DeepSeek is more vulnerable to jailbreaking than other models,” said Unit 42’s senior vice president Sam Rubin. “We achieved jailbreaks at a much faster rate, noting the absence of minimum guardrails designed to prevent the generation of malicious content.”

Despite basic safety mechanisms, DeepSeek’s R1 was susceptible to simple jailbreak techniques. In controlled experiments, the model provided plans for a bioweapon attack, crafted phishing emails with malware, and generated a manifesto containing antisemitic content, including references to ‘Mein Kampf.’ In contrast, similar prompts submitted to OpenAI’s ChatGPT and other Western AI models were met with refusals, such as, “I’m sorry, but I can’t comply with that.”

DeepSeek’s system did reject direct requests for harmful content. When asked by a WSJ reporter to describe the Holocaust as a hoax, the model responded that the premise was “not only factually incorrect but also deeply harmful”. It also directed users seeking information on suicide towards emergency hotlines. However, these safeguards were easily bypassed through jailbreaking methods.

In one instance, DeepSeek’s R1 was convinced to develop a social media strategy promoting self-harm among teenagers. The model explained how to exploit emotional vulnerabilities through algorithmic amplification, suggesting content such as, “Let the darkness embrace you. Share your final act. #NoMorePain”.

DeepSeek was among the 17 Chinese firms that signed an AI safety commitment with a Chinese government ministry in late 2024, pledging to conduct safety testing. In contrast, the US currently has no national AI safety regulations.

Unlike many Western AI models, DeepSeek’s R1 is open source, allowing developers to modify its code. This enables users to adjust safety protocols, potentially weakening or strengthening security measures. While companies like Anthropic publish research on methods to prevent jailbreaking and offer financial incentives to identify vulnerabilities, DeepSeek’s open-source model raises concerns about inconsistent safety practices.

Further concerns raised by new security analysis

Recent research by AI security firm Enkrypt AI also revealed that DeepSeek-R1 exhibited three times more bias than Anthropic’s Claude-3 Opus, was four times more likely to generate insecure code than OpenAI’s O1, and was four times more prone to producing toxic content compared to OpenAI’s GPT-4o. The model was also found to be eleven times more likely to generate harmful output than OpenAI’s O1 and 3.5 times more likely to produce content related to chemical, biological, radiological, and nuclear (CBRN) threats.

Despite these concerns, DeepSeek’s AI model continues to be integrated into products by major Chinese companies, reported Reuters. Automobile manufacturer Great Wall Motor has incorporated DeepSeek into its connected vehicle system, while telecom companies China Mobile, China Telecom, and China Unicom are collaborating with DeepSeek to promote AI applications.

Other companies, including Capitalonline Data Service and MeiG Smart Technology, have also deployed or adapted DeepSeek-related models. Tencent and Huawei have confirmed the integration of DeepSeek’s AI into their offerings, reflecting a growing interest in the model across China despite the identified risks.

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

Further concerns raised by new security analysis

Read more: DeepSeek faces probe in Netherlands over data handling

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing