Overview
In less than a fortnight following the launch of its open-source AI model, China's DeepSeek has become a hot topic in AI circles. Despite appearing to outperform its US competitors in areas such as mathematics and logical reasoning, the company has raised eyebrows due to persistent censorship trends in its responses. Nonetheless, understanding the technicalities of this censorship can provide significant insights.
Testing DeepSeek
WIRED decided to delve deeper into how censorship operates in DeepSeek, for which it ran tests on DeepSeek-R1. The model was tested not just on DeepSeek's app, but also on a version on a third-party AI platform named Together AI, as well as a version on a WIRED computer using the tool Ollama.
Despite simple censorship being avoidable by not using DeepSeek's app, it was discovered that other forms of bias are embedded in the training process of the model. Though these can be removed, the process proves to be complex.
Implications
The findings could have far-reaching implications for DeepSeek and the landscape of Chinese AI companies. If censorship filters in large language models can be removed easily, it could enhance the popularity of open-source LLMs from China. Conversely, if filters prove challenging to bypass, these models may become less competitive globally. DeepSeek remained silent on WIRED's request for a comment.
Restrictions on Chinese-made LLMs
In China, LLMs are subject to strict information controls, similar to social media and search engines. A law from 2023 prohibits AI models from generating content that could 'damage the unity of the country and social harmony', effectively making censorship mandatory.
Adina Yakefu, a researcher studying Chinese AI models at Hugging Face, says that DeepSeek initially aligned its model with local users' needs while complying with Chinese regulations. This is essential for acceptance in a heavily regulated market, she adds.
Strategising Around Censorship
Despite censorship limitations, DeepSeek's open-source status provides ways to bypass the censorship mechanism. Users can download the model and run it locally, keeping all data and response generation on their computer. Alternatively, they can rent cloud servers outside China from companies such as Amazon and Microsoft, although this approach is more complicated and expensive.
Built-in Bias
Apart from refusing to answer certain questions, DeepSeek's model can also show signs of censorship in other ways. The most common is the generation of short responses that align with the Chinese government's stance. This leads to the larger issue of AI bias, which can occur during both pre-training and post-training. However, DeepSeek being open-source means that post-training bias can potentially be adjusted.
The Future of AI
The possibility of 'uncensoring' a Chinese model could cause difficulties for businesses like DeepSeek in their home country. However, the Chinese government seems to be giving open-source AI labs some flexibility, according to Matt Sheehan, a fellow at the Carnegie Endowment for International Peace.
Furthermore, despite the prevalent concern about censorship, it does not deter enterprise users from adopting DeepSeek's models. 'Sensitive topics that only matter in the Chinese context are completely irrelevant when your goal is to help your company code better or to do math problems better', says Xu, an investor and founder of the newsletter Interconnected.