Digital Economy

What we know about Hunyuan Turbo S, Tencent's artificial intelligence model

From early tests (benchmarks), Turbo S appears to be a significant advance in this field

4 April 2025

TENCENT HOLDINGS SOCIETA' DI INVESTIMENTO INVESTIMENTI SEDE

3' min read

Fast and cheap. These are the trump cards that the new Chinese AI model, Hunyuan Turbo S, from Tencent, is aiming for. From early tests (benchmarks), the Turbo S appears to be a significant advance in this field and confirms the strong innovative ferment that China is having in the field of artificial intelligence. There is not only the start-up Deepseek, but also tech giants like Alibaba and, indeed, Tencent, as well as a host of other start-ups.

What is Hunyan and how it was born

Tencent, the technology conglomerate that owns platforms such as WeChat and Tencent Cloud, has started development of Hunyuan in 2023 as a generic model to compete with Western solutions. The Turbo S version, released in recent hours, aims to close the performance gap with US systems. But it is also an advancement, in terms of price and speed, over the previous Turbo model.

Some media compared it to Deepseek R1, the one that caused a stir in recent weeks. In reality, Turbo S competes with Deepseek V3 and differs from the reasoning models (such as OpenAi's R1 or o1), because it focuses precisely on speed of response.

How to access Turbo S and prices

Turbo S unlike Deepseek, which is also accessible to the general public in Italy, is available to only a few.

Currently, developers and corporate users can access Hunyuan Turbo S via API on the Tencent Cloud website and enjoy a one-week free trial. The price is 0.8 yuan/million tokens for input and 2 yuan/million tokens for output, a significant reduction from the previous generation Hunyuan Turbo model. Turbo S will then be gradually implemented on Tencent Yuanbao, an AI app currently only available in China.

On the app, users will be able to select the 'Hunyuan' model and disable the deep thinking function for a trial run.

Here you can ask for a free trial of the Tencent Hunyuan Turbo API.

Based on Turbo S, Tencent also launched the T1 inference model with deep thinking capabilities. This model has been fully launched on Tencent Yuanbao and will soon be available via API access.

Technical features

Unlike traditional slow-thinking or reasoning models such as Deepseek R1 and Hunyuan T1, Hunyuan Turbo S achieves 'instant responses'. Tencent has greatly improved the speed of response output, doubling the speed of word output and reducing the latency of the first word by 44 per cent. Tests - so far only those communicated by Tencent, not independent - have Turbo S ranked high in various areas such as general culture, creativity, and mathematics. Hunyuan Turbo S demonstrates performance comparable to leading models such as DeepSeek V3, GPT4o and Claude.

At a deeper level of analysis, it appears that Turbo S adopts a hybrid approach, combining slow and fast thinking. The insight comes from behavioural economics studies, according to which humans rely on intuition or fast thinking for 90%-95% of their daily decisions. Turbo S exploits this paradigm when needed, at the same time resorting to slow thinking when needed and thus can also perform well in reasoning, for more complex problems (such as R1).

In short, through the fusion of long and short chains of reasoning, the model on the one hand gives quick answers on simple problems (topics of everyday, general interest), and on the other hand has good scientific reasoning capabilities. Claude from Anthropic has also recently been shown to be moving towards this hybrid approach.

In terms of architectural innovation, Hunyuan Turbo S adopts a Hybrid-Mamba-Transformer architecture. It thus reduces the computational complexity and KV-Cache occupancy of the traditional Transformer architecture and greatly lowers training and inference (i.e. response generation) costs. This hybrid architecture is more efficient than traditional large language models for long texts. It exploits the advantages of the Mamba architecture in handling long sequences while maintaining the Transformer's ability to understand complex contexts. This is the first effective application of the Mamba architecture to large models, without any loss of performance.