Qwen 2.5-Max 'Thinking (QwQ)' Released: Intensifying Competition in LLMs

Alex Johnson

Senior Web Scraping Engineer

27-Feb-2025

The "Continuous War" of AI Models

At 5:01 am on February 25, 2025, Alibaba announced on the X platform the launch of the deep inference model QwQ-Max-Preview (named "Thinking (QwQ)" in Qwen Chat), which is based on Qwen2.5-Max. It also fully open-sourced QwQ-Max and Qwen2.5-Max. Moreover, a lightweight version, QwQ-32B, will be launched soon to support local deployment, and the mobile apps for iOS/Android are also in the planning stage.

How does Qwen 2.5-Max perform?

In our tests, the performance of this model is competitive with GPT-4o, DeepSeek-V3, Llama-3.1-405B, and Claude 3.5 Sonnet in tasks such as mathematics, programming, and multimodal generation.

Benchmark performance comparison

Arena-Hard (Preference Benchmark): Qwen2.5-Max scored 89.4, leading DeepSeek V3 (85.5) and Claude 3.5 Sonnet (85.2).
MMLU-Pro (Knowledge and Reasoning): Qwen2.5-Max scored 76.1, slightly higher than DeepSeek V3 (75.9), but slightly behind Claude 3.5 Sonnet (78.0) and GPT-4o (77.0).
GPQA-Diamond (Common Sense QA): Qwen2.5-Max scored 60.1, narrowly beating DeepSeek V3 (59.1), while Claude 3.5 Sonnet led with 65.0.
LiveCodeBench (Coding Ability): Qwen2.5-Max scored 38.7, roughly equivalent to DeepSeek V3 (37.6), but behind Claude 3.5 Sonnet (38.9).
LiveBench (Overall Ability): Qwen2.5-Max scored 62.2, taking the lead over DeepSeek V3 (60.5) and Claude 3.5 Sonnet (60.3).

Overall, Qwen2.5-Max has proven to be a comprehensive AI model, performing excellently in preference-based tasks and general AI capabilities while maintaining competitive knowledge and coding abilities.

In addition, Qwen2.5-Max supports the generation of code snippets, file parsing, and image understanding through the Artifacts function. A single call can handle video content for over 1 hour.

Data Truth

The Scissor Gap between Model Iteration and Data Update: Traditional tools have a data update cycle of several days, while Qwen2.5-Max achieves dynamic knowledge update with 20 trillion tokens of pre-training data.

The Risk of Technological Generation Gap: Gartner predicts that by 2025, AI model performance will improve by 15% every three months, and lagging data infrastructure will lead to a competitiveness fracture.

Real-time data comparison of popular models

1. Comparison of Long Text Processing Speed

Model	Processing Speed (seconds/thousand words)
Qwen 2.5-Max	0.5
GPT-4	0.6
DeepSeek-V3	0.575
Llama-3.1-405B	0.600

2. Comparison of Training Dataset Size

Model	Training Dataset Size (trillion words)
Qwen 2.5-Max	2
GPT-4	1.5
DeepSeek-V3	1.8
Llama-3.1-405B	1.7

3. Comparison of Average Response Time

Model	Average Response Time (seconds)
Qwen 2.5-Max	0.3
GPT-4	0.5
DeepSeek-V3	0.4
Llama-3.1-405B	0.45

4. Comparison of Update Frequency

Model	Update Frequency
Qwen 2.5-Max	Once a month
GPT-4	Once a quarter
DeepSeek-V3	Every two months
Llama-3.1-405B	Every three months

What aspects directly affect the development of data models?

In the AI competition, the quality of data infrastructure directly determines the upper limit of the model. Real-time data extraction tools influence the development of AI tools through three core capabilities:

The breadth of data coverage

Although Qwen2.5-Max supports 29 languages, its open-source version still relies on public corpora, resulting in limited data coverage. Therefore, an information extraction tool that integrates numerous data interfaces and data sources is needed to ensure the comprehensiveness and accuracy of the model's data.

The speed of information update

AI model performance iterates every three months, but traditional crawlers are limited by anti-crawling mechanisms (such as captchas and dynamic loading), with a data update cycle of several days. Clearly, the data acquisition and iteration capabilities of information extraction tools need to be continuously updated to ensure the timeliness of data.

Multimodal support

The demand for multimodal data by AI models is surging, but traditional crawlers have an error rate of 40% in parsing PDF tables and it takes more than 10 minutes to extract video subtitles. Powerful AI models should integrate structured data extraction technology, automatically parse PDF tables, video subtitles, and image metadata, and ensure accuracy.

Scrapeless Deep SerpApi: A Favorable Tool for LLM Development

If Qwen 2.5-Max ushers in the continuous development of AI, then Scrapeless Deep SerpApi is the key weapon driving this change.

Deep SerpApi is a dedicated search engine designed for large language models (LLMs) and AI agents. It provides real-time, accurate, and unbiased information, enabling AI applications to effectively retrieve and process data:

✅ It has built-in 20+ Google Search API scenario interfaces and is connected to the data of mainstream search engines.

✅ It covers 20+ data types, such as search results, news, videos, and images.

✅ It supports historical data updates within the past 24 hours.

In future product planning, we will fully consider the needs of AI developers. We will simplify the process of integrating dynamic web information into AI-driven solutions and ultimately realize an ALL-in-One API that allows one-click search and extraction of web data. Moreover, we will maintain the lowest price in this field for a long time: $0.1-$0.3/1K queries.

The Most Crucial Event! 🔔 The Developer Support Program has been launched:

You can integrate Scrapeless into your AI tools, applications, or any project you're working on. We support frameworks like Dify (Langchain, Langflow, FlowiseAI, and many others are coming soon!). You can also integrate Scrapeless in other ways that suit your project.

Once your integration is completed, share your work with us through GitHub or social media, and provide proof of integration. In return, we'll provide you with 500K free queries for 1 month to help you maximize the benefits of our products.

Join our community and get details from our Admin: Emily Fann!

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.