Judul : Tencent’s ‘training-free’ AI model improvement technique sparks debate
link : Tencent’s ‘training-free’ AI model improvement technique sparks debate
Tencent’s ‘training-free’ AI model improvement technique sparks debate
Paper argues that large language models can improve through experience on the job without needing to change their parametersResearchers at Tencent Holdings have proposed a new "lightweight" technique to get AI models to improve by using "experience" without retraining, sparking a debate about whether that could be the key to more cost-effective continual learning.
The paper titled "Training-Free Group Relative Policy Optimisation", published last week on open-access repository arXiv, argued that large language models (LLMs) can improve through on-the-job experience, without needing to change their parameters.
Current training methods for making LLMs more useful in real-life tasks are reliant on techniques such as reinforcement learning, in which the model's parameters - the variables encoding its "intelligence" - are adjusted through algorithms such as Group Relative Policy Optimisation (GRPO).
Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.
Under GRPO, the model makes multiple attempts at a task, then adjusts its parameters based on the scores of those attempts. However, this process can be slow and computationally costly.
Instead, the researchers from Tencent's AI research lab suggested that LLMs could simply log the rules and heuristics from this GRPO process in an "experience library" and deploy them when faced with new tasks.

The paper provided examples of the kinds of heuristics the model comes up with itself, such as: "When solving geometry problems with intersections, validate solutions lie within bounded regions or segments, not on extensions, to avoid extraneous answers."
When the model next encounters a geometry problem, it will read this as part of its context and adjust its response accordingly. The model therefore becomes more capable as the experience library gets updated, rather than the model parameters.
The researchers applied this "training-free" GRPO algorithm to DeepSeek's V3.1-Terminus model, a 671 billion parameter model, and found that it beat Alibaba Cloud's Qwen2.5-32B-Instruct, a 32 billion parameter model fine-tuned using more conventional methods, on mathematical reasoning and web search tasks.
The result was achieved more efficiently, they claimed, with only 100 additional training examples needed to improve DeepSeek-V3.1-Terminus at a cost of around US$18, versus US$10,000 and 17,000 training examples for Qwen2.5-32B-Instruct.
Still, AI researchers online raised doubts about the research findings, pointing out that the experiment on models of different parameter sizes was not conducive to assessment of the relative benefits of the training-free GRPO technique.
The researchers noted this in the paper themselves, after finding that training-free GRPO as applied to Qwen2.5-32B-Instruct yielded worse performance than baseline scores.
"This may suggest that the effectiveness of our method is dependent on the underlying model's reasoning and tool-use capabilities ... indicating that model capability is a prerequisite for effective experience-based optimisation," they wrote.
While LLMs have driven much of the generative AI boom in recent years, their limitations in real-world domains have spurred researchers globally to focus on improving their self-improvement and continual learning capabilities.
In China, limited access to advanced US semiconductors has made cost-efficiency an additional priority, as companies look to scale up capabilities without incurring heavy computational costs.
More Articles from SCMP
Chinese airlines oppose US flight plan, Nets ditch China’s Zeng: SCMP daily highlights
US and China flex muscles as narrative war over trade tensions heats up
Hong Kong’s Metropol dim sum restaurant may be dead, but its neon sign will live on
Hong Kong’s respite agencies must be held accountable for refusing services
This article originally appeared on the South China Morning Post (www.scmp.com), the leading news media reporting on China and Asia.
Copyright (c) 2025. South China Morning Post Publishers Ltd. All rights reserved.
Thus the article Tencent’s ‘training-free’ AI model improvement technique sparks debate
You are now reading the article Tencent’s ‘training-free’ AI model improvement technique sparks debate with the link addresshttps://www.angkaraja.cfd/2025/10/tencents-training-free-ai-model.html
0 Response to "Tencent’s ‘training-free’ AI model improvement technique sparks debate"
Post a Comment