Tencent’s ‘training-free’ AI model improvement technique sparks debate - Hello friends Angka jitu, In the article you are reading this time with the title Tencent’s ‘training-free’ AI model improvement technique sparks debate, We have prepared this article well for you to read and learn from. We hope the contents of this post are helpful. Artikel artificial intelligence, Artikel deep learning, Artikel machine learning, Artikel news, Artikel technology, We hope you understand what we've written. Okay, happy reading.

Judul : Tencent’s ‘training-free’ AI model improvement technique sparks debate
link : Tencent’s ‘training-free’ AI model improvement technique sparks debate

Tencent’s ‘training-free’ AI model improvement technique sparks debate

Paper argues that large language models can improve through experience on the job without needing to change their parameters

Researchers at Tencent Holdings have proposed a new "lightweight" technique to get AI models to improve by using "experience" without retraining, sparking a debate about whether that could be the key to more cost-effective continual learning.

The paper titled "Training-Free Group Relative Policy Optimisation", published last week on open-access repository arXiv, argued that large language models (LLMs) can improve through on-the-job experience, without needing to change their parameters.

Current training methods for making LLMs more useful in real-life tasks are reliant on techniques such as reinforcement learning, in which the model's parameters - the variables encoding its "intelligence" - are adjusted through algorithms such as Group Relative Policy Optimisation (GRPO).

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

Under GRPO, the model makes multiple attempts at a task, then adjusts its parameters based on the scores of those attempts. However, this process can be slow and computationally costly.

Instead, the researchers from Tencent's AI research lab suggested that LLMs could simply log the rules and heuristics from this GRPO process in an "experience library" and deploy them when faced with new tasks.

The paper provided examples of the kinds of heuristics the model comes up with itself, such as: "When solving geometry problems with intersections, validate solutions lie within bounded regions or segments, not on extensions, to avoid extraneous answers."

When the model next encounters a geometry problem, it will read this as part of its context and adjust its response accordingly. The model therefore becomes more capable as the experience library gets updated, rather than the model parameters.

The researchers applied this "training-free" GRPO algorithm to DeepSeek's V3.1-Terminus model, a 671 billion parameter model, and found that it beat Alibaba Cloud's Qwen2.5-32B-Instruct, a 32 billion parameter model fine-tuned using more conventional methods, on mathematical reasoning and web search tasks.

The result was achieved more efficiently, they claimed, with only 100 additional training examples needed to improve DeepSeek-V3.1-Terminus at a cost of around US$18, versus US$10,000 and 17,000 training examples for Qwen2.5-32B-Instruct.

Still, AI researchers online raised doubts about the research findings, pointing out that the experiment on models of different parameter sizes was not conducive to assessment of the relative benefits of the training-free GRPO technique.

The researchers noted this in the paper themselves, after finding that training-free GRPO as applied to Qwen2.5-32B-Instruct yielded worse performance than baseline scores.

"This may suggest that the effectiveness of our method is dependent on the underlying model's reasoning and tool-use capabilities ... indicating that model capability is a prerequisite for effective experience-based optimisation," they wrote.

While LLMs have driven much of the generative AI boom in recent years, their limitations in real-world domains have spurred researchers globally to focus on improving their self-improvement and continual learning capabilities.

In China, limited access to advanced US semiconductors has made cost-efficiency an additional priority, as companies look to scale up capabilities without incurring heavy computational costs.

Thus the article Tencent’s ‘training-free’ AI model improvement technique sparks debate

That's it for the articleTencent’s ‘training-free’ AI model improvement technique sparks debate This time, I hope it's been helpful to you all. Okay, see you in another article.

You are now reading the article Tencent’s ‘training-free’ AI model improvement technique sparks debate with the link addresshttps://www.angkaraja.cfd/2025/10/tencents-training-free-ai-model.html

Tencent’s ‘training-free’ AI model improvement technique sparks debate

Tencent’s ‘training-free’ AI model improvement technique sparks debate

Thus the article Tencent’s ‘training-free’ AI model improvement technique sparks debate

0 Response to "Tencent’s ‘training-free’ AI model improvement technique sparks debate"

Post a Comment

Tencent’s ‘training-free’ AI model improvement technique sparks debate

Baca juga

Tencent’s ‘training-free’ AI model improvement technique sparks debate

Thus the article Tencent’s ‘training-free’ AI model improvement technique sparks debate

0 Response to "Tencent’s ‘training-free’ AI model improvement technique sparks debate"

Post a Comment