On 18 October, the AI Research Laboratory, focusing on financial markets, Nof1, launched an unprecedented experiment: six of the world ' s top AI models — — GPT-5, Gemini 2.5 Pro, Grok-4, Claude Sonet 4.5, DeepSeek V3.1, Qwen3 Max & mdash; — managing real funds of $10,000 each on Hyperliquid to conduct encrypted currency transactions。
Current ranking and account value: as of the evening of 30 October, the most recent rankings are as follows:
This list represents a dramatic change from the data of a few days ago. DeepSeek, while still leading, withdrew significantly from 95.71 per cent to 56.71 per cent, and the value of the account fell from $19,570 to $15,671, evaporating nearly $4,000. Qwen3 also experienced a retreat from 53.68 per cent to 25.20 per cent. More notably, Claude Sonet 4.5 changed from a micro-interest to a 7 per cent loss, while the GPT 5 loss was further increased to 72 per cent, which is no longer far from the blast warehouse。
Markets are on an upward path, and the strategy differences between different models are beginning to emerge:
DeepSeek's success is based on a "swing-in" approach: 95 per cent of the time is spent, and it is believed that the trend will continue. In the upward trend, the strategy resulted in the highest return of 95 per cent. But when the trend reversed, the same strategy cost it 30 per cent。
This exposes a key issue:** Trends follow-up strategies need to be matched with effective mechanisms for stopping gains and losses. ** If only "let profit run" and no "cut losses", a big reversal could devour most profits。
DeepSeek may be too convinced of the value of "long hold", ignoring market uncertainty. Its single maximum profit of $7,378 comes from a 60-hour ETH deal, and this successful experience may have reinforced its long-termist beliefs. However, financial markets are not a one-way street, and trends can reverse at any time。
Qwen3 demonstrated the value of the silo in practical terms. Its 82.4 per cent of silo time appears to be "missing opportunity" at the upswing, but it becomes "avoiding loss" at the downfall。
The withdrawal of 26 per cent vs 32 per cent, which appears to be a 6 percentage point difference, is likely to increase under the compounding effect. More importantly, Qwen3 retains more principal and psychological advantages and, once the market is stable, it can quickly re-establish itself. And DeepSeek, if he continues to retreat, could fall into a vicious circle of "float-suspensive-miss-back"。
BTC Buy & Hold acts as a slap on all "smart" AI. This strategy has no technical analysis, no sophisticated algorithms, no frequent repositioning, but it is now ranked third, exceeding half the AI model。
This result tells us that it is more important to make less mistakes in transactions than to do more right. ** Gemini lost 66 per cent with 193 transactions, BTC Buy & Hold saved the principal with 0 transactions. Who's more successful? The answer is obvious。
With the exception of Qwen3, almost all AI has revealed serious deficiencies in risk management:
THIS SUGGESTS THAT WHILE THESE AIS ARE ABLE TO "READ" MARKET DATA AND "EXECUT" TRANSACTIONAL INSTRUCTIONS, THEY ARE FAR FROM MATURE IN TERMS OF THE CORE COMPETENCIES OF RISK MANAGEMENT。
After reading the data and analysis, we are easily attracted to the 56 per cent return of DeepSeek or the 66 per cent loss of Gemini. But before drawing any conclusions, we must confront the systemic limitations of the experiment itself, which may be more important than the results themselves。
The experiment lasted only 12 days, from 18 to 30 October. What does 12 days in the encryption market mean? It's probably just a full-blown chord。
What we saw was "up, up, up, up." It happens to be a full cycle, but it's more like luck. If the experiment starts at the top of the market, or there's a "519" single-day drop of 30 percent, the current ranking could be completely reversed。
56 per cent of DeepSeek ' s earnings may be highly dependent on the 12-day pattern. Ninety-five per cent of its multi-pronged strategy was the king in a unilateral rise, but if it was hit by a three-month shock, it would be wiped out by transaction costs and repeated stoppages。
Similarly, 82 per cent of Qwen3's air storage rate is in the best position in the convulsion market, but in 2021 the mad cow would lose to doubt. A BTC cow town that went up from $10,000 to $100,000, and 80 percent of the time in the barn means you've only earned 20 percent。
Data for 12 days are insufficient to demonstrate the long-term effectiveness of any strategy。
All six AI models receive the same framework of market data and trade directives. It's like having six fund managers read the same research for decision-making; it's not their research skills that you're testing, it's their discipline。
in the real world of transactions, alpha comes from information asymmetries. the top-level quantitative fund has an exclusive chain tracking system that allows for the detection of whale transfers; data on off-site large order flows are available to detect institutional movements in advance。
BUT IN THIS EXPERIMENT, AI SAW EXACTLY THE SAME INFORMATION. IT'S MORE LIKE AN "EXECUTION COMPETITION" THAN A "TACTICAL INNOVATION COMPETITION"。
We cannot judge from this experiment who would be the real winner if we gave DeepSeek exclusive data on the chain, and Gemini exclusive on Twitter。
Each AI only manages $10,000 principal. This is a very small amount of money — &mdash on Hyperliquid; you can go in and out at any time, the slide point is negligible, the liquidity shock is non-existent, and the large splits need not be considered at all。
But in the real world of quantitative transactions, managing $10 million and managing $10,000 are two species。
This experiment tested the flexibility of small funds, not the robustness of scalable strategies。
The market was relatively stable during the experiment, with a moderate rate of volatility. We didn't see:
All AI's wind control systems are not tested for extreme stress, and these are the real challenges for encrypted traders. What happens to DeepSeek's cut-off mechanism when it's "unable to make a deal"? We don't know. Does Qwen3's fast-paced warehouse still work when the exchange crashes? I don't know。
Luck, in a 12-day experiment, could be much bigger than we thought。
It's a one-time experiment, and there's no second season to verify the stability of the strategy. We cannot judge:
Now, it's more like six people throwing the dice, and DeepSeek is throwing the biggest points. But it doesn't mean it's better. It's probably better luck。
After looking at these limitations, you might ask, "Is the experiment still relevant
Yeah, but it doesn't mean "who's the champion." The real value of this experiment is to show us:
But if you're going to leave your money to it because you see DeepSeek first, or you're going to follow its strategy, it's a big mistake。
Twelve days of champions, not 12 months of champions; $10,000 of champions, not $1,000,000 of champions; and this race of champions, not of the next。
Investing in this thing has never been a simple answer. This experiment gives us valuable data, but the limitations behind the data may be more thoughtful than the data themselves。
The data for this reporting period have been edited by WolfDAO and can be updated in case of doubt
Contribution: Riffi / WolfDao (X: @10xWolfdao)