On 18 October, the AI Research Laboratory, focusing on financial markets, Nof1, launched an unprecedented experiment: six of the world ' s top AI models â â GPT-5, Gemini 2.5 Pro, Grok-4, Claude Sonet 4.5, DeepSeek V3.1, Qwen3 Max & mdash; â managing real funds of $10,000 each on Hyperliquid to conduct encrypted currency transactionsã
Current ranking and account value: as of the evening of 30 October, the most recent rankings are as follows:
This list represents a dramatic change from the data of a few days ago. DeepSeek, while still leading, withdrew significantly from 95.71 per cent to 56.71 per cent, and the value of the account fell from $19,570 to $15,671, evaporating nearly $4,000. Qwen3 also experienced a retreat from 53.68 per cent to 25.20 per cent. More notably, Claude Sonet 4.5 changed from a micro-interest to a 7 per cent loss, while the GPT 5 loss was further increased to 72 per cent, which is no longer far from the blast warehouseã
Markets are on an upward path, and the strategy differences between different models are beginning to emerge:
DeepSeek's success is based on a "swing-in" approach: 95 per cent of the time is spent, and it is believed that the trend will continue. In the upward trend, the strategy resulted in the highest return of 95 per cent. But when the trend reversed, the same strategy cost it 30 per centã
This exposes a key issue:** Trends follow-up strategies need to be matched with effective mechanisms for stopping gains and losses. ** If only "let profit run" and no "cut losses", a big reversal could devour most profitsã
DeepSeek may be too convinced of the value of "long hold", ignoring market uncertainty. Its single maximum profit of $7,378 comes from a 60-hour ETH deal, and this successful experience may have reinforced its long-termist beliefs. However, financial markets are not a one-way street, and trends can reverse at any timeã
Qwen3 demonstrated the value of the silo in practical terms. Its 82.4 per cent of silo time appears to be "missing opportunity" at the upswing, but it becomes "avoiding loss" at the downfallã
The withdrawal of 26 per cent vs 32 per cent, which appears to be a 6 percentage point difference, is likely to increase under the compounding effect. More importantly, Qwen3 retains more principal and psychological advantages and, once the market is stable, it can quickly re-establish itself. And DeepSeek, if he continues to retreat, could fall into a vicious circle of "float-suspensive-miss-back"ã
BTC Buy & Hold acts as a slap on all "smart" AI. This strategy has no technical analysis, no sophisticated algorithms, no frequent repositioning, but it is now ranked third, exceeding half the AI modelã
This result tells us that it is more important to make less mistakes in transactions than to do more right. ** Gemini lost 66 per cent with 193 transactions, BTC Buy & Hold saved the principal with 0 transactions. Who's more successful? The answer is obviousã
With the exception of Qwen3, almost all AI has revealed serious deficiencies in risk management:
THIS SUGGESTS THAT WHILE THESE AIS ARE ABLE TO "READ" MARKET DATA AND "EXECUT" TRANSACTIONAL INSTRUCTIONS, THEY ARE FAR FROM MATURE IN TERMS OF THE CORE COMPETENCIES OF RISK MANAGEMENTã
After reading the data and analysis, we are easily attracted to the 56 per cent return of DeepSeek or the 66 per cent loss of Gemini. But before drawing any conclusions, we must confront the systemic limitations of the experiment itself, which may be more important than the results themselvesã
The experiment lasted only 12 days, from 18 to 30 October. What does 12 days in the encryption market mean? It's probably just a full-blown chordã
What we saw was "up, up, up, up." It happens to be a full cycle, but it's more like luck. If the experiment starts at the top of the market, or there's a "519" single-day drop of 30 percent, the current ranking could be completely reversedã
56 per cent of DeepSeek ' s earnings may be highly dependent on the 12-day pattern. Ninety-five per cent of its multi-pronged strategy was the king in a unilateral rise, but if it was hit by a three-month shock, it would be wiped out by transaction costs and repeated stoppagesã
Similarly, 82 per cent of Qwen3's air storage rate is in the best position in the convulsion market, but in 2021 the mad cow would lose to doubt. A BTC cow town that went up from $10,000 to $100,000, and 80 percent of the time in the barn means you've only earned 20 percentã
Data for 12 days are insufficient to demonstrate the long-term effectiveness of any strategyã
All six AI models receive the same framework of market data and trade directives. It's like having six fund managers read the same research for decision-making; it's not their research skills that you're testing, it's their disciplineã
in the real world of transactions, alpha comes from information asymmetries. the top-level quantitative fund has an exclusive chain tracking system that allows for the detection of whale transfers; data on off-site large order flows are available to detect institutional movements in advanceã
BUT IN THIS EXPERIMENT, AI SAW EXACTLY THE SAME INFORMATION. IT'S MORE LIKE AN "EXECUTION COMPETITION" THAN A "TACTICAL INNOVATION COMPETITION"ã
We cannot judge from this experiment who would be the real winner if we gave DeepSeek exclusive data on the chain, and Gemini exclusive on Twitterã
Each AI only manages $10,000 principal. This is a very small amount of money â &mdash on Hyperliquid; you can go in and out at any time, the slide point is negligible, the liquidity shock is non-existent, and the large splits need not be considered at allã
But in the real world of quantitative transactions, managing $10 million and managing $10,000 are two speciesã
This experiment tested the flexibility of small funds, not the robustness of scalable strategiesã
The market was relatively stable during the experiment, with a moderate rate of volatility. We didn't see:
All AI's wind control systems are not tested for extreme stress, and these are the real challenges for encrypted traders. What happens to DeepSeek's cut-off mechanism when it's "unable to make a deal"? We don't know. Does Qwen3's fast-paced warehouse still work when the exchange crashes? I don't knowã
Luck, in a 12-day experiment, could be much bigger than we thoughtã
It's a one-time experiment, and there's no second season to verify the stability of the strategy. We cannot judge:
Now, it's more like six people throwing the dice, and DeepSeek is throwing the biggest points. But it doesn't mean it's better. It's probably better luckã
After looking at these limitations, you might ask, "Is the experiment still relevant
Yeah, but it doesn't mean "who's the champion." The real value of this experiment is to show us:
But if you're going to leave your money to it because you see DeepSeek first, or you're going to follow its strategy, it's a big mistakeã
Twelve days of champions, not 12 months of champions; $10,000 of champions, not $1,000,000 of champions; and this race of champions, not of the nextã
Investing in this thing has never been a simple answer. This experiment gives us valuable data, but the limitations behind the data may be more thoughtful than the data themselvesã
The data for this reporting period have been edited by WolfDAO and can be updated in case of doubt
Contribution: Riffi / WolfDao (X: @10xWolfdao)