Alibaba's AI Model Outperforms GPT-5 in Global Crypto Trading Test

Alibaba's AI Model Outperforms GPT-5 in Global Crypto Trading Test

04 November, 20251 sources compared
Business

Key Points from 1 News Sources

  1. 1

    Alibaba’s Qwen3-Max AI achieved a 22.32% return on a $10,000 crypto investment in two weeks

  2. 2

    The AI outperformed five other leading US and Chinese crypto trading models, including OpenAI’s

  3. 3

    The competition involved real money and real market conditions in cryptocurrency trading

Full Analysis Summary

AI Models in Crypto Trading

A recent global crypto trading competition reported by South China Morning Post says a China-based model, DeepSeek’s V3.1 Chat, led performance with a 4.89% gain.

OpenAI’s GPT-5 suffered the steepest loss at −62.66%.

Only two out of six AI models turned a profit overall.

The organizer Nof1 cautioned that outcomes might be due to luck and pledged more rigorous statistical methods in future rounds.

The test used only quantitative market data with no access to news, which some observers say limits how well the results mirror real-world investing.

Notably, the provided source does not mention Alibaba; the outperforming China-based model named is DeepSeek, not an Alibaba model.

Coverage Differences

Contradiction

South China Morning Post (Asian) reports that the outperforming China-based model is DeepSeek’s V3.1 Chat, not an Alibaba model, directly contradicting the premise that Alibaba’s AI beat GPT-5. It also specifies GPT-5 was the worst performer, which frames the competitive outcome clearly.

Narrative

South China Morning Post (Asian) frames the outcome in a China-vs-US model context (Chinese DeepSeek versus four US models) and emphasizes numerical performance results, rather than corporate branding like Alibaba, which it does not mention.

Ambiguity

South China Morning Post (Asian) highlights caveats that outcomes might be luck-driven and that methodology excluded news, cautioning against overgeneralizing to real-world investing; it does not provide further corroboration from other media.

Evaluation of Market Models

The competition limited models to using only quantitative market data without incorporating news input.

This approach raised questions about the relevance of the models to real-world scenarios.

Nof1 noted that the results might have been influenced by luck and promised more rigorous statistical methods in future rounds.

This indicates that the organizers recognize the need for stronger evaluation before making definitive conclusions.

The article mentions that two out of six models were profitable but only identifies DeepSeek as one of them.

The identity of the other profitable model is not specified in the provided information.

Coverage Differences

Tone

South China Morning Post (Asian) adopts a cautious tone, foregrounding limitations—luck, lack of news data, and the need for rigorous statistics—rather than celebratory headlines about an unambiguous victory.

Narrative

Rather than emphasizing brand-centric narratives (e.g., Alibaba), the South China Morning Post (Asian) report centers on test constraints and statistical caution, shaping a methodological narrative over corporate rivalry.

Performance of AI Models

The article compares the performance of four US entrants: OpenAI, Anthropic, Google DeepMind, and Elon Musk’s xAI.

All four US entrants experienced significant losses in the test.

OpenAI’s GPT-5 had the worst performance, with a loss of −62.66%.

The Chinese entrant DeepSeek V3.1 gained 4.89%, making it the top performer in this round under the stated constraints.

This highlights a competitive moment for a China-based model in this specific, constrained test.

However, this does not imply broader superiority in general investing contexts.

Coverage Differences

Narrative

South China Morning Post (Asian) frames the outcome in national terms—one Chinese model vs. four US models—and emphasizes the magnitude of GPT-5’s loss, which can shape perceptions of a China–US performance gap in this test.

Ambiguity

Because models were barred from using news and the organizer suggested luck may explain outcomes, the extent to which these results generalize to real markets remains unclear in the article.

Preliminary AI Trading Test Analysis

The implications of the test results are cautious.

The organizer commented that the outcomes might be due to luck and emphasized the need for more rigorous statistics.

This suggests the test was a preliminary and limited benchmark rather than a definitive assessment of AI trading ability.

The absence of news inputs further separates the test from real-world investing conditions.

The article explicitly avoids attributing the win to Alibaba, instead naming DeepSeek as the winner.

Therefore, any claims that Alibaba's AI model outperformed GPT-5 are not supported by the source provided.

Coverage Differences

Tone

The article’s tone is measured and skeptical, foregrounding luck and methodological limitations rather than triumphalism.

All 1 Sources Compared

South China Morning Post

Alibaba’s Qwen returns 22% in 2 weeks, beats DeepSeek, OpenAI in crypto trading showdown

Read Original