In a bold move to assert its dominance in the global AI race, Alibaba’s Qwen team has introduced a new family of AI models, Qwen2.5-VL. The models, designed to tackle a wide range of text, image, and video analysis tasks, also showcase the ability to interact with software on PCs and mobile devices, placing them in direct competition with OpenAI’s Operator.
The Qwen2.5-VL models, available for testing in Alibaba’s Qwen Chat app and for download on Hugging Face, are equipped to perform tasks such as analyzing charts and graphics, extracting data from scanned documents, and understanding hours-long videos. These models can also identify intellectual properties from movies, TV series, and other products—though their ability to recognize copyrighted works hints at a potentially controversial training dataset.
Benchmarking Qwen2.5-VL Against Global Leaders
Alibaba claims that the flagship Qwen2.5-VL-72B model outperforms major competitors like OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash in benchmarks for video understanding, document analysis, and question-answering. While the smaller models in the series—Qwen2.5-VL-3B and Qwen2.5-VL-7B—are licensed for unrestricted commercial use, the flagship model is protected under Alibaba’s custom license. Organizations with over 100 million monthly active users must seek explicit permission from Alibaba to deploy it commercially.
One of the standout features of Qwen2.5-VL is its ability to control software. In a video shared on social media by Philipp Schmid, a Hugging Face technical lead, the model was shown using the Booking.com app on an Android device to book a flight from Chongqing to Beijing. In another demonstration, the model controlled apps on a Linux desktop, though its practical effectiveness in a real computer environment remains limited based on its lower benchmark scores on OSWorld.
Limitations and Censorship
As with many AI systems developed in China, Qwen2.5-VL operates under certain restrictions dictated by the country’s regulatory framework. When prompted to discuss sensitive topics, such as “Xi Jinping’s mistakes,” the model declined to respond, reflecting China’s strict censorship standards. The country’s internet regulators require AI models to align with “core socialist values,” ensuring they avoid politically sensitive or controversial topics.
While these constraints limit the model’s applicability in certain contexts, they align with Alibaba’s need to comply with domestic regulations, particularly as China accelerates its push for AI leadership.
Implications for the Global AI Landscape
The release of Qwen2.5-VL underscores Alibaba’s growing ambition to challenge global AI giants like OpenAI, Google, and Anthropic. With its ability to perform multimodal tasks and interact with software, the model family positions Alibaba as a key player in the competitive AI field.
However, Qwen2.5-VL is not without its challenges. Critics point to its subpar performance in real-world computer environments and the potential ethical implications of its training data, particularly regarding copyrighted works.
Despite these hurdles, Alibaba’s emphasis on versatility and efficiency could attract widespread interest, especially in markets looking for alternatives to U.S.-based AI solutions. With the smaller models available under a permissive license, Alibaba has opened the door for developers and businesses to explore new applications without significant licensing barriers.
The Future of Qwen2.5-VL
Qwen2.5-VL’s ability to bridge the gap between traditional AI tasks and real-world interaction marks a significant step forward. As the models continue to evolve, their impact on industries like document processing, video analysis, and software automation could be transformative.
By pushing the boundaries of what AI can achieve while adhering to local regulatory frameworks, Alibaba is setting a precedent for innovation in a highly competitive and increasingly fragmented global AI market. As the Qwen2.5-VL family gains traction, it will be fascinating to see how it shapes the future of AI and whether it can truly rival established leaders in the field.