Breakthrough in multimodal video generation technology, what opportunities does Web3 AI have?

7/9/2025, 10:21:18 AM

Intermediate

This article analyzes the breakthroughs in multimodal video generation technology (such as Byte's EX-4D, Google Veo, etc.) and discusses their profound impact on the creator economy and Web3 AI.

Apart from the “submergence” of AI localization, the biggest change in the AI sector recently is the technological breakthrough in multimodal video generation, which has evolved from supporting pure text-based video generation to a fully integrated generation technology combining text, images, and audio.

Here are a few examples of technological breakthroughs for everyone to experience:

1) ByteDance open-sources the EX-4D framework: Monocular video instantly transforms into free-viewpoint 4D content, with a user acceptance rate of 70.7%. This means that for an ordinary video, AI can automatically generate viewing effects from any angle, which previously required a professional 3D modeling team to achieve.

2) Baidu “Hui Xiang” platform: generates a 10-second video from one image, claiming to achieve “movie-level” quality. However, whether this is exaggerated by marketing remains to be seen until the Pro version update in August.

3) Google DeepMind Veo: Can achieve 4K video + environmental sound synchronization generation. The key technological highlight is the achievement of “synchronization” capability, as previously it was a splicing of two systems for video and audio. To achieve true semantic-level matching, significant challenges must be overcome, such as in complex scenes, where the synchronization of walking actions in the video and corresponding footstep sounds must be addressed.

4) Douyin ContentV: 8 billion parameters, 2.3 seconds to generate 1080p video, cost 3.67 yuan/5 seconds. To be honest, this cost control is quite good, but currently, considering the generation quality, it still falls short when encountering complex scenes.

Why is it said that these cases have significant value and meaning in terms of breakthroughs in video quality, production costs, and application scenarios?

1. In terms of breakthroughs in technological value, the complexity of generating a multimodal video is often exponential. A single frame image consists of about 10^6 pixels, and a video must ensure temporal coherence (at least 100 frames), along with audio synchronization (10^4 sample points per second), while also considering 3D spatial consistency.

In summary, the technical complexity is not low. Originally, it was a super large model tackling all tasks head-on. It is said that Sora burned tens of thousands of H100s to achieve video generation capabilities. Now, it can be realized through modular decomposition and collaborative work of large models. For example, Byte’s EX-4D actually breaks down complex tasks into: depth estimation module, viewpoint transformation module, temporal interpolation module, rendering optimization module, and so on. Each module specializes in one task and then coordinates through a mechanism.

2. In terms of cost reduction: it actually involves optimizing the reasoning architecture itself, including a layered generation strategy, where a low-resolution skeleton is generated first and then high-resolution imaging content is enhanced; a caching reuse mechanism, which is the reuse of similar scenes; and dynamic resource allocation, which actually adjusts the model depth based on the complexity of the specific content.

With this set of optimizations, we will achieve a result of 3.67 yuan per 5 seconds for Douyin ContentV.

3. In terms of application impact, traditional video production is a capital-intensive game: equipment, venues, actors, post-production; it’s normal for a 30-second advertisement to cost hundreds of thousands. Now, AI compresses this entire process to a prompt plus a few minutes of waiting, and can achieve perspectives and special effects that are difficult to attain in traditional shooting.

This turns the original technical and financial barriers of video production into creativity and aesthetics, which may promote a reshuffling of the entire creator economy.

The question arises, what is the relationship between the changes in the demand side of web2 AI technology and web3 AI?

1. First, the change in the structure of computing power demand. Previously, in AI, the competition was based on scale; whoever had more homogeneous GPU clusters would win. However, the demand for multimodal video generation requires a diverse combination of computing power, which could create a need for distributed idle computing power, as well as various distributed fine-tuning models, algorithms, and inference platforms.

2. Secondly, the demand for data labeling will also strengthen. Generating a professional-grade video requires: precise scene descriptions, reference images, audio styles, camera movement trajectories, lighting conditions, etc., which will become new professional data labeling requirements. Using Web3 incentive methods can encourage photographers, sound engineers, 3D artists, and others to provide professional data elements, enhancing the AI video generation capability with specialized vertical data labeling.

3. Finally, it is worth mentioning that when AI gradually shifts from centralized large-scale resource allocation to modular collaboration, it itself represents a new demand for decentralized platforms. At that time, computing power, data, models, incentives, etc. will jointly form a self-reinforcing flywheel, which will in turn drive the integration of web3AI and web2AI scenarios.

Statement:

This article is reprinted from [tmel0211 tmel0211]，Copyright belongs to the original author [tmel0211] If you have any objections to the reprint, please contact Gate Learn TeamThe team will process it as quickly as possible according to the relevant procedures.
Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team, unless otherwise mentioned. Gate Under no circumstances shall translated articles be copied, disseminated, or plagiarized.

Crypto Calendar

Tokenların Kilidini Aç

Venom, 25 Ağustos'ta şu anda dolaşımda olan arzın yaklaşık %2,83'ünü oluşturan 59.260.000 VENOM token'ını kullanıma sunacak.

VENOM

0.12%

2025-08-24

Dokümantasyon Yayını

"Dokümantasyon Yayını: YKILY MCP Arayüz Standartları – geliştiriciler için birleşik entegrasyon kılavuzu."

AGI

0.52%

2025-08-24

59.26MM Token Kilidi Açma

Venom, dolaşımdaki arzının yaklaşık %2.83'ünü UTC saatine göre sabah 8'de kilidini açar.

VENOM

0.12%

2025-08-24

ROUTE Göçü Sona Erdi

Router Protocol, bir yıl süren uzatmalar ve sonuçlanan bir DAO oylamasının ardından, ROUTE token geçişi için son tarih olarak 25 Ağustos, UTC ile 17:00'yi duyurdu. Bu tarihten sonra, değiştirilmemiş v.1.0 token'lar ve bunların v.2.0 eşdeğerleri (1:33.33 oranında) kalıcı olarak yakılacaktır. Göç işlemini henüz tamamlamamış kullanıcıların, son tarihten önce bunu yapmaları şiddetle tavsiye edilir.

ROUTE

-1.03%

2025-08-24

Ana Ağ Güncellemesi

Zircuit, testnet'inin zkVM sağlayıcılarını entegre etmek için çatalını tamamladı. Güncelleme sorunsuz bir şekilde ilerliyor ve ağ beklenildiği gibi sonuçlanıyor. Aynı güncellemenin Zircuit ana ağında 25 Ağustos'ta uygulanması planlanıyor.

ZRC

-2.93%

2025-08-24

Intermediate

Blockchain Profitability & Issuance - Does It Matter?

In the field of blockchain investment, the profitability of PoW (Proof of Work) and PoS (Proof of Stake) blockchains has always been a topic of significant interest. Crypto influencer Donovan has written an article exploring the profitability models of these blockchains, particularly focusing on the differences between Ethereum and Solana, and analyzing whether blockchain profitability should be a key concern for investors.

6/17/2024, 3:14:00 PM

Beginner

Arweave: Capturing Market Opportunity with AO Computer

Decentralised storage, exemplified by peer-to-peer networks, creates a global, trustless, and immutable hard drive. Arweave, a leader in this space, offers cost-efficient solutions ensuring permanence, immutability, and censorship resistance, essential for the growing needs of NFTs and dApps.

6/8/2024, 2:46:17 PM

Intermediate

The Upcoming AO Token: Potentially the Ultimate Solution for On-Chain AI Agents

AO, built on Arweave's on-chain storage, achieves infinitely scalable decentralized computing, allowing an unlimited number of processes to run in parallel. Decentralized AI Agents are hosted on-chain by AR and run on-chain by AO.

6/18/2024, 3:14:52 AM

Intermediate

In-depth Analysis of API3: Unleashing the Oracle Market Disruptor with OVM

Recently, API3 secured $4 million in strategic funding, led by DWF Labs, with participation from several well-known VCs. What makes API3 unique? Could it be the disruptor of traditional oracles? Shisijun provides an in-depth analysis of the working principles of oracles, the tokenomics of the API3 DAO, and the groundbreaking OEV Network.

6/25/2024, 1:56:05 AM

Intermediate

AI Agents in DeFi: Redefining Crypto as We Know It

This article focuses on how AI is transforming DeFi in trading, governance, security, and personalization. The integration of AI with DeFi has the potential to create a more inclusive, resilient, and future-oriented financial system, fundamentally redefining how we interact with economic systems.

11/28/2024, 3:45:01 AM

Beginner

Dimo: Decentralized Revolution of Vehicle Data

Dimo is a car IoT platform built on Polygon, allowing car owners to collect and share vehicle data such as mileage, speed, and location, in exchange for DIMO tokens as rewards. The platform enables real-time monitoring, management, and monetization of vehicle data through integration with hardware such as AutoPi OBDII devices. The DIMO token, based on ERC-20, aims to incentivize user participation, with governance features included in its token economy. Dimo also collaborates with IoTeX, integrating W3bstream technology to support Web3 developers' access to vehicle data, jointly creating a new ecosystem for mobile travel. With two rounds of funding raising $20.5 million, the Dimo project has a fixed token supply, with circulating supply gradually increasing.

5/6/2024, 12:37:57 PM

Start Now

$100

Voucher!

Breakthrough in multimodal video generation technology, what opportunities does Web3 AI have?

Statement:

Crypto Calendar

Related Articles

Blockchain Profitability & Issuance - Does It Matter?

Arweave: Capturing Market Opportunity with AO Computer

The Upcoming AO Token: Potentially the Ultimate Solution for On-Chain AI Agents

In-depth Analysis of API3: Unleashing the Oracle Market Disruptor with OVM

AI Agents in DeFi: Redefining Crypto as We Know It

Dimo: Decentralized Revolution of Vehicle Data