OpenAI Sora painted the pie, and the AI industry collectively gave it a round... Is it?
In the field of AI Wensheng video, almost only Sora is still a future.
In the past two months, AI-generated video applications at home and abroad have continued to explode, with China's Kuaishou, Byte, Zhipu AI, Shengshu Technology, and Aishi Technology, and overseas Google, Luma, and Runway.
While there are still differences in the level of different platforms, the overall usability has been greatly improved, and the stylized features are more comprehensive. The only drawback is that when it comes to integrating into workflows, AI video tools are still a little bit poor. The bread painted by Sora is going to be rounded up by these latecomers.
Giving up futures, Wensheng video application exploded
The industry and the public sector regard video as a key area for AI applications. NVIDIA CEO Jensen Huang invited Meta CEO Mark Zuckerberg for a conversation at SIGGRAPH 2024, the world's top computer graphics conference, on July 30, and both sides agreed that video capabilities will be the evolution direction of AI models.
Jiaming Song, chief scientist of Luma AI, who came from NVIDIA's research group, said in a conversation with Anjney Midha, a partner at a16z, that video is associated with the 3D world, and from a learning perspective, video data enables models to better understand and reason about the 3D world. Therefore, real-time and high-quality video generation will eventually promote the development of embodied AI.
Video is such a "bridge", and now a large number of AI companies are trying to get through it first, especially OpenAI has turned Sora into a futures that the outside world cannot use, giving other platforms room for further development.
(Compiled from public information)
(Source: Tianyancha)
Behind the ultra-long front is the temptation of these companies. Part it's about the business model, and the other part is about the technology application prospects.
Keling, Jimeng, Vidu, etc. have all launched a membership subscription model to try to popularize the application on the C-side. Wang Changhu, founder of Aishi Technology, previously said in an interview with Caixin: "Aishi's current strategy is to focus on 2C (consumer-oriented) and collect feedback from users at home and abroad to better iterate the underlying model based on user experience. "As for further applications, it's too early to talk about it, mainly because the C-end charging model can't afford the cost.
Luma AI has adopted the product form of To C, but it originally focused on the 3D field, and entered the video generation field to explore more possibilities of 3D generation and reconstruction, and to drive 3D development with video. This has more application prospects in the industrial field, such as mass production of 3D materials required for films.
Most importantly, Luma AI's vision is not to sell technology or materials, but to build a platform like TikTok, which is a 3D-based ecosystem. Wang Changhu also said in a conversation with Zhang Peng, the founder of Geek Park, that Aishi Technology is also aiming at "platform opportunities in the AIGC era", but the form of the platform cannot be predicted for the time being, because the AI industry will not grow in the way of replicating the existing platform.
Not only that, but applications that are now taking shape to bring AI-generated video into a complete workflow. Clapper, an open-source video editing tool, has recently become more popular, and it features a combination of various AI technologies that uses prompts to mobilize AI agents to generate and iterate stories, directly skipping the process of manually editing files.
(Source: Heart of the Machine)
It can be seen that AI-generated video is evolving much faster than we think. At present, the focus of the industry is undoubtedly on two aspects: generation speed and generation efficiency. However, the big model does not provide a completely certain direction for the business model, which is more dependent on the team's choice. In this process, in addition to commercialization, AI companies also need to think about how to avoid falling into compliance dilemmas and cost dilemmas. Therefore, it is not easy to make Wensheng video mature, and it is now only equivalent to the stage when ChatGPT has just come out.
The "hard flaws" and breakthroughs of AI-generated video
A16Z has previously expressed the opinion that giants need to pay more attention to legal security, copyright and other issues in the transformation from scientific research results to commercial products, so the efficiency is often slow. We don't think about whether Sora has always been absent for this reason, just looking at the related problems that the industry has to face, the logic is actually the same.
1. The "gap" of commercialization, the current AI-generated video is difficult to meet the needs of Party A
Bloomberg has reported that OpenAI has been trying to recommend Sora to Hollywood, but without success. The first commercial ad created with Sora was a Toys"R"Us commercial that was released in June. However, not only did the video use some old footage, but the public press release did not say that it was entirely AI-generated.
Director Nik Kleverov also said in a now-deleted story that Native Foreign, the creative agency that produced the shots, provided about a dozen staff members to work on it, and that Sora supported 80 to 85 percent of the process. This is not good news for AI-generated Video, which requires high efficiency and low cost.
2. It is difficult to meet the training cost and high-quality datasets
The essence of a video can be seen as a series of images, and there are many publicly available datasets for images, but videos do not. OpenAI has been accused of using YouTube videos for training, and Nvidia was recently exposed by the media to collect a large amount of data from Netflix and YouTube to train its Cosmos project to support the development of its AI products in the real world. It can download the equivalent of 80 years of video content every day.
This reflects two key points: First, Huang and Luma have similar views, and the development of AI video is indeed significant for AI to enter the 3D world, and Nvidia has done the same: text - image - video - 3D model - real world. The second is that the video dataset is a big problem, in addition to copyright issues, these video data also lack labels, Stanford University professor Stefano Ermon said, at this stage there is a lack of methods to filter and filter the videos, and after screening, they also need to consider their labels and descriptions.
3.AI the problem of asset bubbles, AI must solve important and complex problems for users to be valuable, but its development results are far from comparable to those of technologies such as the Internet when they were first born
In a recent interview, Michael Eisenberg, a partner at Benchmark, quoted his friend Gavin Baker, founder of Atreides Management, on the development of large models: "Underlying models are the fastest depreciating asset in history. ”
The example he gave came from the founder of Seeking Alpha, a high-frequency field like the financial field, where business and data are updated every minute, and the trained model can only complete routine work such as writing reports, but cannot cope with the high-speed refresh of data to meet the needs of financial prediction in the future.
Moreover, the development of other technologies is deterministic, although the bubble was huge in the early stage of Internet development, it has already reflected the application path; And AI is full of uncertainty. The marginal cost of Internet development is almost equal to about zero (or many of them are shared with operators and users), but the marginal cost of AI growth involves a large number of fixed assets, which are now borne by entrepreneurs themselves, and the more they invest, the weaker the marginal improvement effect becomes. A lot of early investment, most likely a trap.
The technological revolution must be followed by the industrial revolution, and the industrial revolution needs the guidance of phenomenal products. What AI needs more is a successful scenario. At present, it seems that AI-generated video has not yet achieved such results.
Arin, the founder of Perplexity, provides another point of view, that is, the value essence of the basic model reflects the value of the team behind it, that is, Sora is to OpenAI, and Wenxin is to Baidu. It's not that Sora can kill the video, it's just that the outside world believes that Sora, led by OpenAI, has such a possibility. When Sora doesn't deliver the breakthrough we expected, who can take on the big role in this area?
From this point of view, the key may be who can truly integrate AI-generated video into the workflow of a commercial system first, just like Clapper's exploration of video production. And that's a bigger problem, because it involves integration with other fields – meteorology, cities, film and television, automobiles, manufacturing. Maybe Sora will come up with a more concrete result someday this year, or maybe it's other startups that have upended our perception of AI video.
This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com