Home About us

Oral video, Chinese manufacturers enter the war

Zhang Shule 2024/06/24 09:28

Zhang Shule is a columnist for People's Daily Online and People's Post and Telegraph

Can the video also be dictated?

This one is on the run.

After the release of OpenAI's Wensheng video model Sora, domestic companies rushed to enter the game, and the domestic Wensheng video model entered the acceleration stage.

Zhang Shule, Oral video, Chinese manufacturers enter the war

Over the past six months, AI-generated video has been on and off display.

Vidu, which is known as the first self-developed video model in China, and the subsequent video generation models launched by many domestic manufacturers such as Byte and Tencent, have attracted the attention of the outside world from time to time.

Recently, another domestic video model has joined the battle, and the official website of Kuaishou's "Keling" video generation model has been officially launched.

Zhang Shule, Oral video, Chinese manufacturers enter the war

On the 21st, the Kuaishou Keling large model released a blockbuster update: the image video function was officially opened, which supports converting static images into 5-second videos, and users can control the movement of objects in the image through the prompt text; At the same time, the video rewriting function is launched, which supports one-click rewriting and multiple consecutive rewriting of generated videos, and can generate a video of about 3 minutes at most.

Compared with the previous video models released by various companies, which are mainly display videos, the Keling large model unveiled this time not only has the effect of benchmarking Sora, but also has been opened for testing experience on Kuaishou's Kuaiying App.

According to Kuaishou, the Keling large model is self-developed by the Kuaishou AI team, using a similar technical route to Sora, combined with a number of self-developed technological innovations, and the video resolution it generates is up to 1080p, the duration can reach up to 2 minutes (frame rate 30fps), and it supports a free aspect ratio.

In addition, it is also officially claimed that the large model of Kelin can generate a large amount of reasonable motion and make it conform to the objective laws of motion.

Zhang Shule, Oral video, Chinese manufacturers enter the war

In the official video example, an astronaut is running on the moon, and as the camera slowly rises, the astronaut's gait and shadow can remain reasonable and appropriate.

At about the same time, Meitu announced that it would launch a new product MOKI at the end of July, which can help users generate AI short videos based on the video generation capabilities of Meitu's large model.

However, there is also a view that compared with the large language model that rushes to the top, the video model is slower to heat up, and there is less of a giant.

Why is this happening?

Aren't the big manufacturers interested?

At the same time, in the last round of large language model competition, Kuaishou and Meitu had a low sense of presence.

In the video large-scale model track, what are the biggest advantages of these two companies?

In this regard, Beijing Business Daily reporter Wei Wei and Shule had an exchange, this monkey thought:

The big factories that are still sprinting for the "college entrance examination" will not directly attack the "postdoctoral".

Making a video is not a bunch of pictures to form a PPT, the big factory is not in a hurry to make this piece of force, and the practicability is not strong, it is just a muscle display.

Zhang Shule, Oral video, Chinese manufacturers enter the war

After all, video generation isn't about connecting a bunch of AI drawings together to turn them into cartoons.

In addition to considering more details such as consistent image, conforming to description, light and shadow segmentation, and storyboard performance, there is also the ability to understand and recreate the plot.

All of these require deep learning in multiple vertical fields such as video structure, content analysis, shooting techniques, and narrative techniques.

The difficulty is far from being done by data accumulation and user error correction such as chatting, drawing, or specializing in chess.

Even masters in the field of film and television often fail, and the difficulty of making artificial intelligence films that are still in the "college entrance examination stage" can be imagined.

Zhang Shule, Oral video, Chinese manufacturers enter the war

But Kuaishou and Meitu need to show their muscles, even if it's just a show.

Whether it's Kuaishou or Meitu, the biggest advantage in the video model track is that they have a wealth of "learning materials" for AI deep learning.

Relying on these "learning materials", certain copyright issues can be circumvented, and through years of content accumulation, vertical segmentation and labeling in the video field, the large model can better "retrieve" knowledge, and it also has a certain degree of video professionalism in algorithm design.

But that's all, there is still a lack of original accumulation in artificial intelligence algorithms in technology.

In addition, even if the video model is mature, it is difficult to make a big breakthrough in the field of film and television.

Whether it's a short drama, an advertisement, a long video or a movie, although it will roll up "blockbuster special effects".

Zhang Shule, Oral video, Chinese manufacturers enter the war

But in the end, the audience is attracted to the content (from the screenwriter to the camera movement, and the actors' acting skills).

These are the keys to large-scale business monetization.

It is foolish to think that video models may be easier to find some business opportunities in the field of animation.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com