Home About us

"Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Qubits 2024/08/16 13:24
Hengyu from the Qubit of Concave Fei Temple
| Official account QbitAI

The fastest conversational video AI in history is here, with a delay of less than a second!

End-to-end, you can listen, see, speak, and have an image.


The product is not the product of companies like OpenAI or HeyGen that have already made a big splash, and it doesn't have a specific name.

Because it comes from the startup team Tavus, it is also known as Conversational Replicas by Tavus.

The main function is to build an immersive AI-generated video experience.

After being launched today, it has rushed to the first place on Producthunt's hot list of today's new products, and the number of likes is still rising.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast


I can see that netizens are enthusiastic:

Okay, now there is a "person" to start a ZOOM video conference for me hahahaha!

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

There are also many netizens who regard this as a better human-computer interaction interface than reading documents or chatting.

This conversational video interface is a game changer!
I can already imagine the endless possibilities of immersive experiences.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

It can be played for 2 minutes on the web

Seeing this message, the qubits rushed to the official website of Tavus in a second.

On the official website, you can experience the "fastest conversation video in history" for 2 minutes online.

According to the existing premise, the dialogue object during the experience is Carter, created by Tavus.

Carter is positioned as an employee of Tavus, an AI video research company, who responds with humor and is helpful at the same time.

This is the following man:

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Although Carter is a virtual person, the video with him is like a video with his own friends.

The official recommendation is to try to stay in a quiet room when chatting with Carter after authorizing the camera and microphone.

The following is the online screen recording of netizens' online trial play:


Carter mentioned in the conversation that the most popular topics to discuss with him are, in addition to asking him about the AI technology used by Tavus, sharing his daily journey, and telling jokes.

He told a joke on the spot:

Ask, why can't the bike stand there on its own?
The answer is because it's too tired (Two tires).

After speaking, Carter himself cheered himself on, haha twice.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

The qubits were also actually experienced for 2 minutes, and the overall feeling is as follows:

First of all, Tavus's response speed is indeed very fast, in line with the official claim of "under a second".

Even if you suddenly speak up while he's talking, Carter can immediately stop and listen to your latest speech.

Second, although the official claims that it supports more than 30 languages, he cannot speak Chinese whether he asks questions in Chinese or English.

When we asked him "Can u speak Chinese", Carter would reply, "I'd rather have a conversation in English!" ”

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Third, Tavus' AI can indeed "see with the eyes".

During the qubit trial process, I was embarrassed for a while, and I didn't know what was good, so I could only smirk.

Carter immediately spoke:

Oh! You gave me a smile~

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Fourth, in the demo version, Carter's mouth and words are almost exactly in sync.

It is no wonder why some netizens said after trying the game:

Truly impressive, it boasts fast responses, excellent video and audio generation.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Now, you can use Tavus's conversational video AI as long as you sign up.

In the official version, Carter is not the only AI image that can be used for dialogue, there are men and women, and there are everything from sales to life coaching.

The background of the chat can also be changed according to the user's choice, regardless of the office scene.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

At the same time, you can also manually enter the context of the conversation.

It can be said that the degree of personalization is very high.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

At present, there is a free version and a paid version, corresponding to different generation benefits.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Based on self-developed model development

Behind the Tavus dialogue video AI is the Phoenix-2 model developed by the Tavus team.

It's a combination of audio and text-driven 3D models and 2D GANs that produce 1-2 minute photorealistic short videos.

The build process is broadly broken down into the following four steps:

TTS (Text-to-Speech) – 3D reconstruction of the head and shoulders – Prompt Word Script-driven facial animation – High-fidelity rendering.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

△ Fine-tune facial geometric details with differential rendering

In order to make the AI image of the dialogue with the user more realistic, the Tavus team combined GAN and 3D Gaussian splash when building the video rendering pipeline of Phoenix-2.

The reason for this is that traditional GANs are usually limited by image resolution, while volumetric models are always lacking in temporal consistency.

So, Tavus thought of combining the two.

When training GAN, it requires a large dataset and expensive computing resources, and due to its two-dimensional nature and time consistency problems, the inference time and video quality are usually limited.

Tavus uses the 3D model as an "intermediary" to achieve renderings at over 100 FPS and achieve a higher degree of controllability and versatility due to the physically aware constraints around dynamic objects.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

△ Compare the differences between 2D and 3D head speaking models

In addition, the improvement of the Phoenix-2 model over its predecessors in the series is that it replaces the NeRF of the original Phoenix model.

Instead, use 3D Gaussian Splash to learn how to drive dynamic facial deformation in 3D space, and use that information to render the view based on invisible audio.

Team members said that compared to NeRF, 3D Gaussian splash performed better in terms of data, memory, computational complexity, flow, rendering efficiency, etc.

The pipeline based on the 3D Gaussian splash Phoenix-2 model can be trained up to 70% faster than the original model, rendering at 60+ FPS.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

Tavus said that during conversations, there are end-of-turn detection and interruptibility to make the conversation feel more real.

In addition, because facial information is very sensitive, the team provides security checks, security protocols, automatic content moderation, and anti-illusion checks to keep the information safe.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

It is worth mentioning that the Phoenix series model also supports another product of Tavus -

Generate a conversation video of the user's digital twin avatar.

You only need to provide 2 minutes of footage and spend $1 (onwards) to call the API to generate video content.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast


"If you don't < 1s, you're not human"

The Tavus team is a four-year-old AI video startup that is not large.

Most of the members are from Amazon, Descript, Google, and Apple, among others.

According to public information, as of March this year, the company has received Series A investments from Sequoia, Scale VC and YC, with a financing amount of about $18 million.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

The co-founder and CEO of Tavus is named Hassaan Raza.

Worked at Google and Apple.

Qubits, "Her" has an image! Make a video call to the AI, with almost no delay, Sequoia YC cast

The company's co-founder and COO commented on Producthunt that the production of conversational video AI took a long time, and it took about thousands of hours to research, engineer and build.

As for why go for a latency of 1 second or less?

The official answer is also given, which is to simulate the video conversation between humans and humans as much as possible:

Because if the reaction speed is not less than 1 second, then (the person on the other side of you) is not human.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com