Home About us

Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning and NLP 2024/08/11 00:40

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?
Source | New Zhiyuan ID | AI-era





Recently, these "TED Speakers" have gone viral on the Internet.

Take a closer look, can you spot any problems?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

The answer is revealed - none of these five people are real people!

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

The little brother who is looking for someone online is going to cry

is so realistic, almost flawless, this level of raw picture AI directly made netizens' jaws drop.

Even AI recognition software can't recognize this as an AI-generated graph.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

"Doesn't it look real because it's a real photo?"

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

"None of them are real people? It's just creepy!"

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Netizens commented: This has surpassed the uncanny valley and reached the "hyper-real valley".

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

In just a dozen hours, the number of viewers of the post sharing this picture on Twitter has exceeded 5 million.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Subsequently, the author was also picked up - he is Leo Kadieff, a former member of the Stable Diffusion team.

He revealed: These TEDx speakers are all made with the latest Flux real version of LoRA.

In the past, the human eye would see a sense of disobedience, but this time the picture is so realistic, and it is precisely because LoRA technology has improved the model that greatly increases the sense of realism.

And, according to the authors, another benefit of this workflow is that it greatly simplifies complex prompts.

This news simply made the prompt bitter hands ecstatic.

This small 22MB file can save us the trouble of not having to write a bunch of authenticity-related tokens in each prompt.

The phrase "a RAW surrealist photo, UHD, 8k" is enough. Lovers of realism, absolutely love this tool.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

The author bluntly says: do we still need to fine-tune the reality model?

- These images are the original output of Flux+LoRA and have not undergone any upscaling or post-processing

- You'll need the corresponding RealismLora file and the ComfyUI workflow

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Lora:https://huggingface.co/XLabs-AI/flux-RealismLora/tree/main

ComfuUI:https://we.tl/t-zrC5tPFG17

The real version of LoRA, the effect is outstanding

It is not difficult to see from the following two pictures that the comparison between the effects of using LoRA and not using LoRA is really obvious.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Netizens have played hi

At the same time, Kyrannio, the sharer of "TED Speaker", also tried to replicate it with Midjourney.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

The initial prompt is as follows:

A woman speaking on stage, from Google, white background, corporate logo blurred, tech conference --style raw --v 6.1

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

As you can see, the generation is not bad, but there is still a big gap with the image generated by Leo Kadieff.

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

Then, the blogger made some improvements:

A young woman smiling and speaking on stage, from Google, white background, corporate logo blurred, tech conference --style raw --v 6.1

Deep Learning & NLP, Beyond the Uncanny Valley! 5 million netizens around the world were deceived, and none of the popular TEDx speakers were real people?

And after many generations, the closest result was tried:

At the same time, with the public availability of Google Imagen 3, netizens also took this set of prompts for the first time to try.

For a time, the whole network set off a craze for AI mapping.

Imagen 3 is available to everyone

That's right, as just mentioned, Google's strongest Wensheng graph model, Imagen 3, is officially available for use.

prompt:Photo of a man holding a sign that says: "Imagen Is Now Almost As Good As Midjourney" in New York City.


Source: Risphere

Netizen chrypnotoad said that he has never seen an AI that can do Achilles Shield so well!

Being able to easily hold such a complex prompt, Imagen 3 is really not to be underestimated.

The well-known blogger "Guizang" said after the experience:

The generated content is accurate, but the images are aesthetically pleasing. As long as the characters are involved, you have to carefully consider the way the prompts are written, otherwise there is a high probability that the picture will not be produced.

Fortunately, they do a good job of interacting with the prompt words:

The LLM analyzes the type of prompt word and gives you the relevant word that you can switch between directly.

Source: Collection

In addition to direct generation, Imagen 3 also supports partial redraw to edit images with brushes and hints.

Source: Collection

Of course, the PK of several top-notch Wenshengtu AI must also be indispensable: Midjourney V6 vs Imagen 3 vs FLU.1[pro].

Asian female with heterochromic pupils.

Native Americans.

South Asian woman with beauty mole.

Crazy artist.

Unfortunately, Google is probably too sensitive to generate this prompt......

Old man of the Caucasus with a mustache.

Runway also came to rub a wave, but...

Taking advantage of this popularity, Nicolas Neubert, creative director of Runway, also generated a video with his own Gen-3 Alpha.

Sure enough, after the AI picture is turned into a video, the effect is still good!

And this post also caused a sensation.

Netizens praised: From the miserable Will · Smith eating pasta a year and a half ago, to today's extent, the progress can be described as crazy.


At the same time, some sharp-eyed netizens found that there are still some subtle bugs in this video.

For example, the human tongue does not move, the teeth are somewhat crooked and flattened, the strange spot on the left arm appears in the fourth second, and the bug at the Google logo is also very obvious.



If you look closely, you can see that all the shadows are very unnatural, such as the shadows of the microphone. There are also places where things touch, and many lines are messy.

The movement of the lips is also unnatural.


The eyes still look soulless.

In general, compared with AI graphics, there are obviously many more bugs in AI videos at present.


The reason behind this is that AI simply doesn't understand what a human tongue, hair, or eyes are. In the next AI, you still have to learn human anatomy and physics.


Moreover, in the Wensheng diagram, Runway is much worse.

SD started a business with the original team, and it was Wang Bang as soon as it was shot

Speaking of FLUX.1, it actually caused a wave of buzz in early August.

Robin Rombach, a core member of Stable Diffusion and Stabililty AI, started a business and officially announced the establishment of Black Forest Labs.

The first product of the FLUX.1 series is a model that directly kills Midjourney, DALL-E and Stable Diffusion!

According to the official blog, FLUX.1 has achieved SOTA in terms of image detail, prompt word adherence, style diversity, and scene complexity.

In particular, the FLUX.1 [pro] has been tested and won the first place among other Wensheng graph models.

Visual quality, prompt word adherence, size/aspect ratio variation, typography, and output variety

ELO score

To strike a balance between accessibility and model capabilities, FLUX.1 is available in three variants: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell]:

- FLUX.1 [pro]: The top-of-the-line version of FLUX.1 that offers state-of-the-art image generation with best-in-class prompt word following capabilities, visual quality, image detail, and output variety.

- FLUX.1 [dev] is an open-weighted instructional distillation model for non-commercial applications. Since it is distilled directly from FLUX.1 [pro], FLUX.1 [dev] not only achieves strong quality and prompt word following ability, but is also more efficient than standard models of the same scale.

- FLUX.1 [schnell] is the fastest model designed for local development and personal use. (Schnell means fast in German.)

It is worth mentioning that all FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion Transformer blocks with a parameter size of 12 billion.

Among them, the team improved on the previous diffusion model by building flow matching, and improved model performance and hardware efficiency by combining rotary positional embeddings and parallel attention layers.

Team members

If you open the Black Forest Labs homepage, you can see that there are 15 members of the team.

The founder is none other than an old acquaintance, Robin Rombach.

Stability AI had acquired Robin's Latent Diffusion model and hired him as chief scientist.

On the Google Scholar website, Robin Rombach's paper "High-Resolution Image Synthesis with Latent Diffusion Models" has received more than 9,000 citations.

During this period, he led the Stable Diffusion series, one of the world's most downloaded and widely used open source models.

Address: https://arxiv.org/pdf/2112.10752

Andreas Blattmann, Patrick Esser, and Dominik Lorenz are all SD paper authors and new members of the Black Forest Labs startup team.

Except for Bjorn Ommer, it can be said that Robin took all the SD core veterans with him.

"Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation" is the last paper published by Robin before his departure.

Address: https://arxiv.org/abs/2403.12015

It is worth mentioning that in this paper, Andreas Blattmann, Tim Dockhorn, Axel Sauer, Frederic Boesel, Patrick Esser were also involved.

In addition, the new team's previous innovations include the creation of VQGANs and Latent Diffusion, SD models for image and video generation (SD XL, SVD), and Adversarial Diffusion Distillation for ultra-fast real-time image synthesis.


It seems that the speed of progress in AI image generation and video is still accelerating.

In another year, the AI pictures and videos we will be able to see will be amazing.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com