Cold Thinking in the Humanoid Robot Boom: Why Are Investors "Disheartened"?
In the past week, at the World Robot Conference, whether it is the number of robots or the popularity of the exhibition, humanoid robots can be called "C position". Compared to application-specific industrial robots, the design of robots such as "humanoids" is actually not efficient. Its biggest advantage is that it can be more "universal" in human society, that is, humans do not need to deliberately modify the environment in order to facilitate the operation of robots, and where human hands and legs can reach, humanoid robots can also be. The key to achieving "generic" is to have a strong enough generic model. Some investors believe that there is no real barrier to the hardware ontology to be a humanoid robot, and the software is the problem.
Author | Zhou Xiaoyan, Hao Boyang
Editorial Review | Time
Edit | Peninsula
Source | Tencent Technology
In the past week, we seem to be able to see robots all over the world at the World Robot Conference, with more than 600 related exhibits covering almost all industrial chains of robots.
There are also a dizzying variety of robot types, such as robot dogs that "jump up and down", robotic arms that "sway" in a uniform manner, and "food delivery experts" who can do it without spilling alcohol.
Whether it is the number of robots or the popularity of the exhibition, humanoid robots can be called "C position". According to the official, this is the conference with the largest number of humanoid robots, and almost half of the audience at the scene is concentrated in the booth of the humanoid robot company.
These humanoid robots are all different in height, short, fat, and thin: from the "small" Booster T1, which is only 110 cm, to the "strong", which is 185 cm. Each robot has different styles of robotic arm shapes, battery positions, face shapes, and even movement methods, but they are all performing "stunts" diligently.
In terms of martial arts, they can play Wing Chun, boxing, dance "seaweed dance", and even run on the ground with steel poles; Papers, they can write calligraphy, and they can also cook, do laundry, and fold clothes. At a number of booths, the robots seemed to have learned all their skills, waiting to enter the factory to work or go to the customer's home to serve the high attitude.
△ Stardust Smart Astribot S1 dances seaweed
△ Progressive power CL-1 uphill
However, an investor who has been paying attention to the humanoid robot track for a long time said to Tencent Technology after visiting the exhibition: "I don't want to invest in any of them."
They are neither useful enough nor able to close the gap at the moment.
For example, the main task of humanoid robots in industrial scenarios is to do picking and small-scale handling and moving, but the existing traditional automated robots already have a very mature solution, and it is of little significance to do humanoid robots. The main task of humanoid robots focusing on home scenes is cooking, folding clothes, and stir-frying, although the completion of each robot in this kind of scene is different, but investors said, "You can do it against the opponent, there is no insurmountable gap, it's just a matter of time."
According to incomplete statistics from Tencent Technology, a total of 28 humanoid robot companies participated in the conference, and most of the service scenarios of each company's products are concentrated in industrial or household scenarios in addition to scientific research.
△ Tencent Technology incomplete statistics: List of humanoid robot companies participating in the World Robot Conference|Sorted by initials
From the statistics of Tencent Technology, it can be seen that the mechanical performance of these robots is not small, such as degrees of freedom, peak torque and other indicators, there can be a difference of up to 5 times; In terms of movement speed alone, the fastest level can reach more than 7 km/h, while the slow one can only achieve 2.5 km/h. However, at the level of software foundation - large model, it is difficult for each company to open a big gap.
And this was supposed to be the biggest highlight of this year's humanoid robots.
This is because the design of robots such as "humanoids" is not very efficient compared to industrial robots for specific applications. Its biggest advantage is that it can be more "universal" in human society, that is, humans do not need to deliberately modify the environment in order to facilitate the operation of robots, and where human hands and legs can reach, humanoid robots can also be.
The key to achieving "generic" is to have a strong enough generic model.
For this investor, the limited software breakthrough of humanoid robots in the exhibition has made him feel aesthetic fatigue. "What can make my eyes shine now is probably a robot that really has the ability to generalize", for example, a home service robot that can mop the floor, can think of taking the initiative to make a bed in the room and do a job without the owner's command. Tencent Technology communicated with a number of investors who pay attention to the robot track, and they generally believe that there is no real barrier to the hardware ontology to be a humanoid robot, and the software is the problem, because it determines the generalization ability of the humanoid robot, and only with a strong generalization ability, the humanoid robot can work in a variety of task scenarios and be closer to the "universality".
But the road to AGI is a beautiful ideal, in addition to running to the distance, but also in combination with the current technical conditions step by step.
Although this year's humanoid robot track was a little disappointing for VCs, compared with previous years, we found that this track actually has some new changes worth paying attention to.
1. The "emoji" obsession of humanoid robots?
If a humanoid robot wants to achieve true emotional companionship in the future, its "face" and "expression" will become extremely important.
Hiroshi Ishiguro, a Japan roboticist and head of ATR's Hiroshi Ishiguro Special Research Office, believes that "as we come into contact with more and more robots, we may gradually accept lifelike robots and rely on them for our care and other needs in the future." ”
Wang Yuquan, founder of Hywin Capital, has a similar view, he once told Tencent Technology that robots do not need to have bipedals like humans, but they can have a "face" that can make expressions like people, and with this ability, robots can be better qualified for jobs that require communication with humans, such as welcoming and accompaniment.
There are two schools of thought regarding whether humanoid robots do "expressions": "abstract" and "bionic", the former advocates the use of abstract symbols to convey expressions to express emotional communication, and the latter advocates making faces infinitely close to real people, hoping that like humans, facial expressions can be driven by mobilizing the power of "muscles".
At WRC 2024, we observed that in addition to the mainstream "abstraction", more "bionics" began to enter the market, and this type of robot can do all kinds of "memes".
A typical representative is the domestic bionic robot company EX Group, which brought "Li Bai" and "Du Fu" to the World Robot Conference last year, and made "Su Shi" this year.
△
EX Group Bionic Robot "Su Shi"
In addition, compared with last year, this year there is another robot company "Digital Huaxia" that makes facial expressions, and its humanoid robot "Xia Lan" interacted with the audience on the spot:
△ Digital Huaxia Robot "Xia Lan"
In addition, there are many products that do not pay attention to how detailed the "expression" is, some do not even have a "face" at all, and some companies that do "face" choose a general "helmet" style.
If you pay close attention, you will find that whether it is the foreign Boston Dynamics electric Atlas, Musk's Optimus Gen2, Figure 01, or the new "Expedition A2" on the domestic Zhiyuan, the new "G1" announced by Unitree, and the "Walker S" that UBTECH will work in the factory, they all have a similar "steel" face.
△ First row from left to right Boston Dynamics electric Atlas, Elon Musk's Optimus Gen2
Figure 01;Second row from left to right: Zhiyuan "Expedition A2", Unitree G1, UBTECH Wlaker S)
Almost all of these humanoid robot face materials use black glass masks with LED trim, and Wang Xingxing, the founder of Yushu, said at the exchange meeting before WRC 2024, "I am very satisfied with the head design of the G1, and it will not change in a short time."
Perhaps one of the reasons is that the face itself is a screen on which any abstract symbol can appear, which is convenient for composing expressions and conveying emotions to humans.
For example, when Figure 01 or Figure 02 speaks, their faces will show OpenAI's iconic symbols, which is not a real expression, but it also makes the humans who communicate with it feel that "you are listening to me carefully".
△Figure02
In fact, if you look back at the appearance of the first version of the Boston Dynamics robot "hydraulic" Atlas, the "originator" of the humanoid robot, you will find that it does not even have a basic "face", let alone an expression, and its face looks a little "confused", using only a few slightly thick steel pipes and a device with two holes to build a simple Atlas face.
This may stem from a belief in Boston Dynamics founder Marc Raiber, who once said in an interview that "ability, dexterity, perception and intelligence are the key functions of robots, and nothing else matters." ”
△ Boston Dynamics Hydraulic Atlas
It wasn't until 2021 that a hydraulic Atlas dance video "Do You Love Me" became popular, and Marc Raiber began to recognize the importance of "bionics" for humanoid robots to communicate with human emotions, perhaps it is this relationship that the electric Atlas in 2024 has a "helmet-like" face.
These helmet-like faces play a role in both aesthetics and functionality, such as their main color is "high-grade black", which represents a full sense of technology, and the helmet-like design can reduce damage to sensors and cameras from the external environment, such as dust, collisions or other physical damage. What's more, they can circumvent the discomfort of the "uncanny valley effect".
The "helmet-like" abstraction is very popular, but the bionic school has also been researching how to make the robot's expression more human-like, and there are two main technical routes in this field: autonomous and remotely controlled. Autonomous robots are driven by machine learning and algorithms to generate facial expressions, while remotely operated robots rely on operator instructions to mimic their facial expressions.
For example, the Innovation Machine Lab at the University of Colombia's College of Engineering has developed a robot called Emo. Employing a self-supervised learning framework, the robot is able to predict human facial expressions and can even make predictions within 840 milliseconds before a person smiles, smiling in sync with humans.
△Paper address: https://www.science.org/doi/10.1126/scirobotics.adi4724
Earlier, in order to better practice robots imitating human expressions, some scholars developed the open-source robot Eva, and published a paper explaining the principle of robot expression drive.
△ The address of the paper https://www.sciencedirect.com/science/article/pii/S2468067220300262
Eva's head consists of four parts: the mask drive mechanism, the jaw, the eyes, and the neck, which are described in the paper, "in which the mask drive mechanism employs 12 MG90S servo motors, two 3D-printed servo sets to accommodate the servo motors, a custom silicone mask, a 3D-printed skull to support the mask, and a wire that passes through a Teflon Bowden tube." ”
△Eva's server set
The wires are passed through tubes and connected to various servo motors within the skull, "in order to produce facial expressions, a specific subset of the 12 servo motors needs to be activated, which pulls the wires and deforms the mask to simulate how the facial muscles deform the skin when they make expressions." To put it simply, the steel wire can be combined to drive multiple servo motors through the instructions issued by the staff to conduct force to the mask, thus forming the "expression" of the robot.
At WRC 2024, Li Boyang, CEO of EX Robot, told Tencent Technology, "The face of EX Robot integrates dozens of sets of degrees of freedom, and the expression is driven by a set of systems developed by EX. At the same time, a set of emotion models was developed to facilitate the robot to conduct sentiment analysis and map it to expressions. ”
It seems to be an obsession to make humanoid robots infinitely close to humans, and the active "biomimicry" at WRC 2024 is the manifestation of this obsession.
Second, the mass production and entry of robots into the factory are not data for part-time jobs
At this year's robot conference, not only the number of humanoid robots has increased significantly, but even the first generation of products have directly announced mass production and can enter the factory to work. They trotted all the way to keep up with their predecessors, the Tesla Optimus.
Zhiyuan Robot, founded by Zhihuijun, is expected to ship 300 units in 2024, and its bipedal humanoid robot will be mass-produced from October, with an annual shipment of 200 units. UniX AI's wheeled humanoid robot Wanda is also scheduled to be mass-produced on a small scale in 9, and is expected to produce about 100 units within the year. At the site, UBTECH even directly built a factory scene to show the whole process of their latest humanoid robot entering the factory, including screening vehicle conditions, sorting and picking products, and handling work. According to their staff, UBTECH has cooperated with automobile companies and really started to operate in the factory.
△UBTECH humanoid robot on the automobile production line
But when it comes to the robot's performance, the relevant staff admit that it can only reach 20%-30% of human efficiency at present, and the battery life is only two hours. Short battery life is also a common problem for humanoid robots in the industry.
Is the mass production and entry of robots at such a level really for industrialization? Not.
This needs to be talked about again, the "generalization" ability of robots mentioned by investors above, which requires a large amount of data.
How much data is needed?
For example, on the platform of UniX-AI, they equipped a large model of the Wanda robot to demonstrate a variety of tasks in one go, including washing clothes, folding clothes, making hamburgers, 3D cleaning, and so on. One of the most impressive aspects of this process is the scene where Wanda does its laundry, first receiving a human command to do so, and then automatically finding its way to find the dirty laundry and putting it in the washing machine. This seems to be quite capable of completing the whole process task independently.
△UniX-AI's Wanda robot demonstrates performing laundry tasks in the exhibition hall
But this generalization is limited.
Yang Fengyu, founder of UniX-AI, told Tencent Technology that task-level generalization such as laundry is what can be achieved with current data and training. However, it will take some time to actively identify and complete the generalization function of one model to handle multiple tasks.
Throughout the Robotics Conference, we were able to see a variety of robots that could perform split tasks. Like UBTECH's WalkerS, which sorts items, Stardust Intelligence can write a beautiful calligraphy S1. However, there are basically no robots that can truly realize generalized display between multiple tasks.
△ Stardust Intelligence's S1 robot is writing
They are doing all kinds of very different and very limited work in the fixed booth, and even the process seems to be very stylized. This time makes people in a trance, as if returning to the era of programming robots before the arrival of large models.
In an interview during the Robotics Conference, Professor Wolfram Burgard, a participant in the RT-X project, argued that the current way of training basic models has a huge problem with energy efficiency - it requires too much computing power and data to reach the threshold of generalization.
He gave the example of how in the RT-X dataset project, although they collected more than 1 million fragments, covering more than 500 skills of the robot and 160,000 specific tasks, RT-2 may not be able to perform the task correctly at all when the table height is slightly different.
△Example of data in the RT-X dataset
This means that we may still be short of at least half the Internet's data from the moment of truly generalized embodied intelligence ChatGPT.
Therefore, in this competition to achieve "generalization", enterprises that can obtain data in batches first can take the lead. Therefore, obtaining effective data is the hottest battlefield for many robot companies under the stage.
At the press conference of Zhiyuan Robot, Zhihuijun announced Zhiyuan's data collection plan. They expect to build a sampling plant with about 100 robots by the end of September, corresponding to 150 workers, and then enter the mass production stage of data, with the goal of 1,000 pieces of data per day for one worker, and the current 600 pieces/day. That's already 1/3 of the number of robots they expect to "mass-produce". Of course, the investment comes with a return, and according to the data they give, this data factory can collect the same magnitude as the RT-X dataset in 10 days.
△ The data factory situation displayed by Zhiyuan at the robot conference
UniX-AI and Stardust Intelligence, which are the rising stars of embodied intelligence, also emphasize their investment in data collection. Yang Fengyu, the founder of UniX-AI, mentioned that they have used the data obtained by "new collection methods" such as simulation training in the virtual environment and video collection and analysis in the training of robots.
However, according to Zhiyuan, the data collected by these real machines is very expensive at present. Even if large-scale production is adopted, the cost needs 0.4 yuan per piece. Even if the simulation data is acquired in a simulated environment, it requires 60%-70% of the cost of real data.
So how can data be collected better and cheaper? Working in a factory may be an option that is beneficial to both parties. The robot can obtain a real-world scenario that collects data related to practical work, and the associated labor costs may be saved; Relevant enterprises can gain industry experience in intelligent manufacturing exploration, and they can have an additional publicity caliber.
The real enterprises that have digested a considerable part of the "mass-produced" humanoid robots are just another data factory for humanoid robots at this stage.
Third, humanoid robots also take the "people-friendly" route, where is not "demolished".
"Mass production" has always been an industrial problem for humanoid robots, mainly because the specifications of key components are not uniform, the parameter requirements are uneven, and it is difficult to form standardization. Wang Xingxing, the founder of Unitree, also told Tencent Technology before WRC 2024, "Each company has different ideas for doing embodied intelligence, such as how the sensing data of the robot should be collected, whether the tactile sensor should be used, and how many fingers the end effector should have, all of which are not uniform."
Although the route of the industry is still being explored, in fact, many companies are using "modular" thinking to make humanoid robots, that is, humanoid robots are like a "big toy", arms, manipulators, and chassis can be disassembled and installed, and at this year's WRC 2024, the path of modular design robots is directly presented, "The degree of standardization of software and hardware on the humanoid robot track is low, and the modularization of parts and components is actually the initial attempt of some companies to do standardization", one participated in WRC Industry insiders in 2024 said to Tencent Technology.
The robot company tries to modularize the main parts, focusing on the upper arm, dexterous hands, and feet, such as the Xingdong Era Star1 robot can be detached and replaced with a chassis, and the bottom of the Star1 can be replaced with both feet or wheels, "If necessary, you can also use the chassis and only keep the upper body", said the Xingdong Era staff.
△ Star1
The "Lingxi X1" robot of Zhiyuan Robot is the main open source, and the core components such as motors and grippers can be disassembled and assembled.
△ Zhiyuan "Lingxi" series
However, the replacement of the end effector involves the body's ability to control different types of grippers, for example, the body that can easily operate the two-finger gripper and three-finger gripper may not be able to control the five-finger dexterous hand, "the control ability they involve is not a level".
Although after WRC 2024, many people are skeptical and disappointed with the application of humanoid robots, and even traditional robotic arms are not comparable to them in actual scenarios, and even feel that "people are not as good as dogs", but the progress of technology is gradual, and the generalization and intelligence of robots are not achieved overnight, and many "intermediate" product types may emerge in the middle.
In the development process of these "intermediate" products, there may be some situations that deviate from the normal growth trajectory, such as letting immature humanoid robots enter the factory to "work", just like asking a toddler to go to the 100-meter sprint, which seems a bit "seedling" and may even be ugly.
But the humanoid robot needs to be "pulled out to yo-yo", and it can only truly serve humans if it walks into humans and perceives the world.
This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com