Home About us

Three Paths to Embodied Intelligent Robots

Robot industry caprice 2024/07/01 17:56

Now, it seems that more and more robot start-ups are saying that they want to be "embodied", that is, to make embodied intelligent robots (let's put it so roughly) when PR. In a paper last year, "Analysis of the Historical Evolution and Industrial Chain of Humanoid Robots", I mentioned that the concept of "embodied intelligence" is not a new thing, it has appeared many years ago, and it is only because artificial intelligence (again) became popular last year that it was known by many people and became a hot spot for entrepreneurship.

How long does it take for embodied intelligence to actually land? This is a controversial topic. Some investors and academics have told me that it will take at least 5-10 years for a decent company to come out. Some industry observers believe that its success or failure depends on the landing of Tesla Optimus Prime.

In fact, no matter how long it takes for the embodied intelligent robot to land, its development path is currently three: pure software path, a general-purpose humanoid robot with "software and hardware combination", and a general solution in the subdivision scenario of "software and hardware combination".

The robot industry caprice, the three ways of embodied intelligent robots

Software-only path: A common robot model across hardware

At the heart of this path is the development of the Cross-Embodiment Foundation Model (CEF) for seamless compatibility across hardware platforms. The CEF model was designed to overcome the limitations common in traditional robotics development, which often require an independent software development process for each hardware platform, which leads to high time and cost investment, and makes it difficult to achieve rapid iteration of the technology.

In the traditional robot model, the three elements of perception, planning, and action interact around a common goal. Traditional robot models are often modular, with individual components relatively independent and requiring human integration and coordination. CEF proposes a new approach that blends perception, planning, and action into an end-to-end model. This new model borrows large language models (LLMs) like GPT-4 and applies them to the field of embodied intelligence, specifically robotics. In this way, the researchers sought to inject powerful language comprehension and generative abilities into the robotic system, allowing it to perform at higher levels of cognition and behavior.

To achieve this, such models may provide APIs that allow developers to easily invoke and integrate model capabilities into their bots. The purpose of this is to enable robots to better understand and respond to complex situations in the real world, while lowering the barrier to entry for developers and promoting technological innovation and application.

Technological superiority and innovation

The CEF model allows developers to deploy and run on multiple hardware platforms after a single write, whether it's a sophisticated humanoid robot, an efficient wheeled robot, or a flexible drone, all sharing the same software architecture. This cross-hardware commonality greatly simplifies software development and maintenance, reduces duplication of effort, and allows developers to focus more on developing innovative features rather than trivial hardware compatibility issues.

Cost-effectiveness and resource optimization

Ideally, the CEF model can significantly reduce the overall cost of a robotics project. By eliminating the need to develop separate software for each hardware platform, developers can save a lot of time and money, especially during prototype testing and mass production. In addition, the portability of the CEF model means that when new technologies emerge, robots can easily integrate them without having to replace the entire hardware system, maximizing the use of resources and achieving a long-term return on investment.

The robot industry caprice, the three ways of embodied intelligent robots

Innovation and iterative acceleration

Most excitingly, the CEF model provides an accelerator for innovation and iteration in robotics. Developers can continuously optimize and upgrade their software models to respond quickly to market needs and technological advancements without waiting for hardware to evolve. This software-driven innovation model enables robots to learn new skills faster, adapt to new environments, and ultimately push the entire robotics industry to a higher level.

However, while the software-only path has great potential, it is essentially "bright and long".

First, to train an effective embodied intelligence model, a large amount of high-quality data is required. Obtaining this data is never easy, especially when privacy protection and data security are involved.

Second, CEF as a whole is still in its early stages, and its Scaling Law has not yet been fully verified. This means that researchers need to further confirm the feasibility of this approach and how much capacity it can achieve.

Here are some explanations.

Scaling law refers to a general law in the field of artificial intelligence, which usually refers to the tendency of a model's performance to improve with the number of parameters. The Scaling Law here refers to whether the Cross-Embodiment model can exhibit better performance or capability as it scales.

In practice, to achieve Cross-Embodiment, the following issues may need to be addressed:

Reproducing Scaling Law: Researchers need to demonstrate that Cross-Embodiment models can consistently improve performance across different hardware platforms, as has been observed in other AI fields such as computer vision and natural language processing. This requires a lot of experimentation and data analysis to confirm whether a similar Scaling Law exists, and to determine where it can be applied.

Capability emergence: Another question is the extent to which capability emerges from the Cross-Embodiment model. Ability emergence, when a model exhibits behavior or capabilities that exceed expectations after being trained, may be due to complex interactions and nonlinear nature within the model. To evaluate the emergences of Cross-Embodiment models, extensive experiments and tests are required to determine the level of intelligence they exhibit in different tasks and environments.

A general-purpose humanoid robot that combines software and hardware

The second way of embodied intelligent robots focuses on the development of general-purpose humanoid robots that can adapt to multiple scenarios. These robots are highly autonomous and intelligent entities, and their core feature is that they have a powerful and flexible AI system capable of performing a range of complex tasks. This comprehensive capability will make general-purpose humanoid robots a part of human life, just like cars and washing machines.

This is actually the ultimate dream form of human beings for robots, so I won't say much about it, but only three points are proposed:

1. In addition to the difficulty of technology and engineering implementation, public opinion questions about general-purpose humanoid robots mainly focus on the necessity, that is, whether it is necessary to create humanoid robots. If non-human objects can help people accomplish certain tasks (e.g., hair dryers, floor scrubbers, toilets), why do they have to create human forms?

It's essentially what I call the "tool route" vs. the "ideal route" debate. For the reasons and potential necessity of developing humanoid robots, please refer to the humble work "10,000-word long text|The value logic of humanoid robots", which will not be repeated in this article. It should be acknowledged that it always makes sense to think about the design and manufacture of robots from the perspective of tools and practicality.

2. However, I always believe that robots, as essentially anthropomorphic agents/semi-agents, must be deeply coupled with human society in the future, and their significance is not only to help people complete specific tasks, but to reshape human society by extending the radius of human capabilities. Looking at a bot only in terms of tool attributes is short-sighted.

3. At present, the realization of general-purpose humanoid robots hopes to be on C-end robot products, although the only successful C-end robot is a sweeper. Whoever can choose a specific scenario from the complex and chaotic C-end needs and create a product that can be sold may have a little advantage.

The robot industry caprice, the three ways of embodied intelligent robots

A general solution for the subdivision scenario of "combination of software and hardware".

The third commercialization path of embodied intelligence focuses on industry-specific general robotics solutions, i.e., General purpose robotics in vertical domain. At the heart of this strategy is the design and manufacture of highly specialized, customized robots based on a deep understanding of and meeting the unique needs of a specific field or industry. Not only does this highly targeted robot precisely address industry-specific challenges, but it also delivers unprecedented efficiency gains and cost savings through its superior performance, resulting in significant commercial results.

Please note that compared to the purely tool-based robots mentioned earlier, these robots are embodied intelligent robots and general-purpose robots, but they are more focused on subdivided scenarios. There is a clear difference between the two.

Let's take two scenarios as examples.

Healthcare sector

At present, the application of robots in the medical field is still fragmented: delivery robots help deliver medical supplies, surgical robots help with surgeries, companion robots assist in patient care, and guide robots tell patients how to go to the department.

In the future, there may be a scenario in which a human-faced robot comforts a patient not to worry, then tells the patient's family what to do now, and then delivers the patient's medicine and information to the doctor's designated room. On the way, it was able to take the elevator by itself, and also helped a few people who saw a doctor to show the way. At night, it also cooperated with a surgical robot in the operating room to help doctors and nurses complete an operation.

However, if you take it to the supermarket without adjustment, it may only be able to use part of its capabilities.

Logistics industry

The logistics industry is another area that may benefit from general-purpose robotic solutions for market segments. At present, the application of automated robots in warehousing and distribution has significantly improved the efficiency and accuracy of the logistics chain. They are able to autonomously sort, handle, store and sort goods, significantly reducing human error and speeding up cargo turnover. For example, Amazon's Kiva robot system, through intelligent scheduling, realizes the rapid positioning and extraction of goods in the warehouse, greatly optimizes logistics operations, shortens the order processing cycle, and provides consumers with faster and more reliable delivery services.

In the future, robots in warehouses may be able to wear many hats: patrolling, picking up parcels, collecting some data on factory operations, etc.

In general, the implementation path of embodied intelligent robots is a process from pure software to hardware integration, and then to industry-specific applications. In this process, the maturity of the technology, the market demand, and the ability to apply across fields are key factors.

Among them, the difficulty of the basic general model (whether it serves which of the above three paths) is particularly worthy of attention in the field of embodied intelligent robots. Let's make some inductions based on what we have talked about above.

To put it bluntly, the underlying general-purpose model is like a smart brain capable of processing and understanding large amounts of information. However, when the model came out of the lab and into the real world, the first challenge it faced was how to adapt to a variety of different environments. The real-world environment is far more complex than the training data we provide to the model. That's the problem of generalization – the model needs to be able to understand and process new situations without seeing it. At the same time, real-world tasks and environments are changeable, and models need to be able to adapt quickly to these changes. This requires the model to be flexible and able to adapt its behavior to the new environment.

In addition, the underlying general-purpose model usually requires significant computing resources, such as high-performance GPUs. Not only does this increase costs, but it can also limit the use of the model in resource-constrained environments.

Then consider the interpretability of the model. In some cases, we need to know how the model makes decisions. For example, in medical diagnostics or self-driving cars, the model's decision-making process needs to be as transparent as possible so that we can trust it. Of course, this is an idealized appeal.

Compatibility issues also need to be considered when integrating the model into a robotic system. The model needs to work seamlessly with the robot's hardware and software, which can involve complex interface design and communication protocols.

The real-time performance of the model is also a key factor. In some applications, robots need to react quickly. If the model doesn't inference fast enough, it may not be able to meet these real-time requirements.

We must also consider the robustness of the model. In the real world, sensor data can be subject to various disturbances. The model needs to be able to handle these noises and anomalies and maintain a stable performance.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com