The robot dog of the goose factory appeared on the cover of the Nature sub-magazine: agile like a real dog, able to play orienteering
Tencent robot dog appeared on the cover of the Nature sub-magazine!
Under its control, the robot dog's movements are becoming more and more similar to those of dogs in the real world.
Note that the two robot dogs here play "orienteering", or the kind with chase.
In the game, two robot dogs play the role of a chaser and an escapee, and the escapee needs to reach the designated location without being caught.
Once it reaches its designated location, the two robot dogs will exchange identities and so on until one of them is caught.
One of the difficulties of this game is that there is a maximum speed limit, and neither robot dog can rely on speed alone to win, and a certain strategy must be planned.
There are even more difficult obstacle courses, with more intense battles and more exciting scenes.
Behind this robot off-road competition, it is this new control framework that is applied.
The framework takes a hierarchical approach and uses generative models to learn how animals move, with training data from a Labradoodle.
This approach allows the robot dog to no longer rely on physical models or hand-designed reward functions, and can understand and adapt to more environments and tasks like animals.
The robot dog, named MAX, weighs 14kg and has 3 movers on each leg that provides an average of 22 N· of continuous torque and a maximum of 30 N·.
One of the highlights of MAX is that it mimics dogs in the real world.
In the indoor environment, MAX broke free from the researcher and began to run freely.
If you put MAX outside, it can also run and play happily on the grass.
This imitation is even more vivid when encountering complex terrain with obstacles.
Up, MAX can climb stairs with agility and speed.
Downwards, it can also drill through obstacles, and the bar in front of it is not touched at all.
Behind this series of actions are the strategies that MAX's control system learns from the movements of a Labrador.
Using the imitation of a real dog, MAX can also plan more advanced strategies and accomplish more complex tasks, as shown in the chase shown above is a good example.
It is worth mentioning that in addition to having the two robot dogs compete against each other, the researchers also joined the fight through controller control.
It is not difficult to see from the picture that the robot dog in the real person control mode (No. 1 in the picture below) is not as flexible as the pure machine scheme (No. 2).
The end result was that in the case of a hook (a human-controlled robot dog with a higher maximum speed limit), the humans still lost completely to the machines with a score of 0:2.
In addition to allowing the robot dog to move flexibly, the biggest advantage of the framework is its versatility, which can be pre-trained and knowledge reused for different task scenarios and robot forms.
In the future, the team also plans to migrate the system to humanoid robots and multi-agent collaboration.
So, how did the researchers at Robotics X Lab come up with this solution?
The core idea behind the design of the control framework was to mimic the movements, perceptions, and strategies of real animals.
The framework enables robots to understand and adapt to their environments and tasks from a broader perspective like animals by building primitive, environmental, and policy-level knowledge that can be pre-trained, reusable, and scalable.
In terms of implementation, the framework also adopts a hierarchical control approach, with three levels - the original motion controller (PMC), the environmental adaptation controller (EPMC) and the policy controller (SEPMC) ——Correspondence with primitive, environmental, and strategic knowledge, respectively.
First, a human will issue a high-level command (such as telling the machine to chase the rules and objectives of the game), which is the only place where a human needs to be involved in the whole process.
This advanced command is received by SEPMC, which develops a strategy based on the current situation (e.g., robot character, opponent location, etc.), and then generates navigation commands that include information such as direction of movement, speed, etc.
Navigation commands are then passed to the EPMC, which then combines the contextual perception information (e.g., terrain heightmap, depth information, etc.) to select the appropriate motion mode to form a category distribution and select the appropriate discrete latent representation.
Finally, the PMC combines this potential representation with the current state of the robot (such as joint position, velocity, etc.) to obtain a motor control signal, which is finally delivered for execution.
The order of training is reversed – starting with the PMC and ending with the SEPMC.
The first stage of PMC training, also known as primitive-level training, is to build up the basic athletic ability.
Training data for this stage comes from the motion capture of a well-trained medium-sized Labradoodle.
By instructing the dog to complete various movements, the authors collected about half an hour of motion sequences of different gaits (such as walking, running, jumping, sitting, etc.), sampled at a frequency of 120 frames per second.
The dog follows different path trajectories such as straight lines, squares, and circles during the capture process. In addition, the authors specifically collected data on the movement of climbing and descending stairs for about 9 minutes.
To bridge the differences in skeletal structure between animals and robots, the authors used an inverse kinematics approach to redirect the dog's joint motion data to the robot joints.
Through further manual adjustment, the reference motion data compatible with the quadruped robot was finally obtained.
△ Data map does not represent the source of training data
Based on these data, the authors used the generative model VQ-VAE encoder to compress and represent the animal's motion patterns to construct the discrete latent space of PMC.
Through vector quantization techniques, these successive latent representations are discretized into predefined discrete embedding vectors, and the decoder generates specific motion control signals based on the selected discrete embeddings and the current robot state.
On the basis of VQ-VAE, the training goal of PMC is to minimize the deviation between the generated motion trajectory and the reference trajectory.
At the same time, the authors introduced a priority sampling mechanism to dynamically adjust the weights of different motion modes in training according to their difficulty, ensuring that the network fits all reference data well.
Through continuous iteration and optimization, PMC gradually learns a set of discrete representations that can effectively express complex animal movements until convergence.
The results of the PMC phase provide the basis for EPMC to generate higher-level motion control information.
The EPMC introduces an environment perception module on the basis of the PMC, which receives information from sensors such as vision and radar, so that the strategy network can dynamically adjust the motion mode according to the current environmental state.
The core of EPMC is a probabilistic generation network, which generates a probability distribution on the discrete latent space provided by the PMC based on the current perception information and command signals.
This distribution determines which primitive movement modes should be activated to best adapt to the current environment and task.
The training of EPMC is realized by minimizing the loss function of environmental adaptation and task completion, and gradually learns to optimize the motion strategy to improve the adaptability and robustness of the robot.
The final SEPMC training phase further improved the robot's cognitive and planning capabilities, enabling it to formulate and execute high-level strategies in a multi-agent interactive environment.
Based on EPMC, SEPMC generates high-level strategic decisions (such as chasing and dodging) based on the current game state (such as self and opponent positions, etc.) and historical interaction records.
The chase-style orienteering game played by the MAX robot is also the training method of SEPMC.
At this stage, the authors adopted the advanced multi-agent reinforcement learning algorithm PFSP to continuously improve the strategic level of the robot through self-game.
During the training process, the current strategy is constantly pitted against historically strong opponents, forcing them to learn more robust and efficient strategies.
Thanks to the solid foundation laid in the first two phases, the learning of this complex strategy is very efficient and can converge quickly even in the case of sparse rewards.
It is worth mentioning that in such a multi-agent solution, some agents that simulate humans can also be introduced, so as to realize the cooperation between machines or between humans and machines.
The above training process is completed in a simulated environment and then migrated to the real environment with zero shots.
In the simulation, the physical parameters can be freely controlled, and the authors randomize a large number of physical parameters (including loads, terrain changes, etc.), and the strategies obtained through reinforcement learning must be able to cope with these changes and obtain stable and general control capabilities.
In addition, the authors use LSTM for each layer of the control framework, so that each layer has a certain amount of time series memory and planning capabilities.
In terms of sensors, the authors have mainly verified that a series of complex tasks can be accomplished using the Motion Capture system or visual perception based on the Depth Camera alone.
In order to deal with a more open and complex environment, the authors will further integrate LiDAR, Audio and other perceptual inputs in the future to carry out multimodal understanding and better cope with the environment.
This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com