Home About us

Meta Next-Gen MTIA: AI processors dedicated to recommendation inference

Zhineng Automobile 2024/09/02 08:04

Produced by Zhineng Zhixin

At Hot Chips 2024, Meta showcased its next-generation MTIA (Meta Training and Inference Accelerator), an AI processor designed for recommendation inference.

Meta's MTIA represents its continued investment in customized hardware to address the growing demand for recommendation engines, and the technical architecture, key features, use cases, and implications of this new accelerator for recommendation inference are significant.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Meta's recommendation system plays a central role in enhancing the user experience, driving content relevance and ad targeting.

As recommender systems grow in size and complexity, Meta faces multiple challenges with traditional GPUs, including cost, power, latency, and scalability.

To optimize the performance of recommendation engines and reduce total cost of ownership (TCO), Meta designed specialized MTIA chips to efficiently handle multiple services.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Part 1

MTIA's Technology Architecture and Key Innovations

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Meta's next-generation MTIA uses several cutting-edge technologies designed to optimize the efficiency and performance of recommended inference:

● Process & Power Consumption: Manufactured using TSMC's 5nm process, MTIA has a thermal design power (TDP) of 90W, significantly reducing power requirements and making it easier to manage in data centers.

At the same time, the processor uses 16 channels of LPDDR5 memory with a 128GB memory configuration, which provides strong support for efficient data processing.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

● RISC-V architecture vs. network-on-chip (NoC): Unlike common Arm architectures, MTIA uses RISC-V as the control core, an open and flexible architecture that allows Meta to deeply customize for recommendation inference.

In addition, the new generation of network-on-chip (NoC) is faster than its predecessor, improving data transmission efficiency.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

● Dynamic Quantization Engine and Hardware Decompression: MTIA has a built-in high-precision integer dynamic quantization engine and hardware decompression engine, which reduces storage and bandwidth consumption when processing large-scale data, thereby improving overall computing performance.

These features are critical to recommender engines, which often have to deal with large and complex data tables and weights.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

● Acceleration module and cluster architecture: Each acceleration module contains two MTIA chips, with a total power consumption of 220W for a single card, which can transmit data through the PCIe Gen5 x8 interface.

Each rack can accommodate 72 MTIA accelerator modules with a total power consumption of less than 16kW. This modular design significantly improves the scalability and flexibility of recommendation inference.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

Part 2

Meta MTIA 

application scenarios and performance advantages

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

The main use cases of MTIA processors are to support large-scale recommendation inference tasks within Meta, including social media content recommendation, ad delivery optimization, and personalized content presentation in metaverse interactions.

Compared to traditional GPU solutions, MTIA is deeply optimized for recommendation inference and can more efficiently handle the specific computational patterns of recommendation tasks.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

● Performance Improvement and Power Optimization: MTIA's architecture is designed to focus on reducing latency, increasing throughput, and achieving higher computational efficiency in recommended tasks.

The new MTIA delivers multiple times the performance boost over its predecessor while maintaining relatively low power and thermal requirements, making it more cost-effective for large-scale deployments.

● PCIe Shared Memory and System Integration: Meta uses the shared memory mechanism on PCIe, which not only simplifies the data transfer process, but also provides a more flexible way to access data for recommended inference tasks.

This architecture can significantly improve the system response speed and processing efficiency in high-concurrency computing tasks.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

The release of Meta MTIA demonstrates its leadership in custom hardware and represents an important direction for future AI accelerators, which are deeply optimized for specific use cases.

With the widespread adoption of recommender systems among major internet companies, the success of MTIA could spark further innovation in the field of AI accelerators by other tech giants.

Zhineng Auto, 2024 Hot Chips |Meta Next-generation MTIA: AI processor dedicated to recommendation inference

● Combination with RISC-V: MTIA adopts the RISC-V architecture, which not only enhances the flexibility of the processor, but also reduces the dependence on proprietary architecture, and promotes the development of the open source hardware ecosystem. This trend has the potential to further change the competitive landscape of the AI accelerator market in the future.

● Scaling Challenges and Energy Management: While MTIA has achieved significant performance gains through innovative architectures, large-scale AI clusters still face scaling and energy management challenges. As the demand for recommender systems continues to grow, how to optimize energy efficiency and reduce the operating costs of clusters will become a key issue for future development.

brief summary

Meta's next-generation MTIA provides new ideas for improving the performance of recommender systems and reducing operating costs by deeply optimizing recommendation inference tasks at the hardware level. Its innovative architecture design and adoption of RISC-V demonstrate Meta's foresight in the field of customized AI accelerators.

This article is from Xinzhi self-media and does not represent the views and positions of Business Xinzhi.If there is any suspicion of infringement, please contact the administrator of the Business News Platform.Contact: system@shangyexinzhi.com