The power-efficient and scalable Distributed Disaggregated Chassis (DDC) design provides the high performance VoQ fabric needed for AI clusters
Accton Technology Contributes AI/ML-Optimized DDC 800G Designs to OCP
Media Contact
Lucille Lu,
Head of Marketing
Lucille_lu@edge-core.com
Accton Technology Corporation (Accton), a leading provider of networking and communications solutions, today announced innovative 800G-optimized products that deliver a fully Distributed Disaggregated Chassis (DDC) AI-optimized network solution. Accton’s new switch offerings form the basis for a DDC scheduled fabric architecture by incorporating the requisite Virtual Output Queue (VoQ) based fabric to deliver high-performance lossless Ethernet comparable in performance to those designs previously only seen with InfiniBand. Accton will contribute the power-efficient hardware design to the Open Compute Project later this year, with first samples on display in Accton’s booth #A4, at the OCP 2024 conference in San Jose.
“As a long-standing member of the OCP collaborative community, we are excited to be able to continuously share our product designs. The Broadcom StrataDNX™ Ramon3 and StrataDNX™ Jericho3-AI optimized switching systems will provide a high-performance VoQ-based architecture perfect for AI clusters, and can scale up to 32,000 GPUs at 800Gbps,” said Michael K.T. Lee, Sr. Vice President of Accton’s R&D Center. “The innovative Ramon3 networking solution offers not only cost savings, but also power efficiency, ease of DDC management, and lower overall operating costs. We are confident that the dual Ramon3 and Jericho3-AI DDC architecture currently offers the most effective solution for Ethernet-based AI clusters.”
Accton’s new OCP contribution is based upon the latest dual Ramon3 and Jericho3-AI chipsets which were designed specifically to deal with the complex flows on the network when GPU collective operations – particularly “all-to-all" and “all reduce” operations are underway . The Jericho3-AI based Network Cloud Packet Forwarder (NCP) supports a 14.4 Tbps full-duplex switching fabric and features 18x800G OSFP network interface ports and 20x800G OSFP fabric interface ports. The Ramon3 based Network Cloud Fabric (NCF) engine supports a 51.2 Tbps switching capacity. Together, the cell-based switching design eliminates the traditional Ethernet frame overhead and can effectively load balance all fabric links to build an efficient and high-availability DDC AI-optimized cluster.
With Accton’s 6RU and Broadcom’s dual Ramon3, 128x800G (dual-51.2 Tbps) fabric port switch, and 2RU Jericho3-AI, 18x800G (14.4 Tbps leaf) Ethernet network interface port switch, customers can build a 32K-GPU 800G/400G AI/ML cluster with a two-stage DDC network architecture enabling 400G GPU clusters now and the ability to migrate to 800G GPU clusters later with a simple software upgrade- without replacing the physical switches. This provides significant cost savings and investment protection, reducing capital spending while providing the flexibility for AI/ML builders to “pay as they grow” without large upfront costs.
“The DDC architecture is becoming synonymous with high-performance AI fabric. It is a proven solution for Ethernet-based high density computing clusters,” said Yossi Kikozashvili????, Head of Product, AI Infrastructure at DriveNets, “DriveNets’ VoQ-based scheduled fabric, together with Broadcom’s Ramon3 and Jericho3-AI-based switching systems, deliver a congestion-free, lossless network that enables industry-leading performance for AI clusters at scale.”
BRCM Quote
Oozie Parizer, senior director of marketing, Core Switching Group, Broadcom said, “The Scheduled AI Fabric, based on Ramon3 and Jericho3-AI, provides a significant performance boost for AI/ML workloads. It enables the deployment of lossless AI Infrastructure across campuses spanning hundreds of kilometers apart.”
Accton takes the hardware design further for the dual Ramon3 and Jericho3-AI system by optimizing the port mapping within the box, achieving traffic load balancing that accommodates extreme traffic conditions. The innovative design approach reduces the restrictions for network architects to balance traffic loads. Ramon3 and Jericho3-AI systems offer much lower system power consumption, reduced latency, minimized port-to-port timing skews, and high manufacturability with no flyover cables.
About Accton:
Accton Technology Corporation is a global premier provider of networking and communication solutions for top-tier networking, computer, and telecommunications vendors. Leveraging its advanced hardware engineering, software application, and system design capability, Accton collaborates with its strategic partners to architect, develop, and manufacture innovative, leading-edge network products. Accton’s evolving core technology, and its highly qualified global workforce enable it to deliver superior distributed virtual network solutions that are affordable and robust to a variety of market segments.
For more information about Accton and its subsidiaries, please visit www.accton.com.
About DriveNets:
DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks, streamlining network operations, increasing network performance at scale, and improving their economic model. DriveNets’ solutions – Network Cloud and Network Cloud-AI – adapt the architectural model of hyperscale cloud to telco-grade networking and support any network use case – from core-to-edge to AI networking – over a shared physical infrastructure of standard white-boxes, radically simplifying the network’s operations and offering telco-scale performance and reliability with hyperscale elasticity. DriveNets’ solutions are currently deployed in the world’s largest networks. Learn more at: www.drivenets.com.
View source version on businesswire.com: https://www.businesswire.com/news/home/20241015892348/en/
Add Comment