Scale-Up vs Scale-Out Networking

Distinction between fast intra-rack accelerator communication and slower inter-rack or data-center communication in AI clusters.

Key points

In Pope’s explanation, scale-up networking connects GPUs inside a rack with high-bandwidth all-to-all connectivity, while scale-out networking connects racks through slower data-center fabrics ^[src-042].
MoE all-to-all traffic is well matched to scale-up networks because any GPU may need to send tokens to experts on any other GPU ^[src-042].
Crossing rack boundaries can bottleneck MoE traffic because a large share of tokens may need slower scale-out links ^[src-042].
Larger scale-up domains matter not only for capacity but also for effective memory bandwidth: more GPUs can read model weights in parallel during decode ^[src-042].
Physical rack constraints such as cabling density, bend radius, power, cooling, weight, and backplane design limit how large scale-up domains can become ^[src-042].
^[src-061] adds a training-scale reliability angle: once runs involve 10,000 to 100,000 GPUs, component failures are expected and cluster software must handle redundancy as a normal condition.

^[src-042] Dwarkesh Patel — “How GPT, Claude, and Gemini are actually trained and served – Reiner Pope” (2026-04-29)
^[src-061] Lex Fridman – “State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490” (2026-01-31)