Reversible Networks
Neural-network architecture pattern where layers are made invertible so activations can be rematerialized during backpropagation instead of stored throughout the forward pass.
Key points
- Pope connects reversible neural networks to Feistel constructions from cryptography, where a non-invertible function can be wrapped into an invertible two-input transformation [src-042].
- In training, stored activations can dominate memory footprint because the backward pass needs them in reverse order [src-042].
- Reversible layers let the backward pass reconstruct forward activations on demand, trading additional compute for lower memory use [src-042].
- The trade-off is the inverse of KV-cache serving: KV cache spends memory to save compute, while reversible training spends compute to save memory [src-042].
- The discussion appears as a bridge between neural-network architecture and cryptographic mixing/differentiation ideas [src-042].
Related concepts
Source references
- [src-042] Dwarkesh Patel — “How GPT, Claude, and Gemini are actually trained and served – Reiner Pope” (2026-04-29)
Recommended next
Keep reading from this thread
From 494 indexed pages and articles.
- Wiki concept Dwarkesh Patel Interviewer and host of the Dwarkesh Patel podcast. In [src-042], he runs a blackboard-style technical interview with Reiner Pope on how frontier models are Related by 042
- Wiki concept KV Cache Stored key/value representations of prior tokens used during autoregressive transformer decoding so each new token can attend to the previous context without recomputing every Related by 042
- Insight AI Beyond POCs How enterprise AI moves beyond proofs of concept through ownership, governance, measurement, adoption, and production operating models Readers have engaged with this next