Your AI Model Crashed Mid-Training? It Might Be Manifold Tearing
MoE models frequently crash mid-training without warning. DeepSeek V4 identified the cause from a differential geometry perspective: manifold tearing. Three stabilization techniques make training predictable.