Your Neural Network
Might Be Sealing Shut
A new open-source metric detects when AI training dynamics go rigid — before it shows up in the loss curve
Loss curves lie.
A neural network can achieve perfect accuracy on your benchmark while its internal learning dynamics are locked into a rigid, brittle regime — one that may fail when it encounters something new. The loss curve says everything is fine. The weight dynamics say the system is sealing shut.
wcdfa (weight-change detrended fluctuation analysis) is a new open-source Python package that measures what loss curves can't: the temporal structure of how a network modifies its own weights during training.
pip install wcdfa
Three lines in your training loop:
from wcdfa import WeightChangeDFA
monitor = WeightChangeDFA(window=500)
for epoch in range(num_epochs):
train(model, optimizer)
monitor.update(model)
if monitor.ready:
print(f"α = {monitor.alpha:.3f} ({monitor.regime})")
The metric returns a single number — α — computed via detrended fluctuation analysis on the time series of weight-update magnitudes (‖ΔW‖). That number tells you the dynamical regime of your network's self-modification process.
Why this matters
It measures something different from loss, gradient norms, or learning rate. A network with a flat loss curve and healthy gradient norms can still have α = 2.0 — deep in the ordered phase, dynamically rigid, sealed. Weight-change DFA detects what performance metrics miss.
The grokking evidence. Applied to networks that exhibit grokking (delayed generalization), weight-change DFA reveals something nobody had seen before: generalization and dynamical criticality are two separate phase transitions, separated by approximately 10,000–12,000 training epochs. The network solves the problem long before its learning dynamics become healthy. Two transitions, not one.
The asymmetric sensitivity finding. The first 2% of corrective perturbation accounts for 27% of the total effect on training dynamics. The most dangerous configuration for a self-modifying system is not insufficient correction — it's zero correction. Any correction is dramatically better than none.
AI safety implications. Value lock-in — a model becoming so committed to its learned objective that corrective signals can no longer reach it — has the same dynamical signature as the ordered phase: α drifting upward, the basin deepening until departure becomes impossible. The framework predicts that weight-change DFA drifting above ~1.2 could serve as a detectable precursor to uncorrigibility. This has not yet been tested on alignment benchmarks.
The most dangerous configuration for a self-modifying system is not insufficient correction — it's zero correction.
What's under the hood
At each training step, the package computes ‖ΔW‖ — the Frobenius norm of the weight update across all parameter tensors. Over a rolling window, it applies DFA to this time series. DFA fits the scaling relationship between fluctuation magnitude and observation timescale. The slope of the log-log fit is α.
The implementation is validated against nolds, an established DFA reference package, with mean deviation of 0.018 across 50 trials. White noise gives α ≈ 0.5. Brownian motion gives α ≈ 1.5. Pink (1/f) noise gives α ≈ 1.0. The math is correct.
Features
Framework-agnostic — pass pre-computed ‖ΔW‖ for JAX, TensorFlow, or anything else
Rolling monitor — epoch-level tracking
One-line plotting —
monitor.plot()Weights & Biases logging —
monitor.log_wandb(step)R² goodness-of-fit — confidence signal for the scaling estimate
39-test suite — validated against reference implementation
Four failure modes
The metric identifies two failure modes directly and frames two more that require additional measurement:
| Mode | α | What's happening |
|---|---|---|
| Sealed | > 1.5 | Deepening without widening. Catastrophic forgetting, value lock-in. |
| Dissolved | < 0.8 | Widening without deepening. Training instability, no convergence. |
| Captured | ≈ 1.0 | Healthy dynamics aimed at wrong target. Reward hacking. |
| Against self | ≈ 1.0 | Healthy dynamics aimed inward. Adversarial vulnerability. |
The captured and against-self modes look healthy on α alone — detection requires measuring the coupling between the learned attractor and its intended objective. Weight-change DFA is necessary but not sufficient for full alignment monitoring.
Background
Weight-change DFA was developed as part of a research program on constitutive gap dependence — the hypothesis that self-modifying systems operating near dynamical criticality must periodically leave their operating regime and return to maintain balanced flexibility. Proof-of-concept simulations support the claim: 95.5% of the effect on training dynamics persists between perturbation events, not just during them. The gap changes how the system rewires itself in between corrections, not just during corrections.
Kogura, J. S. (2026). Grokking Precedes Criticality: Weight-Change DFA Reveals a Delayed Phase Transition in Generalizing Networks.
Kogura, J. S. (2026). Constitutive Gap Dependence: A Temporal Mechanism for Criticality Maintenance in Self-Modifying Systems. Submitted to J. R. Soc. Interface.
Kogura, J. S. (2026). The Arriving Breath: A Philosophical Conspiracy — The Temporal Ground of Caring. ISBN 979-8-9954717-0-7.
Where this is
The metric works. The DFA math is validated against a reference implementation. The calibration targets are hit. The grokking finding (two transitions, not one) replicates across seeds. The 95.5% clean-step retention is real. The package installs, runs, and does what the README says it does.
But the simulations are proof-of-concept — small recurrent networks (N = 100–150) with Hebbian learning, and one grokking experiment on a two-layer MLP. The central open question is whether the therapeutic window and the diagnostic thresholds translate to gradient-trained transformer architectures at production scale. That hasn't been tested yet.
Where we're trying to go
Validate at scale. The most important next step is running weight-change DFA on standard architectures — ResNets, transformers — with standard benchmarks. If the metric works on the models people actually train, everything else follows. If it doesn't, we'll say so.
Real-time training diagnostic. We want α to become something ML engineers check the way they check loss curves — a standard part of the training dashboard. The Weights & Biases integration is a first step.
Early warning for corrigibility loss. If α drifting upward reliably precedes a model becoming unresponsive to corrective feedback, that's a safety tool the field doesn't currently have. Testing this against alignment benchmarks is a priority.
Understand the grokking gap. The 10,000–12,000 epoch delay between generalization and dynamical criticality is unexplained. Why does a network need that much additional training to reorganize its weight dynamics after it's already solved the task? Understanding this could change how the field thinks about when training is actually done.
Connect to biological criticality. The framework predicts that sleep, diastole, immune cycling, and ecological disturbance are instances of the same temporal mechanism operating at different scales — constitutive gap dependence maintaining criticality in self-modifying systems. The heartbeat evoked potential (HEP) direction test proposed in the companion papers is executable against existing datasets.
Independent replication. Multiple experimental protocols are proposed in the companion papers. We're hoping other researchers pick them up. The code is open, the method is documented, the examples are runnable. The findings either replicate or they don't.
This is early-stage work from an independent researcher. It might be important. It might not. The way to find out is to test it, break it, and see what holds.
Install
MIT license. Use it, build on it, cite it.