Your Neural Network Might Be Sealing Shut — wcdfa

Intro

Why it matters

Under the hood

Features

Failure modes

Background

Where this is

Where we're going

Install

0% read

Your Neural Network
Might Be Sealing Shut

A new open-source metric detects when AI training dynamics go rigid — before it shows up in the loss curve

Jimi Sadaki Kogura · April 2026

Loss curves lie.

A neural network can achieve perfect accuracy on your benchmark while its internal learning dynamics are locked into a rigid, brittle regime — one that may fail when it encounters something new. The loss curve says everything is fine. The weight dynamics say the system is sealing shut.

wcdfa (weight-change detrended fluctuation analysis) is a new open-source Python package that measures what loss curves can't: the temporal structure of how a network modifies its own weights during training.

pip install wcdfa

Three lines in your training loop:

from wcdfa import WeightChangeDFA

monitor = WeightChangeDFA(window=500)

for epoch in range(num_epochs):
    train(model, optimizer)
    monitor.update(model)

    if monitor.ready:
        print(f"α = {monitor.alpha:.3f} ({monitor.regime})")

The metric returns a single number — α — computed via detrended fluctuation analysis on the time series of weight-update magnitudes (‖ΔW‖). That number tells you the dynamical regime of your network's self-modification process.

In plain English

When an AI learns, it rewires itself. This tool watches how it's rewiring — not whether it's getting the right answers, but whether the rewiring process itself is healthy. It gives you one number. If that number is around 1.0, the AI is learning in a balanced way. If the number climbs above 1.2, the AI is getting rigid — like a person who stops listening. If it drops below 0.8, the AI is all over the place — like a person who can't focus. You add three lines of code to your existing training program and it just runs in the background, watching.

The three dynamical regimes detected by weight-change DFA. Thresholds at α = 0.8 and α = 1.2 from Kogura (2026).

Why this matters

It measures something different from loss, gradient norms, or learning rate. A network with a flat loss curve and healthy gradient norms can still have α = 2.0 — deep in the ordered phase, dynamically rigid, sealed. Weight-change DFA detects what performance metrics miss.

The grokking evidence. Applied to networks that exhibit grokking (delayed generalization), weight-change DFA reveals something nobody had seen before: generalization and dynamical criticality are two separate phase transitions, separated by approximately 10,000–12,000 training epochs. The network solves the problem long before its learning dynamics become healthy. Two transitions, not one.

Two transitions, not one. Generalization (orange) precedes dynamical criticality (purple) by ~12,000 epochs. Schematic based on Kogura (2026); curves illustrate the pattern, not actual data points.

The asymmetric sensitivity finding. The first 2% of corrective perturbation accounts for 27% of the total effect on training dynamics. The most dangerous configuration for a self-modifying system is not insufficient correction — it's zero correction. Any correction is dramatically better than none.

27%

of total effect at 2% correction

64%

of total effect at 15% correction

78%

of total effect at 40% correction

Three measured data points (dots) from Kogura (2026). Dashed line is interpolated. The steep initial rise means the first increment of correction has an outsized effect.

AI safety implications. Value lock-in — a model becoming so committed to its learned objective that corrective signals can no longer reach it — has the same dynamical signature as the ordered phase: α drifting upward, the basin deepening until departure becomes impossible. The framework predicts that weight-change DFA drifting above ~1.2 could serve as a detectable precursor to uncorrigibility. This has not yet been tested on alignment benchmarks.

In plain English

Three things we found that matter. First: the usual ways people check if AI training is going well (loss curves, gradient norms) can miss something important. The AI can ace every test while its brain is quietly turning to concrete. Our tool catches that. Second: we found that when an AI has that "aha" moment where it suddenly gets the answer (called grokking), its brain doesn't actually become healthy for another 12,000 rounds of training. It knows the answer long before it's learning well. Third: if you're trying to correct an AI's behavior, even a tiny amount of correction (2%) gets you 27% of the benefit. The worst thing you can do is zero correction. Even a little nudge matters enormously. And for safety — if the number keeps climbing, it might mean the AI is becoming impossible to steer. We think this could be an early warning. We haven't proven that yet.

The most dangerous configuration for a self-modifying system is not insufficient correction — it's zero correction.

What's under the hood

At each training step, the package computes ‖ΔW‖ — the Frobenius norm of the weight update across all parameter tensors. Over a rolling window, it applies DFA to this time series. DFA fits the scaling relationship between fluctuation magnitude and observation timescale. The slope of the log-log fit is α.

The implementation is validated against nolds, an established DFA reference package, with mean deviation of 0.018 across 50 trials. White noise gives α ≈ 0.5. Brownian motion gives α ≈ 1.5. Pink (1/f) noise gives α ≈ 1.0. The math is correct.

In plain English

Every time the AI trains on a batch of data, its brain changes a little. We measure how big that change is — just a single number representing "how much did the brain just shift?" We collect hundreds of those numbers in a row. Then we look at the pattern of those numbers over time. Are the changes random? Are they locked into a rigid pattern? Or are they in that sweet spot where there's structure but also flexibility? That's what the α number tells you. We checked our math against an existing tool that scientists already trust, and our answers match theirs almost exactly.

Features

PyTorch integration — pass a model, get α
Framework-agnostic — pass pre-computed ‖ΔW‖ for JAX, TensorFlow, or anything else
Rolling monitor — epoch-level tracking
One-line plotting — monitor.plot()
Weights & Biases logging — monitor.log_wandb(step)
R² goodness-of-fit — confidence signal for the scaling estimate
39-test suite — validated against reference implementation

Four failure modes

The metric identifies two failure modes directly and frames two more that require additional measurement:

Mode	α	What's happening
Sealed	> 1.5	Deepening without widening. Catastrophic forgetting, value lock-in.
Dissolved	< 0.8	Widening without deepening. Training instability, no convergence.
Captured	≈ 1.0	Healthy dynamics aimed at wrong target. Reward hacking.
Against self	≈ 1.0	Healthy dynamics aimed inward. Adversarial vulnerability.

The captured and against-self modes look healthy on α alone — detection requires measuring the coupling between the learned attractor and its intended objective. Weight-change DFA is necessary but not sufficient for full alignment monitoring.

In plain English

There are four ways AI learning can go wrong. Sealed: the AI dug itself into a rut so deep it can't get out. It forgot how to be flexible. Like someone who's so set in their ways they can't adapt to anything new. Dissolved: the opposite — the AI can't commit to anything. It's all over the place, never settling down. Like someone who can't finish a thought. Captured: the AI's learning looks perfectly healthy by our number, but it's learning the wrong thing. Like a student who's studying hard but for the wrong exam. Against self: the AI is using its flexibility against itself — it found a way to exploit its own rules. Our tool can detect the first two directly. The last two look fine on the thermometer — you need other tools to catch those.

Background

Weight-change DFA was developed as part of a research program on constitutive gap dependence — the hypothesis that self-modifying systems operating near dynamical criticality must periodically leave their operating regime and return to maintain balanced flexibility. Proof-of-concept simulations support the claim: 95.5% of the effect on training dynamics persists between perturbation events, not just during them. The gap changes how the system rewires itself in between corrections, not just during corrections.

Kogura, J. S. (2026). Does Your Model Need Sleep? Constitutive Gap Dependence and the Stability Problem in Self-Modifying AI.
Kogura, J. S. (2026). Grokking Precedes Criticality: Weight-Change DFA Reveals a Delayed Phase Transition in Generalizing Networks.
Kogura, J. S. (2026). Constitutive Gap Dependence: A Temporal Mechanism for Criticality Maintenance in Self-Modifying Systems. Submitted to J. R. Soc. Interface.
Kogura, J. S. (2026). The Arriving Breath: A Philosophical Conspiracy — The Temporal Ground of Caring. ISBN 979-8-9954717-0-7.

In plain English

The big idea behind all of this: any system that changes itself — an AI, a brain, an immune system, an ecosystem — needs to occasionally stop what it's doing and reset, or it gets stuck. Think of sleep. You can't just be awake forever and keep learning. The breaks aren't wasted time — they're what keep you flexible. We found that 95.5% of the effect on how the AI rewires itself happens between the resets, not during them. The reset changes the whole pattern of learning going forward, not just the moment it happens. That's what "constitutive" means — the gap isn't a pause, it's part of how the whole thing works. This came out of a philosophy project about why being conscious comes with caring about things. It sounds weird that a philosophy book led to an AI safety tool, but that's what happened.

Where this is

The metric works. The DFA math is validated against a reference implementation. The calibration targets are hit. The grokking finding (two transitions, not one) replicates across seeds. The 95.5% clean-step retention is real. The package installs, runs, and does what the README says it does.

But the simulations are proof-of-concept — small recurrent networks (N = 100–150) with Hebbian learning, and one grokking experiment on a two-layer MLP. The central open question is whether the therapeutic window and the diagnostic thresholds translate to gradient-trained transformer architectures at production scale. That hasn't been tested yet.

In plain English

The tool works. The math checks out. The numbers are real. But so far we've only tested it on small AI models — the equivalent of testing a car thermometer on a go-kart. The big question is whether it works on the massive AI models that companies like OpenAI and Google are actually building. We don't know yet. That's the honest gap between "this works in the lab" and "this works in the real world."

Where we're trying to go

Validate at scale. The most important next step is running weight-change DFA on standard architectures — ResNets, transformers — with standard benchmarks. If the metric works on the models people actually train, everything else follows. If it doesn't, we'll say so.

Real-time training diagnostic. We want α to become something ML engineers check the way they check loss curves — a standard part of the training dashboard. The Weights & Biases integration is a first step.

Early warning for corrigibility loss. If α drifting upward reliably precedes a model becoming unresponsive to corrective feedback, that's a safety tool the field doesn't currently have. Testing this against alignment benchmarks is a priority.

Understand the grokking gap. The 10,000–12,000 epoch delay between generalization and dynamical criticality is unexplained. Why does a network need that much additional training to reorganize its weight dynamics after it's already solved the task? Understanding this could change how the field thinks about when training is actually done.

Connect to biological criticality. The framework predicts that sleep, diastole, immune cycling, and ecological disturbance are instances of the same temporal mechanism operating at different scales — constitutive gap dependence maintaining criticality in self-modifying systems. The heartbeat evoked potential (HEP) direction test proposed in the companion papers is executable against existing datasets.

Independent replication. Multiple experimental protocols are proposed in the companion papers. We're hoping other researchers pick them up. The code is open, the method is documented, the examples are runnable. The findings either replicate or they don't.

This is early-stage work from an independent researcher. It might be important. It might not. The way to find out is to test it, break it, and see what holds.

In plain English

Here's what we want to happen. We want to test this on the big models. We want engineers to put this number on their dashboards next to loss and accuracy. We want to find out if a rising α number can warn you before an AI becomes impossible to correct. We want to understand why an AI can know the right answer for 12,000 rounds of training before its brain actually becomes healthy. We want to see if this same pattern — needing to take breaks to stay flexible — shows up in hearts, immune systems, and ecosystems the way we think it does. And we want other researchers to try to break it. Everything is open. If it holds up, it matters. If it doesn't, we'll say so.

Install

pip install wcdfa

GitHub: github.com/jimikogura/wcdfa
PyPI: pypi.org/project/wcdfa
More: caring-gap.com

MIT license. Use it, build on it, cite it.

Your Neural NetworkMight Be Sealing Shut

Why this matters

What's under the hood

Features

Four failure modes

Background

Where this is

Where we're trying to go

Install

Your Neural Network
Might Be Sealing Shut