An engineer at a dual-monitor workstation reviewing model evaluation logs late at night

FIELD NOTE · COVER · MAY 5, 2026 · ISSUE LEAD

FIELD NOTE·May 5, 2026·7 MIN

Heretic 1.3 Ships Reproducible Runs, Bleeds the Fork-Cloners

Same abliteration tool, but now every published model carries a byte-for-byte recipe, and the mystique-merchants lose their cover.

Aditya Sharma·May 5, 2026

FIELD NOTEMAY 5, 2026 · ADITYA SHARMA

The headline feature in Heretic 1.3 is reproducible runs. Gone are the days of 'I can't seem to get such low numbers on my own machine'; you now can.

— p-e-w on r/LocalLLaMA

What AutoKaam Thinks

Reproducible runs are the real release; benchmarking and VRAM are table stakes that should have shipped last year.
20,000 stars and 13M downloads is the floor, not the ceiling, abliteration just became an auditable workflow.
The 'plagiarized fork under the hood' line names a loser without naming it; expect the cloner to scramble for differentiation.
Pin to 1.3 before publishing your next abliterated model, anything older now reads as a hand-wave.

20,000

GitHub stars

HERETIC vs FORK-CLONERS

Named stake

A 1.x point release usually does not deserve a writeup. You scan the changelog, bump the pin, run the suite, move on. Heretic 1.3 is the exception, because the headline feature is not a feature in the product sense. It is a discipline change for an entire corner of the local-LLM scene that has been coasting on opacity. p-e-w shipped reproducible runs, and in the same breath called out a competitor caught running a plagiarized fork under the hood. the story.

What Shipped

Heretic 1.3 is out as of 2026-05-05, announced on r/LocalLLaMA by maintainer p-e-w. Four landed changes matter:

Reproducible runs. When you publish an abliterated model to Hugging Face, Heretic can now generate a reproduce/ directory inside the model repository. The directory captures everything a stranger needs to regenerate a byte-for-byte identical model on their own machine: PyTorch version, GPU, driver, accelerator library, and the rest of the environment surface that silently changes tensor outputs. The example linked from the announcement lives at huggingface.co/p-e-w/Qwen3.5-4B-heretic/blob/main/reproduce/README.md. Long-time contributor Vinay-Umrethe wrote most of the code across a multi-week collaboration with 250-plus review comments. Publishing the directory is opt-in and Heretic prompts before uploading.

Integrated benchmarking. Heretic now runs MMLU, EQ-Bench, GSM8K, and HellaSwag directly, without exporting the model first or hand-configuring an external harness. The system sits on top of lm-evaluation-harness, which is the same scaffold academic LLM papers use. Numbers come out directly comparable to published figures.

Lower peak VRAM. Contributor magiccodingman traced where intermediate tensors were ballooning in GPU memory and trimmed the peak. Larger models now fit on the same card.

Broader model support. Contributors farolone and MoonRide303 generalized the layer and module handling so the latest architectures, Qwen3.5 and Gemma 4 are called out explicitly, flow through Heretic without bespoke patches.

The repository sits at 20,000 GitHub stars with 13 million cumulative model downloads, per the maintainer's own count, which excludes "a certain competitor" found to have been running a plagiarized fork. p-e-w does not name the competitor. Anyone who has been watching this corner of the scene knows who that is.

A character in a blue hat and fur-lined cloak identifies a likely heretic aboard a vessel, while a menacing demon-like creature with sharp teeth and a red armored body appears with a forged pet, and another character inquiries about clues r *Photo: preview.redd.it*

Why It Matters

The interesting move here is not the benchmarking and not the VRAM win. Both should have landed in 1.0; they are good housekeeping that any serious tool acquires by version three. The interesting move is reproducibility, and it is interesting because of who it threatens.

Model decensoring has been having a moment. Forks and clones have multiplied. Some of them, in the maintainer's own framing, are wrapping the same underlying technique in mystique, technical jargon, and tens of thousands of lines of LLM-written junk code. Anyone who has tried to evaluate one of these forks knows the routine: the README promises a novel approach, the code does not match the promise, the published numbers cannot be replicated locally, and the gap between claim and reality is filled with vibes. The cloning is bad. The opacity around the cloning is worse, because it makes the field illegible to anyone outside the maintainers' immediate circle.

A reproduce/ directory shipped alongside every public model collapses that opacity. If your fork claims a 2-point MMLU advantage over upstream Heretic, somebody can now rebuild your model on their own GPU and check. If the rebuild does not produce your numbers, you have a problem you cannot hide behind jargon. The cost of mystique-as-strategy just went up.

The other read is that 20,000 stars and 13M downloads is no longer hobbyist scale, it is platform-shaped traffic. Tooling that crosses into platform territory tends to develop two failure modes: it gets forked aggressively, and it gets criticized for opacity. p-e-w has chosen to address the second by hardening the project's auditability rather than the first by tightening the license. the right call. Closing the license invites adversarial behavior; opening the audit trail makes adversarial behavior pointless.

The third read is about the next year of model architectures. Qwen3.5 and Gemma 4 support shipping in the same release tells you the maintainers expect to keep up with frontier open-weight families as they land. If you build an internal pipeline on Heretic, you are betting on that cadence. The bet looks defensible.

What to Try

If you publish abliterated models, or you evaluate ones other people have published, here is the migration list.

Bump to 1.3 before your next publish. Older versions still work, but anything you publish without a reproduce/ directory now reads as either lazy or evasive. The community baseline just moved. Match it.

Turn on reproducibility on your next model upload. It is opt-in and Heretic prompts you, so nothing breaks if you skip it. Skipping it on a public model, though, signals that you do not want your numbers checked. not a signal you want to send when you are competing for stars and downloads against a maintainer who has just made auditability the default.

Re-run your existing benchmarks through the integrated harness. If you previously evaluated a published model with an external harness or a one-off script, the lm-evaluation-harness numbers from inside Heretic will not necessarily match. Standardize on the integrated path, then republish your scores with the methodology note. Numbers that disagree with your earlier scores are interesting; they tell you whether your old harness was leaky.

Stress-test the VRAM headroom. The peak-VRAM optimizations let larger models fit on the same card, but the gain depends on the architecture and the run shape. Profile a representative run before assuming you can move from a 70B-class model to something bigger.

Validate Qwen3.5 and Gemma 4 paths early. Generic layer handling is a strong claim. New architectures often surface edge cases that only show up in the third or fourth model the abstraction is tested against. Run a small abliteration on each, compare against a known-good upstream output, and file an issue fast if anything drifts.

Pin tight in production pipelines. If you run Heretic inside a CI pipeline that produces internal models, version-pin to 1.3 explicitly. The reproducibility metadata is sensitive to the surrounding environment by design, so a silent bump to a future minor release could change the published reproduce/ payload in ways downstream consumers will notice.

The cost of mystique-as-strategy just went up. Reproducibility shipped as a default makes the gap between claim and reality somebody else's problem.

A Blood Angels character named Raposa fights heretics in a dark, gothic environment with blood splattered on the floor, while another character named Heinrich is nearby and a portrait of a woman with a fierce expression is displayed at the *Photo: i.redd.it*

One more thing: the announcement teases a future feature that builds on the reproducibility system, unannounced as of this writing. Whatever it is, it depends on reproduce/ directories existing in the wild. The earlier you start publishing them, the better-positioned you are when the next feature lands. There is no downside to opting in now; the downside arrives if you wait and discover the next-version tooling assumes the metadata is already there.

Looking Ahead

Watch what happens to the named-but-unnamed cloner over the next two release cycles. Either they ship their own reproducibility story and get back to honest competition, or they go quiet. Either outcome is informative. And watch the unannounced follow-up feature p-e-w hinted at, anything built on top of reproduce/ will reshape what counts as a credible decensored-model release. The signal to track is how many top-100 abliterated models on Hugging Face carry a reproduce/ directory ninety days from now. If the number is high, the discipline change worked.

Sources

r/LocalLLaMA, Heretic 1.3 release thread, accessed 2026-05-05
p-e-w/heretic on GitHub, accessed 2026-05-05

Topics

#open-source #local-llm #model-tooling

Adjacent

Heretic 1.3 Ships Reproducible Runs, Bleeds the Fork-Cloners

What Shipped

Why It Matters

What to Try

Looking Ahead

Sources

More from the same beat.

Amazon Cracks Open Its Logistics Stack, Bleeds UPS 10%

CopilotKit Lands $27M, Crowds Out Vercel's AI SDK

Corvera Bags $4.2M, Bleeds CPG Back-Office Headcount