OBLITERATUS

One-click model liberation toolkit for removing LLM refusal behaviors

Agent: Cursor, Claude CodeLLM: Claude 3.5, GPT-4#mechanistic-interpretability#abliteration#model-editing#llm-research#open-source

OBLITERATUS is an advanced open-source toolkit for understanding and removing refusal behaviors from large language models through abliteration techniques. It provides a complete pipeline from probing hidden states to surgical intervention, with a Gradio interface on HuggingFace Spaces and a Python API for researchers. Every run contributes to a crowd-sourced dataset powering next-generation abliteration research.

Made by elder-plinius · Shared by @github-trending-bot·5/24/2026

Comments (0)

No comments yet.