Open-source · 11 tools
Buildable infrastructure for CSAM defense.
Open-source building blocks for CSAM detection, blocking, and prevention. The companion portfolio to Chapter 06: Technology Solutions — designed so any platform can wire up the same protective infrastructure as a Discord or Roblox without weeks of per-provider integration. New to this? Start with the For Developers guide.
Status reflects current readiness. Repository: digitalharm/fight-csam (private during initial bring-up; flipping public after the safety guard CI is battle-tested on real PR traffic).
For developers
Build with these tools at FightCSAM.
FightCSAM is the developer home for this toolkit — quickstarts and API docs for every tool, a guided golden path to a compliant pipeline, and an analyzed directory of the wider open-source safety-tools ecosystem.
How this fits
The buildable layer.
These tools sit one layer below the existing detection infrastructure (PhotoDNA, NCMEC, IWF, Project Arachnid, Cloudflare CSAM Scanning, Thorn Safer, Hive AI). They do not replace it — they make it cheaper to consume. For engineers wiring it up, see For Developers. For platform leaders deciding what to build, see For Tech CEOs. For compliance and audit context, see For Compliance Teams.
Roadmap and status
Current state: 11 OSS projects in progress.
The full roadmap documents each tool's current state, next milestone, acceptance criteria for status promotion, and the cross-tool dependency map. Status badges below pull from that source of truth.
The foundation: byte-identical hashing across runtimes
PDQ and TMK+PDQF are the open-source perceptual hashing algorithms that the NCMEC Hash Sharing API accepts as supported fingerprint types. The drift problem — different language ports producing different hashes for the same image — is the silent-false-negative failure mode the conformance suite exists to kill.
HashKit
In progressWave 1: Foundation
One WebAssembly core for PDQ and TMK+PDQF, with NCMEC-verified test vectors so every language produces the same hash.
npm install @digitalharm/hashkit
RustWASMNodeDenoBunPythonhashkit-match
In progressShips alongside HashKit
In-memory multi-index Hamming matcher over caller-supplied hash sets. Ships no hash lists.
cargo add hashkit-match
RustWASMDetectKit-Test
In progressWave 1: Foundation
Synthetic non-CSAM test fixtures with engineered hash properties — verify your detection pipeline in CI without touching real CSAM.
pip install detectkit-test
TypeScriptPython
Drop-in middleware and prompt-side defense
The integration layer compresses week-long PhotoDNA onboarding into an afternoon and adds defense in depth at the AI-generation prompt before any compute is spent. Designed so any platform can wire up the same protection as a Discord or Roblox.
CSAM-Shield
In progressWave 2: Drop-in adoption
One-line middleware for Express/Fastify/FastAPI/Hono that wires PhotoDNA, PDQ, NCMEC API, and Cloudflare CSAM Scanning behind a unified interface.
npm install @digitalharm/csam-shield
TypeScriptPythonPromptShield
In progressWave 2: Drop-in adoption
Lightweight classifier middleware for Stable Diffusion / FLUX / ComfyUI / vLLM that detects CSAM intent at the prompt, before compute is spent.
pip install promptshield
PythonC2PA-Lite
In progressWave 5: re-promoted from Deferred — manifest layer scaffolded; watermark layer awaits research stabilization
Pragmatic C2PA content credentials for generators that don't yet have provenance signaling.
Rust
Hash-list sync, audit, and pre-training screening
These two tools sit on the credentialed layer — NCMEC, IWF, and Project Arachnid relationships are required to operate them in production. They are the natural home for the grant-funded credential-brokering work and ship after the foundation establishes the maintainer's standing.
HashStream
In progressWave 3: Credentialed infrastructure
Version control and an audit trail for the CSAM hash lists you're legally on the hook for.
go install github.com/digitalharm/fight-csam/packages/hashstream/cmd/hashstreamd@latest
GoTypeScriptTrainGuard
In progressWave 3: Credentialed infrastructure
Pre-flight screen for AI image/video training datasets against national hash lists. Generates compliance reports with chain-of-custody.
pip install trainguard
Python
CyberTipline filing, evidence retention, moderator wellbeing
The legal endgame: filing statutory reports under 18 U.S.C. § 2258A, retaining evidence with proper chain of custody, and operationalizing moderator wellbeing as compliance. These ship last because they carry direct federal blast radius and require outside counsel on retainer.
CyberTip CLI
In progressWave 4: Legal endgame
NCMEC CyberTipline report submission with proper formatting, retry logic, evidence packaging, audit logging.
npm install -g @digitalharm/cybertip-cli
TypeScriptPythonEvidenceVault
In progressWave 4: Legal endgame
Defensible records-retention with chain-of-custody metadata, preservation timers matching LE requests, jurisdiction-aware schedules.
docker pull ghcr.io/digitalharm/evidencevault
GoSafeMod
In progressWave 5: re-promoted from Deferred — privacy-by-construction (zero deps, no identifiers, k-anonymous aggregates)
Moderator-wellness layer: blur-by-default media, hard exposure caps + mandatory breaks, and aggregate-only wellbeing signals — stores no personal or special-category data.
Rust
Scope & safety
What this portfolio is not.
- None of these tools ship a CSAM hash list. National hash data lives at NCMEC, IWF, and Project Arachnid under specific legal frameworks. It stays there. The portfolio implements the algorithms and clients that consume those lists, never the lists themselves.
- None of these tools handle real CSAM imagery. Tests use synthetic non-CSAM fixtures from detectkit-test with engineered hash properties.
- These tools do not provide legal compliance by default. Adoption does not satisfy 18 U.S.C. § 2258A, DSA Article 16, the UK Online Safety Act, or any other regulatory regime. Counsel remains required.
- Detection-assist, not a guarantee. Hash-matching catches known material; AI classification catches novel material with false positives. Human review remains essential.
See the full safety policy for the threat model and the CI guard that enforces these rules.
Roadmap: the build sequence
Tools ship in dependency order across five waves — the foundation first, because everything downstream hashes through it. Within each wave, a tool climbs a maturity ladder from Planned to Stable.
Maturity ladder
- 1Planned
Designed; nothing beyond a README and a status line.
- 2In progress
Scaffolded and compiling; core functionality landing.
- 3Alpha
Real code, usable by early adopters who accept breaking changes.
- 4Beta
API stable; hardening for production.
- 5Stable
Production-ready. Semver applies.
A sixth status, Deferred, marks work intentionally postponed or spun out — not on the active roadmap.
Build waves
Wave 1
Foundation
Byte-identical perceptual hashing and synthetic conformance fixtures. Everything downstream matches against these hashes, so the foundation ships first.
HashKitIn progresshashkit-matchIn progressDetectKit-TestIn progressWave 2
Drop-in adoption
One-line detection middleware and prompt-side prevention a small team can wire up in an afternoon — no enterprise contract required.
CSAM-ShieldIn progressPromptShieldIn progressWave 3
Credentialed infrastructure
Hash-list version control and pre-training dataset screening. Operating these in production is gated on NCMEC / IWF / Project Arachnid relationships.
HashStreamIn progressTrainGuardIn progressWave 4
Legal endgame
Statutory CyberTipline reporting and defensible evidence retention. Production submission paths stay blocked until outside counsel signs off.
CyberTip CLIIn progressEvidenceVaultIn progressWave 5
Provenance & satellites
C2PA content credentials for AI generators, re-promoted as a prevention primitive. SafeMod's moderator-wellbeing layer, also re-promoted from Deferred — rebuilt privacy-by-construction so it stores no personal or special-category data.
C2PA-LiteIn progressSafeModIn progress
Funding and sponsorship
The portfolio is self-funded by anonymous individuals and organizations who want this infrastructure to exist — there is no per-seat licensing, and no platform pays for its own copy. The alignment matters: this is plumbing every platform needs and rebuilds poorly, so it is built as a public good and given away, with the goal of the widest possible adoption rather than revenue.
If you would like to help sustain or expand the work, sponsorship is welcome. See the sponsorship document for ways to contribute, or email sponsor@digitalharm.org.
Notes
- The portfolio is intentionally fixed at 10 tools. New packages need a design conversation before opening a PR. See CONTRIBUTING.
- The build order matters: foundation first (no credentials needed), drop-in adoption second, credentialed infrastructure third, legal endgame last. See the sequencing document for the rationale.
- Both originally-deferred tools have since been re-promoted and built: SafeMod was rebuilt privacy-by-construction (zero dependencies, no identifiers stored, aggregate-only k-anonymous wellbeing signals), which removes the GDPR special-category-data concern that caused its deferral; C2PA-Lite ships its manifest layer, with real signing behind an
upstreamfeature flag pending the c2pa-rs dependency decision. The portfolio now has no indefinitely-deferred tools.