Open-source · 11 tools

Buildable infrastructure for CSAM defense.

Open-source building blocks for CSAM detection, blocking, and prevention. The companion portfolio to Chapter 06: Technology Solutions — designed so any platform can wire up the same protective infrastructure as a Discord or Roblox without weeks of per-provider integration. New to this? Start with the For Developers guide.

Status reflects current readiness. Repository: digitalharm/fight-csam (private during initial bring-up; flipping public after the safety guard CI is battle-tested on real PR traffic).

For developers

Build with these tools at FightCSAM.

FightCSAM is the developer home for this toolkit — quickstarts and API docs for every tool, a guided golden path to a compliant pipeline, and an analyzed directory of the wider open-source safety-tools ecosystem.

Visit FightCSAM ↗

How this fits

The buildable layer.

These tools sit one layer below the existing detection infrastructure (PhotoDNA, NCMEC, IWF, Project Arachnid, Cloudflare CSAM Scanning, Thorn Safer, Hive AI). They do not replace it — they make it cheaper to consume. For engineers wiring it up, see For Developers. For platform leaders deciding what to build, see For Tech CEOs. For compliance and audit context, see For Compliance Teams.

Roadmap and status

Current state: 11 OSS projects in progress.

The full roadmap documents each tool's current state, next milestone, acceptance criteria for status promotion, and the cross-tool dependency map. Status badges below pull from that source of truth.

See the roadmap ↗

Hashing & conformance

The foundation: byte-identical hashing across runtimes

PDQ and TMK+PDQF are the open-source perceptual hashing algorithms that the NCMEC Hash Sharing API accepts as supported fingerprint types. The drift problem — different language ports producing different hashes for the same image — is the silent-false-negative failure mode the conformance suite exists to kill.

HashKit
In progress
Wave 1: Foundation
One WebAssembly core for PDQ and TMK+PDQF, with NCMEC-verified test vectors so every language produces the same hash.
```
npm install @digitalharm/hashkit
```
RustWASMNodeDenoBunPython
Docs ↗GitHub ↗Context in report →
hashkit-match
In progress
Ships alongside HashKit
In-memory multi-index Hamming matcher over caller-supplied hash sets. Ships no hash lists.
```
cargo add hashkit-match
```
RustWASM
Docs ↗GitHub ↗Context in report →
DetectKit-Test
In progress
Wave 1: Foundation
Synthetic non-CSAM test fixtures with engineered hash properties — verify your detection pipeline in CI without touching real CSAM.
```
pip install detectkit-test
```
TypeScriptPython
Docs ↗GitHub ↗Context in report →

Integration & prevention

Drop-in middleware and prompt-side defense

The integration layer compresses week-long PhotoDNA onboarding into an afternoon and adds defense in depth at the AI-generation prompt before any compute is spent. Designed so any platform can wire up the same protection as a Discord or Roblox.

CSAM-Shield
In progress
Wave 2: Drop-in adoption
One-line middleware for Express/Fastify/FastAPI/Hono that wires PhotoDNA, PDQ, NCMEC API, and Cloudflare CSAM Scanning behind a unified interface.
```
npm install @digitalharm/csam-shield
```
TypeScriptPython
Docs ↗GitHub ↗Context in report →
PromptShield
In progress
Wave 2: Drop-in adoption
Lightweight classifier middleware for Stable Diffusion / FLUX / ComfyUI / vLLM that detects CSAM intent at the prompt, before compute is spent.
```
pip install promptshield
```
Python
Docs ↗GitHub ↗Context in report →
C2PA-Lite
In progress
Wave 5: re-promoted from Deferred — manifest layer scaffolded; watermark layer awaits research stabilization
Pragmatic C2PA content credentials for generators that don't yet have provenance signaling.
Rust
Docs ↗GitHub ↗Context in report →

List infrastructure

Hash-list sync, audit, and pre-training screening

These two tools sit on the credentialed layer — NCMEC, IWF, and Project Arachnid relationships are required to operate them in production. They are the natural home for the grant-funded credential-brokering work and ship after the foundation establishes the maintainer's standing.

HashStream
In progress
Wave 3: Credentialed infrastructure
Version control and an audit trail for the CSAM hash lists you're legally on the hook for.
```
go install github.com/digitalharm/fight-csam/packages/hashstream/cmd/hashstreamd@latest
```
GoTypeScript
Docs ↗GitHub ↗Context in report →
TrainGuard
In progress
Wave 3: Credentialed infrastructure
Pre-flight screen for AI image/video training datasets against national hash lists. Generates compliance reports with chain-of-custody.
```
pip install trainguard
```
Python
Docs ↗GitHub ↗Context in report →

Legal & operations

CyberTipline filing, evidence retention, moderator wellbeing

The legal endgame: filing statutory reports under 18 U.S.C. § 2258A, retaining evidence with proper chain of custody, and operationalizing moderator wellbeing as compliance. These ship last because they carry direct federal blast radius and require outside counsel on retainer.

CyberTip CLI
In progress
Wave 4: Legal endgame
NCMEC CyberTipline report submission with proper formatting, retry logic, evidence packaging, audit logging.
```
npm install -g @digitalharm/cybertip-cli
```
TypeScriptPython
Docs ↗GitHub ↗Context in report →
EvidenceVault
In progress
Wave 4: Legal endgame
Defensible records-retention with chain-of-custody metadata, preservation timers matching LE requests, jurisdiction-aware schedules.
```
docker pull ghcr.io/digitalharm/evidencevault
```
Go
Docs ↗GitHub ↗Context in report →
SafeMod
In progress
Wave 5: re-promoted from Deferred — privacy-by-construction (zero deps, no identifiers, k-anonymous aggregates)
Moderator-wellness layer: blur-by-default media, hard exposure caps + mandatory breaks, and aggregate-only wellbeing signals — stores no personal or special-category data.
Rust
Docs ↗GitHub ↗Context in report →

Scope & safety

What this portfolio is not.

None of these tools ship a CSAM hash list. National hash data lives at NCMEC, IWF, and Project Arachnid under specific legal frameworks. It stays there. The portfolio implements the algorithms and clients that consume those lists, never the lists themselves.
None of these tools handle real CSAM imagery. Tests use synthetic non-CSAM fixtures from detectkit-test with engineered hash properties.
These tools do not provide legal compliance by default. Adoption does not satisfy 18 U.S.C. § 2258A, DSA Article 16, the UK Online Safety Act, or any other regulatory regime. Counsel remains required.
Detection-assist, not a guarantee. Hash-matching catches known material; AI classification catches novel material with false positives. Human review remains essential.

See the full safety policy for the threat model and the CI guard that enforces these rules.

Roadmap: the build sequence

Tools ship in dependency order across five waves — the foundation first, because everything downstream hashes through it. Within each wave, a tool climbs a maturity ladder from Planned to Stable.

Maturity ladder

1Planned
Designed; nothing beyond a README and a status line.
2In progress
Scaffolded and compiling; core functionality landing.
3Alpha
Real code, usable by early adopters who accept breaking changes.
4Beta
API stable; hardening for production.
5Stable
Production-ready. Semver applies.

A sixth status, Deferred, marks work intentionally postponed or spun out — not on the active roadmap.

Build waves

Wave 1
Foundation
Byte-identical perceptual hashing and synthetic conformance fixtures. Everything downstream matches against these hashes, so the foundation ships first.
HashKitIn progresshashkit-matchIn progressDetectKit-TestIn progress
Wave 2
Drop-in adoption
One-line detection middleware and prompt-side prevention a small team can wire up in an afternoon — no enterprise contract required.
CSAM-ShieldIn progressPromptShieldIn progress
Wave 3
Credentialed infrastructure
Hash-list version control and pre-training dataset screening. Operating these in production is gated on NCMEC / IWF / Project Arachnid relationships.
HashStreamIn progressTrainGuardIn progress
Wave 4
Legal endgame
Statutory CyberTipline reporting and defensible evidence retention. Production submission paths stay blocked until outside counsel signs off.
CyberTip CLIIn progressEvidenceVaultIn progress
Wave 5
Provenance & satellites
C2PA content credentials for AI generators, re-promoted as a prevention primitive. SafeMod's moderator-wellbeing layer, also re-promoted from Deferred — rebuilt privacy-by-construction so it stores no personal or special-category data.
C2PA-LiteIn progressSafeModIn progress

Funding and sponsorship

The portfolio is self-funded by anonymous individuals and organizations who want this infrastructure to exist — there is no per-seat licensing, and no platform pays for its own copy. The alignment matters: this is plumbing every platform needs and rebuilds poorly, so it is built as a public good and given away, with the goal of the widest possible adoption rather than revenue.

If you would like to help sustain or expand the work, sponsorship is welcome. See the sponsorship document for ways to contribute, or email sponsor@digitalharm.org.

Notes

The portfolio is intentionally fixed at 10 tools. New packages need a design conversation before opening a PR. See CONTRIBUTING.
The build order matters: foundation first (no credentials needed), drop-in adoption second, credentialed infrastructure third, legal endgame last. See the sequencing document for the rationale.
Both originally-deferred tools have since been re-promoted and built: SafeMod was rebuilt privacy-by-construction (zero dependencies, no identifiers stored, aggregate-only k-anonymous wellbeing signals), which removes the GDPR special-category-data concern that caused its deferral; C2PA-Lite ships its manifest layer, with real signing behind an upstream feature flag pending the c2pa-rs dependency decision. The portfolio now has no indefinitely-deferred tools.

← Chapter 06: Technology Solutions For Tech CEOs →

Build with these tools at FightCSAM.

The buildable layer.

Current state: 11 OSS projects in progress.

The foundation: byte-identical hashing across runtimes

HashKit

hashkit-match

DetectKit-Test

Drop-in middleware and prompt-side defense

CSAM-Shield

PromptShield

C2PA-Lite

Hash-list sync, audit, and pre-training screening

HashStream

TrainGuard

CyberTipline filing, evidence retention, moderator wellbeing

CyberTip CLI

EvidenceVault

SafeMod

What this portfolio is not.

Roadmap: the build sequence

Foundation

Drop-in adoption

Credentialed infrastructure

Legal endgame

Provenance & satellites

Funding and sponsorship

Notes