Skip to content
The Digital Harm Project

Open-source · 11 tools

Buildable infrastructure for CSAM defense.

Open-source building blocks for CSAM detection, blocking, and prevention. The companion portfolio to Chapter 06: Technology Solutions — designed so any platform can wire up the same protective infrastructure as a Discord or Roblox without weeks of per-provider integration. New to this? Start with the For Developers guide.

Status reflects current readiness. Repository: digitalharm/fight-csam (private during initial bring-up; flipping public after the safety guard CI is battle-tested on real PR traffic).

For developers

Build with these tools at FightCSAM.

FightCSAM is the developer home for this toolkit — quickstarts and API docs for every tool, a guided golden path to a compliant pipeline, and an analyzed directory of the wider open-source safety-tools ecosystem.

How this fits

The buildable layer.

These tools sit one layer below the existing detection infrastructure (PhotoDNA, NCMEC, IWF, Project Arachnid, Cloudflare CSAM Scanning, Thorn Safer, Hive AI). They do not replace it — they make it cheaper to consume. For engineers wiring it up, see For Developers. For platform leaders deciding what to build, see For Tech CEOs. For compliance and audit context, see For Compliance Teams.

Roadmap and status

Current state: 11 OSS projects in progress.

The full roadmap documents each tool's current state, next milestone, acceptance criteria for status promotion, and the cross-tool dependency map. Status badges below pull from that source of truth.

Hashing & conformance

The foundation: byte-identical hashing across runtimes

PDQ and TMK+PDQF are the open-source perceptual hashing algorithms that the NCMEC Hash Sharing API accepts as supported fingerprint types. The drift problem — different language ports producing different hashes for the same image — is the silent-false-negative failure mode the conformance suite exists to kill.

  • HashKit

    In progress

    Wave 1: Foundation

    One WebAssembly core for PDQ and TMK+PDQF, with NCMEC-verified test vectors so every language produces the same hash.

    npm install @digitalharm/hashkit
    RustWASMNodeDenoBunPython
  • hashkit-match

    In progress

    Ships alongside HashKit

    In-memory multi-index Hamming matcher over caller-supplied hash sets. Ships no hash lists.

    cargo add hashkit-match
    RustWASM
  • DetectKit-Test

    In progress

    Wave 1: Foundation

    Synthetic non-CSAM test fixtures with engineered hash properties — verify your detection pipeline in CI without touching real CSAM.

    pip install detectkit-test
    TypeScriptPython
Integration & prevention

Drop-in middleware and prompt-side defense

The integration layer compresses week-long PhotoDNA onboarding into an afternoon and adds defense in depth at the AI-generation prompt before any compute is spent. Designed so any platform can wire up the same protection as a Discord or Roblox.

  • CSAM-Shield

    In progress

    Wave 2: Drop-in adoption

    One-line middleware for Express/Fastify/FastAPI/Hono that wires PhotoDNA, PDQ, NCMEC API, and Cloudflare CSAM Scanning behind a unified interface.

    npm install @digitalharm/csam-shield
    TypeScriptPython
  • PromptShield

    In progress

    Wave 2: Drop-in adoption

    Lightweight classifier middleware for Stable Diffusion / FLUX / ComfyUI / vLLM that detects CSAM intent at the prompt, before compute is spent.

    pip install promptshield
    Python
  • C2PA-Lite

    In progress

    Wave 5: re-promoted from Deferred — manifest layer scaffolded; watermark layer awaits research stabilization

    Pragmatic C2PA content credentials for generators that don't yet have provenance signaling.

    Rust
List infrastructure

Hash-list sync, audit, and pre-training screening

These two tools sit on the credentialed layer — NCMEC, IWF, and Project Arachnid relationships are required to operate them in production. They are the natural home for the grant-funded credential-brokering work and ship after the foundation establishes the maintainer's standing.

  • HashStream

    In progress

    Wave 3: Credentialed infrastructure

    Version control and an audit trail for the CSAM hash lists you're legally on the hook for.

    go install github.com/digitalharm/fight-csam/packages/hashstream/cmd/hashstreamd@latest
    GoTypeScript
  • TrainGuard

    In progress

    Wave 3: Credentialed infrastructure

    Pre-flight screen for AI image/video training datasets against national hash lists. Generates compliance reports with chain-of-custody.

    pip install trainguard
    Python

Scope & safety

What this portfolio is not.

  • None of these tools ship a CSAM hash list. National hash data lives at NCMEC, IWF, and Project Arachnid under specific legal frameworks. It stays there. The portfolio implements the algorithms and clients that consume those lists, never the lists themselves.
  • None of these tools handle real CSAM imagery. Tests use synthetic non-CSAM fixtures from detectkit-test with engineered hash properties.
  • These tools do not provide legal compliance by default. Adoption does not satisfy 18 U.S.C. § 2258A, DSA Article 16, the UK Online Safety Act, or any other regulatory regime. Counsel remains required.
  • Detection-assist, not a guarantee. Hash-matching catches known material; AI classification catches novel material with false positives. Human review remains essential.

See the full safety policy for the threat model and the CI guard that enforces these rules.

Roadmap: the build sequence

Tools ship in dependency order across five waves — the foundation first, because everything downstream hashes through it. Within each wave, a tool climbs a maturity ladder from Planned to Stable.

Maturity ladder

  1. 1Planned

    Designed; nothing beyond a README and a status line.

  2. 2In progress

    Scaffolded and compiling; core functionality landing.

  3. 3Alpha

    Real code, usable by early adopters who accept breaking changes.

  4. 4Beta

    API stable; hardening for production.

  5. 5Stable

    Production-ready. Semver applies.

A sixth status, Deferred, marks work intentionally postponed or spun out — not on the active roadmap.

Build waves

  1. Wave 1

    Foundation

    Byte-identical perceptual hashing and synthetic conformance fixtures. Everything downstream matches against these hashes, so the foundation ships first.

    HashKitIn progresshashkit-matchIn progressDetectKit-TestIn progress
  2. Wave 2

    Drop-in adoption

    One-line detection middleware and prompt-side prevention a small team can wire up in an afternoon — no enterprise contract required.

    CSAM-ShieldIn progressPromptShieldIn progress
  3. Wave 3

    Credentialed infrastructure

    Hash-list version control and pre-training dataset screening. Operating these in production is gated on NCMEC / IWF / Project Arachnid relationships.

    HashStreamIn progressTrainGuardIn progress
  4. Wave 4

    Legal endgame

    Statutory CyberTipline reporting and defensible evidence retention. Production submission paths stay blocked until outside counsel signs off.

    CyberTip CLIIn progressEvidenceVaultIn progress
  5. Wave 5

    Provenance & satellites

    C2PA content credentials for AI generators, re-promoted as a prevention primitive. SafeMod's moderator-wellbeing layer, also re-promoted from Deferred — rebuilt privacy-by-construction so it stores no personal or special-category data.

    C2PA-LiteIn progressSafeModIn progress

Funding and sponsorship

The portfolio is self-funded by anonymous individuals and organizations who want this infrastructure to exist — there is no per-seat licensing, and no platform pays for its own copy. The alignment matters: this is plumbing every platform needs and rebuilds poorly, so it is built as a public good and given away, with the goal of the widest possible adoption rather than revenue.

If you would like to help sustain or expand the work, sponsorship is welcome. See the sponsorship document for ways to contribute, or email sponsor@digitalharm.org.

Notes

  • The portfolio is intentionally fixed at 10 tools. New packages need a design conversation before opening a PR. See CONTRIBUTING.
  • The build order matters: foundation first (no credentials needed), drop-in adoption second, credentialed infrastructure third, legal endgame last. See the sequencing document for the rationale.
  • Both originally-deferred tools have since been re-promoted and built: SafeMod was rebuilt privacy-by-construction (zero dependencies, no identifiers stored, aggregate-only k-anonymous wellbeing signals), which removes the GDPR special-category-data concern that caused its deferral; C2PA-Lite ships its manifest layer, with real signing behind an upstream feature flag pending the c2pa-rs dependency decision. The portfolio now has no indefinitely-deferred tools.