← Back to index
View raw markdown

NoCherry -- Anti Data Cherry-Picking Consortium

Just say no to cherry picking data

Commit-Reveal Protocol

Hash your data, commit the hashes to a timestamped server, wait for a drand beacon round you couldn't have predicted, and the beacon deterministically picks which items you must reveal. You can't cherry-pick what gets audited.

Terminal demo: key generation, hashing, commitment, verification, and server submission

A standalone commit-reveal protocol that proves three things about data:

  1. Temporal ordering -- the commitment existed before the randomness that selected it
  2. Integrity -- revealed data matches committed hashes
  3. Unbiased selection -- which items are chosen for reveal is determined by external randomness, not the committer

The protocol is domain-agnostic. It deals in opaque SHA-256 hashes. Higher-level systems layer meaning on top.

Why this exists

When someone holds a private dataset and must reveal a subset to prove something about the whole, they'll cherry-pick the best-looking slice. This protocol makes that impossible — an external beacon nobody controls picks what gets revealed, after the data is already committed.

This matters for AI benchmarks, standardized test fairness, training data provenance, compliance audits, and more.

How the protocol works

 User                          Server                      drand beacon
  │                              │                              │
  │  1. Hash data locally        │                              │
  │     (SHA-256 per item)       │                              │
  │                              │                              │
  │  2. Create commitment        │                              │
  │     (bundle hashes +         │                              │
  │      sign with Ed25519)      │                              │
  │                              │                              │
  │  3. Submit ─────────────────>│                              │
  │                              │  4. Record timestamp         │
  │                              │  5. Wait for next round ────>│
  │                              │<──── beacon randomness ──────│
  │                              │  6. Compute selection        │
  │                              │     (deterministic from      │
  │                              │      beacon + commitment)    │
  │<──── signed receipt ─────────│                              │
  │                              │                              │
  │  7. Reveal selected items ──>│                              │
  │                              │                              │
  │         Anyone can verify the entire chain offline          │

The key insight

The commitment is recorded before the beacon round that determines selection. The beacon randomness is unpredictable (BLS threshold signatures from a distributed network). This means the committer cannot know which items will be selected when they commit, preventing cherry-picking.

Verification is offline

A receipt is self-contained. Anyone can verify the entire chain — commitment signature, beacon timing, selection correctness, data integrity — without contacting the server.

Selection algorithm and verification details

Selection algorithm

Given a beacon output and a commitment, selection is fully deterministic:

Both methods are deterministic -- anyone with the beacon output and commitment can recompute the exact same selection.

Verification checks

  1. Commitment signature -- Ed25519 verify the commitment using the signer's public key
  2. Beacon temporal ordering -- the drand round timestamp is after registered_at
  3. Selection recomputation -- re-run the selection algorithm, confirm it matches
  4. Reveal signature -- Ed25519 verify the reveal object
  5. Data integrity -- SHA-256 hash revealed data, confirm it matches committed hashes
  6. Completeness -- all randomly-selected items are actually revealed

Cryptographic primitives

Primitive Usage
SHA-256 File hashing, commitment hashing, HMAC-based PRNG
Ed25519 Commitment and reveal signing, server key management
HMAC-SHA256 Selection PRNG, seeded with beacon randomness
BLS12-381 drand beacon signature verification
Canonical JSON Deterministic serialization for signing payloads

All cryptographic operations are implemented once in Rust (crates/pb-core) and compiled to WebAssembly for use in browsers and Node.js.

What you provide

You hash your files locally and only send the hashes -- the server never sees your raw data. The SDK needs two things:

  1. Your data -- files or bytes to hash (the SDK computes SHA-256 for you via hashBytes)
  2. A signing key -- identifies you and prevents forgery. Generate one with generateKeypair() or pb-js key generate

Everything else has sensible defaults. Two things you might want to change:

Single-item commitments: coin-flip mode

When you commit a batch of items, the protocol uses a Fisher-Yates shuffle to select exactly ceil(probability * item_count) items -- this guarantees at least 1 item is always selected.

But when you commit a single item, the protocol switches to coin-flip mode. Each commitment is an independent Bernoulli trial: the beacon randomness is fed into an HMAC along with your commitment, and if the output falls below the probability threshold, the item is selected. At the default 10% probability, there's a 10% chance it's selected and a 90% chance it's not.

If the coin doesn't land, nothing happens -- but you can commit the same data again. Each new commitment gets a different timestamp, which means a different beacon round, which means a fresh coin flip. Over repeated commitments, the probability compounds: after 10 independent commits at 10%, the chance of being selected at least once is ~65%. After 22 commits, it exceeds 90%.

This is useful when you have a single piece of data (like a model or a report) that you want to subject to ongoing random audits. Each commit is cheap, and eventually the beacon will select it.

Revealing data

Revealing is optional and can happen outside the protocol entirely. You can publish your data anywhere and anyone can verify it against the committed hashes offline -- the receipt is self-contained. If you use the server's reveal endpoint, you can optionally include file_urls mapping item hashes to public URLs, and the server will fetch and verify the data matches.

Advanced options

Randomness beacon

The protocol uses drand quicknet as its randomness source:

The beacon interface is pluggable -- other randomness sources can be added.

Repository structure

├── crates/
│   ├── pb-core/              # Rust: all crypto primitives (zero network I/O)
│   └── pb-cli/               # Rust: native CLI (future)
├── src/
│   └── core/                 # Shared Hono routes, beacon logic, signing, DID resolution
├── packages/
│   ├── pb-wasm/              # WASM bindings for pb-core (@nocherry/core)
│   ├── pb-js/                # JavaScript SDK wrapping WASM (@nocherry/sdk)
│   ├── pb-node/              # HTTP server - Hono, portable (@nocherry/server)
│   ├── pb-storage/           # Storage adapters - SQLite, Postgres, PGlite via Kysely
│   └── pb-demo/              # Interactive 7-step demo web app (@nocherry/demo)
├── deploy/
│   ├── cloudflare/           # Cloudflare Workers + D1 deployment
│   ├── google-cloud/         # Cloud Run + Firestore deployment
│   ├── supabase/             # Supabase Edge Functions + Postgres
│   └── pglite/               # In-process WASM Postgres (dev/testing)
└── docs/                     # Protocol specifications

Quick start

CLI

# Generate a signing key
npx pb-js key generate

# Hash a directory, create a commitment, and submit to servers
npx pb-js go ./my-data

The go command is a guided wizard that walks you through the full flow: hash, commit, submit, wait for beacon, and get your receipt.

SDK

import { generateKeypair, hashBytes, createCommitment } from '@nocherry/sdk';

// 1. Generate a key (or load an existing one)
const kp = generateKeypair();

// 2. Hash your data
const hashes = files.map(f => hashBytes(f));

// 3. Create a commitment (only items + key required, everything else defaults)
const commitment = createCommitment(
  JSON.stringify({ items: hashes }),
  kp.private_key_hex
);

// 4. Submit to a server
const res = await fetch('https://server.example/v1/commitments', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(commitment),
});
const receipt = await res.json();

Interactive demo

npm install
npm start -w packages/pb-demo
# Open http://localhost:7421

Walks through all 7 steps with a live drand beacon ticker, real Ed25519 signing, and server-side receipt generation.

Run tests

npx tsx --test packages/pb-node/test/api.test.ts   # Server API tests
npx tsx --test packages/pb-demo/test/walkthrough.test.js  # Demo tests
cargo test                                          # Rust core tests

Deployment

The server runs anywhere Hono runs:

Target Storage Guide
Node.js SQLite (better-sqlite3) npm start -w packages/pb-node
Cloudflare Workers D1 (SQLite at edge) deploy/cloudflare/
Google Cloud Run Firestore deploy/google-cloud/
Supabase Edge Functions Postgres deploy/supabase/
PGlite (in-process) WASM Postgres deploy/pglite/

Documentation

License

See LICENSE for details.