Proof of Human Alignment (PoHA) Protocol

Proof of Human Alignment (PoHA) is a contribution and reward protocol that powers Reppo Network — a decentralized infrastructure for generating and curating preference and AI alignment data to train and evaluate human-aligned AI systems.

PoHA incentivizes two key behaviors:

  • Creation of AI-generated content that reflects human values, intentions, and quality standards.

  • Evaluation of that content via human preference signals, such as ranking, voting, or comparative feedback.

These contributions generate rich, scalable preference datasets — a crucial ingredient for aligning large models and autonomous systems with what people actually want.

The Alignment Problem

AI systems are only as good as the data they’re trained on. Yet most models today are optimized for objective proxies — not subjective human preferences.

Things like:

  • Which response feels more helpful?

  • Which image better matches a prompt?

  • Which answer sounds more human?

  • Which outcome would you choose?

These are questions only real humans can answer — but doing so at scale has traditionally been expensive, slow, and inaccessible.

Preference Data is the Missing Link

Preference data — human-generated rankings, comparisons, or judgments between multiple AI outputs — is the foundation of techniques like Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI.

But so far, only large tech companies have had the infrastructure to collect it at scale.

Reppo changes that. By turning evaluation into a collaborative, rewarded process — and using PoHA to measure contribution — anyone can help align AI systems through their preferences.

How PoHA Works

Creator submits AI-generated content -> Voters vote on quality of content -> Preference and alignment data is generating via creator<>voter interactions -> Protocol aggregates, scores, and rewards contributionsSmart contract–governed emissions

At the end of each epoch (e.g., weekly), a fixed pool of tokens is distributed proportionally to contributors based on their Proof of Human Alignment score.

1. Creation

Creators submit AI-generated content: images, texts, behaviors, etc. Their goal is to produce content that:

  • Resonates with human preferences

  • Ranks well in community evaluation

  • Feeds into AI models as valuable training data

Each submission becomes part of a growing preference dataset — labeled, ranked, and scored by the community.

2. Evaluation (Preference and Alignment Data)

Evaluators generate preference data by:

  • Voting on top submissions

  • Ranking outputs by alignment or quality

  • Comparing pairs (“Which of these two is better?”)

This crowdsourced evaluation creates rich, fine-grained signals that are:

  • Resistant to noise

  • Diverse across demographics

  • Informative for RLHF and reward modeling

The more consistent and useful an evaluator’s preferences, the higher their PoHA score.

3. Reward Distribution

At each epoch, token emissions are split:

  • To creators, based on how highly their work ranks in preference evaluations

  • To evaluators, based on the accuracy and impact of their judgments (compared to consensus or model feedback)

  • To the treasury, to support grants, tooling, and research

Rewards scale with measured contribution to alignment — not just activity or popularity.