The Three Gates Model and Witness Protocol

Abstract

Toxic content, hate speech, and misinformation have become endemic to social media platforms, degrading public discourse and causing measurable psychological and social harm. Existing approaches—algorithmic content moderation, human review, and community-based systems like X's Community Notes—have proven inadequate, intervening reactively after harm has occurred and failing to create genuine accountability for bad-faith participation. This paper presents Plugg's integrated solution: the Three Gates Model, which embeds ethical reflection into engagement mechanics, combined with the Witness Protocol, a distributed verification system where reputation serves as both stake and reward. By mapping the ancient philosophical framework of the Three Gates of Speech—Is it true? Is it necessary? Is it kind?—onto social media's universal engagement actions (comment, repost, like), and by requiring users to stake reputation when witnessing the truth or falsity of claims, Plugg creates a platform where toxic content carries immediate, tangible consequences and constructive participation builds lasting credibility.

1. Introduction

Social media stands as perhaps the most consequential social invention of the twenty-first century. These platforms have connected billions of people across geographic, cultural, and linguistic boundaries, democratising content creation and enabling forms of community, commerce, and coordination previously impossible. The ability to share images, links, short-form and long-form text, audio, and video has fundamentally reshaped how humanity communicates, organises, and understands itself.

Yet this same infrastructure has become a vector for hate speech, harassment, misinformation, and toxic discourse that degrades individual well-being and social cohesion. Research consistently demonstrates that false and inflammatory content spreads faster and further than accurate, constructive content (Vosoughi, Roy, & Aral, 2018). Community spaces—from professional associations to alumni networks to neighbourhood forums—frequently devolve into hostile environments that silence constructive participants while amplifying destructive voices.

Platform responses have largely failed. Algorithmic moderation produces inconsistent results. Human review cannot scale. Community-based systems like X's Community Notes, while innovative in concept, lack the accountability mechanisms needed to create genuine consequences for bad-faith participation. All these approaches share a fundamental limitation: they are reactive, intervening only after harmful content has been created, distributed, and consumed.

This paper presents Plugg's alternative: a proactive architecture that embeds ethical reflection and distributed accountability directly into the platform's engagement mechanics. The solution comprises two integrated components:

The Three Gates Model maps the ancient philosophical framework attributed to Rumi—Is it true? Is it necessary? Is it kind?—onto the three universal social media engagement actions: commenting, reposting, and liking. This creates a platform where every engagement is an ethical choice rather than a thoughtless reaction.

The Witness Protocol establishes a distributed verification system where users stake reputation when affirming or challenging claims. Drawing on the biblical principle that ‘from the mouth of two or three witnesses shall truth be established,’ this system creates accountability without centralising epistemic authority in any single arbiter—human or algorithmic.

2. The Failure of Current Approaches

2.1 Algorithmic Content Moderation

Major platforms rely heavily on algorithmic systems to detect and remove violating content. These systems face insurmountable challenges. Context-dependence defeats pattern matching: the same words may be hateful slur or reclaimed identity marker depending on speaker and context. Adversarial adaptation means bad actors continuously evolve their language to evade detection. Scale limitations produce both false positives (removing legitimate speech) and false negatives (missing violating content). Most fundamentally, these systems are reactive—they intervene only after content has been posted and often after it has already spread.

2.2 Human Review Systems

Human moderators offer contextual understanding that algorithms lack, but face their own limitations. Scale makes comprehensive review impossible; Facebook alone processes billions of posts daily. Moderator trauma from constant exposure to harmful content produces high turnover and psychological harm (Roberts, 2019). Inconsistency across reviewers and regions creates arbitrary enforcement. And like algorithmic systems, human review is inherently reactive—the harm has already occurred before the review takes place.

2.3 Community Notes and Its Limitations

X's Community Notes (formerly Birdwatch) represents the most sophisticated attempt at distributed content verification. The system allows users to add contextual notes to posts, which become visible when users across the political spectrum rate them as helpful. While innovative, Community Notes has structural weaknesses that limit its effectiveness:

Table 1: Structural Limitations of Community Notes

Limitation	Problem	Consequence
No stakes	Note writers risk nothing; posters lose nothing	No accountability for bad-faith participation
Reactive timing	Notes appear after virality	Harm occurs before correction
Binary output	Note appears or doesn't	No nuance; no degrees of confidence
Consensus dependency	Requires 'diverse' agreement	Easily stalemated on contested topics
Identity detachment	Anonymous rating	No accumulated credibility or consequence

The fundamental problem across all these approaches is the absence of skin in the game. Users can post toxic content, amplify misinformation, and participate in bad faith with minimal personal consequence. Until consequences are embedded in the act of engagement itself, toxic content will continue to thrive.

3. The Plugg Platform: Reputation as Currency

Plugg is a social platform built on a fundamentally different premise: reputation is currency. Unlike platforms where engagement metrics are disconnected from personal stake, Plugg users accumulate reputation through constructive participation and lose reputation through destructive behaviour. This reputation is visible, persistent, and consequential—affecting how users are perceived, what opportunities they access, and how their contributions are weighted.

This architecture creates the preconditions for the Three Gates Model and Witness Protocol to function. When reputation has real value, users have genuine stake in their behaviour. The consequence of falsehood becomes dire and immediately discourages people from bad-faith participation. Unlike platforms where a banned account can be replaced with a new one at no cost, Plugg users who destroy their reputation lose something accumulated over time—credibility that cannot be instantly recreated.

Key architectural features of Plugg's reputation system include:

Visibility:: Reputation scores are publicly displayed, creating social accountability.
Persistence:: Reputation accumulates over time; it cannot be reset by creating a new account.
Dimensionality:: Reputation has multiple components (accuracy, helpfulness, civility), providing nuanced signal.
Consequence:: Reputation affects platform privileges, content visibility, and network trust.

4. The Three Gates Model

The Three Gates of Speech, often attributed to the thirteenth-century Persian poet Jalāl ad-Dīn Muhammad Rūmī, articulates a principle found across philosophical traditions: before speaking, one should consider whether the words pass through three gates—Is it true? Is it necessary? Is it kind? The Three Gates Model maps these philosophical filters onto the three universal engagement mechanics that define virtually all social media platforms.

4.1 First Gate: Truth → Comment

The first gate asks: Is it true? This maps to the commenting function. When a user comments on content, they add their perspective—whether agreement, disagreement, additional information, or alternative interpretation. Crucially, the model recognises that truth is perspectival rather than universal. What is ‘true’ for one person may not be true for another, depending on their experiences, knowledge, and standpoint.

Therefore, Plugg does not simply ask commenters whether their comment is true. It requires them to explain how it is true—to provide reasoning, evidence, or acknowledgment of perspective. The comment interface prompts: ‘Share your perspective and explain your reasoning.’ This transforms comments from reactive assertions into reasoned contributions.

4.2 Second Gate: Necessity → Repost

The second gate asks: Is it necessary? This maps to the repost/share function. The logic is intuitive: if content is truly necessary—if it serves a purpose, adds value, addresses a genuine need—it should reach further.

Before amplifying content, Plugg prompts users: ‘This will reach your network. Is sharing it valuable for them?’ This intervention targets the thoughtless amplification that drives virality of sensationalist content, outrage bait, and misinformation—content that spreads not because it is necessary but because it is provocative.

4.3 Third Gate: Kindness → Like

The third gate asks: Is it kind? This maps to the like/heart function. The like button, in its various platform-specific manifestations, fundamentally expresses positive regard. It is, at its best, an act of kindness—an acknowledgment, an encouragement, a gift of attention and approval.

Unlike commenting (which requires explanation) or reposting (which requires assessment of value), liking requires nothing more than genuine positive sentiment. The like button becomes, explicitly, the kindness button—a way to contribute positive energy to the feed without requiring elaboration or justification. Kindness should be easy.

Table 2: The Three Gates Mapping

Gate	Question	Action	Implementation
First Gate	Is it true?	Comment	Explain your perspective
Second Gate	Is it necessary?	Repost	Assess value for network
Third Gate	Is it kind?	Like	Express positive regard

5. The Witness Protocol

The Three Gates Model creates a framework for thoughtful engagement. But how do we verify claims? How do we distinguish truth from falsehood without making AI—or any single authority—the arbiter of truth? The Witness Protocol addresses this challenge through distributed verification with reputation stakes, drawing on the biblical principle: ‘From the mouth of two or three witnesses shall the truth be established.’

5.1 The Claim Taxonomy

Not all claims are alike. The Witness Protocol distinguishes three types of claims, each requiring different verification approaches:

Table 3: Claim Taxonomy and Verification Approaches

Claim Type	Example	Verification Approach
Factual	"Lagos population is 15 million"	AI benchmarks against verified external sources
Interpretive	"This policy will harm small businesses"	Surface diverse reasoning; assess argument quality
Value-based	"This is unjust"	No verification; enforce 'explain how' requirement

This taxonomy enables AI to play different roles depending on claim type: fact-checker for factual claims, organiser of perspectives for interpretive claims, and enforcer of reasoning requirements for value claims.

5.2 Witness Actions and Reputation Stakes

Users can act as witnesses to claims made by others. Each witness action stakes reputation, creating personal accountability:

Affirm: ‘I corroborate this claim.’ The witness stakes reputation that the claim is accurate. If the claim is later proven false, the affirming witness loses their stake. If proven true, they gain reputation for accurate witnessing.
Challenge: ‘I dispute this claim.’ The witness stakes reputation that the claim is inaccurate. If the claim is later proven true, the challenging witness loses their stake. If proven false, they gain reputation for accurate challenge.
Contextualise: ‘This is true if...’ The witness adds nuance or conditions. This action carries lower stakes but builds reputation for constructive contribution.

5.3 Verification Pathways by Claim Type

Factual Claims: AI checks the claim against verified external sources. If sources clearly confirm or refute the claim, it receives a verified/false flag. If sources conflict, the system surfaces the conflict without arbitrating. Witnesses on the wrong side of a clear determination lose their stake.

Interpretive Claims: AI assesses reasoning quality (not the conclusion itself). Witnesses cluster into perspectives. There is no single ‘winner’—but stronger reasoning surfaces through the quality assessment. Reputation flows to well-reasoned positions regardless of which ‘side’ they support.

Value Claims: No verification is possible or attempted. The system enforces the ‘explain how’ requirement but does not judge whether the value is ‘right.’ Reputation is unaffected by value positions.

5.4 The Consequence Layer

On Plugg, reputation consequences compound over time, creating powerful incentives for honest participation:

Consistent false witness → Credibility collapse: Users who repeatedly affirm false claims or challenge true claims see their reputation decline significantly, affecting their standing across the platform.
Consistent accurate witness → ‘Trusted Voice’ status: Users who demonstrate reliable judgment accumulate credibility, with their witness actions carrying greater weight over time.
Pattern of manipulation → Visible warning: Users who engage in coordinated false witnessing or other manipulation tactics receive visible warnings on their profiles.

6. Technical Architecture

6.1 Claim Classification Algorithm

When a user posts a comment (having passed through the First Gate), the system must classify the claim to determine the appropriate verification pathway:

Step 1: Natural language processing identifies assertion statements within the comment.
Step 2: Each assertion is classified by type based on linguistic markers. Factual claims contain verifiable quantities, dates, or named entities. Interpretive claims contain causal language. Value claims contain evaluative language.
Step 3: The system assigns a confidence score to the classification. Low-confidence classifications default to interpretive (the safest category).
Step 4: Users may contest the classification, triggering human review for edge cases.

6.2 Truth Signal Computation

The system computes a truth signal for each claim based on multiple factors. This signal is not a binary true/false determination but a multidimensional assessment:

Truth Signal Function

T_s = f(IC, SD, RQ, EV, CA)

IC — Independent Corroboration:: How many unconnected users have affirmed the claim? Higher independence scores more heavily.
SD — Source Diversity:: Do affirming witnesses come from diverse backgrounds, geographies, or perspectives?
RQ — Reasoning Quality:: Has the claim been explained well? Does the 'explain how' component provide substantive reasoning?
EV — External Verification:: For factual claims only — does the claim match verified external sources?
CA — Counter-argument Strength:: Has the claim survived strong challenges? A claim that withstands scrutiny scores higher than an uncontested claim.

6.3 Reputation Dynamics

User reputation updates dynamically based on their actions and outcomes:

Witnessing outcomes:: Correct witness actions increase reputation; incorrect actions decrease it. The magnitude depends on the difficulty of the judgment.
Original claim outcomes:: Users who consistently post accurate, well-reasoned claims build reputation; those who post false or poorly-reasoned claims lose it.
Time decay:: Very old actions gradually decrease in weight, allowing rehabilitation while preserving patterns.
Witness weight:: Over time, users with strong track records have their witness actions weighted more heavily — not to create an elite, but to recognise demonstrated reliability.

7. Safeguards and Edge Cases

Coordinated false witnessing: Bad actors might coordinate to mass-affirm false claims. The system defends against this through: (a) independence scoring that down-weights clustered accounts; (b) source diversity requirements; (c) pattern detection that flags suspicious coordination; and (d) the fundamental disincentive that all coordinating accounts lose reputation when the claim is proven false.
Expert domains: The system allows optional expertise verification, where users can credential themselves in specific domains. However, this remains optional and domain-specific — general reputation does not require credentialing.
Contested domains: Some factual claims remain genuinely contested among experts. The system handles this by surfacing the contest rather than arbitrating — showing users that authorities disagree and presenting the strongest arguments from each position.
Appeals mechanism: Users may appeal reputation consequences through a structured process that escalates to human review for significant disputes.

8. Implementation Roadmap

Phase 1 — Pilot Community (Q2 2026)

Deploy to a controlled pilot community (initially, an alumni association network) to test mechanics, gather feedback, and calibrate parameters before broader launch.

Phase 2 — Three Gates Launch (Q3 2026)

Roll out the Three Gates engagement framework to all Plugg users, with educational onboarding explaining the philosophy and mechanics.

Phase 3 — Witness Protocol Beta (Q4 2026)

Introduce witness functions with conservative stake sizes, monitoring for edge cases and manipulation attempts.

Phase 4 — Full Integration (2027)

Complete integration of Three Gates, Witness Protocol, and reputation consequences, with ongoing refinement based on data and community feedback.

9. Discussion and Ethical Considerations

The role of AI: The system explicitly avoids making AI the sole arbiter of truth. AI plays supporting roles — classifying claims, checking factual claims against sources, assessing reasoning quality — but the Witness Protocol distributes epistemic authority across the community. AI is a tool, not a judge.
Power concentration: Could 'Trusted Voice' status create a new elite? The design guards against this through: (a) transparent criteria for trust status; (b) continuous accountability (trusted users can lose status); (c) weight limits (no single user's witness can be determinative); and (d) emphasis on reasoning quality alongside reputation.
Chilling effects: Will reputation stakes discourage participation? The design mitigates this through: (a) low stakes for initial participation; (b) contextualisation as a lower-stakes option; (c) emphasis that value claims carry no reputation risk; and (d) the positive incentive of building reputation through accurate participation.
Cultural context: The Three Gates framework has broad cross-cultural resonance, with parallels in Buddhist Right Speech, Aristotelian rhetoric, and African communal discourse traditions. Implementation will remain attentive to cultural variation in speech norms.

10. Conclusion

Toxic content and misinformation have proven resistant to the reactive approaches that dominate current platform design. Plugg offers a different path: proactive architecture that embeds ethical reflection and distributed accountability into the fundamental mechanics of social engagement.

The Three Gates Model transforms engagement from thoughtless reaction to thoughtful action, mapping ancient wisdom onto modern mechanics: Truth (explain your comment), Necessity (assess your repost), Kindness (offer your like). The Witness Protocol creates distributed verification where reputation stakes ensure accountability—‘from the mouth of two or three witnesses shall the truth be established.’

Together, these systems leverage Plugg's distinctive architecture—where reputation functions as currency—to create consequences for toxic behaviour and rewards for constructive participation. The result is a platform designed not merely to connect people, but to foster a social feed characterised by truth thoughtfully shared, value wisely assessed, and kindness freely given.

Social media has determined much of how we live in the twenty-first century. With the Three Gates Model and Witness Protocol, Plugg aims to demonstrate that we can live better—that the same technology that has amplified toxicity can, by design, amplify wisdom instead.

References

Aristotle. (1991). On rhetoric: A theory of civic discourse (G. A. Kennedy, Trans.). Oxford University Press.
Arnett, R. C., Fritz, J. M. H., & Bell, L. M. (2009). Communication ethics literacy: Dialogue and difference. SAGE Publications.
Duggan, M. (2017). Online harassment 2017. Pew Research Center.
Harvey, P. (2000). An introduction to Buddhist ethics: Foundations, values and issues. Cambridge University Press.
Lenhart, A., Ybarra, M., Zickuhr, K., & Price-Feeney, M. (2016). Online harassment, digital abuse, and cyberstalking in America. Data & Society Research Institute.
Roberts, S. T. (2019). Behind the screen: Content moderation in the shadows of social media. Yale University Press.
Sunstein, C. R. (2017). #Republic: Divided democracy in the age of social media. Princeton University Press.
Tufekci, Z. (2018). YouTube, the great radicalizer. The New York Times.
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151.

plugg.africa

Incubated by Cocoon Letters — The African-Centric Think-and-Do-Tank