JudgmentMeter

1. Executive Summary

The Core Problem: For decades, brands measured success via "Ranking" (SEO) and "Sentiment" (Social Listening). These metrics are obsolete in the age of AI. AI intermediaries (like ChatGPT, Amazon Rufus, Perplexity) do not "rank" links; they judge information. They do not "feel" sentiment; they evaluate evidence. Currently, brands are "flying blind," unaware of how they are being represented, cited, or excluded by the machines that now control discovery. The Solution: JudgmentMeter is an AI Observability Dashboard. It replaces sentiment analysis with "Evidence Telemetry." It measures whether a brand is present in the AI's consideration set, whether it is cited as an authority, and whether its data shapes the final outcome. The Goal: To move from optimizing for Visibility (Eyeballs) to optimizing for Judgment (Machine Trust). "You can’t optimize what you can’t see".

2. User Personas

The AI Visibility Strategist (Primary): Needs to prove to executives that "good content" (marketing fluff) is failing and that "structured data" is driving the brand's inclusion in AI answers.

The Risk/Compliance Officer: Needs to know if the AI is citing consumer opinions (e.g., Reddit threads) as legal facts, creating liability.

The Data Architect: Uses the tool to validate if schema changes (e.g., implementing Tri-Layer) successfully improved the "legibility" of the brand to external agents.

3. Core Value Proposition

Beyond Sentiment: Replaces "Do users like us?" with "Is our data usable?".

The New Funnel: Tracks the AI decision funnel: Retrieval (Presence) → Evaluation (Citation) → Generation (Influence).

Traceability: Provides the "Evidence Chain" to explain why an AI recommended a competitor over the client (e.g., "Competitor data was structured; yours was ambiguous").

4. Functional Requirements

4.1. Metric 1: Retrieval Inclusion (Presence)

Definition: "Are you included at all in the AI’s consideration set?".

Requirement: The system must simulate user intents (prompts) across major AI intermediaries (ChatGPT, Claude, Gemini, Rufus) and analyze the initial retrieval context.

Failure State Detection: The tool must identify "Silent Exclusion"—cases where the brand is relevant but never retrieved due to poor ontology or lack of specific keywords in the vector space.

4.2. Metric 2: Citation Authority (The "Context vs. Evidence" Classifier)

Definition: "When you are included, are you treated as an authority or as an example?".

Requirement: A classification engine that analyzes how the brand is mentioned in the AI output.

Classification States:

    ◦ Primary Evidence: The brand is the source of the fact (e.g., "According to [Brand] policy...").

    ◦ Context/Example: The brand is listed as an option but not the source of truth.

    ◦ Anecdote: The brand is mentioned only via third-party reviews (High Risk).

Rationale: "Being cited as evidence is fundamentally different from being summarized as context".

4.3. Metric 3: Outcome Influence (The Counterfactual Engine)

Definition: "Does your information affect the outcome?".

Requirement: The system must measure "Share of Judgment."

Logic: If the brand's data node is removed from the context window, does the AI's advice change? If yes, the brand has High Influence. If no, the brand is "decorative".

4.4. The Lab: Evidence Chain Mapping

Requirement: A visualization tool that reverse-engineers a specific AI response.

Feature: Traceability Graph. It maps the final assertion (e.g., "This product is safe") back to the specific input source (e.g., "Clinical Trial PDF v2").

Value: "Systems that cannot explain how they arrived at a decision are not intelligent—they are indefensible". This helps identify when an AI is hallucinating or relying on outdated data.

5. Technical Architecture & Dashboard

Input Sources:

• Simulated Agents (simulating Amazon Rufus, ChatGPT, etc.).

• Internal Search Logs (if available).

OntoGraph and Tri-Layer data structures (to validate if structured data is being picked up).

Dashboard Modules:

1. The Judgment Score: A composite score (0-100) combining Presence, Citation, and Influence.

2. The Hallucination Risk Monitor: Flags instances where the AI attributes a claim to the brand that does not exist in the "Authoritative Knowledge" layer.

3. The "Why You Lost" Inspector: Analyzes competitor wins. (e.g., "Competitor X won because they provided a 'Confidence Score' and you provided a 'Marketing Slogan'").

6. Use Cases

Use Case A: The "Invisible" Best-Seller

Scenario: A brand has the best-selling product on Amazon but is never recommended by Amazon Rufus for "best for sensitive skin."

JudgmentMeter Analysis: The "Retrieval Inclusion" metric is 0%. The Lab reveals that the product description uses the term "Gentle" (ambiguous) while the AI looks for "Hypoallergenic" (specific).

Action: The brand updates the OntoGraph to map "Gentle" to "Hypoallergenic." JudgmentMeter tracks the subsequent rise in retrieval.

Use Case B: The Liability Trap

Scenario: An AI assistant recommends a cleaning product for "cleaning electronics." The brand does not support this use case.

JudgmentMeter Analysis: The "Citation Authority" metric flags a "High Risk" citation. The AI is citing a user review (Opinion Layer) as usage instruction (Fact Layer).

Action: The brand uses Tri-Layer to explicitly tag that review as "Anecdotal" and publishes a "Do Not Infer" rule via the OntoGraph API.

7. Success Metrics (KPIs)

1. Share of Judgment: The percentage of relevant queries where the brand is the "Recommended Action" (not just a link in the list).

2. Evidence Utilization Rate: How often the brand's structured data (e.g., ingredients, return policy) is quoted verbatim vs. summarized/hallucinated.

3. Correction Speed: Time taken to detect and fix a "Drift" event where the AI starts citing outdated information.

8. Roadmap

Phase 1 (The Mirror): Ability to input a prompt and see the "Evidence Chain" for a single AI response.

Phase 2 (The Monitor): Automated daily tracking of "Presence" and "Citation" for top 50 brand keywords.

Phase 3 (The Interactor): Integration with TruthCalibrate to automatically suggest confidence adjustments when "Outcome Influence" drops.