Oracle risk¶

The public polymarket-oracle-risk package scores Polymarket markets on manipulation-risk, from 0 (objective named source) to ~1 (pure moderator judgment). It's the refusal-gate input for K-Fish trading and also stands alone on PyPI as a PhD-adjacent research contribution.

Feature vector¶

13 features total (blueprint §2.3), in stable column order:

#	Feature	Signal
1	`liveness_hours`	shorter = less review time
2	`log1p_bond_usdc`	smaller bond = thinner deterrent
3	`subjectivity_score`	[0, 1] — LLM-rated
4	`resolution_source_concreteness`	1.0 = named authoritative source
5	`hhi_uma_top10`	Herfindahl of UMA top-10 voters
6	`log1p_market_volume_usd`	attack incentive
7	`log1p_max_single_wallet_position_usd`	whale flag
8	`price_distance_from_extremes`	`min(p, 1-p)` — contested-ness
9	`hours_to_resolution`	last-minute surprise window
10	`price_moved_pct_24h`	late divergence from consensus
11	`proposer_whitelisted`	MOOv2 post-UMIP-189 flag
12	`similar_market_dispute_rate`	beta-binomial shrunk per category
13	`llm_grok_disagrees_with_market`	inter-model disagreement

Posterior fit¶

NumPyro NUTS with priors set to reflect a ~2% base rate for disputes:

\[ \beta \sim \mathcal{N}(0, I),\ \alpha \sim \mathcal{N}(-3, 2),\ \Pr(\text{disputed}\mid x) = \sigma(\alpha + \beta^\top x) \]

For 13 features and <50 high-profile dispute observations, the posterior is wide by design. predict_one() returns the posterior mean + 95% credible interval; callers report the interval, not the point.

Zelenskyy regression test¶

The canonical subjectivity-scorer fail mode is to let vague "consensus of credible reporting" language through as if it were an objective source. The Zelenskyy-suit market (May-July 2025) is the archetypal case. Every release of polymarket-oracle-risk must score that text ≥ 0.75:

from polymarket_oracle_risk.subjectivity import (
    ZELENSKYY_RESOLUTION_TEXT, rule_based_score
)
assert rule_based_score(ZELENSKYY_RESOLUTION_TEXT).score >= 0.75

The test lives at tests/test_subjectivity_zelenskyy.py and gates every PR.

Refusal gate¶

\[ \text{action}(p, \text{risk}) = \begin{cases} \textbf{refuse} & \text{risk.mean} > 0.35 \\ \textbf{refuse} & \text{risk.width} > 0.50 \\ \text{size} \times \text{cap}(\text{risk.mean}) & \text{otherwise} \end{cases} \]

`risk.mean`	Position cap vs unconstrained Kelly
≤ 0.15	100%
≤ 0.25	30%
≤ 0.35	10%
> 0.35	refuse

The width guard is a CVaR-style second gate — a wide posterior means we don't trust the point estimate, so we refuse even if the mean is below the hard threshold. This caught the Zelenskyy-class case in backtest.

Honest limitations¶

<50 high-profile disputes → wide posteriors
Managed Proposer regime change (UMIP-189, Aug 2025) → segment training data; old dispute rates don't apply
HIP-4 has no dispute history yet → scores there are low-confidence until mainnet accumulates samples
UMA top-10 concentration is community-derived, not officially published — re-verify from the voting-v2 subgraph before using as a load-bearing feature.

Where this lives¶

ksk5429/polymarket-oracle-risk — the public repo (MIT)
src/polymarket_oracle_risk/features.py — feature construction
src/polymarket_oracle_risk/train.py — NumPyro model + FitSummary pickle
src/polymarket_oracle_risk/scorer.py — PosteriorModel + RiskScore
src/polymarket_oracle_risk/subjectivity.py — rule-based + LLM path
src/polymarket_oracle_risk/refusal_gate.py — the gate