PRISM Consensus Engine
QoreChain embeds PRISM (Policy-driven Reinforcement-learning for Intelligent State Machines), a reinforcement-learning optimization layer, directly into the consensus layer via the x/rlconsensus module. PRISM observes chain metrics every N blocks, runs inference through a fixed-point neural network, and proposes consensus parameter adjustments — all deterministically, with no floating-point arithmetic in consensus-critical paths.
The PRISM optimization loop: observe chain state, run policy inference, clamp and apply parameter changes, then feed the result back.
Architecture Overview
PRISM consists of four components:
- Observation Collector — Gathers 25-dimensional chain state vectors at configurable intervals.
- Policy Network (MLP) — A Go-native multi-layer perceptron that maps observations to actions.
- Reward Computer — Evaluates the quality of parameter changes using a weighted multi-objective function.
- Circuit Breaker — Monitors chain health and reverts all PRISM-tuned parameters if instability is detected.
All components operate within the ABCI lifecycle and produce deterministic, verifiable outputs across all validator nodes.
Policy Network
The policy network is a feedforward multi-layer perceptron (MLP) implemented entirely in Go with int64 fixed-point arithmetic (scaled by 10^8).
Network Architecture
| Property | Value |
|---|---|
| Input dimensions | 25 |
| Hidden layers | 2 |
| Hidden layer sizes | 256, 256 |
| Output dimensions | 5 |
| Activation (hidden) | ReLU |
| Activation (output) | tanh |
| Total parameters | 73,733 |
| Precision | int64 fixed-point (scaled by 10^8) |
Parameter Count Breakdown
Layer 1: 25 * 256 + 256 = 6,656 (input -> hidden_1)
Layer 2: 256 * 256 + 256 = 65,792 (hidden_1 -> hidden_2)
Layer 3: 256 * 5 + 5 = 1,285 (hidden_2 -> output)
Total: 73,733
Fixed-Point Arithmetic
All MLP computations use int64 values scaled by FixedPointScale = 10^8. This eliminates non-determinism from IEEE 754 floating-point rounding differences across hardware platforms.
- Multiplication:
fixMul(a, b) = (a / SCALE) * b + (a % SCALE) * b / SCALE(split to prevent overflow) - ReLU:
relu(x) = max(0, x) - tanh: Pade approximant
tanh(x) ~ x * (3*S - x^2) / (3*S + x^2)for|x| <= 2.5*SCALE, clamped to +/- SCALE otherwise
Policy weights are stored on-chain as a flattened []int64 vector and can be updated via governance proposal.
Observation Vector
PRISM collects a 25-dimensional observation vector at each observation interval (default: every 10 blocks).
| Index | Dimension | Description |
|---|---|---|
| 0 | block_utilization | Block gas used / block gas limit |
| 1 | tx_count | Number of transactions in the block |
| 2 | avg_tx_size | Mean transaction size in bytes |
| 3 | block_time | Time since previous block (ms) |
| 4 | block_time_delta | Block time minus target block time (ms) |
| 5 | gas_price_50th | Median gas price |
| 6 | gas_price_95th | 95th-percentile gas price |
| 7 | mempool_size | Number of pending transactions |
| 8 | mempool_bytes | Total bytes of pending transactions |
| 9 | validator_count | Active validator count |
| 10 | validator_gini | Gini coefficient of validator power distribution |
| 11 | missed_block_ratio | Fraction of validators that missed signing |
| 12 | avg_commit_latency | Average commit round latency (ms) |
| 13 | max_commit_latency | Maximum commit round latency (ms) |
| 14 | precommit_ratio | Fraction of precommits received |
| 15 | failed_tx_ratio | Fraction of failed transactions |
| 16 | avg_gas_per_tx | Mean gas consumed per transaction |
| 17 | reward_per_validator | Mean reward per validator (uqor) |
| 18 | slash_count | Number of slashing events in observation window |
| 19 | jail_count | Number of jail events in observation window |
| 20 | inflation_rate | Current emission rate |
| 21 | bonded_ratio | Bonded tokens / total supply |
| 22 | reputation_mean | Mean reputation score across active validators |
| 23 | reputation_stddev | Standard deviation of reputation scores |
| 24 | mev_estimate | Estimated MEV extracted (heuristic) |
All values are stored as LegacyDec string representations and converted to int64 fixed-point before inference.
Action Space
The MLP output is a 5-dimensional action vector, where each dimension represents a proposed change to a consensus parameter. The tanh activation constrains raw outputs to [-1, 1], which are then scaled by mode-specific bounds.
| Index | Action Dimension | Description |
|---|---|---|
| 0 | block_time_delta | Proposed change to target block time (ms) |
| 1 | gas_price_delta | Proposed change to base gas price |
| 2 | validator_set_size_delta | Proposed change to target validator set size (logged only, not applied) |
| 3 | pool_weight_rpos_delta | Proposed change to RPoS pool priority weight |
| 4 | pool_weight_dpos_delta | Proposed change to DPoS pool priority weight |
Actions are clamped to the maximum change bounds defined by the current PRISM mode before application.
Reward Function
The reward signal evaluates how well recent parameter changes improved chain performance. It is computed as a weighted sum of five objectives:
R = 0.30 * delta_throughput
+ 0.25 * delta_finality
+ 0.20 * delta_decentralization
- 0.15 * mev_estimate
- 0.10 * failed_tx_ratio
| Component | Weight | Direction | Source Metric |
|---|---|---|---|
| Throughput | +0.30 | Maximize | Change in block utilization |
| Finality | +0.25 | Maximize | Change in precommit ratio |
| Decentralization | +0.20 | Maximize | Negative change in validator Gini coefficient |
| MEV | -0.15 | Minimize | Current MEV estimate |
| Failed Transactions | -0.10 | Minimize | Current failed transaction ratio |
The reward weights are governance-configurable and must sum to exactly 1.0.
PRISM Modes
PRISM operates in one of four modes, controllable via governance:
| Mode | ID | Max Change | Behavior |
|---|---|---|---|
| Shadow | 0 | 0% | Observe and log recommendations only. No parameters are changed. This is the default mode. |
| Conservative | 1 | +/- 10% | Apply parameter changes within tight bounds. Suitable for initial live deployment. |
| Autonomous | 2 | +/- 25% | Apply parameter changes within wider bounds. For mature networks with validated policies. |
| Paused | 3 | 0% | PRISM is completely idle. No observations are collected and no inference runs. |
Mode transitions require a governance proposal. The recommended deployment path is: Shadow → Conservative → Autonomous.
Circuit Breaker
The circuit breaker is a safety mechanism that monitors chain health and automatically reverts all PRISM-tuned parameters if instability is detected.
Detection Logic
The circuit breaker evaluates the last 50 blocks (configurable via circuit_breaker_window):
- Compute block time deltas — For each consecutive pair of block timestamps, compute the block time delta.
- Classify healthy blocks — A block is considered healthy if its delta is positive and within 2x the target block time.
- Compute healthy fraction — Compute the healthy fraction = healthy blocks / total deltas.
Trigger Condition
If the healthy fraction falls below the threshold (default: 50%), the circuit breaker triggers.
Response
When triggered, the circuit breaker:
- Reverts all PRISM-applied parameters (block time, gas price, pool weights) to their default values.
- Pauses PRISM (sets
CircuitBreakerActive = true). - Clears the in-memory policy to force a fresh reload.
- Emits a
circuit_breaker_triggeredevent.
The circuit breaker automatically clears when the healthy fraction recovers above the threshold on subsequent evaluations.
Rollup Advisory Functions
PRISM provides advisory functions for rollup parameter optimization:
SuggestRollupProfile— Analyzes current chain conditions and suggests optimal rollup configuration parameters (block time, gas limit, settlement frequency).OptimizeRollupGas— Recommends gas pricing adjustments for rollup settlement transactions based on main chain congestion patterns.
These functions are informational only and do not modify chain state.
Deterministic Math Library
All PRISM calculations use the mathutil package, which provides deterministic alternatives to standard floating-point math:
| Function | Description | Method |
|---|---|---|
IntegerSqrt(x) | Square root | Newton's method on LegacyDec, 100-iteration convergence |
TaylorLn1PlusX(x) | Natural logarithm ln(1+x) | Argument reduction + 15-term Taylor series |
ExpApprox(x) | Exponential e^x | 12-term Taylor series |
SigmoidApprox(x) | Sigmoid 1/(1+e^-x) | ExpApprox with symmetry for negative inputs |
ReputationMultiplier(r) | Maps [0,1] to [0.5,2.0] | Sigmoid with scale and offset |
All functions operate on cosmossdk.io/math.LegacyDec values, ensuring identical results across all hardware platforms and Go compiler versions.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable PRISM |
observation_interval | uint64 | 10 | Blocks between observation collections |
agent_mode | PrismMode | 0 (Shadow) | Current operating mode |
max_change_conservative | LegacyDec | 0.10 | Maximum parameter change in Conservative mode |
max_change_autonomous | LegacyDec | 0.25 | Maximum parameter change in Autonomous mode |
circuit_breaker_window | uint64 | 50 | Number of recent blocks monitored by circuit breaker |
circuit_breaker_threshold | LegacyDec | 0.50 | Minimum healthy block fraction before trigger |
default_block_time_ms | int64 | 5000 | Default target block time (ms) |
default_base_gas_price | LegacyDec | 100 | Default base gas price |
default_validator_set_size | uint64 | 100 | Default target validator set size |
reward_weight_throughput | LegacyDec | 0.30 | Reward weight for throughput improvement |
reward_weight_finality | LegacyDec | 0.25 | Reward weight for finality improvement |
reward_weight_decentralization | LegacyDec | 0.20 | Reward weight for decentralization improvement |
reward_weight_mev | LegacyDec | 0.15 | Penalty weight for MEV extraction |
reward_weight_failed_txs | LegacyDec | 0.10 | Penalty weight for failed transactions |
Related
- Consensus Mechanism — the consensus layer PRISM optimizes.
- AI Engine — the broader on-chain AI services and endpoints.
- Tokenomics — how RL signals feed reward and parameter adjustments.