Skip to main content

PRISM Consensus Engine

QoreChain embeds PRISM (Policy-driven Reinforcement-learning for Intelligent State Machines), a reinforcement-learning optimization layer, directly into the consensus layer via the x/rlconsensus module. PRISM observes chain metrics every N blocks, runs inference through a fixed-point neural network, and proposes consensus parameter adjustments — all deterministically, with no floating-point arithmetic in consensus-critical paths.

The PRISM optimization loop: observe chain state, run policy inference, clamp and apply parameter changes, then feed the result back.


Architecture Overview

PRISM consists of four components:

  1. Observation Collector — Gathers 25-dimensional chain state vectors at configurable intervals.
  2. Policy Network (MLP) — A Go-native multi-layer perceptron that maps observations to actions.
  3. Reward Computer — Evaluates the quality of parameter changes using a weighted multi-objective function.
  4. Circuit Breaker — Monitors chain health and reverts all PRISM-tuned parameters if instability is detected.

All components operate within the ABCI lifecycle and produce deterministic, verifiable outputs across all validator nodes.


Policy Network

The policy network is a feedforward multi-layer perceptron (MLP) implemented entirely in Go with int64 fixed-point arithmetic (scaled by 10^8).

Network Architecture

PropertyValue
Input dimensions25
Hidden layers2
Hidden layer sizes256, 256
Output dimensions5
Activation (hidden)ReLU
Activation (output)tanh
Total parameters73,733
Precisionint64 fixed-point (scaled by 10^8)

Parameter Count Breakdown

Layer 1: 25 * 256 + 256 = 6,656 (input -> hidden_1)
Layer 2: 256 * 256 + 256 = 65,792 (hidden_1 -> hidden_2)
Layer 3: 256 * 5 + 5 = 1,285 (hidden_2 -> output)
Total: 73,733

Fixed-Point Arithmetic

All MLP computations use int64 values scaled by FixedPointScale = 10^8. This eliminates non-determinism from IEEE 754 floating-point rounding differences across hardware platforms.

  • Multiplication: fixMul(a, b) = (a / SCALE) * b + (a % SCALE) * b / SCALE (split to prevent overflow)
  • ReLU: relu(x) = max(0, x)
  • tanh: Pade approximant tanh(x) ~ x * (3*S - x^2) / (3*S + x^2) for |x| <= 2.5*SCALE, clamped to +/- SCALE otherwise

Policy weights are stored on-chain as a flattened []int64 vector and can be updated via governance proposal.


Observation Vector

PRISM collects a 25-dimensional observation vector at each observation interval (default: every 10 blocks).

IndexDimensionDescription
0block_utilizationBlock gas used / block gas limit
1tx_countNumber of transactions in the block
2avg_tx_sizeMean transaction size in bytes
3block_timeTime since previous block (ms)
4block_time_deltaBlock time minus target block time (ms)
5gas_price_50thMedian gas price
6gas_price_95th95th-percentile gas price
7mempool_sizeNumber of pending transactions
8mempool_bytesTotal bytes of pending transactions
9validator_countActive validator count
10validator_giniGini coefficient of validator power distribution
11missed_block_ratioFraction of validators that missed signing
12avg_commit_latencyAverage commit round latency (ms)
13max_commit_latencyMaximum commit round latency (ms)
14precommit_ratioFraction of precommits received
15failed_tx_ratioFraction of failed transactions
16avg_gas_per_txMean gas consumed per transaction
17reward_per_validatorMean reward per validator (uqor)
18slash_countNumber of slashing events in observation window
19jail_countNumber of jail events in observation window
20inflation_rateCurrent emission rate
21bonded_ratioBonded tokens / total supply
22reputation_meanMean reputation score across active validators
23reputation_stddevStandard deviation of reputation scores
24mev_estimateEstimated MEV extracted (heuristic)

All values are stored as LegacyDec string representations and converted to int64 fixed-point before inference.


Action Space

The MLP output is a 5-dimensional action vector, where each dimension represents a proposed change to a consensus parameter. The tanh activation constrains raw outputs to [-1, 1], which are then scaled by mode-specific bounds.

IndexAction DimensionDescription
0block_time_deltaProposed change to target block time (ms)
1gas_price_deltaProposed change to base gas price
2validator_set_size_deltaProposed change to target validator set size (logged only, not applied)
3pool_weight_rpos_deltaProposed change to RPoS pool priority weight
4pool_weight_dpos_deltaProposed change to DPoS pool priority weight

Actions are clamped to the maximum change bounds defined by the current PRISM mode before application.


Reward Function

The reward signal evaluates how well recent parameter changes improved chain performance. It is computed as a weighted sum of five objectives:

R = 0.30 * delta_throughput
+ 0.25 * delta_finality
+ 0.20 * delta_decentralization
- 0.15 * mev_estimate
- 0.10 * failed_tx_ratio
ComponentWeightDirectionSource Metric
Throughput+0.30MaximizeChange in block utilization
Finality+0.25MaximizeChange in precommit ratio
Decentralization+0.20MaximizeNegative change in validator Gini coefficient
MEV-0.15MinimizeCurrent MEV estimate
Failed Transactions-0.10MinimizeCurrent failed transaction ratio

The reward weights are governance-configurable and must sum to exactly 1.0.


PRISM Modes

PRISM operates in one of four modes, controllable via governance:

ModeIDMax ChangeBehavior
Shadow00%Observe and log recommendations only. No parameters are changed. This is the default mode.
Conservative1+/- 10%Apply parameter changes within tight bounds. Suitable for initial live deployment.
Autonomous2+/- 25%Apply parameter changes within wider bounds. For mature networks with validated policies.
Paused30%PRISM is completely idle. No observations are collected and no inference runs.

Mode transitions require a governance proposal. The recommended deployment path is: Shadow → Conservative → Autonomous.


Circuit Breaker

The circuit breaker is a safety mechanism that monitors chain health and automatically reverts all PRISM-tuned parameters if instability is detected.

Detection Logic

The circuit breaker evaluates the last 50 blocks (configurable via circuit_breaker_window):

  1. Compute block time deltas — For each consecutive pair of block timestamps, compute the block time delta.
  2. Classify healthy blocks — A block is considered healthy if its delta is positive and within 2x the target block time.
  3. Compute healthy fraction — Compute the healthy fraction = healthy blocks / total deltas.

Trigger Condition

If the healthy fraction falls below the threshold (default: 50%), the circuit breaker triggers.

Response

When triggered, the circuit breaker:

  1. Reverts all PRISM-applied parameters (block time, gas price, pool weights) to their default values.
  2. Pauses PRISM (sets CircuitBreakerActive = true).
  3. Clears the in-memory policy to force a fresh reload.
  4. Emits a circuit_breaker_triggered event.

The circuit breaker automatically clears when the healthy fraction recovers above the threshold on subsequent evaluations.


Rollup Advisory Functions

PRISM provides advisory functions for rollup parameter optimization:

  • SuggestRollupProfile — Analyzes current chain conditions and suggests optimal rollup configuration parameters (block time, gas limit, settlement frequency).
  • OptimizeRollupGas — Recommends gas pricing adjustments for rollup settlement transactions based on main chain congestion patterns.

These functions are informational only and do not modify chain state.


Deterministic Math Library

All PRISM calculations use the mathutil package, which provides deterministic alternatives to standard floating-point math:

FunctionDescriptionMethod
IntegerSqrt(x)Square rootNewton's method on LegacyDec, 100-iteration convergence
TaylorLn1PlusX(x)Natural logarithm ln(1+x)Argument reduction + 15-term Taylor series
ExpApprox(x)Exponential e^x12-term Taylor series
SigmoidApprox(x)Sigmoid 1/(1+e^-x)ExpApprox with symmetry for negative inputs
ReputationMultiplier(r)Maps [0,1] to [0.5,2.0]Sigmoid with scale and offset

All functions operate on cosmossdk.io/math.LegacyDec values, ensuring identical results across all hardware platforms and Go compiler versions.


Parameters

ParameterTypeDefaultDescription
enabledbooltrueEnable PRISM
observation_intervaluint6410Blocks between observation collections
agent_modePrismMode0 (Shadow)Current operating mode
max_change_conservativeLegacyDec0.10Maximum parameter change in Conservative mode
max_change_autonomousLegacyDec0.25Maximum parameter change in Autonomous mode
circuit_breaker_windowuint6450Number of recent blocks monitored by circuit breaker
circuit_breaker_thresholdLegacyDec0.50Minimum healthy block fraction before trigger
default_block_time_msint645000Default target block time (ms)
default_base_gas_priceLegacyDec100Default base gas price
default_validator_set_sizeuint64100Default target validator set size
reward_weight_throughputLegacyDec0.30Reward weight for throughput improvement
reward_weight_finalityLegacyDec0.25Reward weight for finality improvement
reward_weight_decentralizationLegacyDec0.20Reward weight for decentralization improvement
reward_weight_mevLegacyDec0.15Penalty weight for MEV extraction
reward_weight_failed_txsLegacyDec0.10Penalty weight for failed transactions
  • Consensus Mechanism — the consensus layer PRISM optimizes.
  • AI Engine — the broader on-chain AI services and endpoints.
  • Tokenomics — how RL signals feed reward and parameter adjustments.