-
Notifications
You must be signed in to change notification settings - Fork 20
log :: 2025‐04
Tip
Simulation Testing, Deterministic Pipeline, Radicle, VRF / KES, Voting Stake
These logs cover advances in deterministic simulation: mapping types across the consensus pipeline, experimenting with Kmett-style machine compositions, biased multi-queue reads, and proving equivalence between monolithic and staged deployments.
They also report operational work—publishing Amaru on Radicle for decentralized collaboration, planning licensing / audit steps for the VRF & KES libraries, and refining a transport-agnostic, deterministic node architecture.
Finally, they detail governance-ledger updates that compute voting-stake distributions (including deposits) and maintain epoch-aligned snapshots for SPOs and DReps.
Spent time on trying to better understand the Amaru consensus pipeline, basically I tried to put types on the arrows/queue between the stages/boxes in AB's diagram:
https://github.com/pragma-org/amaru/wiki/images/architecture.jpg
as well as the type of the state of each box and what effects each box performs. This excersise resulting in my own sketch of the pipeline:
I was also trying to understand how RK's downstream peer PR fits into the picture, and why his "sketch possible stage API and wiring" PR uses selective/biased reads from two sources.
I like to think partly because of all the questions I was asking, AB ended up updating the architecture diagram adding types to the arrows:
https://github.com/pragma-org/amaru/pull/195 https://github.com/pragma-org/amaru/blob/bd563570d32ef0ed783f87101b0fd95c2c271a41/crates/amaru-consensus/README.md
I'm still confused as to how it all works, but I think I managed to make some progress. From what I currently understand is that the DSL needs to be able to do RPC and IO (possibly async) for storage, as well as potentially reading from multiple sources in a biased way.
So in parallel I've continued on the experiments from last week. In particular I've got a version which has can await from multiple sources (not biased yet though), async IO and composes:
The neat thing is that I'm managed to compose the source and sink with the pipeline as well, so with encoding/decodeing this already gives us a 5 stage pipeline (if one stage is the "business logic"). The construction ended up very similar to Kmett's machines package.
Possible next steps:
- Deploy the 5 stages in parallel and check that they give the same result
as if we compose the 5 stages in deploy one big state machine
sequentially, i.e. check that property mentioned last time holds:
deploy (composeStateMachine stateMachine1 stateMachine2) =~ composePipeline (deploy stateMachine1) (deploy stateMachine2) - Figure out how to connect such pipelines where the source is "baked-in" with a simulator;
- Extend the DSL with RPC and async IO, ensure that point 1 and 2 still hold;
- Try to see if we can reuse
machinesand only add RPC and async IO to it, rather than rolling our own.
My hope is that once we got a model implementation in Haskell with all the features we need, then we can start porting it to Rust.
Following a session at Buidler Fest #2, I thought it might be a good idea to provide support for radicle for Amaru. I therefore initialised a radicle repository for Amaru, whose id is rad:zkw8cuTp2YRsk1U68HJ9sigHYsTu.
% rad init
Initializing radicle 👾 repository in /Users/arnaud/projects/amaru/amaru..
✓ Name amaru
✓ Description
✓ Default branch main
✓ Visibility public
✓ Repository amaru created.
Your Repository ID (RID) is rad:zkw8cuTp2YRsk1U68HJ9sigHYsTu.
You can show it any time by running `rad .` from this directory.
✓ Repository successfully announced to the network.
Your repository has been announced to the network and is now discoverable by peers.
You can check for any nodes that have replicated your repository by running `rad sync status`.
To push changes, run `git push rad main`.
It's already published to quite a few seeders:
amaru % rad sync status
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ● Node Address Status Tip Timestamp │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ ● seed.radicle.xeppaka.cz z6Mkeqf…E9uHMWp seed.radicle.xeppaka.cz:8776 synced d33ac3a 36 minutes ago │
│ ● rad.decentralizedscience.org z6MksHa…ewzJQHY rad.decentralizedscience.org:8776 synced d33ac3a 43 minutes ago │
│ ● nihili.st z6Mkn5j…AHtErQE nihili.st:8776 synced d33ac3a 45 minutes ago │
│ ● radicle.linuxw.info z6Mksqu…7327TEt radicle.linuxw.info:8776 synced d33ac3a 48 minutes ago │
│ ● seed.cardano.hydra.bzh z6MkfiR…9zBVQkK cardano.hydra.bzh:8776 synced d33ac3a 48 minutes ago │
│ ● nullradix z6MkwR5…QUZXk8N synced d33ac3a 49 minutes ago │
│ ● rad.spacetime.technology z6Mkuva…QFbrzJR rad.spacetime.technology:443 synced d33ac3a 49 minutes ago │
│ ● radicle.git.gg z6Mkf3h…53bJAqe radicle.git.gg:8776 synced d33ac3a 49 minutes ago │
│ ● osisaadmin z6Mkf8y…JjHenhA osisaftp.no-ip.com:8775 synced d33ac3a 49 minutes ago │
│ ● rte z6MkogD…AngpVPq root.seednode.garden:8776 synced d33ac3a 50 minutes ago │
│ ● Mistera-alpha z6MkuPD…X3pczth synced d33ac3a 50 minutes ago │
│ ● seed.cielago.xyz z6MkmvN…3GFjkf2 seed.cielago.xyz:8776 synced d33ac3a 50 minutes ago │
│ ● radicle.spacetime.technology z6Mkq9e…vqYL4Dw radicle.spacetime.technology:8776 synced d33ac3a 50 minutes ago │
│ ● seed.voidfarers.net z6MkrEy…cAEYX1D seed.voidfarers.net:8776 synced d33ac3a 50 minutes ago │
│ ● radicle.at z6MkjDY…HXsXzm5 seed.radicle.at:8776 synced d33ac3a 50 minutes ago │
│ ● highsunz@tom z6MkqLB…Y3Uz1FH synced d33ac3a 50 minutes ago │
│ ● seed.radicle.garden z6MkrLM…ocNYPm7 seed.radicle.garden:8776 synced d33ac3a 50 minutes ago │
│ ● ash.radicle.garden z6Mkmqo…4ebScxo ash.radicle.garden:8776 synced d33ac3a 50 minutes ago │
│ ● anondev1 z6MkhMG…9zSj42f 103.227.96.201:8776 synced d33ac3a 50 minutes ago │
│ ● rad.levitte.org z6Mkh6T…ubNX2P3 rad.levitte.org:8776 synced d33ac3a 50 minutes ago │
│ ● seed.le-pri.me z6MkrvG…oaxakjw seed.le-pri.me:8776 synced d33ac3a 50 minutes ago │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
05e33fb (update logbook)
I've been reflecting on the discussions with AB and RK from the Paris workshop and RK's subsequent PR:
https://github.com/pragma-org/amaru/pull/187
One feature that he has is the ability to read from multiple queues with priority. I tried to reproduce it here, using inspiration from Ed Kmett's machines library:
I think I can see how this could be simulated and also how such state machines could be composed, which ties back to my last update and what I also stressed in Paris: I think we want to prove the following:
deploy (composeStateMachine stateMachine1 stateMachine2) =~
composePipeline (deploy stateMachine1) (deploy stateMachine2)
and then use the left hand side when simulation testing (one stage pipeline with a big state machine running sequentially), while deploying the right hand side (several stages of small state machines running in parallel). Where pipeline deployment and compositions looks something like:
Obviously these are initial prototypes, which need to be refined, but at least they seem to suggest that it's possible.
One possible direction could be to try to use the machines library as a basis.
(We know the Kmett has put a lot of effort into the structure of the machines
datatypes.) I had looked at machines previously, but I couldn't see how to
simulate them because of the selective reads, that part is now more clear. One
thing that still isn't clear to me is if we could use their notion of Source
machine, because this would somehow need to be able to talk to the simulator.
If that isn't possible we could potentially merely use machines for the
pipeline and feed it inputs from the simulator.
We had a meeting with the original creator of the Rust libraries to align on a plan to clarify the situation.
Brief overview of the current state of VRF and KES libraries:
- we (Amaru) are using txpipe forks of IOHK projects written 3 years ago
- there were some improvements to VRF for speeding up verif which were never used in the node (batch compat)
- there's now a standard in VRF so we should move to new standard but this requires coordination with cardano-node and is way down in the priorities
We then discussed licensing/publication, and how to move forward as we are committed to maintain those libraries for use by the community:
- The plan is to move the current code in Pallas, but this requires clearing out licensing issues and status of dalek for.
- For licensing, seems like we need to get in touch with IOE/G? The license is permissive but we want to make sure it's not perceived as adversarial
- Regarding the dalek fork, there does not seem to be any hurdle to merging:
- see https://github.com/dalek-cryptography/curve25519-dalek/pull/377
- it provides patch/tests to better adhere to the standards
- seems just inertia
We had a discussion about the audit:
- it seems desirable to pay for a proper audit by companies
- Inigo mentioned some well-known candidates: trail-of-bits, zksecurity, JP Aumasson (reviewed C implementation)
- We know that for a 2 weeks x 2 persons project, it costs $100k
- as there's not much code, it could probably cost less for VRF and KES
- From Inigo's point of view, the code is so straightforward the risks are limited
Next steps:
- Reach out to IO to clear licensing issues
- Update dalek PR and push for official merge to remove the need for a fork
- Remove batch compat code from VRF library to reduce the footprint
- Merge code in Pallas coordinating with Santi
- request quote for a proper audit
Started the conversation about deterministic nodes with AB, JE and RK.
One challenge that came up is deterministically simulating pipelines of state machines rather merely one state machine, as I've documented so far. I hoping to postponing this problem until after we got the non-pipelined approach working (which I started documenting here), but AB pointed out that it could be difficult to change it later.
So I started thinking about this problem this week. I got a first prototype here:
It has stages and can do async IO so far. It's not deterministic yet, next step would be to make it single threaded, and then add things like RPC calls to the DSL.
A property that I think seems good to have is that, assuming each stage
in the pipeline is giving by deploying a state machine of type SM state input output = (input, state) -> (output, state), the following holds:
deploy (composeStateMachine stateMachine1 stateMachine2) =~
composePipeline (deploy stateMachine1) (deploy stateMachine2)
Where deploy : SM s i o -> Pipeline i o. If we can show this, then we could
test with the composed single stage sequential state machine, but deploy the
parallel pipeline. This would mean we cut the state space when testing, while
maintaining parallelism when deploying to "production".
- Main question: how to make the pipeline deterministic?
- Main blocker seems that pallas-network is not deterministic as it ties in low-level network IO directly
- But we also want to abstract the TPC part w/in Amaru, in order to support other protocols eg. QUIC, HTTP, Websockets,... so there's a general need to abstract away the transport details
- How about making multithreading deterministic: it's possible to have parallel execution w/ determinism
- The disruptor is an example of a deterministic structure
- Note that this only works if units of work across producers/consumers of disruptor structure are independent
- Q: does gasket provide pre-configured logging?
- Gasket does not current manages multiple inputs/outputs
- The separation b/w schedule and execute seems a bit artificial, need to understand better what need it addresses
- RK goes through the downstream servers PR and design, based on homegrown Actors
- Q: Is there a deterministic runtime in the Rust ecosystem?
- There are a few, but perhaps we want something simpler, esp. if we want our tests to be portable across language barriers
- Discussion about effects:
- Block/header storage is simple because it's content-addressable and append only, the only problem being that if there's a failure and we don't have a block we're supposed to have anymore, this would be considered adversarial by downstream servers. At the very least we need to test these failures, and perhaps mitigate the risk by redownloading missing blocks?
- what about the ledger store? The current state is updated after delta/diffs have been computed when applying a block, so it's mutable
- howeverm, we have one checkpoint per epoch, so it's always possible to restart the sync process from the previous epoch which is reasonably fast (eg. under a minute?)
- Side-effects we should express: timer, network I/O, storage, routing messages to different locations
- Need to get more details about TxPipe's P2P work
- checkout https://doc.akka.io/japi/akka-core/2.9/akka/stream/stage/GraphStage.html (akka streams)
Cardano's governance spreads responsibilities over three groups: stake pool operators (SPOs), delegate representatives (DReps) and a constitutional committee. Both SPOs and DReps have an associated stake as voting power. It corresponds (roughly, we'll get into the subtleties in a moment) to the sum of the stake delegated to them from delegators at an epoch boundary.
Votes tally happens at the beginning of an epoch, after rewards calculation, snapshots taken and retired stake pools pruned. Because the construction of the stake distribution is a potentially expensive operation, the stake distribution of the epoch that just ended cannot be readily available at an epoch boundary. For this reason, the stake distribution is computed incrementally during the next epoch and tally/ratification always happen with one epoch of delay (e.g. the transition between epoch e+1 and e+2 uses the voting stake distribution from e for the tally).
Summarised in a little diagram:
Pruning retired Computing rewards[^1] using:
pools │ - snapshot(e + 2) for
│ │ - pool performances
Stake is │ │ - treasury & reserves
delegated │ │ - snapshot(e) for:
│ │ │ - stake distribution
│ │ │ - pool parameters
│ │Using snapshot(e) │
│ │for leader schedule │ Distributing rewards
│ ││ │ earned from (e)
│ ││ │ │
snapshot(e) │ snapshot(e+1) ││ snapshot(e + 2) │ │snapshot(e + 3)
╽ ╽ ╽ ╽╽ ╽ ╽ ╽╽
━━━━━━━━━━━━╸╸╸╋━██━██━██━██━██━╸╸╸╋╸╸╸━██━██━██━██━██━╸╸╸╋╸╸╸━██━██━██━██━██━╸╸╋╸╸╸━██━██━██━>
e e + 1 ╿ e + 2 ╿ ╿ e + 3 e + 4
│ │ │
│ Cast vote │
│ │
│ Ratifying proposals
│ using voting power
│ of (e + 1)
│
Computing voting power for
(e + 1) using state from
beginning of (e + 2)
As a reminder, each step in the diagram actually happens on every epoch boundaries, but we only represent a single flow on the diagram. Technically, pools pruning happens at the beginning of every epoch, rewards are calculated every epoch, etc... it's an endless cycle of calculations referring to past states.
-
In addition to tracking the stake distribution of stake pools, we must also track the stake distribution of DReps. This corresponds to extra work to perform while computing the stake distribution, although much of this work can be mutualized with the existing stake distribution construction for SPOs. Plus, since the tally is deferred by an epoch, this means we can also construct the stake distribution asynchronously in a thread during the epoch while processing ongoing blocks.
-
The stake distribution for voting happens after retired pools are pruned, although this steps happens after the database snapshot occurs. This means that in order to maintain consistency, we must replay the pool ticking steps during the asynchronous stake distribution calculation, so that we can identify pools that retired at the epoch transition. Their stake is therefore always excluded from the voting round of the epoch prior to their retirement.
To prevent spam, governance proposals require a (substantial) deposit fixed by the protocol. The deposit is returned once the proposal expires, is ratified or is rejected. Interestingly, the value of those deposits also counts towards the active stake of both SPOs and DReps. This is unlike the rewards/consensus pools' stake distribution which doesn't include any sort of deposits (neither from proposals, nor from credentials). More precisely, the deposit is virtually credited to the return address defined in the proposal. If that stake address is itself delegated to a pool or a drep, it is counted as part of their respective voting stake.
-
In our pools stake distribution, we must track a secondary value in addition to the rewards/consensus stake distribution which we call
voting_stake. It is sensibly equivalent to thestake, except that it may contain the extra deposits when relevant. -
The deposits themselves aren't trivial to compute as we must first list out the still valid proposals. We now perform this computation as part of the governance summary, also computing the DRep state mapping dreps to their expiry epoch.
-
Proposals that naturally expire must return their deposit to their corresponding reward account. In case the reward account doesn't or no-longer exists at the moment of the refund, the money is sent to the treasury instead.
Some notes and a diagram from today's discussion about architecting Amaru for deterministic simulation.
- SA introduces a language for expressing interactions b/w the node and its environment
- Q: how rich does the language need to be?
- What's important is that one can parameterize node behaviour by a
Runtimewhich handles send/receive of messages - How to make at least consensus more deterministic:
- split validation process into "stateless" functions with
(i, s) -> (o, s)signatures, whereiandoare "side-effects" (messages to/from outside world) - implement side-effects locally (eg. in consensus "driver" or node)
- investigate how to embed that into gasket
- investigate how to make gasket deterministic => swap the tokio runtime?
- what about networking part, eg. pallas-network? => see with Roland
- what about storage? diff structure would make this easier but we are not there yet.
- split validation process into "stateless" functions with
- waht about making the ledger block validation process "atomic", e.g read everything at start, apply pure rules, write down everything at end?
- make a small example based on echo:
- received headers
- validate them (yes/no)
- RPC call to fetch block body -> could fail (eg. inject failures from tester)
- could also be load/store something to persistent storage
- echo header/body if valid
- do not use gasket at first, but could be a good way to steer gasket towards deterministic behaviour
- another thing to do: port simulator/tester to Rust
