Skip to content

log :: 2025‐04

Matthias Benkort edited this page May 22, 2025 · 14 revisions

Tip

KEYWORDS

Simulation Testing, Deterministic Pipeline, Radicle, VRF / KES, Voting Stake

SUMMARY

These logs cover advances in deterministic simulation: mapping types across the consensus pipeline, experimenting with Kmett-style machine compositions, biased multi-queue reads, and proving equivalence between monolithic and staged deployments.

They also report operational work—publishing Amaru on Radicle for decentralized collaboration, planning licensing / audit steps for the VRF & KES libraries, and refining a transport-agnostic, deterministic node architecture.

Finally, they detail governance-ledger updates that compute voting-stake distributions (including deposits) and maintain epoch-aligned snapshots for SPOs and DReps.

2025-04-25

Weekly simulation testing update (by SA)

Spent time on trying to better understand the Amaru consensus pipeline, basically I tried to put types on the arrows/queue between the stages/boxes in AB's diagram:

https://github.com/pragma-org/amaru/wiki/images/architecture.jpg

as well as the type of the state of each box and what effects each box performs. This excersise resulting in my own sketch of the pipeline:

https://github.com/pragma-org/simulation-testing/blob/main/moskstraumen/src/Moskstraumen/Experiment/AmaruPipeline.hs

I was also trying to understand how RK's downstream peer PR fits into the picture, and why his "sketch possible stage API and wiring" PR uses selective/biased reads from two sources.

I like to think partly because of all the questions I was asking, AB ended up updating the architecture diagram adding types to the arrows:

https://github.com/pragma-org/amaru/pull/195 https://github.com/pragma-org/amaru/blob/bd563570d32ef0ed783f87101b0fd95c2c271a41/crates/amaru-consensus/README.md

I'm still confused as to how it all works, but I think I managed to make some progress. From what I currently understand is that the DSL needs to be able to do RPC and IO (possibly async) for storage, as well as potentially reading from multiple sources in a biased way.

So in parallel I've continued on the experiments from last week. In particular I've got a version which has can await from multiple sources (not biased yet though), async IO and composes:

https://github.com/pragma-org/simulation-testing/blob/main/moskstraumen/src/Moskstraumen/Experiment/SourceSinkNode2.hs

The neat thing is that I'm managed to compose the source and sink with the pipeline as well, so with encoding/decodeing this already gives us a 5 stage pipeline (if one stage is the "business logic"). The construction ended up very similar to Kmett's machines package.

Possible next steps:

  1. Deploy the 5 stages in parallel and check that they give the same result as if we compose the 5 stages in deploy one big state machine sequentially, i.e. check that property mentioned last time holds:
      deploy (composeStateMachine stateMachine1 stateMachine2) =~
      composePipeline (deploy stateMachine1) (deploy stateMachine2)
    
  2. Figure out how to connect such pipelines where the source is "baked-in" with a simulator;
  3. Extend the DSL with RPC and async IO, ensure that point 1 and 2 still hold;
  4. Try to see if we can reuse machines and only add RPC and async IO to it, rather than rolling our own.

My hope is that once we got a model implementation in Haskell with all the features we need, then we can start porting it to Rust.

Amaru on radicle

Following a session at Buidler Fest #2, I thought it might be a good idea to provide support for radicle for Amaru. I therefore initialised a radicle repository for Amaru, whose id is rad:zkw8cuTp2YRsk1U68HJ9sigHYsTu.

% rad init

Initializing radicle 👾 repository in /Users/arnaud/projects/amaru/amaru..

✓ Name amaru
✓ Description
✓ Default branch main
✓ Visibility public
✓ Repository amaru created.

Your Repository ID (RID) is rad:zkw8cuTp2YRsk1U68HJ9sigHYsTu.
You can show it any time by running `rad .` from this directory.

✓ Repository successfully announced to the network.

Your repository has been announced to the network and is now discoverable by peers.
You can check for any nodes that have replicated your repository by running `rad sync status`.

To push changes, run `git push rad main`.

It's already published to quite a few seeders:

amaru % rad sync status
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ●   Node                                             Address                             Status   Tip       Timestamp      │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ ●   seed.radicle.xeppaka.cz        z6Mkeqf…E9uHMWp   seed.radicle.xeppaka.cz:8776        synced   d33ac3a   36 minutes ago │
│ ●   rad.decentralizedscience.org   z6MksHa…ewzJQHY   rad.decentralizedscience.org:8776   synced   d33ac3a   43 minutes ago │
│ ●   nihili.st                      z6Mkn5j…AHtErQE   nihili.st:8776                      synced   d33ac3a   45 minutes ago │
│ ●   radicle.linuxw.info            z6Mksqu…7327TEt   radicle.linuxw.info:8776            synced   d33ac3a   48 minutes ago │
│ ●   seed.cardano.hydra.bzh         z6MkfiR…9zBVQkK   cardano.hydra.bzh:8776              synced   d33ac3a   48 minutes ago │
│ ●   nullradix                      z6MkwR5…QUZXk8N                                       synced   d33ac3a   49 minutes ago │
│ ●   rad.spacetime.technology       z6Mkuva…QFbrzJR   rad.spacetime.technology:443        synced   d33ac3a   49 minutes ago │
│ ●   radicle.git.gg                 z6Mkf3h…53bJAqe   radicle.git.gg:8776                 synced   d33ac3a   49 minutes ago │
│ ●   osisaadmin                     z6Mkf8y…JjHenhA   osisaftp.no-ip.com:8775             synced   d33ac3a   49 minutes ago │
│ ●   rte                            z6MkogD…AngpVPq   root.seednode.garden:8776           synced   d33ac3a   50 minutes ago │
│ ●   Mistera-alpha                  z6MkuPD…X3pczth                                       synced   d33ac3a   50 minutes ago │
│ ●   seed.cielago.xyz               z6MkmvN…3GFjkf2   seed.cielago.xyz:8776               synced   d33ac3a   50 minutes ago │
│ ●   radicle.spacetime.technology   z6Mkq9e…vqYL4Dw   radicle.spacetime.technology:8776   synced   d33ac3a   50 minutes ago │
│ ●   seed.voidfarers.net            z6MkrEy…cAEYX1D   seed.voidfarers.net:8776            synced   d33ac3a   50 minutes ago │
│ ●   radicle.at                     z6MkjDY…HXsXzm5   seed.radicle.at:8776                synced   d33ac3a   50 minutes ago │
│ ●   highsunz@tom                   z6MkqLB…Y3Uz1FH                                       synced   d33ac3a   50 minutes ago │
│ ●   seed.radicle.garden            z6MkrLM…ocNYPm7   seed.radicle.garden:8776            synced   d33ac3a   50 minutes ago │
│ ●   ash.radicle.garden             z6Mkmqo…4ebScxo   ash.radicle.garden:8776             synced   d33ac3a   50 minutes ago │
│ ●   anondev1                       z6MkhMG…9zSj42f   103.227.96.201:8776                 synced   d33ac3a   50 minutes ago │
│ ●   rad.levitte.org                z6Mkh6T…ubNX2P3   rad.levitte.org:8776                synced   d33ac3a   50 minutes ago │
│ ●   seed.le-pri.me                 z6MkrvG…oaxakjw   seed.le-pri.me:8776                 synced   d33ac3a   50 minutes ago │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

05e33fb (update logbook)

2025-04-17

Weekly simulation testing update (by SA)

I've been reflecting on the discussions with AB and RK from the Paris workshop and RK's subsequent PR:

https://github.com/pragma-org/amaru/pull/187

One feature that he has is the ability to read from multiple queues with priority. I tried to reproduce it here, using inspiration from Ed Kmett's machines library:

https://github.com/pragma-org/simulation-testing/blob/main/moskstraumen/src/Moskstraumen/Experiment/AwaitNode3.hs

I think I can see how this could be simulated and also how such state machines could be composed, which ties back to my last update and what I also stressed in Paris: I think we want to prove the following:

     deploy (composeStateMachine stateMachine1 stateMachine2) =~
     composePipeline (deploy stateMachine1) (deploy stateMachine2)

and then use the left hand side when simulation testing (one stage pipeline with a big state machine running sequentially), while deploying the right hand side (several stages of small state machines running in parallel). Where pipeline deployment and compositions looks something like:

https://github.com/pragma-org/simulation-testing/blob/main/moskstraumen/src/Moskstraumen/Experiment/PipelineNode.hs

Obviously these are initial prototypes, which need to be refined, but at least they seem to suggest that it's possible.

One possible direction could be to try to use the machines library as a basis. (We know the Kmett has put a lot of effort into the structure of the machines datatypes.) I had looked at machines previously, but I couldn't see how to simulate them because of the selective reads, that part is now more clear. One thing that still isn't clear to me is if we could use their notion of Source machine, because this would somehow need to be able to talk to the simulator. If that isn't possible we could potentially merely use machines for the pipeline and feed it inputs from the simulator.

2025-04-10

VRF and KES libraries situation

We had a meeting with the original creator of the Rust libraries to align on a plan to clarify the situation.

Brief overview of the current state of VRF and KES libraries:

  • we (Amaru) are using txpipe forks of IOHK projects written 3 years ago
  • there were some improvements to VRF for speeding up verif which were never used in the node (batch compat)
  • there's now a standard in VRF so we should move to new standard but this requires coordination with cardano-node and is way down in the priorities

We then discussed licensing/publication, and how to move forward as we are committed to maintain those libraries for use by the community:

  • The plan is to move the current code in Pallas, but this requires clearing out licensing issues and status of dalek for.
  • For licensing, seems like we need to get in touch with IOE/G? The license is permissive but we want to make sure it's not perceived as adversarial
  • Regarding the dalek fork, there does not seem to be any hurdle to merging:

We had a discussion about the audit:

  • it seems desirable to pay for a proper audit by companies
  • Inigo mentioned some well-known candidates: trail-of-bits, zksecurity, JP Aumasson (reviewed C implementation)
  • We know that for a 2 weeks x 2 persons project, it costs $100k
  • as there's not much code, it could probably cost less for VRF and KES
  • From Inigo's point of view, the code is so straightforward the risks are limited

Next steps:

  • Reach out to IO to clear licensing issues
  • Update dalek PR and push for official merge to remove the need for a fork
  • Remove batch compat code from VRF library to reduce the footprint
  • Merge code in Pallas coordinating with Santi
  • request quote for a proper audit

2025-04-04

Weekly simulation testing update (by SA)

Started the conversation about deterministic nodes with AB, JE and RK.

One challenge that came up is deterministically simulating pipelines of state machines rather merely one state machine, as I've documented so far. I hoping to postponing this problem until after we got the non-pipelined approach working (which I started documenting here), but AB pointed out that it could be difficult to change it later.

So I started thinking about this problem this week. I got a first prototype here:

https://github.com/pragma-org/simulation-testing/blob/main/moskstraumen/src/Moskstraumen/Experiment/PipelineNode.hs

It has stages and can do async IO so far. It's not deterministic yet, next step would be to make it single threaded, and then add things like RPC calls to the DSL.

A property that I think seems good to have is that, assuming each stage in the pipeline is giving by deploying a state machine of type SM state input output = (input, state) -> (output, state), the following holds:

     deploy (composeStateMachine stateMachine1 stateMachine2) =~
     composePipeline (deploy stateMachine1) (deploy stateMachine2)

Where deploy : SM s i o -> Pipeline i o. If we can show this, then we could test with the composed single stage sequential state machine, but deploy the parallel pipeline. This would mean we cut the state space when testing, while maintaining parallelism when deploying to "production".

Deterministic Amaru Architecture

  • Main question: how to make the pipeline deterministic?
  • Main blocker seems that pallas-network is not deterministic as it ties in low-level network IO directly
  • But we also want to abstract the TPC part w/in Amaru, in order to support other protocols eg. QUIC, HTTP, Websockets,... so there's a general need to abstract away the transport details
  • How about making multithreading deterministic: it's possible to have parallel execution w/ determinism
    • The disruptor is an example of a deterministic structure
    • Note that this only works if units of work across producers/consumers of disruptor structure are independent
  • Q: does gasket provide pre-configured logging?
    • Gasket does not current manages multiple inputs/outputs
    • The separation b/w schedule and execute seems a bit artificial, need to understand better what need it addresses
  • RK goes through the downstream servers PR and design, based on homegrown Actors
  • Q: Is there a deterministic runtime in the Rust ecosystem?
    • There are a few, but perhaps we want something simpler, esp. if we want our tests to be portable across language barriers
  • Discussion about effects:
    • Block/header storage is simple because it's content-addressable and append only, the only problem being that if there's a failure and we don't have a block we're supposed to have anymore, this would be considered adversarial by downstream servers. At the very least we need to test these failures, and perhaps mitigate the risk by redownloading missing blocks?
    • what about the ledger store? The current state is updated after delta/diffs have been computed when applying a block, so it's mutable
    • howeverm, we have one checkpoint per epoch, so it's always possible to restart the sync process from the previous epoch which is reasonably fast (eg. under a minute?)
    • Side-effects we should express: timer, network I/O, storage, routing messages to different locations
  • Need to get more details about TxPipe's P2P work
  • checkout https://doc.akka.io/japi/akka-core/2.9/akka/stream/stage/GraphStage.html (akka streams)

2025-04-01

About voting stake distribution (@KtorZ)

Entities & lifecycle

Cardano's governance spreads responsibilities over three groups: stake pool operators (SPOs), delegate representatives (DReps) and a constitutional committee. Both SPOs and DReps have an associated stake as voting power. It corresponds (roughly, we'll get into the subtleties in a moment) to the sum of the stake delegated to them from delegators at an epoch boundary.

Votes tally happens at the beginning of an epoch, after rewards calculation, snapshots taken and retired stake pools pruned. Because the construction of the stake distribution is a potentially expensive operation, the stake distribution of the epoch that just ended cannot be readily available at an epoch boundary. For this reason, the stake distribution is computed incrementally during the next epoch and tally/ratification always happen with one epoch of delay (e.g. the transition between epoch e+1 and e+2 uses the voting stake distribution from e for the tally).

Summarised in a little diagram:

                                    Pruning retired        Computing rewards[^1] using:
                                    pools                  │ - snapshot(e + 2) for
                                    │                      │     - pool performances
                    Stake is        │                      │     - treasury & reserves
                    delegated       │                      │ - snapshot(e) for:
                    │               │                      │     - stake distribution
                    │               │                      │     - pool parameters
                    │               │Using snapshot(e)     │
                    │               │for leader schedule   │                  Distributing rewards
                    │               ││                     │                  earned from (e)
                    │               ││                     │                  │
    snapshot(e)     │ snapshot(e+1) ││     snapshot(e + 2) │                  │snapshot(e + 3)
              ╽     ╽             ╽ ╽╽                   ╽ ╽                  ╽╽
━━━━━━━━━━━━╸╸╸╋━██━██━██━██━██━╸╸╸╋╸╸╸━██━██━██━██━██━╸╸╸╋╸╸╸━██━██━██━██━██━╸╸╋╸╸╸━██━██━██━>
     e                e + 1          ╿      e + 2 ╿         ╿       e + 3             e + 4
                                     │            │         │
                                     │            Cast vote │
                                     │                      │
                                     │                      Ratifying proposals
                                     │                      using voting power
                                     │                      of (e + 1)
                                     │
                                     Computing voting power for
                                     (e + 1) using state from
                                     beginning of (e + 2)

As a reminder, each step in the diagram actually happens on every epoch boundaries, but we only represent a single flow on the diagram. Technically, pools pruning happens at the beginning of every epoch, rewards are calculated every epoch, etc... it's an endless cycle of calculations referring to past states.

Consequences:
  1. In addition to tracking the stake distribution of stake pools, we must also track the stake distribution of DReps. This corresponds to extra work to perform while computing the stake distribution, although much of this work can be mutualized with the existing stake distribution construction for SPOs. Plus, since the tally is deferred by an epoch, this means we can also construct the stake distribution asynchronously in a thread during the epoch while processing ongoing blocks.

  2. The stake distribution for voting happens after retired pools are pruned, although this steps happens after the database snapshot occurs. This means that in order to maintain consistency, we must replay the pool ticking steps during the asynchronous stake distribution calculation, so that we can identify pools that retired at the epoch transition. Their stake is therefore always excluded from the voting round of the epoch prior to their retirement.

Deposits

To prevent spam, governance proposals require a (substantial) deposit fixed by the protocol. The deposit is returned once the proposal expires, is ratified or is rejected. Interestingly, the value of those deposits also counts towards the active stake of both SPOs and DReps. This is unlike the rewards/consensus pools' stake distribution which doesn't include any sort of deposits (neither from proposals, nor from credentials). More precisely, the deposit is virtually credited to the return address defined in the proposal. If that stake address is itself delegated to a pool or a drep, it is counted as part of their respective voting stake.

Consequences:
  1. In our pools stake distribution, we must track a secondary value in addition to the rewards/consensus stake distribution which we call voting_stake. It is sensibly equivalent to the stake, except that it may contain the extra deposits when relevant.

  2. The deposits themselves aren't trivial to compute as we must first list out the still valid proposals. We now perform this computation as part of the governance summary, also computing the DRep state mapping dreps to their expiry epoch.

  3. Proposals that naturally expire must return their deposit to their corresponding reward account. In case the reward account doesn't or no-longer exists at the moment of the refund, the money is sent to the treasury instead.

Making Amaru deterministic

Some notes and a diagram from today's discussion about architecting Amaru for deterministic simulation.

  • SA introduces a language for expressing interactions b/w the node and its environment
  • Q: how rich does the language need to be?
  • What's important is that one can parameterize node behaviour by a Runtime which handles send/receive of messages
  • How to make at least consensus more deterministic:
    1. split validation process into "stateless" functions with (i, s) -> (o, s) signatures, where i and o are "side-effects" (messages to/from outside world)
    2. implement side-effects locally (eg. in consensus "driver" or node)
    3. investigate how to embed that into gasket
    4. investigate how to make gasket deterministic => swap the tokio runtime?
    5. what about networking part, eg. pallas-network? => see with Roland
    6. what about storage? diff structure would make this easier but we are not there yet.
  • waht about making the ledger block validation process "atomic", e.g read everything at start, apply pure rules, write down everything at end?
  • make a small example based on echo:
    1. received headers
    2. validate them (yes/no)
    3. RPC call to fetch block body -> could fail (eg. inject failures from tester)
      • could also be load/store something to persistent storage
    4. echo header/body if valid
    5. do not use gasket at first, but could be a good way to steer gasket towards deterministic behaviour
  • another thing to do: port simulator/tester to Rust

High-level architecture

Clone this wiki locally