From dabce6404b3778f10bbc21f9567f3f61e5f8271b Mon Sep 17 00:00:00 2001 From: Sky O Date: Wed, 7 Jun 2023 18:43:09 +0000 Subject: [PATCH] Initial commit --- hydra-now-changes.org | 94 +++++++++++++++++ hydra-pay.org | 77 ++++++++++++++ hydra-robustness.org | 230 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 401 insertions(+) create mode 100644 hydra-now-changes.org create mode 100644 hydra-pay.org create mode 100644 hydra-robustness.org diff --git a/hydra-now-changes.org b/hydra-now-changes.org new file mode 100644 index 0000000..7a23070 --- /dev/null +++ b/hydra-now-changes.org @@ -0,0 +1,94 @@ +#+title: Hydra Now Changes + +* Technical changes + +Here is a summary of the new technical changes made to HydraPay in the context of the latest demo, which is a full payment channel creation and send amongst two HydraNow users. + + +** Head Manager + +We have unified the surface area and implementation of head/hydra-node management into something called a Head Manager has both database state on heads, and runtime state for heads. A head is considered running if it is present and the ProcessHandles for the nodes are active. + +The head manager is run as part of HydraPay startup and will check its database for heads, then it will scan the logs of the heads and their persist directories to see +if they should be started up (determine state offline) and then will request ports and run all required hydra nodes. + +A head manager allows us to track heads in a less adhoc and seprated way. It provides a simple interface: + +This will take a configuration which is just the L1 addresses of the parties involved, and give you a handle to be able to refer to this head wether it is online of offline. + +The head manager persists each head in a database via generated unique id, and the user addresses of each participant in the head. + +A Head configuration is simply a set of individual Hydra Node configurations where each configuration corresponds to a Hydra Node that will run for this head. + +This includes the unique node id, the persistence directory, and a log file location for error messages, general logs, and the address and keys of the participant the Hydra Node is acting on the behalf of. + +There are many convenience types for Head configuration that you can use to generate the full node configs. For example for a head with some number participants you can simply provide the addresses of the two participants, and the pattern for the persistence, log, error log, and thread log files. From this we can generate the individual Hydra Node configurations. + +When you give this configuration to the Head Manager, it will persist this information so it can track this head across restarts, and upgrades, and give back a handle to the head to reference it. + +After you have handle to the Head, you can ask it to run the head. The running head state tracks the current status, and the various running Hydra Nodes. + +To transition a Head to the running state, the Head Manager will request new ports to run the hydra nodes, it needs 3 ports per node. It will then run the nodes and wait for them to respond, it does this by waiting for the nodes to respond over websocket, and it will also monitor the process handles of the nodes to see if any have stopped running from an error in configuration, or a crash, or an exhaustion of resources etc. + +Under the Head Manager each Hydra Node in each Hydra Head, will have a thread dedicated to processing the messages that the node produces out of its websocket api, this thread will update the Node's status, which will start at Unavailable and transition to Replaying (when the node is playing back past messages), Replayed, when the replay is complete, and PeersConnected when the node sees all the other nodes it should in the Head. + +It also sets up messages queues for each node (for moving the Head to new states through commands), and will associate each node with the user address, so when a user say, wants to commit, we can make sure we communicate with the correct node. + +Having the nodes and their associations known also lets the head manager round robin commands that cause L1 transactions but aren't specific to a single participant, like Init, or Close, or Fanout. Keeping the fuel usage even across all the participants. + +If you want to send commands to a Head, you simply provide the manager the head handle, and the command you would like to perform. Which we represent as a simple GADT for easy typesafe request response. + +The head manager will take the type of command into account, select the right node within the head to submit the command, and another thread will handle the "request response" portion, by translating your command into the appropriate websocket JSON as outlined in the Hydra API https://hydra.family/head-protocol/api-reference/ and knows which commands to look out for to definitively know the request has completed, and what the response was. + +Bounded queues are used in all interaction points to prevent space leaks. + +The Head Manager also has access to proxy information, in a way that makes these head interactions convenient: Instead of having to know the proxy address and information associated with a user address; The API/Interface to the Head manager only requires that address, and will look up and utilize the approriate proxy information (generating it if necessary), and make sure to route and defer all requests through the proxy automatically. + +So if you have two parties A and B and would like to start a Head between them, commit from both parties, and then send transactions within the head: The whole API only asks you for A's address and B's address, and will fuel the proxy, submit transactions to it, etc. + +** Payment Channel API + +Changes have been made to the payment channel API so that it now builds on top of the head manager, a payment channel simply refers to unique head, along with the name of the human-readable name of the channel, and lets the head manager manage all state. Payment Channels are also persisted, and include a reference in the database to the Head that they sit on top of. + +The payment channel API provides simple convenience functions on top of the ones offered by the head manager to only think on the level of payment channels aka + +- Create +- Prepare +- Accept +- Reject +- Send Ada in Channel +- Close + +These are simply compositions of the head manager functionality, and like the Head Manager, when interacting with the Payment Channel API you only need the addresses of the Users/Participants, and no knowledge of the underlying proxy infrastructure. + +** Implementation details of the End to End + +Now that we have looked at the technical details of the Head Management and how the Payment Channel API sits on top of this Head Management infrastructure, now we can dive into the details of an End to End. + +For a user to start a payment channel they must first ask HydraPay for a balanced transaction to pay to their proxy address. This serves two practical purposes. The first is to ensure a UTxO of the exact size they want to lock in the payment channel is avaiable, as Hydra requires exactly one or zero UTxOs to lock into the channel. + +The second is to serve as a "proof" of intent to create and lock these funds. + +Then they must specify the address ofwho will be the other end of this channel. + +After that they submit this information, then HydraPay will first submit the transaction to pay to the proxy, and wait for this transaction, upon success, it will check if the user's proxy requires fuel, if so it submits a transaction that it itself signs, submits, and waits for, to top up the fuel. + +HydraPay will then create the new Payment Channel, this causes database persistence of a new Hydra Head, and a New Payment Channel which refers to this head. + +HydraPay then directs the Head Manager to run this new payment channel (head), this will cause the nodes to spin up, and the process described in the Head Manager section to take place. + +When a Payment Channel is created under the payment channel API, the underlying head is automatically given an Init command and transitions to HeadIsInitializing, then the Commit from the payment channel initiator's proxy address will be sent. + +This way the payment channel is ready faster, and checks can be made to ensure the integrity early on. + +At this point the Payment Channel is considered to be in a "Pending" state, the other party will see they have a new pending request, and the expiry is set to 24 hours from the creation. + +When the other user goes to Accept a PaymentChannel request, they will select how much money they would like to commit, and that will also cause HydraPay to create a balanced transaction to pay to _this_ user's proxy address. + +Once again the signed transaction given to HydraPay serves those 2 roles of ensuring and approriately sized UTxO ready to be committed, and to act as a proof of intent. + +From here, the final Commit is made from this other participant, and then once HydraPay has confirmation the Head is Open, the PaymentChannel is also Open. + +At this point in the app, users can see the Payment Channel is Open and send money within the Payment Channel. Under the hood we leverage the Head Manager's command architecture to issue a NewTx and wait for the snapshot containing confirmation that transaction is complete. + +We also persist the details of each transaction so they can be reliably displayed on the users device. With each transaction we store the sender, amount, and time. diff --git a/hydra-pay.org b/hydra-pay.org new file mode 100644 index 0000000..d6bd43b --- /dev/null +++ b/hydra-pay.org @@ -0,0 +1,77 @@ +#+title: Hydra Pay + +What is the point of Hydra Pay + +Manage +Monitor +Interact + +with Hydra Heads + +We want to hide +Node setup and configuration +We want to intercept the Node messages to know about status + +Conceptually a head is a running blockchain +We care about +Status +- Participants +- Balances +- State (Open | Closed | Contestation) + +Proxy Addresses + +We don't want users to have to give us their private keys to use Hydra Heads +instead we match unique users up with a different address. + +HydraPay.Server.hs:156 +runHydraPay :: HydraPayConfig -> (State -> IO a) -> IO a + +The main entry point into hydra pay +currently this takes a configuration of mode (Managed or Unmanaged) which boils down to +should hydra-pay run a devenet or should it connect to an existing cardano node. + +data State HydraPay.Server.hs:98 +data HydraPayConfig HydraPay.Config.hs:6 + +When hydra pay is running it provides a state this state gives us +- The handle to the cardano node +- The mapping of L1 addresses to their proxy addresses and hydra keys +- The current heads +- The subscribers to those heads +- The Networks for each Head (Meaning the hydra-nodes that all belong to a head) +- The keypath where all the private keys of the proxy addresses are stored. +- getPorts and freePorts to allocate and deallocate ports for new hydra-nodes +- apiKey the key in config/backend/api-key for authentication +- config the bind address, port, and mode (Managed or Connect to existing node) + +runHydraPay is a bracketted function that gives you the State which is needed to run +'Hydra-Pay' actions. Similar to runDb like in gargoyle-postgres. + +The key interactions are going to be related to creation and management of heads +the Hydra Pay websocket API can be found in handleClientMessage HydraPay.Server.hs:1072 + +For each message there is a function that will take the state and modify it, returning the result. + +For example to create a Head, you would, from the websocket API, send a CreateHead +message containing the participants, and a unique name. That once it reaches hydra-pay +will call the createHead function: + +HydraPay.Server.hs 606 +createHead :: (MonadIO m, MonadLog (WithSeverity (Doc ann)) m) => State -> HeadCreate -> m (Either HydraPayError Head) + +which takes the State, and the HeadCreate information (embedded in CreateHead) and will perform the requested changes, spinning up the network, and giving back either a Head or Error. + +All of what hydra pay does can be tracked in this way, Find the action in client websocket ClientMsg type: + +data ClientMsg HydraPay.Api.hs:127 + +then find where that message is handled in the handleClientMessage + +handleClientMessage HydraPay.Server:1072 + +Then from there you can find the actual call made when that message is processed. + +Currently we shell out to cardano-cli and our transaction building is highly simplified, we would likely want to change to cardano-transaction-builder like dango/nunet is using as it seems to work pretty well. + +We are going to have to do a bunch of nix/dependecy bumping anyways to get hydra-pay on 8.10.7 at least. So this is necessary work regardless. diff --git a/hydra-robustness.org b/hydra-robustness.org new file mode 100644 index 0000000..fd3d24f --- /dev/null +++ b/hydra-robustness.org @@ -0,0 +1,230 @@ +#+title: Hydra Robustness/Stability/Simplicity +* The situation + +Essentially we need to know what is going on with some key components: + +- Cardano Node +- Hydra Heads +- Hydra Nodes + +The Hydra Head state comes from the Nodes, and with the nodes we care about their status, and whether they are running or not. + +Ideally we need an interface that can accurately internalize and understand all the of the information from +the various parts and prevent a large array of footguns. + +Lets start with the lifecycle and gotchas. + +High level API (request/response/monitoring): + +Lets look at it from a HydraNow specific lens: + +At the Payment Channel Level +I want to: +- Start a payment channel +- Send transactions in this payment channel +- Close the payment channel (which implies funds are given to the respective owners on L1) + +The interface should be as simple as this, the Payment Channel API should just be concered +with these key actions, and then failure should be represented at the Payment Channel Level? + +At the Hydra Head level, well the hydra head level doesn't really exist, in Hydra +the Head is essentially the Chain Data + State of the Hydra Head as seen by the nodes. + +But from an interface perspective it is nice to be able to have our code speak in Heads and not +necesarily nodes. + +So at the Head Level essentially we want the following RPC/Commands: + +Another important aspect of the + +** Create +This would be a combination of generating the configurations, running the nodes, and then Initing, I recommend this is just one concrete action in the API as it is something we always want to do, likely the first Commit is probably able to be incorporated into this command as well, but because you may not be able to/ready/want to commit right away, we can leave it out. + +Creation would then only persist a Head that was able to get through Init, this has a tradeoff that gas will be spent the second this action is performed, but only persisting Valid, working, heads allows us to make a lot of good assumptions about the state of the system. + +*** What are the failure states? +Configuration Error: The nodes will fail to start if configuration is wrong, so we should represent our types and parse don't validate to get into a state where running the process to start a Head begins with a valid configuration. In our API/implementation as the Types should not allow for invalid configurations, so configuration errors can likely be handled in the form of functions that take the various parameters and either produce a configuration or not. + +Node Error: The nodes fail to start, this should be an exceptional circumstance, but while we are figuring this all out we need to have a good way to track down why a node failed to start and fix it. So while this shouldn't be exposed in the User level API, we should have something to point us more directly to Node errors and crashes, likely this takes the form of the system self reporting, monitoring, and restarting threads. It should maybe even track the RAM and resource usage as a form of logging. Potentially logs from another part of the system also tell us why the node failed, for example: There is a crash when a NewTx is sent to a Node that is part of a Head that isn't considered Open, if it doesn't produce logs because of the crash, then we don't know what happened, but if we know the time of the crash, we can likey see around that time in the logs that NewTx was sent. + +Transaction fails to get poseted on chain: This is almost eloquently expressed in the Hydra Websocket API error PostTxOnChainFailed, the reason is burried in there, and we could likley parse it out to make it more obvious to us during development, usually this indicates Fuel is missing, we can respond to this error by topping up fuel using HydraNow's faucet, but HydraPay can't automatically handle this, as the response should be user defined... Maybe we can provide Hooks or callbacks for when this happens... + +Technically we should be able to see transactions the Node does, as the node is using the signing key of the proxy address... So worst case we can look directly at the chain to try and glean information, the issue with that is then we are re-implemnting the smart contract interaction layer of Hydra, which sounds like awful work to do. + +*** What State needs to be tracked? +We need to track the process handles for the nodes, which node is associated with which person, and we likely need some unique way to refer to a Head and each node process to be able to easily parse and find information through the logs, as lots of unforseen random things may happen. + +We need a database of Heads, likely using HeadId as the primary key. We also need to track whether that "Head" is running, meaning all the nodes are running and see eachother. + +Each Node's state is important, as the node state directly influences the Head state. For example, each node replays its state and then reacts to connected peers, if not all nodes have done this, the Head is in an unusable state. Apparently RollBacks can happen, though I don't know +how to parse RollBacks to resolve our representation of State... perhaps we just consider the Head unusable until a new message is produced that changes state. For example, you may Commit, and then a rollback happens, but likely another message like HeadIsInitialized will happen after a rollback. +(Maybe should ask the Hydra team about this?). The API currently doesn't expose anything about rollbacks, and it looks like we would just want to scan the logs. Potentially we only need to scan the logs, and then we don't care about the websocket information? + +*** What Events Change the State? +so essentially to get a Head, you give a configuration, and get back a Unique Head, the HeadID could be used to track this head, now that Heads would only be "created" when they indeed have state on L1 in a smart contract. +So within the API a Create happens, and the result of that create is either HeadId and some processes, or an error from one of the above failure states. + +The HeadID is given by HeadIsInitializing! + + +*** Mechanically what needs to happen +We start all the Nodes, we ensure they are alive +We then send an Init to one of the nodes. +The node will create state on L1 representing the Head, this costs gas +We will send a Init via the Node's websocket. +We will receive a HeadIsInitializing and this will contain the HeadID inidcating success... +What can happen on failure?? +For now I think failure is going to be CommmandFailed or PostTxOnChainFailed. + +So we need to likely check the state of the Node/Head as we see it, before allowing such a thing, though if create simply takes the configuration and proces Either Error HeadId, then we can avoid the CommandFailed part, meaning anything but PostTxOnChainFailed +is exceptional, the issue is we still need to know all the messages we got to be able to expand our failure logic in case we have missed something... + +** Commit (This needs to talk to the right node, it must be the node that acts on behalf of the participant committing) + +Commiting is quite the process because of the current limitation of Hydra's Commit Scheme. Essentially you must commit O or 1 UTxO and that UTxO must have the exact amount of ADA you would like to commit. +Currently we have proxy addresses, and we Pay (transfer ADA to) these addresses. This has a side optimization/simplification benefit, which is that the UTxO produced can be used to Commit. + +Meaning for HydraPay's API, Commit can take a HeadID, an address (of a participant, that isn't the participants proxy), and a transaction signed by the participant that pays to the proxy. We can submit the transaction, and then +we can just use the resulting UTxO directly as the commit input, and then also provide a form of commit that just commits with 0 UTxOs for a single direction payment channel if necessary. + +commitToHead :: MonadHydraPay m => HeadId -> CommitInfo -> m (Either Error ()) + +When a Head isn't running or a node is dead, we should restart it or something, but do we want commitToHead to wait, or do we want to design our system a different way, where you must explicity wait? +Usually the failure would likely trigger some other logic or something, so it makes sense to have commitToHead, wait for a Head to either succeed or fail, the issue is that we may wait indefinitely +if we have failed to consider a message that is actually a response for the Commit we have sent. + +So in general then, we have some messages we should always consider, and likely log... + +*** What State needs to be tracked +*** What are the failure states + +Failure would be an invalid commit, but we can likely avoid sending to the node to detect most of these issues: +- Nodes crashed or aren't running? We can tell from the process handles before we send anything, the real question is do we restart them right away automatically? I would say probably, as them not running is likely an exceptional circumstance. +- The Head isn't in the right state? Well in that case we can also detect that right away and respond with an error +- The Head is not able to be Committed to? This is also something we can directly control ourselves, it just sucks that we have to. + +What could happen at the Websocket API level? + +We should always get either Committed /or/ CommandFailed Commit + +** SendTx or something +It is kind of wild that Websocket API NewTx requires an actual CBOR Transaction as the Node has your public/private keys and should likely just take a description of the money you want to send and where. +Our API SendTx should do exactly that and simply take a Map like Map Address Amount or something and go from there. There is validation we can do before even sending the websocket message to make things work. + +NewTx can fail if the transaction is invalid, but there is also some implicit time coupling here where the transaction would have to finish before you can send another one. + +By decoupling the SendTx call from the actually building of a valid Transaction CBOR we can simply keep a queue of transactions and submit them as the other ones succeed. Now usually the transactions should happen +fast enough that we never construct two transactions that use the same UTxO and try and spend the same UTxO but we can avoid the potential of that happening all togheter. + +On the Hydra API Side I would like to see NewTx completely eliminated and probably GetUTxO and the UTXO event, as that is too low of a level that nobody using Hydra actually cares about. + +** Balance and more importantly, we care about the balances of the individual participants. +I don't know how this should be shaped, I do know dealing with UTxOs when looking at a Head or Payment channel seems like the wrong level of granularity (see SendTx or something). + +** Contest (We won't use this, but as part of providing an API and extending HydraPay we 100% care about having this in the API available to people to use, and thus we care about it) +So we want to have this in the API eventually, but we would want to be able to test this, for now just providing it, and saying "use if you need to" would suffice. The big invariant here is: +Each participant can only Contest once, so we should validate that the Node doesn't crash or something when you Contest twice... + +** Destroy +Destroying a Head would be contextual, as we usually don't care about the specifics, At different points in the Lifecycle of a Head, we have different ways to shut it down... +So we end up with a Destroy that simply looks at the Head and does the right thing: +Before all parties have committed? ABORT +After the parties have committed? CLOSE + Wait for contestation period + FANOUT +Contestation periods are also pretty annoying and something we probably don't wanna push into HydraPay until we have a usecase and something to validate we can even do it/make it work... + + +** Implementation/Mechanics +So now we have state to track, and that state needs to be accurate, we need to know: +- The HeadID +- The status of the Head On-Chain +- The status of the Head when it comes to readiness to receive commands/is running +- The status of the Nodes +- The particapnts and which Nodes they control +- Potentially how many times each participant has contested. +- The fuel of each Participant's proxy address + +How to we get the HeadID: When you receive a HeadIsInitializing it will come with the HeadID +How do we get updates to the Status of the Head? Through the websocket of the Nodes we will receive: +- HeadIsInitializing +- HeadIsOpen +- HeadIsAborted +- ReadyToFanout +- HeadIsFinalized +- HeadIsContested +- HeadIsClosed + +*** Failure states implicit in the implementation: +- The websocket can close, and should be restarted, and we need to ignore the replay +- Nodes can crash, and we should likely restart them, + +*** Footguns in the implementation: +- When connecting to a node via websocket, the node replays history, then sends a Greetings (which implies no nodes are connected), and then sends PeerConnected for each node + +*** What am I worried about? +Dropping things on the floor, hanging forever, etc. + +When can that happen? Well lets say we want to create a Head? we would need some setup work in running the nodes, connecting to them via websocket, and then +processing all the messages from those websockets, we also need to be able to send messages through to the websockets. +We need to be able to "know" when the action has succeeded or failed. +How do we signal to an action that success has happened? We probably create a TMVar, send the request, and wait on that TMVar to be filled, we likley also +just want a timeout, the issue with timeouts is that somethings, during congestion can actually just take a lot of time.... So maybe timeout is configurable or something. +The issue is, if we issue an action that takes 66seconds, and we had the timeout at 60seconds, that action will happen eventually, and so we _need_ to wait to not drop those results on the floor. + +For example lets say a Head is created but we stopped waiting, we would end up making another head if run the Create action again. So if we did have timeouts, we would likely want logic to say "Find (or Create for me) a Head that has these participants", and maybe that is easy enough to do, as that Head would be in the DB with the participant list and On-Chain status known: HeadIsInitializing, meaning we could likely pick those up. + +So it seems we really just want a Head specific message queue, and some threads setup to handle messages. +We don't necessarily need request-response but instead just need the ability to send and read "messages". + +Then things are created in terms of messages and waiting for messages, which maps nicer to the Hydra API. +We avoid the other issues by simply holding onto state propagated by these messages. + +We can add a layer where we are able to inject our own messages, this could allow us to handle a lot of things pretty nicely. + +Monadically then we are just waiting for events to happen on the system. +This makes business logic pretty nice to write probably. + +We just make the API for reading and writing messages work asynchronously which isn't hard, then the difficulty is just making sure we are waiting for the /right/ messages +and fail out when we get something we don't expect with bright exceptional failure. + +So the websocket api becomes read and write messages, we just make sure the messages are /broadcast/ to all the potentially people looking at this system. +We use these messages to update the state, and then we also use the state to prevent firing dumb messages. + +What else needs to be there for simple and stable interfaces? + +** General stuff +We could even check fuel before seeing PostTxOnChainFailed, though in general, maybe we have a way to take PostTxOnChainFailed and turn it into a ThisPersonNeedsFuel Address type message and then just have a handler that handles that. +It might be worth logging/knowing when a node makes transaction, a node transaction would be a transaction from a Proxy Address, that uses the Fuel UTxO as input? +The HydraAPI does actually have a list of ERROR states for the various errors, we could likely find the parallels and make our errors be able to carry more human readable, less bloated error messages for things + + +* Proposed changes + +At the Head level we should track heads via HeadID, this implies that "creating" a Head means On-Chain activity in the form of an Init, which is then persisted and managed by HydraPay (Head manager). This also means creating a Head is running a head, though that doesn't have to be true once we get the HeadId. + +Interactions under the Head level at the Node level should follow a message queue based API where messages can be sent and read. These represent the activity on the websocket, but will come from a single connection and placed in concurrency friendly datastructures to be able to be shared and passed around. + +We will then simplify Head interactions like Commit, Init, etc. By simply waiting for a message, we will also in the state indicate if messages are flowing (aka the websocket is open and active). + +Head level actions will first check the status of the Node processes, and log when they crash so we can try and acertain why. + +Head level actions should try and check any invariants they can before they actually commit to sending a message in the message queue. + + +The API at the Head level simplifies to: + +- Create + +- Commit +- SendAda +- GetBalance +- Destroy + +All will utilize the above guidelines and the send/read message interface. +Timestamps will be added to logs, and logs will be unified, we will not do anything to persist logs ourselves, unless we can set a hard limit on the amount of logs. + +The PaymentChannel API will then sit on top of all of this, now when creating a payment channel, we DO need to persist some information so that users have an indication that there +is a payment channel, and so we can give immediate feedback. + +So the payment channel table should be updated to have a Maybe HeadID which would point to the Head that actually holds all the channel information, so creation can return immediately, and we can have the task workers manage using the above API to actually do their work, and then part of representing the payment channels in the UI can now be based on whether the Head is Just, and if it is just based around the status of the Head for success and failure. + +Create and Destroy are a little more heavy weight and will actually change the Head Manager database.