A number of fixes
This commit is contained in:
parent
f46a7a8e89
commit
22ac1403b6
45
README.md
45
README.md
@ -14,7 +14,7 @@ Wrapping up the core of this long-experimental feature is the first step.
|
||||
(One of the reasons for that is that using dynamic derivations requires content-addressing derivations, because derivations themselves are always content-addressed.)
|
||||
|
||||
Completely moving the whole ecosystem over to content-addressing derivations is the ultimate goal, but this doesn't need to coincide with wrapping up the core of the experiment.
|
||||
For example, as others have written out, "sedding" binaries to rewrite self-references is unlikely to work in general.
|
||||
For example, as others have written out, "`sed`-ing" binaries to rewrite self-references is unlikely to work in general.
|
||||
That's fine for me
|
||||
--- we'll simply keep input-addressing in the cases where it doesn't work.
|
||||
(Not only is this expedient, this also incentivizes trying to modify packages to stop needing self-references, which I think is a good thing to do regardless.)
|
||||
@ -22,19 +22,19 @@ That's fine for me
|
||||
So what does "wrapping up the core of the experiment" entail?
|
||||
For the big test is "don't put junk in the cache".
|
||||
I am OK with the "client side" missing various conveniences, like tooling to understand trust map conflicts, or fancier garbage collection.
|
||||
So long as there are still input-addressed Nixpkgs, no one will be "forced" to us them (by network effects) and so client UX issues can just be dodged by "just opting out".
|
||||
On the "server side", however, I don't anything sketchy to be going on, because I don't want people to accidentally opt in to issues, especially highly nuanced "cache semantic" issues, that they didn't sign up for.
|
||||
So long as there is an still input-addressed Nixpkgs, no one will be "forced" to use them (by network effects) and so client UX issues can just be dodged by "just opting out".
|
||||
On the "server side", however, I don't want anything sketchy to be going on, because I don't want people to accidentally opt into issues, especially highly nuanced "cache semantics" issues, that they didn't sign up for.
|
||||
Cached build artifacts, even local ones but especially shared internet-accessible ones, are potentially very long-lived.
|
||||
If we get this wrong, we open ourselves up to "cache poisoning" issues, which because of the distributed nature of Nix stores and copying, may be hard to completely eradicate.
|
||||
I wouldn't want to be responsible for any of those.
|
||||
If we get the roll-out wrong, we open ourselves up to "cache poisoning" issues, which because of the distributed nature of Nix stores and copying, may be hard to completely eradicate.
|
||||
I don't want content-addressing derivations to be responsible for any of those.
|
||||
|
||||
#### Medium level
|
||||
|
||||
Drilling deeper, so what does "ensuring the binary cache is sound" entail?
|
||||
Drilling deeper, what does "ensuring the binary cache is sound" entail?
|
||||
I think the essential issue is [Nix#11896].
|
||||
"deep realisations" --- build trace key-value pairs where the key includes derivations that depend on other derivations' outputs --- are fundamentally ambiguous.
|
||||
This ambiguous makes them hard to verify/challenge, and hard to know when they conflict --- two deep realisations may implicitly make incompatible assumptions about the outputs of those dependency derivations.
|
||||
We currently have a notion of "dependent realisations" that seeks to address this issue, but I do not think it is sound, and it is certainly not consistently implemented.
|
||||
We currently have a notion of "dependent realisations" that seeks to address this issue, but I do not think this mechanism is sound, and it is certainly not consistently implemented.
|
||||
|
||||
The simplest thing to do is....just rip out deep realisations.
|
||||
Build trace keys should always be derivations that just depend on "opaque" store objects.
|
||||
@ -51,38 +51,39 @@ There are two downsides to "just do shallow addressing only" which are
|
||||
2. [Nix#11928] We regress with the current scheduling logic, causing build build-time inputs to be built/downloaded unnecessarily when the downstream thing we actually need should just be substitute exists but was built slightly differently.
|
||||
|
||||
Re (1): once again, I am quite willing to defer polishing something that is client-side, and thus has problems that the user is free to side-step entirely by opting out.
|
||||
We can always delete *all* realisations
|
||||
We can always delete *all* realisations locally
|
||||
(there are no hard references between shallow realisations -- no "closure property"),
|
||||
so that sledgehammer can always be exposed as a fail-safe way to unbreak anyone's machine running out of disk space.
|
||||
so that sledgehammer can always be presented as a fail-safe last resort to unbreak anyone's machine that ran out of disk space.
|
||||
Again, the current way we GC realisations (leveraging those "dependent realisations") is not necessarily a good or the only way to do things
|
||||
--- in fact, because the relationships between realisations are "soft" and not "hard", I very this as a situation where there are many possible "policies", and choosing between them is a matter of opinion.
|
||||
Multiple policy/opinion territory is a clear place to cut scope for the first version.
|
||||
|
||||
Two however I consider more series
|
||||
--- it would be really annoying to always download GCC whenever you just want some cached binary built with Clang/some cached binary built with Clang.
|
||||
Yes, you can GC that Clang right away, but that just makes the problem seem sillier.
|
||||
Downside two however I consider more series
|
||||
--- it would be really annoying to always download GCC whenever you just want some cached binary built with GCC.
|
||||
Yes, you can GC that GCC right away, so there is no wasted disk space, but there is still the wasted time waiting for the download, and wasted network usage.
|
||||
Downloading to then delete is not a solution, but just exposes how artificial and silly the status quo is.
|
||||
|
||||
[Nix#11928] is this something I consider required to fix if we're going to get rid of deep realisations (as I propose).
|
||||
[Nix#11928] is thus something I consider required to fix if we're going to get rid of deep realisations (as I propose).
|
||||
The good thing is that we can simply change the scheduling logic so it's no longer a problem.
|
||||
The fix is conceptually simple enough: we can resolve derivations (normalize their inputs) without actually downloading those inputs.
|
||||
We just look up build trace key-value pairs and substitute within the derivation accordingly.
|
||||
The less good news is that it is a bit harder than it sounds to implement, because the scheduling code is currently such a confusing mess.
|
||||
The less good news is that it is a bit harder than it sounds to implement, because the scheduling code was such a confusing mess.
|
||||
|
||||
#### Low level
|
||||
|
||||
This in turn leans me to [Nix#12663].
|
||||
To make progress on the schedule code (and actually a bunch of other issues, which I'll hopefully get to), we need to untangle scheduling and building.
|
||||
Only then we'll we have a "clean workbench" upon which we can address reworking the scheduling logic for [Nix#11928] (and hte other issues too).
|
||||
Only then we'll we have a "clean workbench" upon which we can address reworking the scheduling logic for [Nix#11928] (and the other issues too).
|
||||
This might sound hard, but it actually isn't so bad --- it's just long overdue.
|
||||
(*Not* doing this and attempting to fix the issues anyways is much harder.)
|
||||
|
||||
After Planet Nix, @L-as and I started on a "bottom up" approach to this, which is the one outlined in [Nix#12663].
|
||||
\[You should now just read that issue, it attempts to lay out a roadmap also --- if I said more here I would be just inlining the ticket.\]
|
||||
So far, we got [Nix#12630] and [Nix#12662] done, and have [Nix#12658] and [Nix#12658] "on deck".
|
||||
So far, we got [Nix#12630], [Nix#12662], and [Nix#12658] done, and [Nix#12668] "on deck".
|
||||
This will get local building pretty well "off to the side".
|
||||
Then we do something similar for remote building (maybe just moving the hook code, or maybe indulging a little scope creep and getting rid of it altogether per [Nix#5025]).
|
||||
At that point, the building logic (local and remote cases) will be completely "out of the way", and we should be able to solve [Nix#11928].
|
||||
And at *that* point, we can (with some stop-gap for local GC) fix #11896, just ripping out shallow derivations.
|
||||
And at *that* point, we can (with some stop-gap for local GC) fix [Nix#11896], just ripping out shallow derivations.
|
||||
|
||||
Along with / right after doing [Nix#11896], we can also do [Nix#11897].
|
||||
This is a good simple cleanup --- the scheduling changes and lack of deep realisations mean that there is absolutely use hash derivations "modulo fixed-output derivations", because resolved derivations never depend on fixed-output derivations (because they never depend on any derivation's output at all).
|
||||
@ -91,7 +92,7 @@ We can go back to just using derivation paths.
|
||||
#### Hydra
|
||||
|
||||
With the Nix changes done, the next task is getting Hydra to work with the revamped system.
|
||||
This is especially important given my "server first" approach --- I want to see us building at scale to find and eradicate problems before I worry about anyone actually building this stuff.
|
||||
This is especially important given my "server first" approach --- I want to see us building at scale to find and eradicate problems before I worry about regular users actually using this stuff.
|
||||
This should be a very simple fix --- Hydra already computes deep and shallow realisations and uploads both. It just needs to stop doing the former.
|
||||
|
||||
One interesting thing to note is we should also upload the resolved derivations that the shallow realisation refers to
|
||||
@ -113,8 +114,10 @@ The linked issue contains a discussion of alternatives, I lean towards something
|
||||
|
||||
#### Rollout, Nixpkgs, RFC
|
||||
|
||||
This is probably the most contentious part, and the least "technical stuff I can just do myself", so I don't want to speculate too much.
|
||||
But basically I see a path like this:
|
||||
Whereas the above is mostly "technical stuff I can just *do* without having to ask anyone for for permission", this part is squarely on community by-in.
|
||||
I think what follows is a good process to follow, but, of course, no one knows for sure how the community will react until they do.
|
||||
|
||||
This is the roadmap I have in mind; the "...." indicates perhaps more intermediate steps to gain confidence in the new way things work before a major "flip the switch" milestone.
|
||||
|
||||
1. Implement and document, per the above
|
||||
2. Do a lot of builds of Nixpkgs, publicly, with a public cache.
|
||||
@ -180,7 +183,7 @@ Dynamic derivations is a relatively "cheap" extension to content-addressing deri
|
||||
|
||||
[Nix#12630]: https://github.com/NixOS/nix/pull/12630
|
||||
[Nix#12658]: https://github.com/NixOS/nix/pull/12658
|
||||
[Nix#12658]: https://github.com/NixOS/nix/pull/12668
|
||||
[Nix#12668]: https://github.com/NixOS/nix/pull/12668
|
||||
[Nix#12662]: https://github.com/NixOS/nix/pull/12662
|
||||
[Nix#12662]: https://github.com/NixOS/nix/pull/12662
|
||||
[Nix#12591]: https://github.com/NixOS/nix/pull/12591
|
||||
|
Loading…
x
Reference in New Issue
Block a user