Re: [devuan-dev] Luminously Unparalleled Repository Coalescer design doc

Author: Ivan J.
Date:
To: devuan developers internal list
Subject: Re: [devuan-dev] Luminously Unparalleled Repository Coalescer design doc

Hi!

On Wed, Nov 18, 2020 at 12:45:42PM -0500, Mason Loring Bliss wrote:
> lurc - the Luminously Unparalleled Repository Coalescer
> Initial design document
> ---------------------------------------------------------------------------
> Overview:
>
> Single tool with a collection of single-shot functions, each invoked
> separately and as separate processes, but with batched versions that invoke
> copies of the tool with the correct arguments, in series. Goal: Be able to
> (re-)run each individual piece independently.
>
> Individual operations will assert kernel advisory locks (via fcntl) to
> guarantee coherency during processing. (Id est, no new pulls during a
> merge, no new merges during a pull, with a configurable timeout.)

In the current amprolla implementations, this locking is done wrong.
Also "with a configurable timeout" sounds wrong. Instead, I would
implement proper error handling and cleanup upon error.

> Will have a --force flag or similar to force re-merging of data sets in the
> absense of new data, which will be flagged and recorded during download
> attempts. Will also use --force to insist on redownloading evidently-
> unchanged datasets. (Freshness trusts HTTP headers.)

I recommend parsing Release files rather than trusting HTTP headers,
because they will tell you their update time correctly, rather than
httpd because servers _may_ be misconfigured in some cornercases. The
Release file contains a "Date" which you may use instead.

> Configuration syntax will be simple flat text. Blacklisted packages will be
> per-dist.
>
> TBD: Syntax/semantics for specifying precedence amongst repositories.
> (Example, department > organization > devuan > debian-security > debian,
> with each step potentially asserting blacklists.) Current favourite: linked
> list specified in config? Guard against loops or any dist being masked more
> than once, or directly masking more than one subordinate dist.
> ---------------------------------------------------------------------------
> Procedure:
>
> 1. Pull down repo data from all specified repositories. (Invocation of tool
> can specify a single dist to pull or a batch mode that calls the tool to
> collect all repositories. For single-repository mode only, a --force option
> will allow re-pull even for files that look unchanged.)
>
> a. config will specify dist locations

>
> b. config will have a suite mapping

>
> c. snag each relevant Packages, Release, Contents file

>
> TBD: What's the minimal set of files I need to regenerate, beyond > the Packages files?

The files that need to be generated are "Packages" and "Release". In the
21st century, you'll also want "InRelease" to sign these repositories,
and definitely compress the "Packages" files with gzip or xz.

> 2. Write out merged data where
>
> a. higher-precedence packages mask lower-precedence packages and > blacklists. (Examples, local apt built without libsystemd0, local > Plymouth built without systemd deps, local dist blacklists libsystemd0 > and pulseaudio.)

>
> b. Packages are blacklisted per-dist, with each level offering a > blacklist of packages from that level or in subordinate dists.

Keep in mind not only Source/Package names should be blacklisted, but
also other packages in which their names appear in the dependencies.

> c. Per-dist blacklisting is applied with each successive application of > a dist, from lowest-precedence to highest. As such, if a higher-rank > repository supplied a package blacklisted below, it will appear in the > final results, unless a still-higher-priority dist again blacklists it.

>
> 3. Sign.
>
> 4. Publish data. We want to be *really* atomic, and not have network
> latency impact this, so:
>
> a. rsync the produced merge to holding directory on destination > (pkgmaster)

>
> b. Once on pkgmaster, rsync into place - more likely to be atomic

>
> TBD: but consider better guarantees? Either way, this is outside > of the scope of lurc and merely a suggestion.

Unfortunately transferring files like this will never truly be atomic. I
gave some thought of actually archiving the whole generated repository(s)
with cpio and doing a copy-pass to another server. This might actually
be the most efficient method, but I never found the time to implement
this in the current amprolla codebase.

> ---------------------------------------------------------------------------
> Open questions:
>
> 1. What logging detail do we want? Question listed in weekly meeting doc.
> ---------------------------------------------------------------------------
> Config data:
>
> set of repositories with dist keys (fields: repo <key> <url>)
> map of overlays (fields: map <dist> <subordinate>)
>
> Blacklist data per-dist in /etc/lurc/blacklist.d.

Please use the amprolla configuration as a reference.

> ---------------------------------------------------------------------------
> Method to merge data:
>
> 1. In-memory map of most-subordinate remaining set.
>
> 2. Apply blacklist.
>
> 3. Overlay next-most-subordinate set atop initial data, apply blacklist.
> Loop.
>
> 4. Write out remaining dataset to file. Preserve deb822(5). (Consider
> formal use of deb822 for configs?)
> ---------------------------------------------------------------------------
> Details/notes:
>
> Packaged dependencies so far: libhttp-tinyish-perl
>
> Modification status: Return code is 200 or 2xx if new, 304 if unmodified

Could you explain this?

> Config in /etc/lurc
> Work in /var/spool/lurc
> role user: turkey (why? because sudo turkey lurc)

I don't understand this. Why not reflect the user/group names to the
actual program?

> todo: provide bash-completion
>
> todo: perldoc as base for docco
> ---------------------------------------------------------------------------

Best regards,
Ivan

This message is part of the following thread:
	the complete thread tree sorted by date
	Mason Loring Bliss at
	onefang at

Donate to Dyne.org