Re: redo-stamp - Sergey Matveev

public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed

From: Sergey Matveev <stargrave@stargrave•org>
To: goredo-devel@lists.cypherpunks.ru
Subject: Re: redo-stamp
Date: Wed, 10 Nov 2021 15:22:29 +0300	[thread overview]
Message-ID: <YYu5htlMcYpjsIU4@stargrave.org> (raw)
In-Reply-To: <c1957b79-90dc-4b23-8831-2d942c1bc51f@spacefrogg.net>

[-- Attachment #1: Type: text/plain, Size: 6315 bytes --]

*** goredo [2021-11-09 13:43]:
>I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:

Initially goredo tried to fully resemble behaviour of apenwarr/redo and
redo-stamp had (should had) completely the same behaviour. But soon I
came to the confidence that redo-stamp is just useless and completely
unnecessary thing and complication.

The main difference between apenwarr's and my view on redo is that I am
confident that it is ok to always (cryptographically) checksum target.
https://redo.readthedocs.io/en/latest/FAQImpl/#why-not-always-use-checksum-based-dependencies-instead-of-timestamps
http://www.goredo.cypherpunks.ru/FAQ.html
In my practice, there were huge quantity of .do-s ending with something
like "command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3". I
realized (and I assume that applies to most redo users using it for
software building) that redo-stamping is the thing that is nearly always
wished for. apenwarr/redo's documentation states somewhere that mainly
always-checksumming is useful to make less false-positive OOD decisions.
That is true. But I am confident that hashing can be considered pretty
cheap operation. Even if it is sometimes slowing something down, it
greatly simplified .do-files and overall redo implementation.

apenwarr/redo basically has to ways of determining if the target is changed:

* either it has different mtime+size+whatever metainformation
* or it used redo-stamp and has different hash

goredo, as redo-c, has single way:

* it has different hash
* and just as an optimization, that check can be skipped, if ctime is
  the same (goredo's REDO_INODE_NO_TRUST=1 can forcefully distrust
  everything related to inode's metainformation and hash checking will
  be done anyway -- most trustworthy OOD)
* and as another optimization, target is OOD if its size differs

1. Can we trust mtime+other metainformation guaranteed changing if
   underlying file was definitely changed? According to
   https://apenwarr.ca/log/20181113 it is good enough in practice, but
   can be broken on some FUSEd filesystems. So if we want to have strong
   confidence of guaranteed OOD determination, then we should check the
   hash -- it will by definitely different is something is changed
   (let's forget about possible hash collisions of long enough strong
   cryptographic hash -- its probability is negligible)
2. Or we can use more "reliable" ctime check (again, that can also fail
   on strange/broken FUSE filesystems/drivers for example).
   apenwarr/redo does not use ctime, because it could create too many
   false positives (like changing the number of hard links). But ctime
   can also be broken/untrusted, so cryptographic hashing again will
   save us here

As I saw, as I understand, redo-stamp is used mainly with redo-always
targets. Because redo-always will anyway change inode enough to satisfy
OOD decision, people use redo-stamp to skip false-positive OOD decision
and resource-wasting rebuilding. redo-c/goredo's OOD determination based
on inodes/hashes is very simple from implementation point of view.
redo-always+redo-stamp hugely complicates overall logic and code. I look
at redo-stamp as some kind of a hack to prevent redo-always targets to
OOD everything they touches (that redo-always is intended to do by
definition).

And I came to conclusion that redo-always itself is just an ugly idea.
Not the redo-always itself, but huge complications aimed to skip
rebuilding of everything all the time, because OOD definitely should say
"it is OOD, because it depends on always-target, that is always OOD by
definition". redo-always just should be used. At least as a way many
people (I saw and I assume) uses: to create some kind of target:
    redo-always
    env | sort
    command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3
    # command check is for compatibility with implementations without redo-stamp
I used to do that all the time. But I tired of that stamps (for
preventing rebuilding of literally everything, because everything
depends on environment variables, for example) and of all of that
complications introduced with redo-always. For me, that is just harmful
idea (redo-always). All of that I tried to note in
http://www.goredo.cypherpunks.ru/FAQ.html

Another issue with hashes/stamps is that you do not always want to
checksum the target's value itself. If someone decides that hash of
unexistent target equals to empty string, and if redo implementation
creates resulting file even if nothing was sent to stdout, then of
course there is not way make that target always OOD (possibly that was
the reason people invented redo-always?). But with goredo (and redo-c,
as I remember) there is not problems: if nothing was sent to stdout,
then no output file is created -- unexistent file is always OOD. But if
you wish to explicitly create an empty file, then you can just always
touch "$3". Constant hashing won't harm you here anyhow.

If you really really wish to check only for some metainformation (only
check for mtime), then nothing prevents you to create some intermediate
target that contains output of (stat -f %m $1) and depend not on the
(probably) huge file, but on that intermediate metainformation file
having only the necessary data you wish to check.

>redo-ifchange $input_files
>cmd $input_files >$3
>for f in $input_files; do
>  redo-stamp <$f
>done

I do not understand where is the catch :-). redo-ifchange "$input_files"
clearly explicitly states: rebuild that target (do cmd $input_files) and
everyone who depends on it, if any of $input_files are changed. If
$input_files are not changed, then that target won't be OOD, won't be
rebuild and noone who depends on it won't be rebuild too (if it is the
only dependency of course). In you example redo-stamps literally tells:
this target is OOD if hash of all $input_files data is changed.
redo-ifchange $input_files (with implicit hashing) tells exactly that
too. Is not it?

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

     prev parent reply	other threads:[~2021-11-10 12:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
2021-10-31  8:21 ` Sergey Matveev
2021-11-04 15:35   ` goredo
2021-11-09  9:13     ` Sergey Matveev
2021-11-09 13:43       ` goredo
2021-11-10 10:47         ` Sergey Matveev
2021-11-10 12:22         ` Sergey Matveev [this message]