Multiple calls to redo-* for same target results in multiple .rec entries

public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed

* Multiple calls to redo-* for same target results in multiple .rec entries
@ 2021-10-27 17:18 goredo
  2021-10-31  8:21 ` Sergey Matveev
  0 siblings, 1 reply; 7+ messages in thread
From: goredo @ 2021-10-27 17:18 UTC (permalink / raw)
  To: goredo-devel

Hi,

Thanks for the quick response. :)

I just discovered that calling redo-ifchange / redo-ifcreate multiple times on the same target, multiple entries get created in the .rec file. Order doesn't matter, all four combinations have a similar effect.

At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean...

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
@ 2021-10-31  8:21 ` Sergey Matveev
  2021-11-04 15:35   ` goredo
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Matveev @ 2021-10-31  8:21 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

Greetings!

*** goredo [2021-10-27 17:18]:
>At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean...

Dependencies recording is done in very simple way: when we run some
redo-* commands, then we open temporary .rec file and pass its opened
file descriptor to the redo-* command. That redo-* command just writes
to it, appending records to already opened file. So there is literally
no "aggregator", just append-only file.

I thought that I can replace it with the pipe, that is read by uplevel
redo-* process and it can process it anyhow, removing the duplicates or
do any other kind of checks. But I am not sure what behaviour we desire.
Is it so wrong to have multiple entries? It is rather silly of course,
but, as I assume, it won't break anything, but it clearly shows the
whole timeline of redo-ifchange/redo-ifcreate calls. I thought that we
can warn user that duplicate entries were recorded, but at the same time
there be pretty ordinary use-cases where redo-* is called multiple times
just for convenience and simplicity. So I think it is ok to leave
everything as-is.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-10-31  8:21 ` Sergey Matveev
@ 2021-11-04 15:35   ` goredo
  2021-11-09  9:13     ` Sergey Matveev
  0 siblings, 1 reply; 7+ messages in thread
From: goredo @ 2021-11-04 15:35 UTC (permalink / raw)
  To: goredo-devel

Hi,

For several days, you had me convinced that dependency recording could be kept that simple, but now I've triggered a bug.

Write a default.do.do and create a foo/default.do from it. Try again and redo fails with
```
main.go:484: foo/default.do: Size missing
```

redo implicitly records an ifcreate dependency on default.do (as it was missing in foo/ the first time).

This can, of course, be remedied by not recording .do files as ifcreate dependencies that are the current target. Removing the ifcreate dependency from the rec file fixes the issue.

Still, this had me thinking what it means to have an ifcreate and ifchange dependency on a file, whether the order matters or should matter, and how other redo implementations behave in that case. Especially because redo provides no way to check or remove prior decisions. I'm leaning towards an approach that either records exactly one dependency per target, or where only the last dependency to a specific target is considered. (I.e. last writer wins semantics)

What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it?

Regards,
–Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-04 15:35   ` goredo
@ 2021-11-09  9:13     ` Sergey Matveev
  2021-11-09 13:43       ` goredo
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Matveev @ 2021-11-09  9:13 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

Greetings!

*** goredo [2021-11-04 15:35]:
>main.go:484: foo/default.do: Size missing

Fixed in 1.19.0 release. Code does not check that it looks for
"ifchange" dependency. Thanks for the report!

Also there appeared another funny bug: when you redo foo/default.do, it
is passed, ok. But when you redo it again, then that foo/default.do
target itself is used as a .do to rebuild itself. Also fixed in 1.19.0.

>What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it?

I am sure that everyone does it as it wish in practice and there is
completely no common denominator among implementations. goredo initial
design fully resembled github.com/leahneukirchen/redo-c and it has the
same dependency tracking behaviour. So at least there are two of us :-)
and redo-c seems to be quite popular.

And currently anyway I am still not sure if it is a problem (current
state) and if it is, then that behaviour/tracking should we expect.
Current "Size missing" error is error in the code not looking for the
record type.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-09  9:13     ` Sergey Matveev
@ 2021-11-09 13:43       ` goredo
  2021-11-10 10:47         ` Sergey Matveev
  2021-11-10 12:22         ` redo-stamp Sergey Matveev
  0 siblings, 2 replies; 7+ messages in thread
From: goredo @ 2021-11-09 13:43 UTC (permalink / raw)
  To: goredo-devel

Thanks!

I was just wondering about the exact semantics.

For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo.

I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:
```
redo-ifchange $input_files
cmd $input_files >$3
for f in $input_files; do
  redo-stamp <$f
done
```

So, you may use the output for further computation, but the notion of the target being OOD depends entirely on the input files. This circumvents problems when cmd produces non-reproducible output, ie. including time stamps or PIDs.

Supporting redo-stamp would mean, whenever a .rec file contains a stamp entry, change times, size and hash are ignored in favor of the stamps hash.

Would you consider this? This is a feature that implicit output hashing cannot recreate.

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-09 13:43       ` goredo
@ 2021-11-10 10:47         ` Sergey Matveev
  2021-11-10 12:22         ` redo-stamp Sergey Matveev
  1 sibling, 0 replies; 7+ messages in thread
From: Sergey Matveev @ 2021-11-10 10:47 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

*** goredo [2021-11-09 13:43]:
>For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo.

First of all -- I am definitely could be probably wrong at assumptions
and idioms :-). Of course ability of redo-ifcreate to OOD self is not
something previously thought. It is just a side effect. Just nothing
explicitly prevents you from doing ifcreate-dependency. Of course redo
implementation could forbid it explicitly, but I do not know why and how
it would harm.

I think that if target's author made redo-ifcreate on something already
existing -- it is problem (if it is a problem at all) of the .do author.
Well, ok, redo tool can help to catch as many mistakes or strange things
as much as it can, like warning about simultaneous stdout+$3 output,
like touching the $1 directly. Seems that redo-ifcreate to already
(currently) existing file is anyway something strange -- goredo can
print warning that it records ifcreate-dependency to already existing
file. But just a warning, because anyway that file can appear after a
microsecond after the check, while .do-target is still not completed.
Will add that, because it seems to be harmless, but possibly helpful to
somebody.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: redo-stamp
  2021-11-09 13:43       ` goredo
  2021-11-10 10:47         ` Sergey Matveev
@ 2021-11-10 12:22         ` Sergey Matveev
  1 sibling, 0 replies; 7+ messages in thread
From: Sergey Matveev @ 2021-11-10 12:22 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 6315 bytes --]

*** goredo [2021-11-09 13:43]:
>I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:

Initially goredo tried to fully resemble behaviour of apenwarr/redo and
redo-stamp had (should had) completely the same behaviour. But soon I
came to the confidence that redo-stamp is just useless and completely
unnecessary thing and complication.

The main difference between apenwarr's and my view on redo is that I am
confident that it is ok to always (cryptographically) checksum target.
https://redo.readthedocs.io/en/latest/FAQImpl/#why-not-always-use-checksum-based-dependencies-instead-of-timestamps
http://www.goredo.cypherpunks.ru/FAQ.html
In my practice, there were huge quantity of .do-s ending with something
like "command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3". I
realized (and I assume that applies to most redo users using it for
software building) that redo-stamping is the thing that is nearly always
wished for. apenwarr/redo's documentation states somewhere that mainly
always-checksumming is useful to make less false-positive OOD decisions.
That is true. But I am confident that hashing can be considered pretty
cheap operation. Even if it is sometimes slowing something down, it
greatly simplified .do-files and overall redo implementation.

apenwarr/redo basically has to ways of determining if the target is changed:

* either it has different mtime+size+whatever metainformation
* or it used redo-stamp and has different hash

goredo, as redo-c, has single way:

* it has different hash
* and just as an optimization, that check can be skipped, if ctime is
  the same (goredo's REDO_INODE_NO_TRUST=1 can forcefully distrust
  everything related to inode's metainformation and hash checking will
  be done anyway -- most trustworthy OOD)
* and as another optimization, target is OOD if its size differs

1. Can we trust mtime+other metainformation guaranteed changing if
   underlying file was definitely changed? According to
   https://apenwarr.ca/log/20181113 it is good enough in practice, but
   can be broken on some FUSEd filesystems. So if we want to have strong
   confidence of guaranteed OOD determination, then we should check the
   hash -- it will by definitely different is something is changed
   (let's forget about possible hash collisions of long enough strong
   cryptographic hash -- its probability is negligible)
2. Or we can use more "reliable" ctime check (again, that can also fail
   on strange/broken FUSE filesystems/drivers for example).
   apenwarr/redo does not use ctime, because it could create too many
   false positives (like changing the number of hard links). But ctime
   can also be broken/untrusted, so cryptographic hashing again will
   save us here

As I saw, as I understand, redo-stamp is used mainly with redo-always
targets. Because redo-always will anyway change inode enough to satisfy
OOD decision, people use redo-stamp to skip false-positive OOD decision
and resource-wasting rebuilding. redo-c/goredo's OOD determination based
on inodes/hashes is very simple from implementation point of view.
redo-always+redo-stamp hugely complicates overall logic and code. I look
at redo-stamp as some kind of a hack to prevent redo-always targets to
OOD everything they touches (that redo-always is intended to do by
definition).

And I came to conclusion that redo-always itself is just an ugly idea.
Not the redo-always itself, but huge complications aimed to skip
rebuilding of everything all the time, because OOD definitely should say
"it is OOD, because it depends on always-target, that is always OOD by
definition". redo-always just should be used. At least as a way many
people (I saw and I assume) uses: to create some kind of target:
    redo-always
    env | sort
    command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3
    # command check is for compatibility with implementations without redo-stamp
I used to do that all the time. But I tired of that stamps (for
preventing rebuilding of literally everything, because everything
depends on environment variables, for example) and of all of that
complications introduced with redo-always. For me, that is just harmful
idea (redo-always). All of that I tried to note in
http://www.goredo.cypherpunks.ru/FAQ.html

Another issue with hashes/stamps is that you do not always want to
checksum the target's value itself. If someone decides that hash of
unexistent target equals to empty string, and if redo implementation
creates resulting file even if nothing was sent to stdout, then of
course there is not way make that target always OOD (possibly that was
the reason people invented redo-always?). But with goredo (and redo-c,
as I remember) there is not problems: if nothing was sent to stdout,
then no output file is created -- unexistent file is always OOD. But if
you wish to explicitly create an empty file, then you can just always
touch "$3". Constant hashing won't harm you here anyhow.

If you really really wish to check only for some metainformation (only
check for mtime), then nothing prevents you to create some intermediate
target that contains output of (stat -f %m $1) and depend not on the
(probably) huge file, but on that intermediate metainformation file
having only the necessary data you wish to check.

>redo-ifchange $input_files
>cmd $input_files >$3
>for f in $input_files; do
>  redo-stamp <$f
>done

I do not understand where is the catch :-). redo-ifchange "$input_files"
clearly explicitly states: rebuild that target (do cmd $input_files) and
everyone who depends on it, if any of $input_files are changed. If
$input_files are not changed, then that target won't be OOD, won't be
rebuild and noone who depends on it won't be rebuild too (if it is the
only dependency of course). In you example redo-stamps literally tells:
this target is OOD if hash of all $input_files data is changed.
redo-ifchange $input_files (with implicit hashing) tells exactly that
too. Is not it?

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-10 12:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
2021-10-31  8:21 ` Sergey Matveev
2021-11-04 15:35   ` goredo
2021-11-09  9:13     ` Sergey Matveev
2021-11-09 13:43       ` goredo
2021-11-10 10:47         ` Sergey Matveev
2021-11-10 12:22         ` redo-stamp Sergey Matveev