Re: redoing unnecessary targets when a do file is modified but the output remains unchanged

public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
@ 2021-05-05 18:35 Andrey Dobrovolsky
  0 siblings, 0 replies; 8+ messages in thread
From: Andrey Dobrovolsky @ 2021-05-05 18:35 UTC (permalink / raw)
  To: goredo-devel

Thank You for the very comprehensive answer.

Best regards!
Andrey Dobrovolsky

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
  2021-05-05 13:52 Andrey Dobrovolsky
@ 2021-05-05 15:11 ` Sergey Matveev
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Matveev @ 2021-05-05 15:11 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 4985 bytes --]

*** Andrey Dobrovolsky [2021-05-05 16:52]:
>>redo just:
>>    determines that A is OOD, because B is OOD, and then
>>    executes A.do,
>
>In my opinion this is the origin of the problem, named in the message subject.
>In my implementation if A depends on B and B is outdated, then B.do is
>executed, then B is checked with the hash, and if it is found to be
>changed, only after this A.do is executed.

I understand that. And exactly that fact I do not like, that A.do
explicitly says "redo (current target A) if B one is changed" and
out-of-date algorithm decided that B is OOD -- so A is by definiton
(in my opinion) of "redo-ifchange B", has to be rebuild (and possibly
"B" dependency will be "thrown out" at all (no "redo-ifchange B" will
be called in A.do for some reason)).

>I really expect the issue, pointed in the message subject, very
>harmful and suppose that it can have negative influence on the redo
>system wide use.

I do not see how it makes redo less appealing that any kind of
Makefiles. Even native redo-c is completely worth of replacing
any kind of Make.

As I understand, the only problems with its algorithm may be with
redo-always targets. I met them only with that kind of targets. And
goredo contains an ugly hack, very similar to your algorithm: find if
any of dependencies has always-ed ones, execute them, then do like
redo-c does (excluding always-ed targets in OOD detection if their
hashes are same). So it is actually you suggestion: to run dependency
.do file first. Like automated:
    redo some/always/target/that/is/a/dependency/of/some/real/target
    redo -do-not-treat-always-as-ood-unless-checksum-differs some/real/target
And I made it because only of redo-always targets. Without redo-always
targets -- everything is fine and clear (redo-c native algorithm).

redo-always seems to be pain in the ass. Like with redo-stamp, found in
apenwarr/redo, in my opinion just gives huge unnecessary complexities
(hashing is cheap). DJB does not suggest redo-always at all: only
redo-ifchange, redo-ifcreate (necessary one) and redo. If you want to
run something all the time, use "redo" invocation on that target. And if
you want to redo only if dependencies are changes (probably that were
changed by previous "redo" invocation), then "redo-ifchange" them. So
you actually can "emulate" redo-always targets in native redo-c by
creating manually the .do-s with:
    redo some/always ; redo-ifchange other/one
some/always will be built anyway, but redo-ifchange won't rebuild
other/one, if no checksum differs, after some/always invocation. Much
more simpler than adding redo-always command that... just uglifies very
simple and trivial redo-c algorithm. Inventing hard and complicated
things is easy: redo-stamp, redo-always. But creating simple .do files,
that makes them completely necessary -- is hard.

When I looked at various implementations source code (not deeply, just a
quick look), I thought that nearly everyone does the sample simple and
rock solid, determined algorithm like in redo-c.

As for goredo, again, it still has redo-always and redo-stamp, because
my pretty big projects originally were written "for" apenwarr/redo. And
I liked that if any of environment variables changes (hypothetical
$VERSION_PREFIX), then redo-always targets, dumping "env", making
redo-stamp, will OOD all the targets that will include software version
with that $VERSION_PREFIX. Then I made goredo and removed redo-stamp
from my .do files, because it is complexity and burdensome -- hashing is
very cheap. And redo-always behaviour can be replaced by "wrapper" .do-s
containing pure "redo" call, to some kind of dump-env.do, and
redo-ifchange calls further, depending on dump-env file.

Problem with the whole redo-ecosystem is that all of implementations
have different vision and opinion even on minor subjects. Someone just
literally executes executable .do-s. Others parses that .do-s to
determine is it shell-script, to add "-x" if necessary. Someone requires
shebangs to be present in .do-s, someone does not. Some create
.blablabla.target.name temporary files, some create .complete-random,
without any "target.name"-s extension that will possibly break some
commands expecting for known file extension (I met that kind of
annoynance when run redo-c on .do-s tested for apenwarr/redo). But all
of that is pretty trivial and small things and changes, unlike
redo-always (to make it "optimal" and not literally rebuilding the whole
project, because some envvar changed) and redo-stamp, influencing the
whole building algorithm of the redo itself. Now, after a year of redo
usage, I am convinced that redo/-ifchange/-ifcreate are completely
enough for nearly all the tasks. No redo-always -- no problems with
simple, clear, solid redo-c algorithm :-)

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
@ 2021-05-05 13:52 Andrey Dobrovolsky
  2021-05-05 15:11 ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: Andrey Dobrovolsky @ 2021-05-05 13:52 UTC (permalink / raw)
  To: goredo-devel

Hello!

Thanks for paying attention to my message. I'm sorry that probably it
was formulated in a wrong way, and the main meaning was not expressed
sufficiently clear. I was not pointing at my implementation as at the
reference one, I meant that dependencies processing logic, that I
used, allows to avoid unnecessary .do operations. Exactly in
correspondence with the message subject. Let me, please, take one more
attempt.

So, according to
http://lists.cypherpunks.ru/archive/goredo-devel/2102/0015.html:
>redo just:
>    determines that A is OOD, because B is OOD, and then
>    executes A.do,

In my opinion this is the origin of the problem, named in the message subject.
In my implementation if A depends on B and B is outdated, then B.do is
executed, then B is checked with the hash, and if it is found to be
changed, only after this A.do is executed.
Technically program flow may be described schematically as:

redo_target(SomeTarget)
{
    if(SomeTarget has no .do file)
        return ok;
    while(TheDep = read(DepOfSomeTarget)) {
        if(redo_target(TheDep) is not ok)
            break;
        if(CompareHash(TheDep, HashFromDepOfSomeTarget) != ok)
            break;
    }
    if(not all deps of SomeTarget are ok)
        execute(SomeTarget.do);
}

In other words if A depends on B, one can make the decision on A being
OOD only after B will be updated.

This is the main difference of my version from the original Leah
Neukirchen's, other differences are consequences and some are
implementation details, which are not important and in no way are
promoted. As I can understand, Your implementation follows the same
logic.

 You can easily compile and check my redo version on the dep trees,
that seem to be problematic, "redo" is not equivalent to
"redo-always", this was wrong impression. The only confusing may be
that alone "redo" will not build "all" target, targets must be named
directly.

I really expect the issue, pointed in the message subject, very
harmful and suppose that it can have negative influence on the redo
system wide use.

Best regards!
Andrey Dobrovolsky

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
  2021-05-04 22:52 Andrey Dobrovolsky
@ 2021-05-05  7:35 ` Sergey Matveev
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Matveev @ 2021-05-05  7:35 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 2914 bytes --]

Greetings!

*** Andrey Dobrovolsky [2021-05-05 01:52]:
>I was fascinated by redo idea after opennet.ru announced Your goredo
>release, Your habr article helped a lot too, thanks for both!

Glad they were useful! redo is indeed life-changing for me.
Unfortunately it made me allergic to any kind of Makefiles :-)

>Now I use my fork of Leah Neukirchen's redo-c. The problem You are
>talking about bothered me too, and I've solved it in my  dev2 branch

>github.com/AndreyDobrovolskyOdessa/redo-c

Glad that your version satisfied you! Anyway I am still completely
unsure how it can be done right, paying attention issues said before:
http://lists.cypherpunks.ru/archive/goredo-devel/2102/0015.html

I quickly looked at your changes and one thing seems me very strange.
Possibly I am wrong, because I did not read the whole code to see the
full picture, but:

    In fact "redo" without targets is full equivalent of "redo-always"

(taken from one of your commit message) seems to be just plain wrong.
"redo" literally tells to "re-do specified targets", but "redo-always"
marks the *currently* executed target as an "always" target. They have
completely different purposes: one is for initiating the building of
targets, the other is for marking already running target's dependency.

>But hashed dependencies allow to do only what is
>really needed to be done, and that's really great!

Well, all of apenwarr/redo's redo-stamp, goredo and redo-c use hashed
dependencies. They just do what literally was said to them: redo if that
targets/dependencies are changed. Redoing dependencies that possibly are
actually not dependencies anymore, because of "dynamic" nature of
redo-ifchange, is very confusing to me (and seems to apenwarr, redo-c
authors).

>(yet?) in multi-threaded redoing, that's why current version is
>single-threaded.

redo-c already uses the jobserver protocol and, as I remember,
parallelize jobs good. And each target is another shell/redo invocation.
I do not see where multithreading can help. Reading all that files and
directories metainformation (ctime, inodes, whatever) -- won't be faster
than syscalls and IO. Hashing multiple dependencies will harm, because
of non sequential IO, unless hash algorithm is the bottleneck. Actually
the very very first thing I did in redo-c (when there was no goredo) is
using BLAKE2b instead of SHA256. SHA256 is the most slow algorithm from
the well-known and widely used ones: SHA512 is considerably faster on
64-bit machines. And it really easily can be the bottleneck on my
computer. goredo uses BLAKE3, that still being the cryptographically
secure, is 12 times faster than SHA256 in single thread.

Thanks for sharing your experience and the fork, that could be useful to
others!

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
@ 2021-05-04 22:52 Andrey Dobrovolsky
  2021-05-05  7:35 ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: Andrey Dobrovolsky @ 2021-05-04 22:52 UTC (permalink / raw)
  To: goredo-devel

Hi!

I was fascinated by redo idea after opennet.ru announced Your goredo
release, Your habr article helped a lot too, thanks for both!
Now I use my fork of Leah Neukirchen's redo-c. The problem You are
talking about bothered me too, and I've solved it in my  dev2 branch
at github.com/AndreyDobrovolskyOdessa/redo-c. This branch is for my
personal use, I am "noname" programmer and am not much interested
(yet?) in multi-threaded redoing, that's why current version is
single-threaded. But hashed dependencies allow to do only what is
really needed to be done, and that's really great!
The idea is, that while dependencies description allows, we can
recurse down the dependency tree until we will find something, that
really is to be done, and then we continue execution in down-to-top
manner, checking hashes at every node. This took full rewriting, but I
am satisfied with the result, and now consider the current version
utilizing full power of hashed dependencies.
I will be glad if my solution will help You with Your excellent project.

Regards!
Andrey Dobrovolsky

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
  2021-02-17 20:40 Karolis K
  2021-02-21  8:38 ` Sergey Matveev
@ 2021-02-27  9:09 ` Sergey Matveev
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Matveev @ 2021-02-27  9:09 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1566 bytes --]

Greetings!

I returned to that problem again and actually I do not see how it can be
solved fairly. For example Make has non-dynamic static dependency
information in its Makefiles and if A depends on B, then Make will:
    execute B, then execute A
But redo keeps dependencies information for OOD determination *only*. So
redo just:
    determines that A is OOD, because B is OOD, and then
    executes A.do,
        that *possibly* contains somewhever redo-ifchange B
    then it executes B.do, when redo-ifchange is called in A.do
"A" maybe OOD because of "B", but it does not mean that "B" it still the
dependency, that will be built anyway. Moreover, each .do explicitly
tells when to redo(-ifchange) targets -- and that can be at the very end
of some computations and preparations. redo-ifchange can be xargs-ed at
the very end of .do script/program. So, when we are build "B" and see
that it did not changed, we are *already* executing A.do, so we can not
"revert" that step backward, because without A.do execution we do not
know if B.do needs to be executed too.

Actually currently goredo already does unfair behaviour, explicitly
executing redo-always-ed targets first. Exactly just to prevent complete
rebuilding of everything because of honest redo-always-ed target. But I
just assume that in practice redo-always targets mostly used for some
kind of lightweight environment variable and configuration files checks.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: redoing unnecessary targets when a do file is modified but the output remains unchanged
  2021-02-17 20:40 Karolis K
@ 2021-02-21  8:38 ` Sergey Matveev
  2021-02-27  9:09 ` Sergey Matveev
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Matveev @ 2021-02-21  8:38 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

Greetings!

*** Karolis K [2021-02-17 22:40]:
>In an ideal case I think only the file with a modified do script should be rebuilt, and then, since the output is the same, all other dependencies should simply pass.
>But maybe I am missing something?

You are right -- currently it is sub-optimal. The reason is simple: when
you determine if target is out-of-date (OOD), you recursively go through
the dependecies from "up to bottom" and if you see any OOD one, then the
whole path is considered OOD and rebuild starts. Simple algorithm.

I have encountered exactly that kind of problem with redo-always-ed
targets, which in my case are usually targets checking for environment
variables and configuration files changes. And nearly everyone depends
on them, because all build commands/options depends on envvars and
configuration. And that lead to rebuilding of nearly everything. But I
made a two-stage building and dependency tracking: track some of
"always" targets, rebuild them and then start an ordinary OOD detection
and targets rebuilding. And exactly here if rebuilt "always" targets are
not changed, then OOD works as we expect to skip building.

I must think about some kind of feedback channel that tells that target
is not changed after the rebuild. Will think about that and implement.
It will be useful optimization and safe (because of hashes there should
not be situations when something is skipped mistakenly).

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* redoing unnecessary targets when a do file is modified but the output remains unchanged
@ 2021-02-17 20:40 Karolis K
  2021-02-21  8:38 ` Sergey Matveev
  2021-02-27  9:09 ` Sergey Matveev
  0 siblings, 2 replies; 8+ messages in thread
From: Karolis K @ 2021-02-17 20:40 UTC (permalink / raw)
  To: goredo-devel

Hello,

I recently encountered a behaviour that seems to be sub-optimal.

Consider a simple example with 2 do files:

A.txt.do:

echo “some text here”

B.txt.do:

redo-ifchange A.txt
cat A.txt | tr [:lower:] [:upper:]

After calling redo B.txt the targets are produced.

Now consider a cosmetic change to A.txt.do - for example - adding an empty line at the end of the file.
First - redo-ood will show that both A.txt and B.txt are out of date.
Then, if I redo-ifchange A.txt and call redo-ood after that - B.txt is no longer presented as being out of date.
However, if instead of the above I call redo-ifchange B.txt the file B.txt will also be rebuilt, even thou none of it’s direct dependencies changed.

This is of course a simple toy example, in one real project I encountered a situation where multiple targets were being rebuilt multiple times, all after adding an empty line to one of the do scripts.

In an ideal case I think only the file with a modified do script should be rebuilt, and then, since the output is the same, all other dependencies should simply pass.

But maybe I am missing something?

Thanks a lot for this redo implementation,
Kind regards,
Karolis K.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-05 18:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 18:35 redoing unnecessary targets when a do file is modified but the output remains unchanged Andrey Dobrovolsky
  -- strict thread matches above, loose matches on Subject: below --
2021-05-05 13:52 Andrey Dobrovolsky
2021-05-05 15:11 ` Sergey Matveev
2021-05-04 22:52 Andrey Dobrovolsky
2021-05-05  7:35 ` Sergey Matveev
2021-02-17 20:40 Karolis K
2021-02-21  8:38 ` Sergey Matveev
2021-02-27  9:09 ` Sergey Matveev