public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* Suggestion to revert touching files when the hash matches (problem with hardlinks)
@ 2022-10-31 21:37 Jan Niklas Böhm
  2022-11-01  6:42 ` goredo
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Niklas Böhm @ 2022-10-31 21:37 UTC (permalink / raw)
  To: goredo-devel

Hello again,

I would like to suggest reverting the change introduces in version 
1.23.0: Performance optimization: do not use target’s temporary output 
file at all, if its hash equals to already existing target’s one. Just 
touch existing target file instead.

Unfortunately, this breaks down as soon as hardlinks are used.  Since 
hardlinks and ctime do not play well together, this is all done with the 
environment variable set to REDO_INODE_TRUST=mtime.  Consider the 
following example: the do file "a.do" only has one line "echo aaa" and 
the file "b.do" is

	b.do
	---
	redo-ifchange a
	ln a $3

Calling "redo b" works as expected.  But when a is changed such that it 
is ood, but does not change its output (e.g. insert a comment), we get

	redo b
	redo . a (0.002s)
	err  b (0.006s): $1 was explicitly touched

This is because "a" is touched, which also touches the file "b" and thus 
goredo rightfully complains about it.  Even if it did not (which happens 
in my project because there is more indirection), the mtime is changed 
in the file, but the .rec file is not updated correspondingly.

Since I think it will be much more work to resolve all hardlinks and 
update the .rec file, I think it is more sensible to instead revert the 
behavior of touching the file and instead move $3 to the target location 
instead, which would avoid the problem outlined above.

Since both $3 and the final file are in the same directory, the moving 
should not be an expensive operation anyways I believe.

Let me know what you think!

Cheers
Nik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-10-31 21:37 Suggestion to revert touching files when the hash matches (problem with hardlinks) Jan Niklas Böhm
@ 2022-11-01  6:42 ` goredo
  2022-11-01  7:50   ` Jan Niklas Böhm
  0 siblings, 1 reply; 10+ messages in thread
From: goredo @ 2022-11-01  6:42 UTC (permalink / raw)
  To: goredo-devel

Hardlinks are a bad idea due to their "automatic updates". You no longer get the guarantee that your output is only changed by redo.

Are you sure this did not also happen before 1.23? Because I know this error. After the first run of b.do, you've already established the hardlink between a and b. Linking again to $3 doesn't change the fact that you also changed b directly (via the common link to a).

So, your problem has nothing to do with 1.23, even if the error behaviour has changed.

As an alternative, you could look into using 'cp --reflink' on modern file systems.

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01  6:42 ` goredo
@ 2022-11-01  7:50   ` Jan Niklas Böhm
  2022-11-01  8:21     ` goredo
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Niklas Böhm @ 2022-11-01  7:50 UTC (permalink / raw)
  To: goredo-devel

> Hardlinks are a bad idea due to their "automatic updates". You no longer get the guarantee that your output is only changed by redo.

Unfortunately I am kind of stuck with hardlinks at this point.  I 
actually have not looked in symlinks in detail yet, but that feels a bit 
hacky (since then there is only the indirect link between the file and 
the data contents).

> Are you sure this did not also happen before 1.23? Because I know this error. After the first run of b.do, you've already established the hardlink between a and b. Linking again to $3 doesn't change the fact that you also changed b directly (via the common link to a).
I am fairly sure that this is due to the symlinking, also because this 
error does not occur when the output of "a" is changed and thus the file 
gets renamed.

The reason is that when both "a" and "b" point to the same inode and we 
have to redo both it roughly goes like this:

	echo aaa > a.tmp
	# goredo does the following, but only if a.tmp != a
	mv a.tmp a

So now "a" and "b" point to different inodes and "b" remains unchanged. 
Then when we "redo b" it will establish the hardlink again.  This is 
what in my opinion should also happen if the output of "redo a" did not 
change the contents of "a".

While the contents do not change throughout, by touching "a" the mtime 
for "b" does change and that's what messes up the state in the redo 
process / recfiles, unless I am misunderstanding something.

The file attributes after the first call to "redo b" (when it exits with 
0) are:

	a, inode = 123, mtime = 1
	b, inode = 123, mtime = 2

Now with version 1.27.1 (or any after 1.23.0) when we change "a.do" so 
that it is rerun, but its output does not change, and then "redo a" the 
files look like:

	a, inode = 123, mtime = 3
	b, inode = 123, mtime = 3   # not 2 anymore, error for redo

Whereas if we would move $3 to a it would look like:

	a, inode = 321, mtime = 3
	b, inode = 123, mtime = 2

And "b" could be redone once more, since it is not seemingly modified 
externally.

> As an alternative, you could look into using 'cp --reflink' on modern file systems.

Thanks for that suggestion, this actually reflects the intention a bit 
better of what I would want to happen.  Unfortunately it is not 
supported on the machines I am using.

What I am not sure about is what will trigger the copy mechanism and 
whether that is well suited.  On the one hand, if touching the file 
triggers the copy already, then the updating mechanism from goredo will 
become fairly expensive as this now triggers a full copying instead of 
only a renaming operation.  On the other hand, if touching does not 
cause a copy, then the issue outlined above will also persist.  Of 
course this is more hypothetical, since I cannot use it anyways.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01  7:50   ` Jan Niklas Böhm
@ 2022-11-01  8:21     ` goredo
  2022-11-01  9:02       ` Jan Niklas Böhm
  0 siblings, 1 reply; 10+ messages in thread
From: goredo @ 2022-11-01  8:21 UTC (permalink / raw)
  To: Jan Niklas Böhm; +Cc: goredo-devel

> The reason is that when both "a" and "b" point to the same inode and we have to redo both it roughly goes like this:
>
>     echo aaa > a.tmp
>     # goredo does the following, but only if a.tmp != a
>     mv a.tmp a
>

This is not what happens, because your proposition was that b is constructed from a via ln. So, a and b point to the same inode. In the second run, redo links a to $3. Now, a, b and $3 point to the same inode, regardless of whether redo mv's $3 to a afterwards or not. This is where the toolchain breaks, because it looks like a changed underneath, because all three point to the same inode.

When you are stuck with hardlinks, that is unfortunate. Hardlinks don't work, by design. Reverting won't change that. It will only revert to a different error.

On the same note, softlinks to directories also don't work because of the distributed redo database. Redo relies on the property that there is one canonical path to its dependencies and it gets confused by softlinks opening up a second path.

All this said, the 2 basic rules to a working redo toolchain is: Things that mean the same entity must be referenced by the same path. And things that mean different entities must be referenced by different paths. (Links can violate either rule).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01  8:21     ` goredo
@ 2022-11-01  9:02       ` Jan Niklas Böhm
  2022-11-01 11:49         ` Spacefrogg
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Niklas Böhm @ 2022-11-01  9:02 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]

First of all, thanks for the response.


> This is not what happens, because your proposition was that b is constructed from a via ln. So, a and b point to the same inode. In the second run, redo links a to $3. Now, a, b and $3 point to the same inode, regardless of whether redo mv's $3 to a afterwards or not. This is where the toolchain breaks, because it looks like a changed underneath, because all three point to the same inode.

I disagree with this, maybe there is a misunderstanding.  My setting is 
the following.  I have two .do files: a.do with the line

	echo aaa # or whatever

and b.do with the two lines

	redo-ifchange a
	ln a $3

Now I call "redo b".  This gives me the following files with "stat 
--format="%n: inode=%i, mtime=%Y" a b":

	a: inode=646, mtime=1667292427
	b: inode=646, mtime=1667292427

Now if I change a.do to

	echo bbb # or whatever

and I "REDO_INODE_TRUST=mtime redo a" (ctime already complains because 
`ln` changes the ctime for "a"), I get the files

	a: inode=655, mtime=1667292495
	b: inode=646, mtime=1667292427

So we can see that "b" is not changed, and "REDO_INODE_TRUST=mtime redo 
b" also works and the output is as expected

	a: inode=655, mtime=1667292495
	b: inode=655, mtime=1667292495

However, if a.do is now changed to

	echo bbb # same output as before

and we call "REDO_INODE_TRUST=mtime redo a", then the mtime for b is 
messed up:

	a: inode=655, mtime=1667292646
	b: inode=655, mtime=1667292646  # wasn't redone, but changed

If we now call "REDO_INODE_TRUST=mtime redo b", we get the warning

	warn b externally modified: not redoing

which is due to "a" being touched (as the file hash is equal to $3). 
This does not happen if $3 was renamed to "a", as then "b" would point 
to the old inode.

If the attached patch is applied to run.go (basically remove the block 
that touches the file when the hash matches), then this issue does not 
occur, as the file "a" will have a new inode number in the last example.

I hope the example makes the point a bit more clear.  Sorry that it got 
so long.

[-- Attachment #2: run.go.patch --]
[-- Type: text/x-patch, Size: 581 bytes --]

733,754d732
< 				if hsh == hshPrev {
< 					tracef(CDebug, "%s has same hash, not renaming", tgtOrig)
< 					err = os.Remove(fd.Name())
< 					if err != nil {
< 						goto Finish
< 					}
< 					err = os.Chtimes(path.Join(cwdOrig, tgt), finished, finished)
< 					if err != nil {
< 						goto Finish
< 					}
< 					if !NoSync {
< 						err = syncDir(cwdOrig)
< 						if err != nil {
< 							goto Finish
< 						}
< 					}
< 					err = depWrite(fdDep, cwdOrig, tgt, hshPrev)
< 					if err != nil {
< 						goto Finish
< 					}
< 					goto RecCommit
< 				}
778d755
< 	RecCommit:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01  9:02       ` Jan Niklas Böhm
@ 2022-11-01 11:49         ` Spacefrogg
  2022-11-01 13:14           ` Jan Niklas Böhm
  0 siblings, 1 reply; 10+ messages in thread
From: Spacefrogg @ 2022-11-01 11:49 UTC (permalink / raw)
  To: Jan Niklas Böhm; +Cc: goredo-devel

>     a: inode=646, mtime=1667292427
>     b: inode=646, mtime=1667292427

Your bug is right here. This is a misuse of redo and breaks its logic. A and B are expected to mean different things, so they MUST NOT point to the same inode. Period.

>     a: inode=655, mtime=1667292646
>     b: inode=655, mtime=1667292646  # wasn't redone, but changed

This is, where you trigger the bug. Yes, reverting goredo resolves this exact TRIGGER of the bug. Your actual bug is still present, see above.

I reiterate: Separate paths MUST represent separate inodes or you break redo logic one way or another.

Your bug has nothing to do with the change in 1.23. I don't see why your case of misusing redo should be given any preference over a completely useful optimisation that is in line with the logic of redo.

One way out of your mess is that you create a c.do with an indirect dependency:
redo-ifchange a
ln a b # indirect dependency
touch $3

Users have to depend on c and know that they should use b. Now, you are honest to your consumers, that b is some dirty optimisation that may/will be changed whenever a is changed.

Final words: I sympathize with what you are trying to achieve. But you should rather fix the bug instead of removing one of its triggers.

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01 11:49         ` Spacefrogg
@ 2022-11-01 13:14           ` Jan Niklas Böhm
  2022-11-02 13:57             ` Sergey Matveev
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Niklas Böhm @ 2022-11-01 13:14 UTC (permalink / raw)
  To: goredo-devel

>>      a: inode=646, mtime=1667292427
>>      b: inode=646, mtime=1667292427
> 
> Your bug is right here. This is a misuse of redo and breaks its logic. A and B are expected to mean different things, so they MUST NOT point to the same inode. Period.
> 
>>      a: inode=655, mtime=1667292646
>>      b: inode=655, mtime=1667292646  # wasn't redone, but changed
> 
> This is, where you trigger the bug. Yes, reverting goredo resolves this exact TRIGGER of the bug. Your actual bug is still present, see above.
> 
> I reiterate: Separate paths MUST represent separate inodes or you break redo logic one way or another.

Thank you for elaborating on this.  I see what you mean now.  To be 
honest, I am not aware of anyone making this strict assumption.

> Your bug has nothing to do with the change in 1.23. I don't see why your case of misusing redo should be given any preference over a completely useful optimisation that is in line with the logic of redo.

There are two reasons for changing the behavior:

1. The optimization is not improving the entire execution by a lot. 
Instead of calling os.Rename, goredo calls os.Remove and os.Chtimes. 
Since both the temporary file and the target are in the same directory, 
they're also on the same drive, making os.Rename a cheap operation.
2. The behavior I suggested agrees with the implementation by apenwarr.

> One way out of your mess is that you create a c.do with an indirect dependency:
> redo-ifchange a
> ln a b # indirect dependency
> touch $3
> 
> Users have to depend on c and know that they should use b. Now, you are honest to your consumers, that b is some dirty optimisation that may/will be changed whenever a is changed.

My current setup is structured somewhat like this.  One of my issues is 
that the analogue to the target c produces many files, which I do not 
want to be visible unless they are specifically requested via redoing 
them (in which case they'll be hardlinked from another directory where 
they are saved).

> Final words: I sympathize with what you are trying to achieve. But you should rather fix the bug instead of removing one of its triggers.

Thanks, and thanks for the correspondence (despite the disagreements).

An alternative suggestion is to change the function `isModified`. 
Currently it only checks the ctime/mtime based on the value 
REDO_INODE_TRUST, but will not check the file contents in case the time 
differs (this is the behavior of `Inode.Equals`).  Adding a hash check 
before returning the computed modified value would decrease the number 
of false positives here, because this is the reason why the warning 
about an externally modified file is emitted in the first place.  I'm 
not sure whether the hash check should be done in Inode.Equals or 
outside of that function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-01 13:14           ` Jan Niklas Böhm
@ 2022-11-02 13:57             ` Sergey Matveev
  2022-11-02 22:42               ` Jan Niklas Böhm
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Matveev @ 2022-11-02 13:57 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 3267 bytes --]

Greetings!

goredo (as apenwarr/redo, redo-c, baredo, redo-sh) expects that
filesystem paths reference only a single "object", single entity it is
aware of. To make hardlinks working, you have to change the whole
architecture, where you explicitly expect single entity (under redo's
control) to have possible multiple identifiers on filesystem.

I agree with Spacefrogg's answers on that thread, emphasizing that any
expectations on any certain behaviour of hardlinked files are just wrong
(with current widely used implementations). You should redesign your
targets and workflow to fit redo's expectation. Maybe some
proxy/intermediate targets will help, maybe just do not track all of
generated files and honestly expect defined (by your .do-files)
behaviour, where the fact of successful "foo" target completion also
means "foo.bar" file existence (although untracked).

>There are two reasons for changing the behavior:
>
>1. The optimization is not improving the entire execution by a lot. Instead
>of calling os.Rename, goredo calls os.Remove and os.Chtimes. Since both the
>temporary file and the target are in the same directory, they're also on the
>same drive, making os.Rename a cheap operation.

Actually that optimisation *may* improve execution a lot when goredo is
used on filesystems with active write-cache usage (like UFS with
soft-updates or ZFS). With that optimisation: temporary file is created,
then it is filled with the output, and then it is deleted -- everything
related to it will be just dismissed from the write-cache and no real
I/O is issued to the disk. Except for inode update, that is much more
lightweight operation. Without that optimisation your disk is literally
forced to create a new copy of the file, removing the old one, that is
considerable amount of really issued I/O. You may notice that "if hsh ==
hshPrev" check is done before fsync() is called -- so that optimisation
works even with REDO_NO_SYNC=0. I did that optimisation exactly because
of high I/O rate and no files contents really changed.

>An alternative suggestion is to change the function `isModified`. Currently
>it only checks the ctime/mtime based on the value REDO_INODE_TRUST, but will
>not check the file contents in case the time differs (this is the behavior
>of `Inode.Equals`).  Adding a hash check before returning the computed
>modified value would decrease the number of false positives here, because
>this is the reason why the warning about an externally modified file is
>emitted in the first place.  I'm not sure whether the hash check should be
>done in Inode.Equals or outside of that function.

Modification check is only intended to warn user about some unexpected
events happened with the target under redo's observation and control.
Target must be produced only by redo itself, under its tight control. If
someone "external" modifies it, then in general that can be treated as
undefined behaviour and wrong usage of the redo ecosystem itself. So
even if file's content stays the same, but its inode is touched
"outside" redo, then something wrong is already occurring.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-02 13:57             ` Sergey Matveev
@ 2022-11-02 22:42               ` Jan Niklas Böhm
  2022-11-03  8:55                 ` Sergey Matveev
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Niklas Böhm @ 2022-11-02 22:42 UTC (permalink / raw)
  To: goredo-devel

> goredo (as apenwarr/redo, redo-c, baredo, redo-sh) expects that
> filesystem paths reference only a single "object", single entity it is
> aware of. To make hardlinks working, you have to change the whole
> architecture, where you explicitly expect single entity (under redo's
> control) to have possible multiple identifiers on filesystem.

Maybe I am missing something, but I am not really sure why this would 
require more than the change that I was suggesting.  The hardlink will 
be created by goredo and thus both files will be tracked by redo.  Thus 
the two files being equivalent should be perfectly valid in this 
setting, the only reason that it falls apart is the direct mutation of 
the old target file via touch/os.Chtimes.

In general neither file pointing to the same inode should be written to 
or mutated (or any files that were created by goredo) so it doesn't 
matter which link to the inode is used to modify it, both are wrong.  I 
think that touching the old target file is the wrong level of 
abstraction.  On the flipside, using mv means committing the transaction 
with the file that goredo controlled all along.

When hardlinking any file to $3 it also does not matter whether the 
source of the hardlink was redone prior to the linking or not since no 
writing will be done to that file.  The only difference will be that the 
ctime and the link counter will change, but those are not vital 
attributes to the integrity of the file data.

> I agree with Spacefrogg's answers on that thread, emphasizing that any
> expectations on any certain behaviour of hardlinked files are just wrong
> (with current widely used implementations). You should redesign your
> targets and workflow to fit redo's expectation. Maybe some
> proxy/intermediate targets will help, maybe just do not track all of
> generated files and honestly expect defined (by your .do-files)
> behaviour, where the fact of successful "foo" target completion also
> means "foo.bar" file existence (although untracked).

I am reluctant because that will then encode the dependency between foo 
and foo.bar in the code instead of using the dependency resolution of 
redo itself.  This is not a problem for the simple case, but will become 
increasingly more complex the more targets and linked files interact. 
This is precisely what a build system excels at, so the proposal seems a 
bit unsatisfactory.

The same way that goredo expects that files are not touched by the user 
the user should be allowed to expect that redo does not touch the files 
it created itself.

> Actually that optimisation *may* improve execution a lot when goredo is
> used on filesystems with active write-cache usage (like UFS with
> soft-updates or ZFS). With that optimisation: temporary file is created,
> then it is filled with the output, and then it is deleted -- everything
> related to it will be just dismissed from the write-cache and no real
> I/O is issued to the disk. Except for inode update, that is much more
> lightweight operation. Without that optimisation your disk is literally
> forced to create a new copy of the file, removing the old one, that is
> considerable amount of really issued I/O. You may notice that "if hsh ==
> hshPrev" check is done before fsync() is called -- so that optimisation
> works even with REDO_NO_SYNC=0. I did that optimisation exactly because
> of high I/O rate and no files contents really changed.

I was not aware of those optimizations.  But I would argue that the time 
spent on building $3 will most likely dwarf the performance enhancement. 
  Outside of testing goredo this will probably never bottleneck the 
application.  But an interesting thing that I didn't know before, 
nontheless.  (This I am not sure about, but if you "cp --reflink" to a 
file that is then touched, will the full copy materialize?  If yes, this 
would be a more expensive write since the data of the target file is not 
cached.  If no, then "cp --reflink" would also not solve the issue in 
this particular case and touching the file would break it.)

> Modification check is only intended to warn user about some unexpected
> events happened with the target under redo's observation and control.
> Target must be produced only by redo itself, under its tight control. If
> someone "external" modifies it, then in general that can be treated as
> undefined behaviour and wrong usage of the redo ecosystem itself. So
> even if file's content stays the same, but its inode is touched
> "outside" redo, then something wrong is already occurring.

That I can agree with.

I would like to put forward yet another possible solution.  The number 
of links to the file could be checked prior to calling os.Chtimes and 
only do the optimized procedure if the number of links to the target 
file is 1 since that will not have any ripple effects.  Conceptually 
moving $3 to the target feels cleaner as this does not mutate file 
attributes but I have to admit that I am not aware of all the advantages 
that using touch might bring.

I do not think that using hardlinks should invalidate most assumptions 
of the build system, as they often play nice, unless they are seriously 
abused (such as doing "ln foo $3; echo bla > $3).  But even that would 
be picked up by redo if the file "foo" was under its control and a 
warning would be emitted.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
  2022-11-02 22:42               ` Jan Niklas Böhm
@ 2022-11-03  8:55                 ` Sergey Matveev
  0 siblings, 0 replies; 10+ messages in thread
From: Sergey Matveev @ 2022-11-03  8:55 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 4933 bytes --]

*** Jan Niklas Böhm [2022-11-02 23:42]:
>Maybe I am missing something, but I am not really sure why this would
>require more than the change that I was suggesting.

For example there are "foo" and "bar" linked together. If redo deals
with "foo" target now, then *any* modification to it must not affect
anyhow "bar" target, because it is completely another "object". The fact
that modifying (does not matter if it is contents, inode, or complete
new file renamed atop of existing one) of "foo" affects "bar" somehow --
is unexpected behaviour from redo's point of view. *That* is the problem.

>The hardlink will be
>created by goredo and thus both files will be tracked by redo.  Thus the two
>files being equivalent should be perfectly valid in this setting, the only
>reason that it falls apart is the direct mutation of the old target file via
>touch/os.Chtimes.

Your suggestion leads to desirable side-effect that is friendly to your
setup based on hardlinks. It is just for your particular use-case. In
general, (any known to me) redo is not expected to work in defined and
predictable way when filesystem changes of one file affects another. In
another possible use-cases with hard/soft-links it won't help.

>I think that touching the old target file is the wrong level of abstraction.

If redo is used in expected way (single path is a single trackable
object), then touching of the target is neither valid, nor invalid
thing. It just plays no role to the end user.

Fact of using hard/soft-links *is* the problem (when changing of
one "object" (file) transparently affects another one). They heavily
complicates many things. That is why ex-Unix-creators completely
abandoned the idea of links in Plan 9 operating system. They are just
not worth of it.

>I am reluctant because that will then encode the dependency between foo and
>foo.bar in the code instead of using the dependency resolution of redo
>itself.  This is not a problem for the simple case, but will become
>increasingly more complex the more targets and linked files interact. This
>is precisely what a build system excels at, so the proposal seems a bit
>unsatisfactory.

Agreed that it is not perfect solution. But hard/soft-links are the root
of all that complications. I am against dealing with them at all.

>The same way that goredo expects that files are not touched by the user the
>user should be allowed to expect that redo does not touch the files it
>created itself.

Disagree with that. What particular metainformation is touched -- is
solely the internal question of that dependency system. User should deal
with path names and file's contents. Ideally redo should not look at any
metainformation at all -- it should just look at file's contents. But we
do not do that by default because of performance reasons and in most
cases in practice ctime can be more or less trusted.

>But I would argue that the time
>spent on building $3 will most likely dwarf the performance enhancement.
>Outside of testing goredo this will probably never bottleneck the
>application.

That depends on the target itself. And number of targets. If there are
thousands of them (I have got that kind of projects), and each will add
additional (for example) 2 I/O operations overhead, then it will lead to
many additional seconds of waiting for the disk drive, that at best in
many cases can provide only ~250 IOPS. I do not have enough numbers
left, but that optimisation was clearly visible with my project. It is
not a bottleneck, but *can* be significant and considerable part of
overall build time in my practice.

>(This I am not sure about, but if you "cp --reflink" to a file
>that is then touched, will the full copy materialize?

As I know, reflinks deals with content blocks only. That is the main
difference between them and hardlinks, who also shares the inode itself.
reflink is copy-on-write feature, independent from inode metainformation.

>I would like to put forward yet another possible solution.  The number of
>links to the file could be checked prior to calling os.Chtimes and only do
>the optimized procedure if the number of links to the target file is 1 since
>that will not have any ripple effects.

It won't have ripple effect only in your hardlink-based exact use-case.
Somewhere it will possibly break more expectations accidentally. redo's
user just should not make any expectations and rely on any kind of
behaviour dependant on hard/soft-links at all. Those filesystem features
are harmful (as Plan9 creators also decided).

>I do not think that using hardlinks should invalidate most assumptions of
>the build system

But in fact they invalidate them in surprising ways. Symbolic links are
another complication-bringing beasts.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-03  8:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 21:37 Suggestion to revert touching files when the hash matches (problem with hardlinks) Jan Niklas Böhm
2022-11-01  6:42 ` goredo
2022-11-01  7:50   ` Jan Niklas Böhm
2022-11-01  8:21     ` goredo
2022-11-01  9:02       ` Jan Niklas Böhm
2022-11-01 11:49         ` Spacefrogg
2022-11-01 13:14           ` Jan Niklas Böhm
2022-11-02 13:57             ` Sergey Matveev
2022-11-02 22:42               ` Jan Niklas Böhm
2022-11-03  8:55                 ` Sergey Matveev