public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
From: "Jan Niklas Böhm" <mail@jnboehm•com>
To: goredo-devel@lists.cypherpunks.ru
Subject: Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
Date: Tue, 1 Nov 2022 08:50:20 +0100	[thread overview]
Message-ID: <64d10f4c-b1db-c04e-e238-ee7c26fd5595@jnboehm.com> (raw)
In-Reply-To: <eee2108f-a0b1-45be-8dfe-1cffc5eba5e0@spacefrogg.net>

> Hardlinks are a bad idea due to their "automatic updates". You no longer get the guarantee that your output is only changed by redo.

Unfortunately I am kind of stuck with hardlinks at this point.  I 
actually have not looked in symlinks in detail yet, but that feels a bit 
hacky (since then there is only the indirect link between the file and 
the data contents).

> Are you sure this did not also happen before 1.23? Because I know this error. After the first run of b.do, you've already established the hardlink between a and b. Linking again to $3 doesn't change the fact that you also changed b directly (via the common link to a).
I am fairly sure that this is due to the symlinking, also because this 
error does not occur when the output of "a" is changed and thus the file 
gets renamed.

The reason is that when both "a" and "b" point to the same inode and we 
have to redo both it roughly goes like this:

	echo aaa > a.tmp
	# goredo does the following, but only if a.tmp != a
	mv a.tmp a

So now "a" and "b" point to different inodes and "b" remains unchanged. 
Then when we "redo b" it will establish the hardlink again.  This is 
what in my opinion should also happen if the output of "redo a" did not 
change the contents of "a".

While the contents do not change throughout, by touching "a" the mtime 
for "b" does change and that's what messes up the state in the redo 
process / recfiles, unless I am misunderstanding something.

The file attributes after the first call to "redo b" (when it exits with 
0) are:

	a, inode = 123, mtime = 1
	b, inode = 123, mtime = 2

Now with version 1.27.1 (or any after 1.23.0) when we change "a.do" so 
that it is rerun, but its output does not change, and then "redo a" the 
files look like:

	a, inode = 123, mtime = 3
	b, inode = 123, mtime = 3   # not 2 anymore, error for redo

Whereas if we would move $3 to a it would look like:

	a, inode = 321, mtime = 3
	b, inode = 123, mtime = 2

And "b" could be redone once more, since it is not seemingly modified 
externally.

> As an alternative, you could look into using 'cp --reflink' on modern file systems.

Thanks for that suggestion, this actually reflects the intention a bit 
better of what I would want to happen.  Unfortunately it is not 
supported on the machines I am using.

What I am not sure about is what will trigger the copy mechanism and 
whether that is well suited.  On the one hand, if touching the file 
triggers the copy already, then the updating mechanism from goredo will 
become fairly expensive as this now triggers a full copying instead of 
only a renaming operation.  On the other hand, if touching does not 
cause a copy, then the issue outlined above will also persist.  Of 
course this is more hypothetical, since I cannot use it anyways.

  reply	other threads:[~2022-11-01  7:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 21:37 Suggestion to revert touching files when the hash matches (problem with hardlinks) Jan Niklas Böhm
2022-11-01  6:42 ` goredo
2022-11-01  7:50   ` Jan Niklas Böhm [this message]
2022-11-01  8:21     ` goredo
2022-11-01  9:02       ` Jan Niklas Böhm
2022-11-01 11:49         ` Spacefrogg
2022-11-01 13:14           ` Jan Niklas Böhm
2022-11-02 13:57             ` Sergey Matveev
2022-11-02 22:42               ` Jan Niklas Böhm
2022-11-03  8:55                 ` Sergey Matveev