public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
From: Sergey Matveev <stargrave@stargrave•org>
To: goredo-devel@lists.cypherpunks.ru
Subject: Re: Suggestion to revert touching files when the hash matches (problem with hardlinks)
Date: Thu, 3 Nov 2022 11:55:26 +0300	[thread overview]
Message-ID: <Y2OCCj9r8LsqENK7@stargrave.org> (raw)
In-Reply-To: <b8add293-85d0-2dc3-978b-4900788fe071@jnboehm.com>

[-- Attachment #1: Type: text/plain, Size: 4933 bytes --]

*** Jan Niklas Böhm [2022-11-02 23:42]:
>Maybe I am missing something, but I am not really sure why this would
>require more than the change that I was suggesting.

For example there are "foo" and "bar" linked together. If redo deals
with "foo" target now, then *any* modification to it must not affect
anyhow "bar" target, because it is completely another "object". The fact
that modifying (does not matter if it is contents, inode, or complete
new file renamed atop of existing one) of "foo" affects "bar" somehow --
is unexpected behaviour from redo's point of view. *That* is the problem.

>The hardlink will be
>created by goredo and thus both files will be tracked by redo.  Thus the two
>files being equivalent should be perfectly valid in this setting, the only
>reason that it falls apart is the direct mutation of the old target file via
>touch/os.Chtimes.

Your suggestion leads to desirable side-effect that is friendly to your
setup based on hardlinks. It is just for your particular use-case. In
general, (any known to me) redo is not expected to work in defined and
predictable way when filesystem changes of one file affects another. In
another possible use-cases with hard/soft-links it won't help.

>I think that touching the old target file is the wrong level of abstraction.

If redo is used in expected way (single path is a single trackable
object), then touching of the target is neither valid, nor invalid
thing. It just plays no role to the end user.

Fact of using hard/soft-links *is* the problem (when changing of
one "object" (file) transparently affects another one). They heavily
complicates many things. That is why ex-Unix-creators completely
abandoned the idea of links in Plan 9 operating system. They are just
not worth of it.

>I am reluctant because that will then encode the dependency between foo and
>foo.bar in the code instead of using the dependency resolution of redo
>itself.  This is not a problem for the simple case, but will become
>increasingly more complex the more targets and linked files interact. This
>is precisely what a build system excels at, so the proposal seems a bit
>unsatisfactory.

Agreed that it is not perfect solution. But hard/soft-links are the root
of all that complications. I am against dealing with them at all.

>The same way that goredo expects that files are not touched by the user the
>user should be allowed to expect that redo does not touch the files it
>created itself.

Disagree with that. What particular metainformation is touched -- is
solely the internal question of that dependency system. User should deal
with path names and file's contents. Ideally redo should not look at any
metainformation at all -- it should just look at file's contents. But we
do not do that by default because of performance reasons and in most
cases in practice ctime can be more or less trusted.

>But I would argue that the time
>spent on building $3 will most likely dwarf the performance enhancement.
>Outside of testing goredo this will probably never bottleneck the
>application.

That depends on the target itself. And number of targets. If there are
thousands of them (I have got that kind of projects), and each will add
additional (for example) 2 I/O operations overhead, then it will lead to
many additional seconds of waiting for the disk drive, that at best in
many cases can provide only ~250 IOPS. I do not have enough numbers
left, but that optimisation was clearly visible with my project. It is
not a bottleneck, but *can* be significant and considerable part of
overall build time in my practice.

>(This I am not sure about, but if you "cp --reflink" to a file
>that is then touched, will the full copy materialize?

As I know, reflinks deals with content blocks only. That is the main
difference between them and hardlinks, who also shares the inode itself.
reflink is copy-on-write feature, independent from inode metainformation.

>I would like to put forward yet another possible solution.  The number of
>links to the file could be checked prior to calling os.Chtimes and only do
>the optimized procedure if the number of links to the target file is 1 since
>that will not have any ripple effects.

It won't have ripple effect only in your hardlink-based exact use-case.
Somewhere it will possibly break more expectations accidentally. redo's
user just should not make any expectations and rely on any kind of
behaviour dependant on hard/soft-links at all. Those filesystem features
are harmful (as Plan9 creators also decided).

>I do not think that using hardlinks should invalidate most assumptions of
>the build system

But in fact they invalidate them in surprising ways. Symbolic links are
another complication-bringing beasts.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      reply	other threads:[~2022-11-03  8:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 21:37 Suggestion to revert touching files when the hash matches (problem with hardlinks) Jan Niklas Böhm
2022-11-01  6:42 ` goredo
2022-11-01  7:50   ` Jan Niklas Böhm
2022-11-01  8:21     ` goredo
2022-11-01  9:02       ` Jan Niklas Böhm
2022-11-01 11:49         ` Spacefrogg
2022-11-01 13:14           ` Jan Niklas Böhm
2022-11-02 13:57             ` Sergey Matveev
2022-11-02 22:42               ` Jan Niklas Böhm
2022-11-03  8:55                 ` Sergey Matveev [this message]