public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* redo-ood taking much longer to return in a copy of a project, compared with the original
@ 2021-11-19 20:23 Karolis K
  2021-11-19 20:41 ` Sergey Matveev
  0 siblings, 1 reply; 5+ messages in thread
From: Karolis K @ 2021-11-19 20:23 UTC (permalink / raw)
  To: goredo-devel

Hello,

I have a relatively large project handled by redo (not many dependencies, but the file sizes can be in gigabytes).
Recently I wanted to test some things out and so I made a hard copy of this project (cp -r projectdir projectdir2).

After doing this I noticed that the redo-ood, when started in projectdir2 took around 10x (maybe more) times longer to return compared with the original projectdir.
I also confirmed the same behaviour on a different project, on a different machine (both running CENTOS).

Is this known and expected?
My understanding was that since .redo dependencies are stored under each dir individually the computations shouldn’t depend on where the root is located. But somehow it does.

Any insights are very appreciated.
Kind regards,
Karolis Koncevičius

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redo-ood taking much longer to return in a copy of a project, compared with the original
  2021-11-19 20:23 redo-ood taking much longer to return in a copy of a project, compared with the original Karolis K
@ 2021-11-19 20:41 ` Sergey Matveev
  2021-11-19 20:45   ` Sergey Matveev
  0 siblings, 1 reply; 5+ messages in thread
From: Sergey Matveev @ 2021-11-19 20:41 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

Greetings!

*** Karolis K [2021-11-19 22:23]:
>My understanding was that since .redo dependencies are stored under each dir individually the computations shouldn’t depend on where the root is located. But somehow it does.

Each recorded dependency is stored with the following metainformation
(some real example):

    [...]
    Type: ifchange
    Target: all.do
    Hash: 48a30bcbca86c8e2f66daa4111f86d59c79d59619a2445c7004f23b5db45de22
    Size: 875
    CtimeSec: 1628512691
    CtimeNsec: 304504000
    [...]

By default, if file's ctime is the same, then it is assumed not modified
and no reading is done to compare its hash. When you copy your project,
then all files (even if you do "cp -a" (instead of "cp -r"), that will
keep mtime) will have different ctime value, so redo is forced to check
file's contents. If you "export REDO_INODE_NO_TRUST=1", then that
behaviour (always checking the hash) will be done everywhere. Ctime
metainformation is just an optimization based on assumption that
filesystem can be trusted in that way. After copying, recorded ctimes
are useless and you just loose that optimization.

And basically nothing can be done, as I can see. The only guaranteed
information we can trust is file's contents, that also can be trusted
through long-enough collision resistant hash function. Another thing
that can be used to skip hash checking is file's size: if it differs,
then file's content differ. And as an optimization to skip every time
file's reading, we can use some metainformation from filesystem. And
basically there are only mtime and ctime, that can be useful here. mtime
hardly can be trusted: https://apenwarr.ca/log/20181113
ctime is better, but it can give "false positives" for example when just
adding hardlink. But it is still very helpful in practice.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redo-ood taking much longer to return in a copy of a project, compared with the original
  2021-11-19 20:41 ` Sergey Matveev
@ 2021-11-19 20:45   ` Sergey Matveev
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Matveev @ 2021-11-19 20:45 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

I think that I can add an option to store and use mtime, instead of
ctime. It won't be on by default, but if someone (actually in most cases
on modern high-time-resolution file systems) trusts his system behaviour,
it could be useful. At least mtime can be kept during copying (cp -a,
tar c | tar x), that is impossible with ctime.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redo-ood taking much longer to return in a copy of a project, compared with the original
  2021-11-20 19:18 Karolis K
@ 2021-11-21 19:00 ` Sergey Matveev
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Matveev @ 2021-11-21 19:00 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

*** Karolis K [2021-11-20 21:18]:
>But I wonder - maybe it’s possible to somehow have this slow check be performed only once, and then updated?

Completely not sure if it should be done transparently and
automatically, but separate tool for that task is definitely
won't be a problem. I will look at it, when will got free time.

>It’s just hard for me to accept that after copying the project (or even renaming it with mv) it will be forever doomed to have a slow redo-ood return.

Not forever, but until the first rebuild/update. For me it is expectable
and acceptable behaviour, because... well, how can it be the other way?
I mean that reliable metainformation depends in practice on inode number
of similar things (apenwarr/redo does not depend on ctime, but on mtime
plus inode number and so on).

In goredo 1.21.0 you can use $REDO_INODE_TRUST=mtime if it is
acceptable (I believe in most cases in practice it is), so after even
now after copying (of course by keeping mtime) it won't fallback to hash
checking.

>Seems like this would negate a part of advantage that comes with having .redo/ info being stored within each directory separately.

I do not see any relation between our subject and .redo metainformation
placement. Main reason why .redo is stored in each directory: simplicity
and no problems with decision "where is the (projects) root?". Each
directory could be something isolated from another ones, unlike some
"central" database. For example apenwarr/redo admits that this is huge
pain to try to have single .redo (with sqlite3 database, in its case)
that is hard to determine where it should be placed.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redo-ood taking much longer to return in a copy of a project, compared with the original
@ 2021-11-20 19:18 Karolis K
  2021-11-21 19:00 ` Sergey Matveev
  0 siblings, 1 reply; 5+ messages in thread
From: Karolis K @ 2021-11-20 19:18 UTC (permalink / raw)
  To: goredo-devel

Hello and thank you for such a prompt reply!

> And basically nothing can be done, as I can see. The only guaranteed
> information we can trust is file's contents, that also can be trusted
> through long-enough collision resistant hash function.

I see what you mean here. But I wonder - maybe it’s possible to somehow have this slow check be performed only once, and then updated?
For example, and I assume here, that once the hash is checked and found correct we can maybe update the stored ctime to match that of the file?

It’s just hard for me to accept that after copying the project (or even renaming it with mv) it will be forever doomed to have a slow redo-ood return.
Seems like this would negate a part of advantage that comes with having .redo/ info being stored within each directory separately.

Sorry in advance if my suggestion doesn’t make sense.
Warm regards,
Karolis K.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-21 19:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-19 20:23 redo-ood taking much longer to return in a copy of a project, compared with the original Karolis K
2021-11-19 20:41 ` Sergey Matveev
2021-11-19 20:45   ` Sergey Matveev
2021-11-20 19:18 Karolis K
2021-11-21 19:00 ` Sergey Matveev