public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* Slowness (caching issues?) in dependency checking
@ 2022-08-25 16:05 Jan Niklas Böhm
  2022-08-26 16:09 ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Niklas Böhm @ 2022-08-25 16:05 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 3168 bytes --]

Hello all,

while using goredo I encountered some slowness when checking targets.  I 
am not sure how exactly it is ocurring and was wondering whether this 
can be solved somehow.

I have attached an MWE that hopefully illustrates my point.  If the tar 
archive is decompressed there are five do files:
   all.do: does some computation by calling default.run.do
           a bunch of times in a deep folder hierarchy
   default.run.do: recursively prints some stuff
   default.suffix{,2}.do: both simply append some text to a target
   default.cat.do: prints a part of its file, used for side-effect

The target `all` creates a lot of folders in out/ so that the dependency 
tracking requires a bit of work (the default.run.do is seprarated by a 
lot of folders from the .run file `out/a/.../z/$i.$j.run`).

If the target is called with `redo-ifchange all.suffix.suffix2` then 
everything is being computed in order, as it should (takes ~25 sec. on 
my machine).

When the same line is executed again, all the dependencies need to be 
checked, which takes some time.  But what surprises me is that the time 
to check whether the .suffix or .suffix2 rule is ood is quite 
significant.  I used the following to benchmark the process:

      hyperfine 'redo-ifchange all.suffix' 'redo-ifchange 
all.suffix.suffix2'

and the results I get are:

     Benchmark 1: redo-ifchange all.suffix
       Time (mean ± σ):      7.107 s ±  0.587 s    [User: 7.464 s, 
System: 1.501 s]
       Range (min … max):    6.541 s …  8.101 s    10 runs

     Benchmark 2: redo-ifchange all.suffix.suffix2
       Time (mean ± σ):      8.161 s ±  0.093 s    [User: 8.626 s, 
System: 1.750 s]
       Range (min … max):    7.991 s …  8.234 s    10 runs

     Summary
       'redo-ifchange all.suffix' ran
         1.15 ± 0.10 times faster than 'redo-ifchange all.suffix.suffix2'

I did not expect that there would be this much of a discrepancy between 
those two targets, the overhead for .suffix2 should be negligible in 
comparison, but it still takes a second longer to check this target with 
`redo-ifchange`.  Just from how this seems to me I would think that it 
is something related to the cache.  Or is there another explanation for 
this?


When running redo with the debug flag the program gets to the line (from 
`ifchange.go`)

      [127847] dbg  collecting deps

quite fast but then seems to get stuck at that stage for a while until 
it then proceeds to spit out a lot of dependecy checking information.

If the file `default.suffix.do` is changed trivially (e.g. by changing 
the string it appends) then redo will also have another pause with the 
same line as above when running the .suffix dependency rule.  Doing this 
shows the issue quite clearly as there is a significant delay between 
the two targets:

     redo . all.suffix (default.suffix.do) (11.107s)
     redo all.suffix.suffix2 (default.suffix2.do) (15.130s)

I hope the mail somewhat outlined the issue but I am happy to provide 
more information or explanations.  I took a stab at the code but it was 
not really clear how I could solve my issue.

Thanks in advance and for reading all that text
Nik

[-- Attachment #2: redotest.tar --]
[-- Type: application/x-tar, Size: 10240 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Slowness (caching issues?) in dependency checking
  2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm
@ 2022-08-26 16:09 ` Sergey Matveev
  2022-08-27  9:11   ` Jan Niklas Böhm
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2022-08-26 16:09 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

Greetings!

I tried to reproduce any difference between all.suffix and
all.suffix.suffix2 targets, but completely do not see any.
All my measurements of running those targets takes nearly
the same amount of time. No noticeable overhead for .suffix2.

Various temporary and lock files are created during each redo
invocation, so maybe that is so huge filesystem's overhead? I run those
commands both on tmpfs and ZFS dataset and the latter works slower, but
again with no noticeable difference between those two targets. I looked
at debug output of those both commands and, as expected, the difference
only in additional OOD level check, that virtually has no cost.

I tried playing with REDO_NO_SYNC and sync attribute on filesystem
(actually now I just too lazy to check if any write happens during
redo-ifchange) and it plays no role.

So I really have no ideas, except for overhead of OS overhead, like new
process invocation, however I hardly believe in it, because one
additional statically-linked Go process startup time should be
negligible. There is no caching of OOD information in goredo, except for
temporary file with (already determined) OOD targets, that is not used
during "redo-ifchange all.suffix*" commands.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Slowness (caching issues?) in dependency checking
  2022-08-26 16:09 ` Sergey Matveev
@ 2022-08-27  9:11   ` Jan Niklas Böhm
  2022-08-28 14:49     ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Niklas Böhm @ 2022-08-27  9:11 UTC (permalink / raw)
  To: goredo-devel

First of all, thank you for the quick response.

> I tried to reproduce any difference between all.suffix and
> all.suffix.suffix2 targets, but completely do not see any.
> All my measurements of running those targets takes nearly
> the same amount of time. No noticeable overhead for .suffix2.

That is quite surprising to me.  I tried out the following on a tmpfs 
and got the following results:

$ redo-ifchange all.suffix.suffix2 # first run
...
$ # change in default.suffix.do
$ time redo-ifchange all.suffix # 17.5 sec
redo all.suffix (default.suffix.do) (12.957s)
$ # change in default.suffix.do
$ time redo-ifchange all.suffix.suffix2 # 22 secs
redo . all.suffix (default.suffix.do) (12.893s)
redo all.suffix.suffix2 (default.suffix2.do) (17.424s)

I would expect that in both cases the amount of time taken would be 
almost equal and not multiple seconds apart.  Does this behavior not 
arise on your machine?  I've tried this out with the new version 1.26 now.

(As an aside, it is a bit odd that `time` reports 5 seconds more than 
redo does.)

> Various temporary and lock files are created during each redo
> invocation, so maybe that is so huge filesystem's overhead? I run those
> commands both on tmpfs and ZFS dataset and the latter works slower, but
> again with no noticeable difference between those two targets. I looked
> at debug output of those both commands and, as expected, the difference
> only in additional OOD level check, that virtually has no cost.

Yes, this issue is exacerbated if the fs is slower.  I first encountered 
it on an NFS and there the test above takes around 20 minutes, with a 
delay between .suffix and .suffix.suffix2 of 25 seconds (tested with 
v1.25).  So I would conclude from this that it is not an issue of the fs 
itself.

> I tried playing with REDO_NO_SYNC and sync attribute on filesystem
> (actually now I just too lazy to check if any write happens during
> redo-ifchange) and it plays no role.

Thank you for trying this out.  I did not change this variable at all, 
but it also does not seem to have an effect on my side.

> So I really have no ideas, except for overhead of OS overhead, like new
> process invocation, however I hardly believe in it, because one
> additional statically-linked Go process startup time should be
> negligible. There is no caching of OOD information in goredo, except for
> temporary file with (already determined) OOD targets, that is not used
> during "redo-ifchange all.suffix*" commands.

What I was wondering about is whether the ood information regarding 
`default.run.do` is cached.  Since this file is the same for all of 
`out/a/.../z/$i.$j.run` it need only be checked once.  But the debug 
output lead me to believe that this file (and whether any 
`default.{,run.}.do` in the folders in-between) is checked separately 
for each target.  I would assume that this information could be cached 
for a single invocation of `redo-ifchange`, or is it not intended?

As another note: the apenwarr-redo implementation takes roughly 0.5 
seconds to `redo-ifchange` the target on a tmpfs and 1 minute on and 
NFS, so it is quite a lot faster.  Is this expected due to this 
implementation storing the ood information in a single sqlite file?  I 
would assume that it would make things faster but that the difference 
would not be orders of magnitude. (That is one of the design decisions 
that I do not like about the apenwarr-redo implementation.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Slowness (caching issues?) in dependency checking
  2022-08-27  9:11   ` Jan Niklas Böhm
@ 2022-08-28 14:49     ` Sergey Matveev
  2022-08-28 18:30       ` Niklas Böhm
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2022-08-28 14:49 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 4261 bytes --]

*** Jan Niklas Böhm [2022-08-27 11:11]:
>I would expect that in both cases the amount of time taken would be almost
>equal and not multiple seconds apart.  Does this behavior not arise on your
>machine?  I've tried this out with the new version 1.26 now.

That one is reproducible indeed on my machine:

    % redo all.suffix
    [it tooks ~3.38sec]

    % vi default.suffix.do
    % time redo-ifchange all.suffix
    redo all.suffix (default.suffix.do) (10.258s)
    user:9,19s sys:6,35s mem:191902/11 io:13/36 13,162

    % vi default.suffix.do
    % time redo-ifchange all.suffix.suffix2
    redo . all.suffix (default.suffix.do) (10.266s)
    redo all.suffix.suffix2 (default.suffix2.do) (13.166s)
    user:12,45s sys:6,79s mem:200804/11 io:10/75 16,138

    % time redo-ifchange all.suffix.suffix2
    user:6,38s sys:5,47s mem:193014/11 io:0/0 10,204

>(As an aside, it is a bit odd that `time` reports 5 seconds more than redo
>does.)

Well, goredo tries to report time exactly spent on run target. Its
"started=now()" (analogue) is placed as near as possible to the exec()
call, and finished=now() is placed just after exec() is finished. So
that time does not include related supplementary things like dealing
with temporary files, committing state files, working with lockfiles and
similar things. Moreover multiple goroutines are run, that can create
additional processes, meaning additional context switches.

As I can see, OOD checking+building (first redo-ifchange call in my
output) takes ~10sec according to goredo's measurement and ~13sec told
by time-command. So there is ~3sec overhead not measured by goredo. The
second redo-ifchange reports about the same ~10sec of work spent on
all.suffix, ~13sec (the same time told by time-command of first
redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec
overhead, giving resulting ~16sec of total execution time.

Those 3sec of "overhead" looks pretty scaring big. However after each
target is finished, goredo deal with temporary files, lockfiles, their
syncing, fully copying of dependency .rec-files by fully parsing them.

>What I was wondering about is whether the ood information regarding
>`default.run.do` is cached.  Since this file is the same for all of
>`out/a/.../z/$i.$j.run` it need only be checked once.

No caching is done at all there.

>I would
>assume that this information could be cached for a single invocation of
>`redo-ifchange`, or is it not intended?

Well, currently I do not see any drawbacks of caching that kind of
information. Seems that there should not be any kind of races. OOD
checking is done at the very beginning of goredo invocation and later
only the targets are run.

I made 1.27.0 release that caches OOD decisions and file existence
checks. Profiling shows that majority of time goredo spent in syscalls.
Actually my time-command's output shows that too.

Next biggest bottleneck was the parser of .rec files. It applied regular
expression for nearly each line of those .rec files, that took nearly 2
seconds in total. I threw out regexp and know:

    % time redo-ifchange all.suffix.suffix2
    user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198

But profiling showed that nearly 800ms is still spent on parsing
dependencies. However your redotest created:

    % cat **.rec G "^Type:" W
    87785
    % cat **.rec W
    292957

87k records with 292k lines in them.

>As another note: the apenwarr-redo implementation takes roughly 0.5 seconds
>to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is
>quite a lot faster.  Is this expected due to this implementation storing the
>ood information in a single sqlite file?

After seeing 292k of lines goredo parses in redotest, I am confident
that apenwarr/redo is faster exactly because of that: dealing with
single-file efficient database vs parsing of 87k number of records in
1028 files (however I am doubt that reading of cached 1k files is
non-negligible time). Of course previously goredo also produced huge
amount of syscalls, that was the major obstacle.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Slowness (caching issues?) in dependency checking
  2022-08-28 14:49     ` Sergey Matveev
@ 2022-08-28 18:30       ` Niklas Böhm
  2022-08-29 18:25         ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Böhm @ 2022-08-28 18:30 UTC (permalink / raw)
  To: goredo-devel

>> (As an aside, it is a bit odd that `time` reports 5 seconds more than redo
>> does.)
> 
> Well, goredo tries to report time exactly spent on run target. Its
> "started=now()" (analogue) is placed as near as possible to the exec()
> call, and finished=now() is placed just after exec() is finished. So
> that time does not include related supplementary things like dealing
> with temporary files, committing state files, working with lockfiles and
> similar things. Moreover multiple goroutines are run, that can create
> additional processes, meaning additional context switches.
> 
> As I can see, OOD checking+building (first redo-ifchange call in my
> output) takes ~10sec according to goredo's measurement and ~13sec told
> by time-command. So there is ~3sec overhead not measured by goredo. The
> second redo-ifchange reports about the same ~10sec of work spent on
> all.suffix, ~13sec (the same time told by time-command of first
> redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec
> overhead, giving resulting ~16sec of total execution time.
> 
> Those 3sec of "overhead" looks pretty scaring big. However after each
> target is finished, goredo deal with temporary files, lockfiles, their
> syncing, fully copying of dependency .rec-files by fully parsing them.

That makes sense, then I guess this could be further decreased by 
setting REDO_NO_SYNC=1, although that was just something I noticed and 
am not bothered by.

> Well, currently I do not see any drawbacks of caching that kind of
> information. Seems that there should not be any kind of races. OOD
> checking is done at the very beginning of goredo invocation and later
> only the targets are run.
> 
> I made 1.27.0 release that caches OOD decisions and file existence
> checks. Profiling shows that majority of time goredo spent in syscalls.
> Actually my time-command's output shows that too >
> Next biggest bottleneck was the parser of .rec files. It applied regular
> expression for nearly each line of those .rec files, that took nearly 2
> seconds in total. I threw out regexp and know:
> 
>      % time redo-ifchange all.suffix.suffix2
>      user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198
> 
> But profiling showed that nearly 800ms is still spent on parsing
> dependencies. However your redotest created:
> 
>      % cat **.rec G "^Type:" W
>      87785
>      % cat **.rec W
>      292957
> 
> 87k records with 292k lines in them.

Thank you, that resolves my issue and helps by a lot.  I have played 
around with it for a bit and now the program checks the dependencies a 
lot faster in the example case.  On a local disk it went from 10 seconds 
down to 1.5 and on the NFS it's now from 22 minutes down to about 3, so 
quite big improvements.

>> As another note: the apenwarr-redo implementation takes roughly 0.5 seconds
>> to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is
>> quite a lot faster.  Is this expected due to this implementation storing the
>> ood information in a single sqlite file?
> 
> After seeing 292k of lines goredo parses in redotest, I am confident
> that apenwarr/redo is faster exactly because of that: dealing with
> single-file efficient database vs parsing of 87k number of records in
> 1028 files (however I am doubt that reading of cached 1k files is
> non-negligible time). Of course previously goredo also produced huge
> amount of syscalls, that was the major obstacle.

I agree that this is probably the explanation of the remaining 
difference, but that is an acceptable trade-off since the difference is 
not that large anymore.  By using separate files goredo plays much nicer 
with distributed systems; the sqlite database gets corrupted when more 
than one computer accesses it via a networked FS.

Thank you for the fast response and improving the software so quickly.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Slowness (caching issues?) in dependency checking
  2022-08-28 18:30       ` Niklas Böhm
@ 2022-08-29 18:25         ` Sergey Matveev
  0 siblings, 0 replies; 6+ messages in thread
From: Sergey Matveev @ 2022-08-29 18:25 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1334 bytes --]

*** Niklas Böhm [2022-08-28 20:30]:
>That makes sense, then I guess this could be further decreased by setting
>REDO_NO_SYNC=1, although that was just something I noticed and am not
>bothered by.

Nothing is written during OOD-check, so NO_SYNC won't help in that case.
But during the build, when .redo/*.rec are created -- it will affect
filesystem operations.

>Thank you, that resolves my issue and helps by a lot.

Great, glad to hear about that hugely reduced time! There were huge
amount of syscalls, that are relatively slow especially because of
network filesystem delays.

>I agree that this is probably the explanation of the remaining difference,
>but that is an acceptable trade-off since the difference is not that large
>anymore.

I think it is acceptable tradeoff now too. Maybe there will be another
simpler format, but its worktime will be anyway negligible on NFS.

>By using separate files goredo plays much nicer with distributed
>systems; the sqlite database gets corrupted when more than one computer
>accesses it via a networked FS.

Somewhere apenwarr wrote that he also thinks his decision of single
database was wrong and database-per-directory should be preferrable.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-29 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm
2022-08-26 16:09 ` Sergey Matveev
2022-08-27  9:11   ` Jan Niklas Böhm
2022-08-28 14:49     ` Sergey Matveev
2022-08-28 18:30       ` Niklas Böhm
2022-08-29 18:25         ` Sergey Matveev