public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* Slowness (caching issues?) in dependency checking
@ 2022-08-25 16:05 Jan Niklas Böhm
2022-08-26 16:09 ` Sergey Matveev
0 siblings, 1 reply; 6+ messages in thread
From: Jan Niklas Böhm @ 2022-08-25 16:05 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 3168 bytes --]
Hello all,
while using goredo I encountered some slowness when checking targets. I
am not sure how exactly it is ocurring and was wondering whether this
can be solved somehow.
I have attached an MWE that hopefully illustrates my point. If the tar
archive is decompressed there are five do files:
all.do: does some computation by calling default.run.do
a bunch of times in a deep folder hierarchy
default.run.do: recursively prints some stuff
default.suffix{,2}.do: both simply append some text to a target
default.cat.do: prints a part of its file, used for side-effect
The target `all` creates a lot of folders in out/ so that the dependency
tracking requires a bit of work (the default.run.do is seprarated by a
lot of folders from the .run file `out/a/.../z/$i.$j.run`).
If the target is called with `redo-ifchange all.suffix.suffix2` then
everything is being computed in order, as it should (takes ~25 sec. on
my machine).
When the same line is executed again, all the dependencies need to be
checked, which takes some time. But what surprises me is that the time
to check whether the .suffix or .suffix2 rule is ood is quite
significant. I used the following to benchmark the process:
hyperfine 'redo-ifchange all.suffix' 'redo-ifchange
all.suffix.suffix2'
and the results I get are:
Benchmark 1: redo-ifchange all.suffix
Time (mean ± σ): 7.107 s ± 0.587 s [User: 7.464 s,
System: 1.501 s]
Range (min … max): 6.541 s … 8.101 s 10 runs
Benchmark 2: redo-ifchange all.suffix.suffix2
Time (mean ± σ): 8.161 s ± 0.093 s [User: 8.626 s,
System: 1.750 s]
Range (min … max): 7.991 s … 8.234 s 10 runs
Summary
'redo-ifchange all.suffix' ran
1.15 ± 0.10 times faster than 'redo-ifchange all.suffix.suffix2'
I did not expect that there would be this much of a discrepancy between
those two targets, the overhead for .suffix2 should be negligible in
comparison, but it still takes a second longer to check this target with
`redo-ifchange`. Just from how this seems to me I would think that it
is something related to the cache. Or is there another explanation for
this?
When running redo with the debug flag the program gets to the line (from
`ifchange.go`)
[127847] dbg collecting deps
quite fast but then seems to get stuck at that stage for a while until
it then proceeds to spit out a lot of dependecy checking information.
If the file `default.suffix.do` is changed trivially (e.g. by changing
the string it appends) then redo will also have another pause with the
same line as above when running the .suffix dependency rule. Doing this
shows the issue quite clearly as there is a significant delay between
the two targets:
redo . all.suffix (default.suffix.do) (11.107s)
redo all.suffix.suffix2 (default.suffix2.do) (15.130s)
I hope the mail somewhat outlined the issue but I am happy to provide
more information or explanations. I took a stab at the code but it was
not really clear how I could solve my issue.
Thanks in advance and for reading all that text
Nik
[-- Attachment #2: redotest.tar --]
[-- Type: application/x-tar, Size: 10240 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking
2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm
@ 2022-08-26 16:09 ` Sergey Matveev
2022-08-27 9:11 ` Jan Niklas Böhm
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2022-08-26 16:09 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]
Greetings!
I tried to reproduce any difference between all.suffix and
all.suffix.suffix2 targets, but completely do not see any.
All my measurements of running those targets takes nearly
the same amount of time. No noticeable overhead for .suffix2.
Various temporary and lock files are created during each redo
invocation, so maybe that is so huge filesystem's overhead? I run those
commands both on tmpfs and ZFS dataset and the latter works slower, but
again with no noticeable difference between those two targets. I looked
at debug output of those both commands and, as expected, the difference
only in additional OOD level check, that virtually has no cost.
I tried playing with REDO_NO_SYNC and sync attribute on filesystem
(actually now I just too lazy to check if any write happens during
redo-ifchange) and it plays no role.
So I really have no ideas, except for overhead of OS overhead, like new
process invocation, however I hardly believe in it, because one
additional statically-linked Go process startup time should be
negligible. There is no caching of OOD information in goredo, except for
temporary file with (already determined) OOD targets, that is not used
during "redo-ifchange all.suffix*" commands.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking
2022-08-26 16:09 ` Sergey Matveev
@ 2022-08-27 9:11 ` Jan Niklas Böhm
2022-08-28 14:49 ` Sergey Matveev
0 siblings, 1 reply; 6+ messages in thread
From: Jan Niklas Böhm @ 2022-08-27 9:11 UTC (permalink / raw)
To: goredo-devel
First of all, thank you for the quick response.
> I tried to reproduce any difference between all.suffix and
> all.suffix.suffix2 targets, but completely do not see any.
> All my measurements of running those targets takes nearly
> the same amount of time. No noticeable overhead for .suffix2.
That is quite surprising to me. I tried out the following on a tmpfs
and got the following results:
$ redo-ifchange all.suffix.suffix2 # first run
...
$ # change in default.suffix.do
$ time redo-ifchange all.suffix # 17.5 sec
redo all.suffix (default.suffix.do) (12.957s)
$ # change in default.suffix.do
$ time redo-ifchange all.suffix.suffix2 # 22 secs
redo . all.suffix (default.suffix.do) (12.893s)
redo all.suffix.suffix2 (default.suffix2.do) (17.424s)
I would expect that in both cases the amount of time taken would be
almost equal and not multiple seconds apart. Does this behavior not
arise on your machine? I've tried this out with the new version 1.26 now.
(As an aside, it is a bit odd that `time` reports 5 seconds more than
redo does.)
> Various temporary and lock files are created during each redo
> invocation, so maybe that is so huge filesystem's overhead? I run those
> commands both on tmpfs and ZFS dataset and the latter works slower, but
> again with no noticeable difference between those two targets. I looked
> at debug output of those both commands and, as expected, the difference
> only in additional OOD level check, that virtually has no cost.
Yes, this issue is exacerbated if the fs is slower. I first encountered
it on an NFS and there the test above takes around 20 minutes, with a
delay between .suffix and .suffix.suffix2 of 25 seconds (tested with
v1.25). So I would conclude from this that it is not an issue of the fs
itself.
> I tried playing with REDO_NO_SYNC and sync attribute on filesystem
> (actually now I just too lazy to check if any write happens during
> redo-ifchange) and it plays no role.
Thank you for trying this out. I did not change this variable at all,
but it also does not seem to have an effect on my side.
> So I really have no ideas, except for overhead of OS overhead, like new
> process invocation, however I hardly believe in it, because one
> additional statically-linked Go process startup time should be
> negligible. There is no caching of OOD information in goredo, except for
> temporary file with (already determined) OOD targets, that is not used
> during "redo-ifchange all.suffix*" commands.
What I was wondering about is whether the ood information regarding
`default.run.do` is cached. Since this file is the same for all of
`out/a/.../z/$i.$j.run` it need only be checked once. But the debug
output lead me to believe that this file (and whether any
`default.{,run.}.do` in the folders in-between) is checked separately
for each target. I would assume that this information could be cached
for a single invocation of `redo-ifchange`, or is it not intended?
As another note: the apenwarr-redo implementation takes roughly 0.5
seconds to `redo-ifchange` the target on a tmpfs and 1 minute on and
NFS, so it is quite a lot faster. Is this expected due to this
implementation storing the ood information in a single sqlite file? I
would assume that it would make things faster but that the difference
would not be orders of magnitude. (That is one of the design decisions
that I do not like about the apenwarr-redo implementation.)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking
2022-08-27 9:11 ` Jan Niklas Böhm
@ 2022-08-28 14:49 ` Sergey Matveev
2022-08-28 18:30 ` Niklas Böhm
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2022-08-28 14:49 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 4261 bytes --]
*** Jan Niklas Böhm [2022-08-27 11:11]:
>I would expect that in both cases the amount of time taken would be almost
>equal and not multiple seconds apart. Does this behavior not arise on your
>machine? I've tried this out with the new version 1.26 now.
That one is reproducible indeed on my machine:
% redo all.suffix
[it tooks ~3.38sec]
% vi default.suffix.do
% time redo-ifchange all.suffix
redo all.suffix (default.suffix.do) (10.258s)
user:9,19s sys:6,35s mem:191902/11 io:13/36 13,162
% vi default.suffix.do
% time redo-ifchange all.suffix.suffix2
redo . all.suffix (default.suffix.do) (10.266s)
redo all.suffix.suffix2 (default.suffix2.do) (13.166s)
user:12,45s sys:6,79s mem:200804/11 io:10/75 16,138
% time redo-ifchange all.suffix.suffix2
user:6,38s sys:5,47s mem:193014/11 io:0/0 10,204
>(As an aside, it is a bit odd that `time` reports 5 seconds more than redo
>does.)
Well, goredo tries to report time exactly spent on run target. Its
"started=now()" (analogue) is placed as near as possible to the exec()
call, and finished=now() is placed just after exec() is finished. So
that time does not include related supplementary things like dealing
with temporary files, committing state files, working with lockfiles and
similar things. Moreover multiple goroutines are run, that can create
additional processes, meaning additional context switches.
As I can see, OOD checking+building (first redo-ifchange call in my
output) takes ~10sec according to goredo's measurement and ~13sec told
by time-command. So there is ~3sec overhead not measured by goredo. The
second redo-ifchange reports about the same ~10sec of work spent on
all.suffix, ~13sec (the same time told by time-command of first
redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec
overhead, giving resulting ~16sec of total execution time.
Those 3sec of "overhead" looks pretty scaring big. However after each
target is finished, goredo deal with temporary files, lockfiles, their
syncing, fully copying of dependency .rec-files by fully parsing them.
>What I was wondering about is whether the ood information regarding
>`default.run.do` is cached. Since this file is the same for all of
>`out/a/.../z/$i.$j.run` it need only be checked once.
No caching is done at all there.
>I would
>assume that this information could be cached for a single invocation of
>`redo-ifchange`, or is it not intended?
Well, currently I do not see any drawbacks of caching that kind of
information. Seems that there should not be any kind of races. OOD
checking is done at the very beginning of goredo invocation and later
only the targets are run.
I made 1.27.0 release that caches OOD decisions and file existence
checks. Profiling shows that majority of time goredo spent in syscalls.
Actually my time-command's output shows that too.
Next biggest bottleneck was the parser of .rec files. It applied regular
expression for nearly each line of those .rec files, that took nearly 2
seconds in total. I threw out regexp and know:
% time redo-ifchange all.suffix.suffix2
user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198
But profiling showed that nearly 800ms is still spent on parsing
dependencies. However your redotest created:
% cat **.rec G "^Type:" W
87785
% cat **.rec W
292957
87k records with 292k lines in them.
>As another note: the apenwarr-redo implementation takes roughly 0.5 seconds
>to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is
>quite a lot faster. Is this expected due to this implementation storing the
>ood information in a single sqlite file?
After seeing 292k of lines goredo parses in redotest, I am confident
that apenwarr/redo is faster exactly because of that: dealing with
single-file efficient database vs parsing of 87k number of records in
1028 files (however I am doubt that reading of cached 1k files is
non-negligible time). Of course previously goredo also produced huge
amount of syscalls, that was the major obstacle.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking
2022-08-28 14:49 ` Sergey Matveev
@ 2022-08-28 18:30 ` Niklas Böhm
2022-08-29 18:25 ` Sergey Matveev
0 siblings, 1 reply; 6+ messages in thread
From: Niklas Böhm @ 2022-08-28 18:30 UTC (permalink / raw)
To: goredo-devel
>> (As an aside, it is a bit odd that `time` reports 5 seconds more than redo
>> does.)
>
> Well, goredo tries to report time exactly spent on run target. Its
> "started=now()" (analogue) is placed as near as possible to the exec()
> call, and finished=now() is placed just after exec() is finished. So
> that time does not include related supplementary things like dealing
> with temporary files, committing state files, working with lockfiles and
> similar things. Moreover multiple goroutines are run, that can create
> additional processes, meaning additional context switches.
>
> As I can see, OOD checking+building (first redo-ifchange call in my
> output) takes ~10sec according to goredo's measurement and ~13sec told
> by time-command. So there is ~3sec overhead not measured by goredo. The
> second redo-ifchange reports about the same ~10sec of work spent on
> all.suffix, ~13sec (the same time told by time-command of first
> redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec
> overhead, giving resulting ~16sec of total execution time.
>
> Those 3sec of "overhead" looks pretty scaring big. However after each
> target is finished, goredo deal with temporary files, lockfiles, their
> syncing, fully copying of dependency .rec-files by fully parsing them.
That makes sense, then I guess this could be further decreased by
setting REDO_NO_SYNC=1, although that was just something I noticed and
am not bothered by.
> Well, currently I do not see any drawbacks of caching that kind of
> information. Seems that there should not be any kind of races. OOD
> checking is done at the very beginning of goredo invocation and later
> only the targets are run.
>
> I made 1.27.0 release that caches OOD decisions and file existence
> checks. Profiling shows that majority of time goredo spent in syscalls.
> Actually my time-command's output shows that too >
> Next biggest bottleneck was the parser of .rec files. It applied regular
> expression for nearly each line of those .rec files, that took nearly 2
> seconds in total. I threw out regexp and know:
>
> % time redo-ifchange all.suffix.suffix2
> user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198
>
> But profiling showed that nearly 800ms is still spent on parsing
> dependencies. However your redotest created:
>
> % cat **.rec G "^Type:" W
> 87785
> % cat **.rec W
> 292957
>
> 87k records with 292k lines in them.
Thank you, that resolves my issue and helps by a lot. I have played
around with it for a bit and now the program checks the dependencies a
lot faster in the example case. On a local disk it went from 10 seconds
down to 1.5 and on the NFS it's now from 22 minutes down to about 3, so
quite big improvements.
>> As another note: the apenwarr-redo implementation takes roughly 0.5 seconds
>> to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is
>> quite a lot faster. Is this expected due to this implementation storing the
>> ood information in a single sqlite file?
>
> After seeing 292k of lines goredo parses in redotest, I am confident
> that apenwarr/redo is faster exactly because of that: dealing with
> single-file efficient database vs parsing of 87k number of records in
> 1028 files (however I am doubt that reading of cached 1k files is
> non-negligible time). Of course previously goredo also produced huge
> amount of syscalls, that was the major obstacle.
I agree that this is probably the explanation of the remaining
difference, but that is an acceptable trade-off since the difference is
not that large anymore. By using separate files goredo plays much nicer
with distributed systems; the sqlite database gets corrupted when more
than one computer accesses it via a networked FS.
Thank you for the fast response and improving the software so quickly.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking
2022-08-28 18:30 ` Niklas Böhm
@ 2022-08-29 18:25 ` Sergey Matveev
0 siblings, 0 replies; 6+ messages in thread
From: Sergey Matveev @ 2022-08-29 18:25 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1334 bytes --]
*** Niklas Böhm [2022-08-28 20:30]:
>That makes sense, then I guess this could be further decreased by setting
>REDO_NO_SYNC=1, although that was just something I noticed and am not
>bothered by.
Nothing is written during OOD-check, so NO_SYNC won't help in that case.
But during the build, when .redo/*.rec are created -- it will affect
filesystem operations.
>Thank you, that resolves my issue and helps by a lot.
Great, glad to hear about that hugely reduced time! There were huge
amount of syscalls, that are relatively slow especially because of
network filesystem delays.
>I agree that this is probably the explanation of the remaining difference,
>but that is an acceptable trade-off since the difference is not that large
>anymore.
I think it is acceptable tradeoff now too. Maybe there will be another
simpler format, but its worktime will be anyway negligible on NFS.
>By using separate files goredo plays much nicer with distributed
>systems; the sqlite database gets corrupted when more than one computer
>accesses it via a networked FS.
Somewhere apenwarr wrote that he also thinks his decision of single
database was wrong and database-per-directory should be preferrable.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-29 18:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm
2022-08-26 16:09 ` Sergey Matveev
2022-08-27 9:11 ` Jan Niklas Böhm
2022-08-28 14:49 ` Sergey Matveev
2022-08-28 18:30 ` Niklas Böhm
2022-08-29 18:25 ` Sergey Matveev