public inbox for goredo-devel@lists.cypherpunks.ru Atom feed
* Slowness (caching issues?) in dependency checking @ 2022-08-25 16:05 Jan Niklas Böhm 2022-08-26 16:09 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: Jan Niklas Böhm @ 2022-08-25 16:05 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 3168 bytes --] Hello all, while using goredo I encountered some slowness when checking targets. I am not sure how exactly it is ocurring and was wondering whether this can be solved somehow. I have attached an MWE that hopefully illustrates my point. If the tar archive is decompressed there are five do files: all.do: does some computation by calling default.run.do a bunch of times in a deep folder hierarchy default.run.do: recursively prints some stuff default.suffix{,2}.do: both simply append some text to a target default.cat.do: prints a part of its file, used for side-effect The target `all` creates a lot of folders in out/ so that the dependency tracking requires a bit of work (the default.run.do is seprarated by a lot of folders from the .run file `out/a/.../z/$i.$j.run`). If the target is called with `redo-ifchange all.suffix.suffix2` then everything is being computed in order, as it should (takes ~25 sec. on my machine). When the same line is executed again, all the dependencies need to be checked, which takes some time. But what surprises me is that the time to check whether the .suffix or .suffix2 rule is ood is quite significant. I used the following to benchmark the process: hyperfine 'redo-ifchange all.suffix' 'redo-ifchange all.suffix.suffix2' and the results I get are: Benchmark 1: redo-ifchange all.suffix Time (mean ± σ): 7.107 s ± 0.587 s [User: 7.464 s, System: 1.501 s] Range (min … max): 6.541 s … 8.101 s 10 runs Benchmark 2: redo-ifchange all.suffix.suffix2 Time (mean ± σ): 8.161 s ± 0.093 s [User: 8.626 s, System: 1.750 s] Range (min … max): 7.991 s … 8.234 s 10 runs Summary 'redo-ifchange all.suffix' ran 1.15 ± 0.10 times faster than 'redo-ifchange all.suffix.suffix2' I did not expect that there would be this much of a discrepancy between those two targets, the overhead for .suffix2 should be negligible in comparison, but it still takes a second longer to check this target with `redo-ifchange`. Just from how this seems to me I would think that it is something related to the cache. Or is there another explanation for this? When running redo with the debug flag the program gets to the line (from `ifchange.go`) [127847] dbg collecting deps quite fast but then seems to get stuck at that stage for a while until it then proceeds to spit out a lot of dependecy checking information. If the file `default.suffix.do` is changed trivially (e.g. by changing the string it appends) then redo will also have another pause with the same line as above when running the .suffix dependency rule. Doing this shows the issue quite clearly as there is a significant delay between the two targets: redo . all.suffix (default.suffix.do) (11.107s) redo all.suffix.suffix2 (default.suffix2.do) (15.130s) I hope the mail somewhat outlined the issue but I am happy to provide more information or explanations. I took a stab at the code but it was not really clear how I could solve my issue. Thanks in advance and for reading all that text Nik [-- Attachment #2: redotest.tar --] [-- Type: application/x-tar, Size: 10240 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking 2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm @ 2022-08-26 16:09 ` Sergey Matveev 2022-08-27 9:11 ` Jan Niklas Böhm 0 siblings, 1 reply; 6+ messages in thread From: Sergey Matveev @ 2022-08-26 16:09 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1362 bytes --] Greetings! I tried to reproduce any difference between all.suffix and all.suffix.suffix2 targets, but completely do not see any. All my measurements of running those targets takes nearly the same amount of time. No noticeable overhead for .suffix2. Various temporary and lock files are created during each redo invocation, so maybe that is so huge filesystem's overhead? I run those commands both on tmpfs and ZFS dataset and the latter works slower, but again with no noticeable difference between those two targets. I looked at debug output of those both commands and, as expected, the difference only in additional OOD level check, that virtually has no cost. I tried playing with REDO_NO_SYNC and sync attribute on filesystem (actually now I just too lazy to check if any write happens during redo-ifchange) and it plays no role. So I really have no ideas, except for overhead of OS overhead, like new process invocation, however I hardly believe in it, because one additional statically-linked Go process startup time should be negligible. There is no caching of OOD information in goredo, except for temporary file with (already determined) OOD targets, that is not used during "redo-ifchange all.suffix*" commands. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking 2022-08-26 16:09 ` Sergey Matveev @ 2022-08-27 9:11 ` Jan Niklas Böhm 2022-08-28 14:49 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: Jan Niklas Böhm @ 2022-08-27 9:11 UTC (permalink / raw) To: goredo-devel First of all, thank you for the quick response. > I tried to reproduce any difference between all.suffix and > all.suffix.suffix2 targets, but completely do not see any. > All my measurements of running those targets takes nearly > the same amount of time. No noticeable overhead for .suffix2. That is quite surprising to me. I tried out the following on a tmpfs and got the following results: $ redo-ifchange all.suffix.suffix2 # first run ... $ # change in default.suffix.do $ time redo-ifchange all.suffix # 17.5 sec redo all.suffix (default.suffix.do) (12.957s) $ # change in default.suffix.do $ time redo-ifchange all.suffix.suffix2 # 22 secs redo . all.suffix (default.suffix.do) (12.893s) redo all.suffix.suffix2 (default.suffix2.do) (17.424s) I would expect that in both cases the amount of time taken would be almost equal and not multiple seconds apart. Does this behavior not arise on your machine? I've tried this out with the new version 1.26 now. (As an aside, it is a bit odd that `time` reports 5 seconds more than redo does.) > Various temporary and lock files are created during each redo > invocation, so maybe that is so huge filesystem's overhead? I run those > commands both on tmpfs and ZFS dataset and the latter works slower, but > again with no noticeable difference between those two targets. I looked > at debug output of those both commands and, as expected, the difference > only in additional OOD level check, that virtually has no cost. Yes, this issue is exacerbated if the fs is slower. I first encountered it on an NFS and there the test above takes around 20 minutes, with a delay between .suffix and .suffix.suffix2 of 25 seconds (tested with v1.25). So I would conclude from this that it is not an issue of the fs itself. > I tried playing with REDO_NO_SYNC and sync attribute on filesystem > (actually now I just too lazy to check if any write happens during > redo-ifchange) and it plays no role. Thank you for trying this out. I did not change this variable at all, but it also does not seem to have an effect on my side. > So I really have no ideas, except for overhead of OS overhead, like new > process invocation, however I hardly believe in it, because one > additional statically-linked Go process startup time should be > negligible. There is no caching of OOD information in goredo, except for > temporary file with (already determined) OOD targets, that is not used > during "redo-ifchange all.suffix*" commands. What I was wondering about is whether the ood information regarding `default.run.do` is cached. Since this file is the same for all of `out/a/.../z/$i.$j.run` it need only be checked once. But the debug output lead me to believe that this file (and whether any `default.{,run.}.do` in the folders in-between) is checked separately for each target. I would assume that this information could be cached for a single invocation of `redo-ifchange`, or is it not intended? As another note: the apenwarr-redo implementation takes roughly 0.5 seconds to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is quite a lot faster. Is this expected due to this implementation storing the ood information in a single sqlite file? I would assume that it would make things faster but that the difference would not be orders of magnitude. (That is one of the design decisions that I do not like about the apenwarr-redo implementation.) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking 2022-08-27 9:11 ` Jan Niklas Böhm @ 2022-08-28 14:49 ` Sergey Matveev 2022-08-28 18:30 ` Niklas Böhm 0 siblings, 1 reply; 6+ messages in thread From: Sergey Matveev @ 2022-08-28 14:49 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 4261 bytes --] *** Jan Niklas Böhm [2022-08-27 11:11]: >I would expect that in both cases the amount of time taken would be almost >equal and not multiple seconds apart. Does this behavior not arise on your >machine? I've tried this out with the new version 1.26 now. That one is reproducible indeed on my machine: % redo all.suffix [it tooks ~3.38sec] % vi default.suffix.do % time redo-ifchange all.suffix redo all.suffix (default.suffix.do) (10.258s) user:9,19s sys:6,35s mem:191902/11 io:13/36 13,162 % vi default.suffix.do % time redo-ifchange all.suffix.suffix2 redo . all.suffix (default.suffix.do) (10.266s) redo all.suffix.suffix2 (default.suffix2.do) (13.166s) user:12,45s sys:6,79s mem:200804/11 io:10/75 16,138 % time redo-ifchange all.suffix.suffix2 user:6,38s sys:5,47s mem:193014/11 io:0/0 10,204 >(As an aside, it is a bit odd that `time` reports 5 seconds more than redo >does.) Well, goredo tries to report time exactly spent on run target. Its "started=now()" (analogue) is placed as near as possible to the exec() call, and finished=now() is placed just after exec() is finished. So that time does not include related supplementary things like dealing with temporary files, committing state files, working with lockfiles and similar things. Moreover multiple goroutines are run, that can create additional processes, meaning additional context switches. As I can see, OOD checking+building (first redo-ifchange call in my output) takes ~10sec according to goredo's measurement and ~13sec told by time-command. So there is ~3sec overhead not measured by goredo. The second redo-ifchange reports about the same ~10sec of work spent on all.suffix, ~13sec (the same time told by time-command of first redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec overhead, giving resulting ~16sec of total execution time. Those 3sec of "overhead" looks pretty scaring big. However after each target is finished, goredo deal with temporary files, lockfiles, their syncing, fully copying of dependency .rec-files by fully parsing them. >What I was wondering about is whether the ood information regarding >`default.run.do` is cached. Since this file is the same for all of >`out/a/.../z/$i.$j.run` it need only be checked once. No caching is done at all there. >I would >assume that this information could be cached for a single invocation of >`redo-ifchange`, or is it not intended? Well, currently I do not see any drawbacks of caching that kind of information. Seems that there should not be any kind of races. OOD checking is done at the very beginning of goredo invocation and later only the targets are run. I made 1.27.0 release that caches OOD decisions and file existence checks. Profiling shows that majority of time goredo spent in syscalls. Actually my time-command's output shows that too. Next biggest bottleneck was the parser of .rec files. It applied regular expression for nearly each line of those .rec files, that took nearly 2 seconds in total. I threw out regexp and know: % time redo-ifchange all.suffix.suffix2 user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198 But profiling showed that nearly 800ms is still spent on parsing dependencies. However your redotest created: % cat **.rec G "^Type:" W 87785 % cat **.rec W 292957 87k records with 292k lines in them. >As another note: the apenwarr-redo implementation takes roughly 0.5 seconds >to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is >quite a lot faster. Is this expected due to this implementation storing the >ood information in a single sqlite file? After seeing 292k of lines goredo parses in redotest, I am confident that apenwarr/redo is faster exactly because of that: dealing with single-file efficient database vs parsing of 87k number of records in 1028 files (however I am doubt that reading of cached 1k files is non-negligible time). Of course previously goredo also produced huge amount of syscalls, that was the major obstacle. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking 2022-08-28 14:49 ` Sergey Matveev @ 2022-08-28 18:30 ` Niklas Böhm 2022-08-29 18:25 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: Niklas Böhm @ 2022-08-28 18:30 UTC (permalink / raw) To: goredo-devel >> (As an aside, it is a bit odd that `time` reports 5 seconds more than redo >> does.) > > Well, goredo tries to report time exactly spent on run target. Its > "started=now()" (analogue) is placed as near as possible to the exec() > call, and finished=now() is placed just after exec() is finished. So > that time does not include related supplementary things like dealing > with temporary files, committing state files, working with lockfiles and > similar things. Moreover multiple goroutines are run, that can create > additional processes, meaning additional context switches. > > As I can see, OOD checking+building (first redo-ifchange call in my > output) takes ~10sec according to goredo's measurement and ~13sec told > by time-command. So there is ~3sec overhead not measured by goredo. The > second redo-ifchange reports about the same ~10sec of work spent on > all.suffix, ~13sec (the same time told by time-command of first > redo-ifchange) for all.suffix.suffix2 and yet another additional ~3sec > overhead, giving resulting ~16sec of total execution time. > > Those 3sec of "overhead" looks pretty scaring big. However after each > target is finished, goredo deal with temporary files, lockfiles, their > syncing, fully copying of dependency .rec-files by fully parsing them. That makes sense, then I guess this could be further decreased by setting REDO_NO_SYNC=1, although that was just something I noticed and am not bothered by. > Well, currently I do not see any drawbacks of caching that kind of > information. Seems that there should not be any kind of races. OOD > checking is done at the very beginning of goredo invocation and later > only the targets are run. > > I made 1.27.0 release that caches OOD decisions and file existence > checks. Profiling shows that majority of time goredo spent in syscalls. > Actually my time-command's output shows that too > > Next biggest bottleneck was the parser of .rec files. It applied regular > expression for nearly each line of those .rec files, that took nearly 2 > seconds in total. I threw out regexp and know: > > % time redo-ifchange all.suffix.suffix2 > user:1,67s sys:0,17s mem:207956/11 io:0/0 1,198 > > But profiling showed that nearly 800ms is still spent on parsing > dependencies. However your redotest created: > > % cat **.rec G "^Type:" W > 87785 > % cat **.rec W > 292957 > > 87k records with 292k lines in them. Thank you, that resolves my issue and helps by a lot. I have played around with it for a bit and now the program checks the dependencies a lot faster in the example case. On a local disk it went from 10 seconds down to 1.5 and on the NFS it's now from 22 minutes down to about 3, so quite big improvements. >> As another note: the apenwarr-redo implementation takes roughly 0.5 seconds >> to `redo-ifchange` the target on a tmpfs and 1 minute on and NFS, so it is >> quite a lot faster. Is this expected due to this implementation storing the >> ood information in a single sqlite file? > > After seeing 292k of lines goredo parses in redotest, I am confident > that apenwarr/redo is faster exactly because of that: dealing with > single-file efficient database vs parsing of 87k number of records in > 1028 files (however I am doubt that reading of cached 1k files is > non-negligible time). Of course previously goredo also produced huge > amount of syscalls, that was the major obstacle. I agree that this is probably the explanation of the remaining difference, but that is an acceptable trade-off since the difference is not that large anymore. By using separate files goredo plays much nicer with distributed systems; the sqlite database gets corrupted when more than one computer accesses it via a networked FS. Thank you for the fast response and improving the software so quickly. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Slowness (caching issues?) in dependency checking 2022-08-28 18:30 ` Niklas Böhm @ 2022-08-29 18:25 ` Sergey Matveev 0 siblings, 0 replies; 6+ messages in thread From: Sergey Matveev @ 2022-08-29 18:25 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1334 bytes --] *** Niklas Böhm [2022-08-28 20:30]: >That makes sense, then I guess this could be further decreased by setting >REDO_NO_SYNC=1, although that was just something I noticed and am not >bothered by. Nothing is written during OOD-check, so NO_SYNC won't help in that case. But during the build, when .redo/*.rec are created -- it will affect filesystem operations. >Thank you, that resolves my issue and helps by a lot. Great, glad to hear about that hugely reduced time! There were huge amount of syscalls, that are relatively slow especially because of network filesystem delays. >I agree that this is probably the explanation of the remaining difference, >but that is an acceptable trade-off since the difference is not that large >anymore. I think it is acceptable tradeoff now too. Maybe there will be another simpler format, but its worktime will be anyway negligible on NFS. >By using separate files goredo plays much nicer with distributed >systems; the sqlite database gets corrupted when more than one computer >accesses it via a networked FS. Somewhere apenwarr wrote that he also thinks his decision of single database was wrong and database-per-directory should be preferrable. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-29 18:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-25 16:05 Slowness (caching issues?) in dependency checking Jan Niklas Böhm 2022-08-26 16:09 ` Sergey Matveev 2022-08-27 9:11 ` Jan Niklas Böhm 2022-08-28 14:49 ` Sergey Matveev 2022-08-28 18:30 ` Niklas Böhm 2022-08-29 18:25 ` Sergey Matveev