public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* Dependency collection takes very long
@ 2023-09-28 12:05 spacefrogg
  2023-09-30 16:47 ` Sergey Matveev
  2023-10-07 15:29 ` Sergey Matveev
  0 siblings, 2 replies; 6+ messages in thread
From: spacefrogg @ 2023-09-28 12:05 UTC (permalink / raw)
  To: Goredo Devel

Hi!

I have a not-so-small project. The final target has about 5700 
dependencies boiling down to 1300 unique dependencies, both numbers 
include transient dependencies. redo-targets reports about 270 targets, 
so the rest are source files (≈1050).

The first build runs very smooth (2min single job, 50secs with 11 jobs). 
Starting with the second build, though, goredo takes tens of seconds of 
busy CPU work after logging "dbg collecting deps" in the debug output. 
`iotop` shows that it is busy doing I/O.

N.B. `time` records about 2m50s real time while redo only counts 1m23s. 
So, some significant code executes before redo starts counting the time.

As a comparison, apenwarr/redo needs a fraction of a second to check all 
dependencies. (It does no source file hashing, though.)

redo-targets returns instantly, but redo-sources gets stuck as well, 
thrashing the disk at a rate of 12MB/s taking almost 11m to finish. This 
means it touched almost 8GB!  So, I assume that something is off when 
redo encounters a source file.

I did not change the default REDO_INODE_TRUST. So, I assume redo is 
using ctime checking. My ctimes are reliable. I tested with goredo 1.30 
and 1.32.

By the looks of it, redo always hashes source files. But even if, this 
takes way too much time for the actual amount of data (≈120MB).

–Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dependency collection takes very long
  2023-09-28 12:05 Dependency collection takes very long spacefrogg
@ 2023-09-30 16:47 ` Sergey Matveev
  2023-09-30 21:02   ` Sergey Matveev
  2023-10-07 15:29 ` Sergey Matveev
  1 sibling, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2023-09-30 16:47 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

Greetings!

I investigated why it is so slow and found various issues: mainly
.redo/*.rec files were read too often, when completely unnecessary, that
gave huge I/O load. That happened indeed during dependencies collection.
There were often reads of the whole .rec, but only its first line
(Build: ...) was actually used.

>By the looks of it, redo always hashes source files.

No, I did not notice any unnecessary hashing.

>redo-targets returns instantly, but redo-sources gets stuck as well,

redo-sources was written incredibly algorithmically badly.

I created several default.do targets producing ten thousand of targets
with thousands of partly shared dependencies. Only .rec directory
weights more than gigabyte. Just its parsing (located on tmpfs, so no
using only memory) takes nearly 20 seconds on my machine. Current
optimised version of redo-sources works 22 seconds now, comparing to...
I do not know, because I never waited till it finishes. redo-ifchange
works for 60 seconds, where 20 seconds is parsing of all .redo/*.rec
during "dependencies collection", another 20 seconds is again for
parsing of all .redo/*.rec files during OOD decision process.

I can cache (just two lines of code) already loaded/parsed *.rec files
during "dependencies collection", so 1/3 of time will be reduced. But
that took much RAM (several gigabytes, because of gigabyte .redo
directory). It stored all dependencies information as an ordinary map
with all those huge field names. I also optimised the code to reduce
that memory usage.

I am thinking also about using binary files for storing dependency
information. Currently working with that, heavily reducing their size
and that should also speed their parsing up.

There are still left several *.rec-reading places, that check if the
target was already built by parallel process. That can be mitigated in
practice, but currently no plans for that.

With so many targets and processes I also often catch:
    read /tmp/foo/31.2nd: bad file descriptor
error and *completely* do not understand why that is happening.

I already created several commits that heavily optimises many places.
But work is still in progress. I thought that will finish today, but no.
I can return to work on goredo only after several days unfortunately.

>As a comparison, apenwarr/redo needs a fraction of a second to check all
>dependencies. (It does no source file hashing, though.)

In general, apenwarr/redo should be faster anyway, because of
centralised SQlite3 database with indexing capabilities and pretty
compact binary storage. For my synthetic workload with thousands of
targets and dependencies, that means more than dozen of thousands of
files reading and parsing of pretty verbose recfile-format. My current
testing binary format reduces .redo directory size for three times, but
that is still hundreds of megabytes of data, that is parsed.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dependency collection takes very long
  2023-09-30 16:47 ` Sergey Matveev
@ 2023-09-30 21:02   ` Sergey Matveev
  2023-10-01 11:53     ` goredo
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2023-09-30 21:02 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

*** Sergey Matveev [2023-09-30 19:47]:
>With so many targets and processes I also often catch:
>   read /tmp/foo/31.2nd: bad file descriptor
>error and *completely* do not understand why that is happening.

Found the problem. Also will be fixed in new release.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dependency collection takes very long
  2023-09-30 21:02   ` Sergey Matveev
@ 2023-10-01 11:53     ` goredo
  2023-10-02 11:01       ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: goredo @ 2023-10-01 11:53 UTC (permalink / raw)
  To: goredo-devel

Hello!

No need to rush! goredo is excellent software and you have no obligation to react anyhow else than what suits your life schedule. It is ready when it's ready.

What was the error?

–Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dependency collection takes very long
  2023-10-01 11:53     ` goredo
@ 2023-10-02 11:01       ` Sergey Matveev
  0 siblings, 0 replies; 6+ messages in thread
From: Sergey Matveev @ 2023-10-02 11:01 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

*** goredo [2023-10-01 13:53]:
>What was the error?

One of the variables with *os.File was out of scope and garbage
collector could collect it and variable's finalizer will close
the file descriptor, that invalidly could be reused in a wrong way.
That problem arose only when I dealt with thousands of targets and
dependencies, which presumably pressures garbage collector more,
leading to unexpected behaviour with files. I did not see target's
output corruption, but building may fail.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dependency collection takes very long
  2023-09-28 12:05 Dependency collection takes very long spacefrogg
  2023-09-30 16:47 ` Sergey Matveev
@ 2023-10-07 15:29 ` Sergey Matveev
  1 sibling, 0 replies; 6+ messages in thread
From: Sergey Matveev @ 2023-10-07 15:29 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

Greetings!

Next release will contain HUGE number of various optimisations.
redo-sources will be magnitudes faster with many targets.
redo(-ifchange) will be many times faster.

I tested all of that with synthetic 10k targets depending on another 1k
common targets. .redo directory contained more than 2.1GB of dependency
files. Big part of the whole time was spent on just parsing so much
data. So I decided to use binary .dep files instead of recfile .rec
ones. They are just trivial concatenation of 64-bit integers, raw
hashes, single-byte types and so on. They are nearly 3 times smaller
than .rec ones.

My test case was (just whatever came into my head):

    all.do:
    ------------------------ >8 ------------------------
    for i in `seq 10000` ; do deps="$deps $i" ; done
    redo-ifchange $deps
    echo ok
    ------------------------ >8 ------------------------

    default.do:
    ------------------------ >8 ------------------------
    for i in `seq 1000` ; do deps="$deps $i.2nd" ; done
    redo-ifchange $deps
    dd if=/dev/random of=$3 bs=1K count=1 2>/dev/null
    ------------------------ >8 ------------------------

    default.2nd.do:
    ------------------------ >8 ------------------------
    echo 'print $RANDOM' | zsh
    ------------------------ >8 ------------------------

With all of that, with 8 parallel jobs (-j 8), REDO_NO_SYNC=1 (apenwarr
just always has fsync disabled for his sqlite3 database),
goredo's initial run took 79 seconds, .redo holds 865MB.
apenwarr/redo's took 870 seconds, .redo holds 784MB.
goredo's "redo-ifchange all" took 18 seconds.
apenwarr/redo's took 68 seconds.

However goredo consumes much more memory, loading all the dependency
information in memory. It heavily pressures Go's garbage collector, so
memory usage can be decreased for example by 1/3, but taking 2x/3x
longer runtime to run (anyway faster than apenwarr).

Because of dependency format change, major version number will be
raised and you have to run redo-depfix to convert existing .redo/*.rec
to .redo/*.dep.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-07 15:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-28 12:05 Dependency collection takes very long spacefrogg
2023-09-30 16:47 ` Sergey Matveev
2023-09-30 21:02   ` Sergey Matveev
2023-10-01 11:53     ` goredo
2023-10-02 11:01       ` Sergey Matveev
2023-10-07 15:29 ` Sergey Matveev