Greetings!

I investigated why it is so slow and found various issues: mainly
.redo/*.rec files were read too often, when completely unnecessary, that
gave huge I/O load. That happened indeed during dependencies collection.
There were often reads of the whole .rec, but only its first line
(Build: ...) was actually used.

>By the looks of it, redo always hashes source files.

No, I did not notice any unnecessary hashing.

>redo-targets returns instantly, but redo-sources gets stuck as well,

redo-sources was written incredibly algorithmically badly.

I created several default.do targets producing ten thousand of targets
with thousands of partly shared dependencies. Only .rec directory
weights more than gigabyte. Just its parsing (located on tmpfs, so no
using only memory) takes nearly 20 seconds on my machine. Current
optimised version of redo-sources works 22 seconds now, comparing to...
I do not know, because I never waited till it finishes. redo-ifchange
works for 60 seconds, where 20 seconds is parsing of all .redo/*.rec
during "dependencies collection", another 20 seconds is again for
parsing of all .redo/*.rec files during OOD decision process.

I can cache (just two lines of code) already loaded/parsed *.rec files
during "dependencies collection", so 1/3 of time will be reduced. But
that took much RAM (several gigabytes, because of gigabyte .redo
directory). It stored all dependencies information as an ordinary map
with all those huge field names. I also optimised the code to reduce
that memory usage.

I am thinking also about using binary files for storing dependency
information. Currently working with that, heavily reducing their size
and that should also speed their parsing up.

There are still left several *.rec-reading places, that check if the
target was already built by parallel process. That can be mitigated in
practice, but currently no plans for that.

With so many targets and processes I also often catch:
    read /tmp/foo/31.2nd: bad file descriptor
error and *completely* do not understand why that is happening.

I already created several commits that heavily optimises many places.
But work is still in progress. I thought that will finish today, but no.
I can return to work on goredo only after several days unfortunately.

>As a comparison, apenwarr/redo needs a fraction of a second to check all
>dependencies. (It does no source file hashing, though.)

In general, apenwarr/redo should be faster anyway, because of
centralised SQlite3 database with indexing capabilities and pretty
compact binary storage. For my synthetic workload with thousands of
targets and dependencies, that means more than dozen of thousands of
files reading and parsing of pretty verbose recfile-format. My current
testing binary format reduces .redo directory size for three times, but
that is still hundreds of megabytes of data, that is parsed.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A