Several issues since 1.30

public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed

* Several issues since 1.30
@ 2023-09-21 19:12 spacefrogg
  2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: spacefrogg @ 2023-09-21 19:12 UTC (permalink / raw)
  To: goredo-devel

Hi,

I am experiencing several issues since updating to 1.30 (maybe present 
one or two releases earlier, too).

1. I have a C project with a very flat hierarchy. About 90 C files are 
to be compiled and linked into 5 static libraries. The compiler command 
is pre-computed in a separate script and used (and redo-ifchange'd) in 
the default.o target that compiles each C file. On a fresh run which 
must establish the dependency the first time, I run into the following 
warning issued by redo:

     run.go:247: resource deadlock avoided 
/home/user/project/out/.redo/cc.precompile.lock

N.B. The particular message shown above refers to the precompile script 
that generates precompiled header files. It bears the same dependency 
relation as the compile script: Each C file depends on it. Sometimes it 
hits at that target, sometimes it hits the compiler script. This warning 
is issued about 320 times. So, many times more than files depending on 
it.

To be clear, everything works as expected, but it seems that redo has 
become greedy of sorts and exhausts some of its resources. My toolchain 
does not limit the amount of targets discharged to a `redo-ifchange` 
call. So, I suspect the number of open files is an issue.

2. Sometimes, again for projects with many independent targets that can 
run in parallel, redo *at least* prints out that it ran one of those 
targets twice that all the targets depended upon (e.g. the compile 
script). I suspect a serious issue here that might relate to no. 1. 
Maybe, sometimes redo looses track of whether a target's dependency is 
already handled due to the resource exhaustion. I could not verify 
whether redo actually runs the target twice or just prints out the line 
a second time. The two status lines are not always adjacent to each 
other (some other status line(s) might be in between), which, to me, 
hints to that it is actually executed multiple times.

3. The output of redo has become useless for larger amounts of targets 
that can run in parallel. Note, that this effect is independent of 
whether I run redo with a single job or multiple parallel jobs. What 
happens, is, that the output of each target is printed one after another 
without properly inserting the separators that conclude the run of each 
target (e.g. redo . . yadda/yadda (yadda.do) (10.123s)). All the missing 
separators are printed in a batch at some point. So, the general order 
of things, i.e., log output first, separator line second, is preserved. 
I suspect that some terminal flow control/buffering and stream handling 
has changed internally and that this is a race condition between the log 
output handler and the status printing (job manager?) handler. I also 
noticed that sometimes, redo does not print any separators for quite a 
while (while counting up the executed targets) and then printing many at 
once at some point. I noticed this for targets that do not produce any 
output. I suspect that this effect has the same cause.

4. In light of no. 3, I have a small feature request: The output of 
targets can be very confusing, because it is unclear at first, what 
target the output relates to. It just starts printing and only tells you 
in the very last line, which target actually produced the output. 
apenwarr redo does a better job, here, by printing two similar status 
lines, once as a header and once as a footer. So the feature request is 
as follows: As long as a target produces no output, do not print 
anything. If it starts producing output, print a header and then its 
output. Always print a footer after the end of a target. So, as long as 
targets produce no output, the list of completed targets looks the same 
as now, but targets that produce output are properly introduced by a 
header line for better context.

Thanks for reading this far!
–Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: resource deadlock avoided messages
  2023-09-21 19:12 Several issues since 1.30 spacefrogg
@ 2023-09-23 19:16 ` Sergey Matveev
  2023-09-24 11:17   ` goredo
  2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:16 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

Greetings!

*** spacefrogg [2023-09-21 19:12]:
>run.go:247: resource deadlock avoided
>/home/user/project/out/.redo/cc.precompile.lock

I also saw that message for a long time in previous releases. Currently
I can not reproduce it on my system (FreeBSD). I think I still see it on
some GNU/Linuxes, however today I run project that previously tend to
cause those deadlocks and still none of them. Currently I assume that is
some OS-specific behaviour, because I can not reproduce on FreeBSD
anyhow.

Actually that message is an error returned when trying to take a lock.
In some previous releases that lead to exiting the redo with a failure.
I added a loop to try to take the lock again after 10ms. Seems that
worked fine, although produced that noisy error message. In 1.31.0
release I print error message every 10 unsuccessful attempts. In my
practice, taking into account what I saw on GNU/Linuxes, there are few
of those messages per each target as a rule.

>My toolchain does not
>limit the amount of targets discharged to a `redo-ifchange` call. So, I
>suspect the number of open files is an issue.

Nah, I saw that messages in a projects with only a dozens of files.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Target runs multiple times in single session
  2023-09-21 19:12 Several issues since 1.30 spacefrogg
  2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
@ 2023-09-23 19:16 ` Sergey Matveev
  2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
  2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
  3 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:16 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

Greetings!

*** spacefrogg [2023-09-21 19:12]:
>2. Sometimes, again for projects with many independent targets that can run
>in parallel, redo *at least* prints out that it ran one of those targets
>twice that all the targets depended upon (e.g. the compile script).

I investigated that issues and was able to reproduce it with 1k of
targets depending on a single common one. That was a race in goredo's
behaviour, that I certainly was aware about and desired to take into
account, but for some reason I do not see that in the code, that is
awful and serious issues indeed. It actually could run target multiple
times.

There is possibility of too runScript()s run at the same time, because
the target is out of date. They take the lock of the given target. Only
one succeeds, other fails, knowing that somebody get the lock earlier
and that means that target is being built. In that case, runScript()
awaits till the lock is release and exits, without trying to rerun the
target again. But runScript() can be started at the time, when other
runScript() already release its lock. So it won't know that the target
was just rebuilt. Now I check its dependency and if it contains current
BuildUUID, then it means that it was already built in current "session"
and do not need running.

Fixed in 1.31.0 release. Thanks for raising the issues about it! I
seldom saw "doubled" targets output, but did not think that it was
really run twice (or more), because all builds were fine and successful.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow/buffered output of targets finished messages
  2023-09-21 19:12 Several issues since 1.30 spacefrogg
  2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
  2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
@ 2023-09-23 19:17 ` Sergey Matveev
  2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
  3 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:17 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1354 bytes --]

Greetings!

*** spacefrogg [2023-09-21 19:12]:
>3. The output of redo has become useless for larger amounts of targets that
>can run in parallel.

Although I use goredo everyday with projects of at least dozens of
dependencies, they build pretty quickly so I did not notice that
behaviour. I created 1k of targets with sleep 0.1 and indeed I see the
progress bar constantly refreshing, but no "redo" lines appearing until
they all printed at once at the very end.

That was not about buffering, but when goredo make the list of targets
needed to be done, it starts them in a for-loop. runScript() is not a
very quick operation itself. So just running each target takes
relatively considerable amount of time. And only after all of targets
are started, next for-loop starts collecting their results, collecting
either they succeeded or not. When that loop is started, actually most
of the targets can be already done, but the code that prints that
information just was not started at all.

In 1.31.0 release I moved that status-collecting code to a separate
goroutine working in parallel with targets running. Now it will show
success/failure message about the finished target nearly instantly.
Thanks for the report!

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-21 19:12 Several issues since 1.30 spacefrogg
                   ` (2 preceding siblings ...)
  2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
@ 2023-09-23 19:17 ` Sergey Matveev
  2023-09-24 11:10   ` goredo
  3 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:17 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]

Greetings!

*** spacefrogg [2023-09-21 19:12]:
>3. The output of redo has become useless for larger amounts of targets that
>can run in parallel. [...]
>4. In light of no. 3, I have a small feature request: The output of targets
>can be very confusing, because it is unclear at first, what target the
>output relates to.

Do I correctly understand that by "target's output" you mean stderr's
output of the running target? I understand that target's stdout will be
saved in file, but just asking for correct understanding.

By default stderr is not intercepted at all -- if target write something
to it, then it just passes as is transparently to goredo. If you enable
logs keeping (-k/REDO_LOGS=1), then goredo will intercept stderr, saving
it to the file of the corresponding target (.redo/TGT.log), and each
read line is immediately also printed to goredo's stderr, without any
long-term buffering at all too.

But I do not understand how anything can be human readable if you run
many targets in parallel. You can buffer their stderr output in memory
and print only after they are finished -- but in that case you won't see
their real-time instantly written lines. As far as I remember,
apenwarr/redo can show kept logs grouped together and ordered in
chronological order. I made that in goredo's "redo-log -r" command the
same way. You can watch kept logs after the build in a friendly and sane
manner. I think that you need either convenient, properly ordered
(linearized, https://apenwarr.ca/log/20181106) logs viewing, that
"redo-log -r" tries to give, or you need instant unbuffered output just
to understand how that the progress of your build is going further,
nothing is stuck. Imagine hundreds of targets printing to stderr in
parallel, where each line of the output can be produced by different
target.

Maybe (optional) prepending of target's name to each stderr's line may
be enough convenient to watch the progress? I hope I will try to check
its usefulness tomorrow.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
@ 2023-09-24 11:10   ` goredo
  2023-09-25 16:20     ` Sergey Matveev
  0 siblings, 1 reply; 12+ messages in thread
From: goredo @ 2023-09-24 11:10 UTC (permalink / raw)
  To: goredo-devel

Hi,

first, thank you for your comprehensive responses!

You are correct. I was referring to said log linearization. And I understand now, how stderr output happens in goredo. I was under the impression that goredo was also applying log linearization on stderr and not only in redo-log. I still think it has strong practical implications that goredo does not do log linearization while running. It is just so convenient to use the terminals scrollback buffer than to run redo-log for the same effect for 99% of the cases.

Apart from log linearization, what is important to me, is that goredo's status lines and target's stderr output is coherent. After a target finished, the status lines should follow before any other target stderr output. In 1.30, I did not see that happen, which was confusing. Several targets' output was printed one after another without separation, even when running a single job at a time. I realize that as soon as I run multiple jobs in parallel, stderr output gets mixed up.

The problem is, as it is now, the noise of not being able to instantly identify the target of some stderr output becomes frustrating quickly. And the reason is missing coherence with status lines and missing log linearization on the terminal. I try to work more with redo-log and see if that solves my issues.

–Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: resource deadlock avoided messages
  2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
@ 2023-09-24 11:17   ` goredo
  2023-09-25 16:14     ` Sergey Matveev
  0 siblings, 1 reply; 12+ messages in thread
From: goredo @ 2023-09-24 11:17 UTC (permalink / raw)
  To: goredo-devel

Is it really an issue when goredo is unable to grab a lock? Maybe print the warning after 10 seconds if the situation persist. So, the user gets a response why redo is stuck on some target. The warning was confusing, because, to me, it hinted at some error in my set of redo targets, as if I was constructing dependency loops or so. It looks strange to me to get informed about some issues, which I have no means to fix. Let's see, how 1.31 performs for that project...

–Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: resource deadlock avoided messages
  2023-09-24 11:17   ` goredo
@ 2023-09-25 16:14     ` Sergey Matveev
  0 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-25 16:14 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

*** goredo [2023-09-24 13:17]:
>Is it really an issue when goredo is unable to grab a lock?

I read man page about fcntl's locking mechanisms and discovered that
Linux kernel can issue EDEADLK error even if no deadlock actually
exists, that is expected/known behaviour. So in 1.32.0 release I
explicitly check for that error and silently retry taking the lock. At
last that message will disappear. Do not know why I missed that issue
description in the man page.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-24 11:10   ` goredo
@ 2023-09-25 16:20     ` Sergey Matveev
  2023-09-28 13:55       ` spacefrogg
  0 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-25 16:20 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 961 bytes --]

*** goredo [2023-09-24 13:10]:
> It is just so convenient to use the terminals scrollback buffer than to run redo-log for the same effect for 99% of the cases.

But that will buffer targets output and bring delays for it.
Currently I do not have strong opinion on that subject, except that
current implementation just leaves target's stderr alone (of no log
capture is enabled of course) and it passes as is immediately. Will need
to think about all of that more later.

>goredo's status lines and target's stderr output is coherent.

By default status lines are printed only for *finished* targets. So
target's stderr definitely will be before that finished status. You can
enable -log-wait (REDO_LOG_WAIT=1) to print the fact that some target is
started. Personally I use that sometimes to understand what is actually
running now.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-25 16:20     ` Sergey Matveev
@ 2023-09-28 13:55       ` spacefrogg
  2023-09-30 16:57         ` Sergey Matveev
  2023-10-16 13:24         ` Sergey Matveev
  0 siblings, 2 replies; 12+ messages in thread
From: spacefrogg @ 2023-09-28 13:55 UTC (permalink / raw)
  To: goredo-devel

Hi!

I thought about the problem further and I've found the precise output, 
that is problematic for me.
Definitions:
target's output - the log messages sent to stderr. Example lines 
prefixed with (to)
status line - the (green, red) messages from goredo printed after a 
target finishes. Example lines prefixed (sl)

The problematic output appears, when a target A produces output and 
after that redo-ifchange's other targets B and C. It looks as follows:

(to) some log message of A
(to) some other message of A
(to) message that belongs to B
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)

Especially, when B or C redo-ifchange further targets, the distance from 
(to) lines to the matching (sl) line increases so much, that it is 
virtually impossible to reliably determine to which target the (to) 
lines belong. Also, output from A and B are concatenated because no 
other target has finished in the meantime.

I would prefer the following output:

(sl) A (A.do) ...
(to) some log message of A
(to) some other message of A
(sl) . B (B.do) ...
(to) message that belongs to B
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)

So, while C is not producing output, no header is produced and the 
output stays short and nice. The other targets get two (sl) lines, one 
header one footer. The output of A and B also cannot interfere because A 
is currently waiting on redo-ifchange of B and C and does not produce 
output.

I do realize, that without complete log linearization, there still 
remains the following case when running parallel jobs:

(sl) A (A.do) ...
(to) some log message of A
(to) some other message of A
(sl) . B (B.do) ...
(to) message that belongs to B
(sl) D (D.do) ...
(to) message of D
(to) late message that belongs to A
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)
(sl) D (D.do) (6.789s)

The last two (to) lines could belong to any of A, B or D. Proper 
linearization does incur real cost. But even if they do get mixed up, I 
maintain that the output as a whole is much more informative now with 
very little cost. The redo process knows which target it operates on and 
when it first starts to produce output. So, it could inject the (sl) 
header just before that. Especially when running with a single job, 
there should be no race conditions because targets A and D are executed 
one after another.

–Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-28 13:55       ` spacefrogg
@ 2023-09-30 16:57         ` Sergey Matveev
  2023-10-16 13:24         ` Sergey Matveev
  1 sibling, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-30 16:57 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

Greetings!

*** spacefrogg [2023-09-28 13:55]:
>I do realize, that without complete log linearization, there still remains
>the following case when running parallel jobs: [...]

Yeah, I thought about that exact situation. But I assume that in
majority of cases that "(sl) TGT ..." output will be convenient and
enough. I will catch every stderr and prepend status line telling what
process output will follow now. And it will be prepended only if
different process started to output something. That will buffer only a
single line, that does not cost/delays anything. When A and B produces
output simultaneously, that will print something like:

    redo A (A.do) ...
    [some message from A]
    [some message from A]
    redo B (B.do) ...
    [some message from B]
    [some message from B]
    [some message from B]
    redo A (B.do) ...
    [some message from A]
    redo B (B.do) ...
    [some message from B]
    ...

I think that is good enough. In general you always understand who
produced the message. And there must not be any collision between
target's stderror and simultaneous printing of the status/progress
message from goredo. Will implement all of that in the next release.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Unclear what target the output relates to
  2023-09-28 13:55       ` spacefrogg
  2023-09-30 16:57         ` Sergey Matveev
@ 2023-10-16 13:24         ` Sergey Matveev
  1 sibling, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-10-16 13:24 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

Greetings!

*** spacefrogg [2023-09-28 13:55]:
>The redo process knows which target it operates on and when it
>first starts to produce output. So, it could inject the (sl) header just
>before that.

Done in 2.2.0 release.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-10-16 13:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 19:12 Several issues since 1.30 spacefrogg
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
2023-09-24 11:17   ` goredo
2023-09-25 16:14     ` Sergey Matveev
2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
2023-09-24 11:10   ` goredo
2023-09-25 16:20     ` Sergey Matveev
2023-09-28 13:55       ` spacefrogg
2023-09-30 16:57         ` Sergey Matveev
2023-10-16 13:24         ` Sergey Matveev