public inbox for goredo-devel@lists.cypherpunks.ru
Atom feed
* Several issues since 1.30
@ 2023-09-21 19:12 spacefrogg
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: spacefrogg @ 2023-09-21 19:12 UTC (permalink / raw)
To: goredo-devel
Hi,
I am experiencing several issues since updating to 1.30 (maybe present
one or two releases earlier, too).
1. I have a C project with a very flat hierarchy. About 90 C files are
to be compiled and linked into 5 static libraries. The compiler command
is pre-computed in a separate script and used (and redo-ifchange'd) in
the default.o target that compiles each C file. On a fresh run which
must establish the dependency the first time, I run into the following
warning issued by redo:
run.go:247: resource deadlock avoided
/home/user/project/out/.redo/cc.precompile.lock
N.B. The particular message shown above refers to the precompile script
that generates precompiled header files. It bears the same dependency
relation as the compile script: Each C file depends on it. Sometimes it
hits at that target, sometimes it hits the compiler script. This warning
is issued about 320 times. So, many times more than files depending on
it.
To be clear, everything works as expected, but it seems that redo has
become greedy of sorts and exhausts some of its resources. My toolchain
does not limit the amount of targets discharged to a `redo-ifchange`
call. So, I suspect the number of open files is an issue.
2. Sometimes, again for projects with many independent targets that can
run in parallel, redo *at least* prints out that it ran one of those
targets twice that all the targets depended upon (e.g. the compile
script). I suspect a serious issue here that might relate to no. 1.
Maybe, sometimes redo looses track of whether a target's dependency is
already handled due to the resource exhaustion. I could not verify
whether redo actually runs the target twice or just prints out the line
a second time. The two status lines are not always adjacent to each
other (some other status line(s) might be in between), which, to me,
hints to that it is actually executed multiple times.
3. The output of redo has become useless for larger amounts of targets
that can run in parallel. Note, that this effect is independent of
whether I run redo with a single job or multiple parallel jobs. What
happens, is, that the output of each target is printed one after another
without properly inserting the separators that conclude the run of each
target (e.g. redo . . yadda/yadda (yadda.do) (10.123s)). All the missing
separators are printed in a batch at some point. So, the general order
of things, i.e., log output first, separator line second, is preserved.
I suspect that some terminal flow control/buffering and stream handling
has changed internally and that this is a race condition between the log
output handler and the status printing (job manager?) handler. I also
noticed that sometimes, redo does not print any separators for quite a
while (while counting up the executed targets) and then printing many at
once at some point. I noticed this for targets that do not produce any
output. I suspect that this effect has the same cause.
4. In light of no. 3, I have a small feature request: The output of
targets can be very confusing, because it is unclear at first, what
target the output relates to. It just starts printing and only tells you
in the very last line, which target actually produced the output.
apenwarr redo does a better job, here, by printing two similar status
lines, once as a header and once as a footer. So the feature request is
as follows: As long as a target produces no output, do not print
anything. If it starts producing output, print a header and then its
output. Always print a footer after the end of a target. So, as long as
targets produce no output, the list of completed targets looks the same
as now, but targets that produce output are properly introduced by a
header line for better context.
Thanks for reading this far!
–Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: resource deadlock avoided messages
2023-09-21 19:12 Several issues since 1.30 spacefrogg
@ 2023-09-23 19:16 ` Sergey Matveev
2023-09-24 11:17 ` goredo
2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:16 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]
Greetings!
*** spacefrogg [2023-09-21 19:12]:
>run.go:247: resource deadlock avoided
>/home/user/project/out/.redo/cc.precompile.lock
I also saw that message for a long time in previous releases. Currently
I can not reproduce it on my system (FreeBSD). I think I still see it on
some GNU/Linuxes, however today I run project that previously tend to
cause those deadlocks and still none of them. Currently I assume that is
some OS-specific behaviour, because I can not reproduce on FreeBSD
anyhow.
Actually that message is an error returned when trying to take a lock.
In some previous releases that lead to exiting the redo with a failure.
I added a loop to try to take the lock again after 10ms. Seems that
worked fine, although produced that noisy error message. In 1.31.0
release I print error message every 10 unsuccessful attempts. In my
practice, taking into account what I saw on GNU/Linuxes, there are few
of those messages per each target as a rule.
>My toolchain does not
>limit the amount of targets discharged to a `redo-ifchange` call. So, I
>suspect the number of open files is an issue.
Nah, I saw that messages in a projects with only a dozens of files.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Target runs multiple times in single session
2023-09-21 19:12 Several issues since 1.30 spacefrogg
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
@ 2023-09-23 19:16 ` Sergey Matveev
2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
3 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:16 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]
Greetings!
*** spacefrogg [2023-09-21 19:12]:
>2. Sometimes, again for projects with many independent targets that can run
>in parallel, redo *at least* prints out that it ran one of those targets
>twice that all the targets depended upon (e.g. the compile script).
I investigated that issues and was able to reproduce it with 1k of
targets depending on a single common one. That was a race in goredo's
behaviour, that I certainly was aware about and desired to take into
account, but for some reason I do not see that in the code, that is
awful and serious issues indeed. It actually could run target multiple
times.
There is possibility of too runScript()s run at the same time, because
the target is out of date. They take the lock of the given target. Only
one succeeds, other fails, knowing that somebody get the lock earlier
and that means that target is being built. In that case, runScript()
awaits till the lock is release and exits, without trying to rerun the
target again. But runScript() can be started at the time, when other
runScript() already release its lock. So it won't know that the target
was just rebuilt. Now I check its dependency and if it contains current
BuildUUID, then it means that it was already built in current "session"
and do not need running.
Fixed in 1.31.0 release. Thanks for raising the issues about it! I
seldom saw "doubled" targets output, but did not think that it was
really run twice (or more), because all builds were fine and successful.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Slow/buffered output of targets finished messages
2023-09-21 19:12 Several issues since 1.30 spacefrogg
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
@ 2023-09-23 19:17 ` Sergey Matveev
2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
3 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:17 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1354 bytes --]
Greetings!
*** spacefrogg [2023-09-21 19:12]:
>3. The output of redo has become useless for larger amounts of targets that
>can run in parallel.
Although I use goredo everyday with projects of at least dozens of
dependencies, they build pretty quickly so I did not notice that
behaviour. I created 1k of targets with sleep 0.1 and indeed I see the
progress bar constantly refreshing, but no "redo" lines appearing until
they all printed at once at the very end.
That was not about buffering, but when goredo make the list of targets
needed to be done, it starts them in a for-loop. runScript() is not a
very quick operation itself. So just running each target takes
relatively considerable amount of time. And only after all of targets
are started, next for-loop starts collecting their results, collecting
either they succeeded or not. When that loop is started, actually most
of the targets can be already done, but the code that prints that
information just was not started at all.
In 1.31.0 release I moved that status-collecting code to a separate
goroutine working in parallel with targets running. Now it will show
success/failure message about the finished target nearly instantly.
Thanks for the report!
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-21 19:12 Several issues since 1.30 spacefrogg
` (2 preceding siblings ...)
2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
@ 2023-09-23 19:17 ` Sergey Matveev
2023-09-24 11:10 ` goredo
3 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-23 19:17 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]
Greetings!
*** spacefrogg [2023-09-21 19:12]:
>3. The output of redo has become useless for larger amounts of targets that
>can run in parallel. [...]
>4. In light of no. 3, I have a small feature request: The output of targets
>can be very confusing, because it is unclear at first, what target the
>output relates to.
Do I correctly understand that by "target's output" you mean stderr's
output of the running target? I understand that target's stdout will be
saved in file, but just asking for correct understanding.
By default stderr is not intercepted at all -- if target write something
to it, then it just passes as is transparently to goredo. If you enable
logs keeping (-k/REDO_LOGS=1), then goredo will intercept stderr, saving
it to the file of the corresponding target (.redo/TGT.log), and each
read line is immediately also printed to goredo's stderr, without any
long-term buffering at all too.
But I do not understand how anything can be human readable if you run
many targets in parallel. You can buffer their stderr output in memory
and print only after they are finished -- but in that case you won't see
their real-time instantly written lines. As far as I remember,
apenwarr/redo can show kept logs grouped together and ordered in
chronological order. I made that in goredo's "redo-log -r" command the
same way. You can watch kept logs after the build in a friendly and sane
manner. I think that you need either convenient, properly ordered
(linearized, https://apenwarr.ca/log/20181106) logs viewing, that
"redo-log -r" tries to give, or you need instant unbuffered output just
to understand how that the progress of your build is going further,
nothing is stuck. Imagine hundreds of targets printing to stderr in
parallel, where each line of the output can be produced by different
target.
Maybe (optional) prepending of target's name to each stderr's line may
be enough convenient to watch the progress? I hope I will try to check
its usefulness tomorrow.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
@ 2023-09-24 11:10 ` goredo
2023-09-25 16:20 ` Sergey Matveev
0 siblings, 1 reply; 12+ messages in thread
From: goredo @ 2023-09-24 11:10 UTC (permalink / raw)
To: goredo-devel
Hi,
first, thank you for your comprehensive responses!
You are correct. I was referring to said log linearization. And I understand now, how stderr output happens in goredo. I was under the impression that goredo was also applying log linearization on stderr and not only in redo-log. I still think it has strong practical implications that goredo does not do log linearization while running. It is just so convenient to use the terminals scrollback buffer than to run redo-log for the same effect for 99% of the cases.
Apart from log linearization, what is important to me, is that goredo's status lines and target's stderr output is coherent. After a target finished, the status lines should follow before any other target stderr output. In 1.30, I did not see that happen, which was confusing. Several targets' output was printed one after another without separation, even when running a single job at a time. I realize that as soon as I run multiple jobs in parallel, stderr output gets mixed up.
The problem is, as it is now, the noise of not being able to instantly identify the target of some stderr output becomes frustrating quickly. And the reason is missing coherence with status lines and missing log linearization on the terminal. I try to work more with redo-log and see if that solves my issues.
–Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: resource deadlock avoided messages
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
@ 2023-09-24 11:17 ` goredo
2023-09-25 16:14 ` Sergey Matveev
0 siblings, 1 reply; 12+ messages in thread
From: goredo @ 2023-09-24 11:17 UTC (permalink / raw)
To: goredo-devel
Is it really an issue when goredo is unable to grab a lock? Maybe print the warning after 10 seconds if the situation persist. So, the user gets a response why redo is stuck on some target. The warning was confusing, because, to me, it hinted at some error in my set of redo targets, as if I was constructing dependency loops or so. It looks strange to me to get informed about some issues, which I have no means to fix. Let's see, how 1.31 performs for that project...
–Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: resource deadlock avoided messages
2023-09-24 11:17 ` goredo
@ 2023-09-25 16:14 ` Sergey Matveev
0 siblings, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-25 16:14 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 584 bytes --]
*** goredo [2023-09-24 13:17]:
>Is it really an issue when goredo is unable to grab a lock?
I read man page about fcntl's locking mechanisms and discovered that
Linux kernel can issue EDEADLK error even if no deadlock actually
exists, that is expected/known behaviour. So in 1.32.0 release I
explicitly check for that error and silently retry taking the lock. At
last that message will disappear. Do not know why I missed that issue
description in the man page.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-24 11:10 ` goredo
@ 2023-09-25 16:20 ` Sergey Matveev
2023-09-28 13:55 ` spacefrogg
0 siblings, 1 reply; 12+ messages in thread
From: Sergey Matveev @ 2023-09-25 16:20 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 961 bytes --]
*** goredo [2023-09-24 13:10]:
> It is just so convenient to use the terminals scrollback buffer than to run redo-log for the same effect for 99% of the cases.
But that will buffer targets output and bring delays for it.
Currently I do not have strong opinion on that subject, except that
current implementation just leaves target's stderr alone (of no log
capture is enabled of course) and it passes as is immediately. Will need
to think about all of that more later.
>goredo's status lines and target's stderr output is coherent.
By default status lines are printed only for *finished* targets. So
target's stderr definitely will be before that finished status. You can
enable -log-wait (REDO_LOG_WAIT=1) to print the fact that some target is
started. Personally I use that sometimes to understand what is actually
running now.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-25 16:20 ` Sergey Matveev
@ 2023-09-28 13:55 ` spacefrogg
2023-09-30 16:57 ` Sergey Matveev
2023-10-16 13:24 ` Sergey Matveev
0 siblings, 2 replies; 12+ messages in thread
From: spacefrogg @ 2023-09-28 13:55 UTC (permalink / raw)
To: goredo-devel
Hi!
I thought about the problem further and I've found the precise output,
that is problematic for me.
Definitions:
target's output - the log messages sent to stderr. Example lines
prefixed with (to)
status line - the (green, red) messages from goredo printed after a
target finishes. Example lines prefixed (sl)
The problematic output appears, when a target A produces output and
after that redo-ifchange's other targets B and C. It looks as follows:
(to) some log message of A
(to) some other message of A
(to) message that belongs to B
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)
Especially, when B or C redo-ifchange further targets, the distance from
(to) lines to the matching (sl) line increases so much, that it is
virtually impossible to reliably determine to which target the (to)
lines belong. Also, output from A and B are concatenated because no
other target has finished in the meantime.
I would prefer the following output:
(sl) A (A.do) ...
(to) some log message of A
(to) some other message of A
(sl) . B (B.do) ...
(to) message that belongs to B
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)
So, while C is not producing output, no header is produced and the
output stays short and nice. The other targets get two (sl) lines, one
header one footer. The output of A and B also cannot interfere because A
is currently waiting on redo-ifchange of B and C and does not produce
output.
I do realize, that without complete log linearization, there still
remains the following case when running parallel jobs:
(sl) A (A.do) ...
(to) some log message of A
(to) some other message of A
(sl) . B (B.do) ...
(to) message that belongs to B
(sl) D (D.do) ...
(to) message of D
(to) late message that belongs to A
(sl) . B (B.do) (1.234s)
(sl) . C (C.do) (2.345s)
(sl) A (A.do) (5.678s)
(sl) D (D.do) (6.789s)
The last two (to) lines could belong to any of A, B or D. Proper
linearization does incur real cost. But even if they do get mixed up, I
maintain that the output as a whole is much more informative now with
very little cost. The redo process knows which target it operates on and
when it first starts to produce output. So, it could inject the (sl)
header just before that. Especially when running with a single job,
there should be no race conditions because targets A and D are executed
one after another.
–Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-28 13:55 ` spacefrogg
@ 2023-09-30 16:57 ` Sergey Matveev
2023-10-16 13:24 ` Sergey Matveev
1 sibling, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-09-30 16:57 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]
Greetings!
*** spacefrogg [2023-09-28 13:55]:
>I do realize, that without complete log linearization, there still remains
>the following case when running parallel jobs: [...]
Yeah, I thought about that exact situation. But I assume that in
majority of cases that "(sl) TGT ..." output will be convenient and
enough. I will catch every stderr and prepend status line telling what
process output will follow now. And it will be prepended only if
different process started to output something. That will buffer only a
single line, that does not cost/delays anything. When A and B produces
output simultaneously, that will print something like:
redo A (A.do) ...
[some message from A]
[some message from A]
redo B (B.do) ...
[some message from B]
[some message from B]
[some message from B]
redo A (B.do) ...
[some message from A]
redo B (B.do) ...
[some message from B]
...
I think that is good enough. In general you always understand who
produced the message. And there must not be any collision between
target's stderror and simultaneous printing of the status/progress
message from goredo. Will implement all of that in the next release.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unclear what target the output relates to
2023-09-28 13:55 ` spacefrogg
2023-09-30 16:57 ` Sergey Matveev
@ 2023-10-16 13:24 ` Sergey Matveev
1 sibling, 0 replies; 12+ messages in thread
From: Sergey Matveev @ 2023-10-16 13:24 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 343 bytes --]
Greetings!
*** spacefrogg [2023-09-28 13:55]:
>The redo process knows which target it operates on and when it
>first starts to produce output. So, it could inject the (sl) header just
>before that.
Done in 2.2.0 release.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-10-16 13:24 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 19:12 Several issues since 1.30 spacefrogg
2023-09-23 19:16 ` resource deadlock avoided messages Sergey Matveev
2023-09-24 11:17 ` goredo
2023-09-25 16:14 ` Sergey Matveev
2023-09-23 19:16 ` Target runs multiple times in single session Sergey Matveev
2023-09-23 19:17 ` Slow/buffered output of targets finished messages Sergey Matveev
2023-09-23 19:17 ` Unclear what target the output relates to Sergey Matveev
2023-09-24 11:10 ` goredo
2023-09-25 16:20 ` Sergey Matveev
2023-09-28 13:55 ` spacefrogg
2023-09-30 16:57 ` Sergey Matveev
2023-10-16 13:24 ` Sergey Matveev