public inbox for nncp-devel@lists.cypherpunks.ru Atom feed
* I/O timeout in nncp-daemon @ 2021-01-21 23:55 John Goerzen 2021-01-23 9:28 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: John Goerzen @ 2021-01-21 23:55 UTC (permalink / raw) To: nncp-devel Hi, I've had a bit of a recurring problem. Lines like this from nncp-daemon: Jan 21 17:44:31 nncp nncp-daemon[867]: E 2021-01-21T23:44:31.001040425Z [sp-xmit err="write tcp [elided]: i/o timeout" nice="255"... The conditions that cause this tend to be: 1) The remote nncp-caller just received a 4GB packet and is now checking it, and nncp-daemon still has a bunch more packets for it 2) The remote nncp-call just started up and it waiting for a drive to spin up (apparently it makes the TCP connection before it does whatever it is that makes the drive spin up) 3) Basically situations where nncp-call(er) are a bit slow for some reason This timeout is much lower than any of the deadlines set in the configuration file, and appears to be a few seconds. What I don't know is if this could happen in nncp-call(er) as well; in my case, the nncp-call(er) machine has its spool on a USB HDD and the nncp-daemon is on a raidz2 array with fast drives. I looked, unsuccessfully, for this in the source code. Would it be possible to make this configurable, or larger? Thanks! - John ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: I/O timeout in nncp-daemon 2021-01-21 23:55 I/O timeout in nncp-daemon John Goerzen @ 2021-01-23 9:28 ` Sergey Matveev 2021-01-27 22:48 ` John Goerzen 0 siblings, 1 reply; 6+ messages in thread From: Sergey Matveev @ 2021-01-23 9:28 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 4223 bytes --] Greetings! *** John Goerzen [2021-01-21 17:55]: >This timeout is much lower than any of the deadlines set in the configuration >file, and appears to be a few seconds. 10 seconds. These are various: conn.Set*Deadline(time.Now().Add(DefaultDeadline)) lines. >Would it be possible to make this configurable, or larger? Well... when I wrote that, I assumed that remote system anyway has to give an answer (from TCP point of view) during 10 seconds. Overall file's retrieving algorithm is simple: * after handshake is made, each side sends list of packets (with their nice value) available for remote side (INFO) * then each side, if it wants to (due to configuration), sends request (FREQ) to remote side to start sending specified packet's contents from specified offset * after receiving of that FREQ, sending of many FILE packets begins. Each of FILE contains a chunk of encrypted packet. Packet's contents are saved to "PKT-HASH.part" * when encrypted packet is fully retrieved (its length equals to INFO's length known in advance), then background checksum checker is started. If checksum is good, then it renames "PKT-HASH.part" to "PKT-HASH" * but it is running in background, while receiving/sending of another packets continues * when another packet is fully retrieved, but background checker is still busy -- NNCP waits checker to complete And seems (I am sure) that exactly because of the last step, the program "hangs" and does not read anything more from the socket. Of course some buffers contains possibly another FILE chunk, but they are quickly filled and still waits for program to issue Read() from it. Checksumming is required to be completely sure that received file is good and we can send "DONE" message to remote side, that will delete the packet from its spool. You know, then I was writing that online part of NNCP, I was not thinking about huge files at all. Initially NNCP lacked any online communication at all. What can be done? * receiver can send HALT packet to stop remote's side from sending any data. I do not like that case, because it can easily lead to constant HALT+FREQ exchanging. It is hard to determine if we really needs to stop reception, because we have got many gigabytes of data on USB2 HDD * receiver can send FREQs not for the bunch of files, but only just for single one, waiting for its reception, checksumming and only asking for another FREQ after that. I do not like that, because it leads to many round-trips. Currently it can send many KiBs of FREQs in just a single TCP segment, leading to a non-stopping stream of sent packets * receiver can make a queue with packets needed to be checked. Instead of waiting for checksumer to end, it just fills up his queue But! Anyway I do not like the fact that checker and receiver work simultaneously, leading to constant read/write operations, killing HDDs performance. It highly decreases the overall receiving speed. Ideally we should either only send the data, or only received the data, to be able to *sequentially* write it on the disk (it can be done now by specifying rx/tx modes). Then, we should sequentially read it, for checksum verification, without any network transmission at all. Moreover, what if we deal with 1TiB file? There are high probability that daemon/caller will be restarted and noone, until the next online session will start that .part checking. I think that there should be another intermediate step made of packets processing. "PKT-HASH.part" is partly received file. No it is time to create some kind of "PKT-HASH.done-but-unchecked" one. And another nncp-checker daemon, that just checks the checksum and renames ".done-but-unchecked" to "PKT-HASH". And need to add possibility for nncp-daemon *not* to do checksumming immediately. So there be possibility to do checksum check completely asynchronously from the transmission. Of course tossing must be made with -seen option, to save the fact that some file was seen and (possibly) already processed. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: I/O timeout in nncp-daemon 2021-01-23 9:28 ` Sergey Matveev @ 2021-01-27 22:48 ` John Goerzen 2021-01-28 7:40 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: John Goerzen @ 2021-01-27 22:48 UTC (permalink / raw) To: Sergey Matveev; +Cc: nncp-devel On Sat, Jan 23 2021, Sergey Matveev wrote: > Greetings! > > *** John Goerzen [2021-01-21 17:55]: >>This timeout is much lower than any of the deadlines set in the >>configuration >>file, and appears to be a few seconds. > > 10 seconds. These are various: > conn.Set*Deadline(time.Now().Add(DefaultDeadline)) > lines. > >>Would it be possible to make this configurable, or larger? > > Well... when I wrote that, I assumed that remote system anyway > has to > give an answer (from TCP point of view) during 10 seconds. I have a number of situations where that assumption wouldn't be true -- high-latency low-bandwidth links would be one, especially if bufferbloat is involved. > * when encrypted packet is fully retrieved (its length equals to > INFO's > length known in advance), then background checksum checker is > started. > If checksum is good, then it renames "PKT-HASH.part" to > "PKT-HASH" Could it be doing the checksum inline while it's writing it to disk in the first place? - John ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: I/O timeout in nncp-daemon 2021-01-27 22:48 ` John Goerzen @ 2021-01-28 7:40 ` Sergey Matveev 2021-01-28 15:03 ` John Goerzen 0 siblings, 1 reply; 6+ messages in thread From: Sergey Matveev @ 2021-01-28 7:40 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 882 bytes --] *** John Goerzen [2021-01-27 16:48]: >> give an answer (from TCP point of view) during 10 seconds. >I have a number of situations where that assumption wouldn't be true I understand that now. My bad assumptions. >Could it be doing the checksum inline while it's writing it to disk in the >first place? Of course! But only if it gets the file from the very beginning. Or if Merkle trees are used, to be able to do part of their computations in place. I will look at all of that subject (with checking files integrity asynchronously) when will find time. PS: please, do not add me as one of recipient of the email message -- I am already subscribed to the maillist, and my Mail-Followup-To header suggests that: https://cr.yp.to/proto/replyto.html -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: I/O timeout in nncp-daemon 2021-01-28 7:40 ` Sergey Matveev @ 2021-01-28 15:03 ` John Goerzen 2021-01-28 15:25 ` Sergey Matveev 0 siblings, 1 reply; 6+ messages in thread From: John Goerzen @ 2021-01-28 15:03 UTC (permalink / raw) To: nncp-devel On Thu, Jan 28 2021, Sergey Matveev wrote: >>Could it be doing the checksum inline while it's writing it to >>disk in the >>first place? > > Of course! But only if it gets the file from the very beginning. > Or if Ah, true. Perhaps in those cases, before reading the remaining parts from the network, it would read in the existing bits? (Or just do the check afterwards as it does now.) I suppose that could lead to inefficiences if, say, it's getting a 1GB packet slowly and gets interrupted a bunch of times, but would probably help in the general case? > I will look at all of that subject (with checking files > integrity > asynchronously) when will find time. > > PS: please, do not add me as one of recipient of the email > message -- I > am already subscribed to the maillist, and my Mail-Followup-To > header > suggests that: https://cr.yp.to/proto/replyto.html Hmm. mu4e usually does the right thing, but in this case a basic reply is sending it to you personally, and a "reply all" is including you and the list on the reply. I will try to remember to manually munge the headers on future replies. Sorry about that. - John ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: I/O timeout in nncp-daemon 2021-01-28 15:03 ` John Goerzen @ 2021-01-28 15:25 ` Sergey Matveev 0 siblings, 0 replies; 6+ messages in thread From: Sergey Matveev @ 2021-01-28 15:25 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 1949 bytes --] *** John Goerzen [2021-01-28 09:03]: >Ah, true. Perhaps in those cases, before reading the remaining parts from >the network, it would read in the existing bits? (Or just do the check >afterwards as it does now.) Reading the file before -- bad idea, because it can be huge and that will add the big delay before actual network transmission begins. Network transmission time -- is what we should reduce first. >I suppose that could lead to inefficiences if, say, it's getting a 1GB packet >slowly and gets interrupted a bunch of times, but would probably help in the >general case? As I can see, there are no "general cases" -- everyone uses NNCP very differently :-). Time spent online -- that should be reduced as much as we can, in my opinion, so no delays because of disk activity should be introduced (that is why asynchronous integrity check should be added anyway). Local disk won't be missed and unavailable, but network availability is something that can be missed for a very long time. Merkle trees should help in transparent hashing while network transmission is performed and then reading the beginning of file after it is received. If any ready to use library exists that gives that ability to partly calculate checksums, then possibly I will switch to it in the next release. Writing the own one requires time :-( >Hmm. mu4e usually does the right thing, but in this case a basic reply is >sending it to you personally, and a "reply all" is including you and the list >on the reply. Unfortunately do not know about mu4e, but there should be some kind of "reply to list" command. Not a huge problem, but doubled messages are little annoying (you literally tell you mail server to send a letter both personally to me and to maillist, that sends its own copy to me as a subscriber). -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-01-28 15:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-21 23:55 I/O timeout in nncp-daemon John Goerzen 2021-01-23 9:28 ` Sergey Matveev 2021-01-27 22:48 ` John Goerzen 2021-01-28 7:40 ` Sergey Matveev 2021-01-28 15:03 ` John Goerzen 2021-01-28 15:25 ` Sergey Matveev