Greetings! *** John Goerzen [2021-01-21 17:55]: >This timeout is much lower than any of the deadlines set in the configuration >file, and appears to be a few seconds. 10 seconds. These are various: conn.Set*Deadline(time.Now().Add(DefaultDeadline)) lines. >Would it be possible to make this configurable, or larger? Well... when I wrote that, I assumed that remote system anyway has to give an answer (from TCP point of view) during 10 seconds. Overall file's retrieving algorithm is simple: * after handshake is made, each side sends list of packets (with their nice value) available for remote side (INFO) * then each side, if it wants to (due to configuration), sends request (FREQ) to remote side to start sending specified packet's contents from specified offset * after receiving of that FREQ, sending of many FILE packets begins. Each of FILE contains a chunk of encrypted packet. Packet's contents are saved to "PKT-HASH.part" * when encrypted packet is fully retrieved (its length equals to INFO's length known in advance), then background checksum checker is started. If checksum is good, then it renames "PKT-HASH.part" to "PKT-HASH" * but it is running in background, while receiving/sending of another packets continues * when another packet is fully retrieved, but background checker is still busy -- NNCP waits checker to complete And seems (I am sure) that exactly because of the last step, the program "hangs" and does not read anything more from the socket. Of course some buffers contains possibly another FILE chunk, but they are quickly filled and still waits for program to issue Read() from it. Checksumming is required to be completely sure that received file is good and we can send "DONE" message to remote side, that will delete the packet from its spool. You know, then I was writing that online part of NNCP, I was not thinking about huge files at all. Initially NNCP lacked any online communication at all. What can be done? * receiver can send HALT packet to stop remote's side from sending any data. I do not like that case, because it can easily lead to constant HALT+FREQ exchanging. It is hard to determine if we really needs to stop reception, because we have got many gigabytes of data on USB2 HDD * receiver can send FREQs not for the bunch of files, but only just for single one, waiting for its reception, checksumming and only asking for another FREQ after that. I do not like that, because it leads to many round-trips. Currently it can send many KiBs of FREQs in just a single TCP segment, leading to a non-stopping stream of sent packets * receiver can make a queue with packets needed to be checked. Instead of waiting for checksumer to end, it just fills up his queue But! Anyway I do not like the fact that checker and receiver work simultaneously, leading to constant read/write operations, killing HDDs performance. It highly decreases the overall receiving speed. Ideally we should either only send the data, or only received the data, to be able to *sequentially* write it on the disk (it can be done now by specifying rx/tx modes). Then, we should sequentially read it, for checksum verification, without any network transmission at all. Moreover, what if we deal with 1TiB file? There are high probability that daemon/caller will be restarted and noone, until the next online session will start that .part checking. I think that there should be another intermediate step made of packets processing. "PKT-HASH.part" is partly received file. No it is time to create some kind of "PKT-HASH.done-but-unchecked" one. And another nncp-checker daemon, that just checks the checksum and renames ".done-but-unchecked" to "PKT-HASH". And need to add possibility for nncp-daemon *not* to do checksumming immediately. So there be possibility to do checksum check completely asynchronously from the transmission. Of course tossing must be made with -seen option, to save the fact that some file was seen and (possibly) already processed. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF