Greetings!

*** John Goerzen [2021-01-21 17:55]:
>This timeout is much lower than any of the deadlines set in the configuration
>file, and appears to be a few seconds.

10 seconds. These are various:
    conn.Set*Deadline(time.Now().Add(DefaultDeadline))
lines.

>Would it be possible to make this configurable, or larger?

Well... when I wrote that, I assumed that remote system anyway has to
give an answer (from TCP point of view) during 10 seconds.

Overall file's retrieving algorithm is simple:

* after handshake is made, each side sends list of packets (with their
  nice value) available for remote side (INFO)
* then each side, if it wants to (due to configuration), sends request
  (FREQ) to remote side to start sending specified packet's contents
  from specified offset
* after receiving of that FREQ, sending of many FILE packets begins.
  Each of FILE contains a chunk of encrypted packet. Packet's contents
  are saved to "PKT-HASH.part"
* when encrypted packet is fully retrieved (its length equals to INFO's
  length known in advance), then background checksum checker is started.
  If checksum is good, then it renames "PKT-HASH.part" to "PKT-HASH"
* but it is running in background, while receiving/sending of another
  packets continues
* when another packet is fully retrieved, but background checker is
  still busy -- NNCP waits checker to complete

And seems (I am sure) that exactly because of the last step, the program
"hangs" and does not read anything more from the socket. Of course some
buffers contains possibly another FILE chunk, but they are quickly
filled and still waits for program to issue Read() from it.

Checksumming is required to be completely sure that received file is
good and we can send "DONE" message to remote side, that will delete the
packet from its spool.

You know, then I was writing that online part of NNCP, I was not
thinking about huge files at all. Initially NNCP lacked any online
communication at all.

What can be done?

* receiver can send HALT packet to stop remote's side from sending any
  data. I do not like that case, because it can easily lead to constant
  HALT+FREQ exchanging. It is hard to determine if we really needs to
  stop reception, because we have got many gigabytes of data on USB2 HDD
* receiver can send FREQs not for the bunch of files, but only just for
  single one, waiting for its reception, checksumming and only asking
  for another FREQ after that. I do not like that, because it leads to
  many round-trips. Currently it can send many KiBs of FREQs in just a
  single TCP segment, leading to a non-stopping stream of sent packets
* receiver can make a queue with packets needed to be checked. Instead
  of waiting for checksumer to end, it just fills up his queue

But! Anyway I do not like the fact that checker and receiver work
simultaneously, leading to constant read/write operations, killing HDDs
performance. It highly decreases the overall receiving speed. Ideally we
should either only send the data, or only received the data, to be able
to *sequentially* write it on the disk (it can be done now by specifying
rx/tx modes). Then, we should sequentially read it, for checksum
verification, without any network transmission at all.

Moreover, what if we deal with 1TiB file? There are high probability
that daemon/caller will be restarted and noone, until the next online
session will start that .part checking.

I think that there should be another intermediate step made of packets
processing. "PKT-HASH.part" is partly received file. No it is time to
create some kind of "PKT-HASH.done-but-unchecked" one. And another
nncp-checker daemon, that just checks the checksum and renames
".done-but-unchecked" to "PKT-HASH". And need to add possibility for
nncp-daemon *not* to do checksumming immediately. So there be
possibility to do checksum check completely asynchronously from the
transmission. Of course tossing must be made with -seen option, to save
the fact that some file was seen and (possibly) already processed.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF