Greetings! *** John Goerzen [2021-08-18 08:56]: >So while looking into the question of "how could I have the quickest >delivery and execution of packets between machines on a LAN" I am sure that if we are dealing with <=1Gbps Ethernet, then the main bottleneck is the network itself and TCP-related algorithms. If we deal with >=10Gbps links and especially high-latency ones, then TCP is the thing you are likely have to tune. That is why I played with various protocols like UDT, Tsunami, QUIC and some others I do not remember now. It is better to possibly loose some traffic because of congestion, send more overall data than necessary, but deliver the whole packet as fast as it is possible. For example flush it with a "tsunami" of UDP packets and then resent the lost chunks (and new MTH hash algorithm allows immediate integrity checking too). I did not dive deeply into all of that, but with an ordinary 1Gbps Ethernet adapter and short-length home network all of that is behind an ordinary TCP. Possibly fine TCP tuning will be always enough for NNCP. https://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm >2) nncp-caller seems to be doing frequent calls to nanosleep, futex, >clock_gettime, and epoll while it has a connection to a remote. Yeah, that is Go runtime uses for goroutines running for established session. And many goroutines in NNCP are in endless loop with a sleep, constantly checking is there anything new in the spool directory. >The broad question is: what is the most efficient way to do fast data >exchange? (Efficient in terms of both SSD life and battery life on a >laptop) For me, the first thing about efficiency is dealing with the network. Transport protocol: currently just an ordinary TCP and an administrator tuning it for necessary purposes. And application protocol atop of it: NNCP's SP, that can aggregate multiple SP-packets in single TCP segment. In theory. In practice it is done during the handshake, but then each even about newly appeared packet is sent immediately, to notify remote side as quickly as possible. And Noise_IK pattern is used, because of reduced number of round-trips, comparing to Noise_XK, which hides identity. Then comes CPU and memory. I assume that battery life depends mainly on CPU. Cryptographic algorithms used in NNCP are some kind of the fastest ones: ChaCha20-Poly1305 and BLAKE3. AES-GCM with hardware acceleration could be faster (and less CPU hungry), but that will complicate SP-protocol with algorithm negotiation, that I won't do. But neither ChaCha20-Poly1305, nor BLAKE3 implementations use multiple CPUs now. Multiple connections will be parallelized, because they will work in multiple independent goroutines. SSD life depends on disk activity. Because I use mainly hard drives everywhere, I tend to minimize and serialize all disk operations. Obviously :-). Of course the most optimal way is to transparently receive data, checksum it, decipher, authenticate and write only the deciphered/processed payload to the disk. But because of reliability requirement we have to save encrypted packet, do various fsync-calls, and only after that begin its processing, with another fsyncs. Performance and reliability guarantees are opposites. Turning off fsync (zfs set sync=disabled, mount -o nosync), atime, .hdr files of course will hasten NNCP. Constant rereading of spool directory, stat-ing files in it, locking -- generally won't create any real I/O operations to the disk, because of filesystem caching. And of course it won't wearout SSDs, because it is read operations. But it consumes CPU, indeed. Instead of constant rereading of directory contents, software can use various frameworks like kqueue and inotify, that will explicitly immediately notify about changes, without the need of an endless expensive loop with a sleep. But all of that is OS-specific, that is why I am not looking in that direction. I am not against that kind of optimization, but just have not seen they eating too much CPU to worry about. But they are not free of course -- any kind of syscalls is relatively expensive. There are many places NNCP can be optimized, especially in SP-related code, to do less loops with sleeps and syscalls. Especially with OS-specific things like kqueue/epoll events notification. >I have been using persistent connections (very high onlinedeadline and >maxonlinetime) with nncp-caller, even when that's not strictly necessary, >reasoning that there is no particular overhead for establishing a new >connection periodically and all the logging associated with that. However, >if nncp-caller is using CPU time/battery power to maintain that, then >perhaps I'm a bit off there. (Though it does seem to be negligible) NNCP sends PING packets from time to time and runs various goroutines that check if anything new appeared in spool directories. We should do benchmarks of course, but session establishing is several TCP/SP roundtrips, with asymmetric cryptography involved (that is *very* expensive from CPU point of view: 0.5-1M of CPU cycles), with first handshake packets padded to their maximal size of ~64KB. So handshake should be very expensive (traffic, delays, CPU) comparing to long-lived sessions. >The bigger question is around tossing. Does autotoss do something more >restrictive than nncp-toss (perhaps only toss from a particular machine)? Yes, it runs tosser only for the node we have got connection. >Is there a way, since autotoss is in-process with nncp-caller, to only >trigger the toss algorithm when a new packet has been received, rather than >by cycle interval? Can be done. Should be done :-). Current autotosser runs literally the same toss-functions as nncp-toss. >One other concern about a very short cycle interval is that a failing packet >can cause a large number of log entries. I remember about that issue and about the whole problem of (unexistent) errors processing. Currently I just had no time to think about that. And in the nearest weeks won't start thinking about it too... various other things in real life I have to finish :-) >A final question about when-tx-exists being true. I am a bit unclear how >that interacts with cron. Is it: >2) Called are made only when cron says to, but only if an outgoing packet >exists. (when-tx-exists causes FEWER calls than cron alone) >I'm guessing it's #2 but I'm not certain. Yes, exactly like you wrote here. when-tx-exists just tells, every time we appear to make a call, to check if there really exists any outgoing packet (with specified niceness). -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF