Issues with very large packets

public inbox for nncp-devel@lists.cypherpunks.ru
Atom feed

* Issues with very large packets
@ 2021-02-18 21:35 John Goerzen
  2021-02-19 12:36 ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: John Goerzen @ 2021-02-18 21:35 UTC (permalink / raw)
  To: nncp-devel

Hi folks!

So this week I had the occasion to send some 250GB packets through 
NNCP.  This generally worked in the end, but had two interesting 
foibles along the way.

First, nncp-stat was extremely slow on a system that had packets 
like that queued for transmission.  I'm wondering if it is trying 
to read all the packets, even when called with no parameters?

The other is a new twist on the timeout problem I mentioned 
previously.  I previously encountered it post-receive when the 
receiving end would do a time-consuming check on a largeish (say, 
10GB) packet.

With these very large packets, I also encountered it at the 
beginning of a call.  nncp-daemon on the transmitting side 
apparently was taking too long at call establishment, and 
nncp-call would timeout (with the same error I'd mentioned 
previously) without receiving anything.  Interestingly, if 
nncp-call would try again immediately, it would work.  By cycling 
through this timeout/working/timeout/working pattern, eventually 
all the data was transmitted.

It makes me wonder if nncp-daemon was doing some sort of expensive 
scan at the beginning of the call, and either it or the OS cached 
the results?  Not quite sure there.

Anyhow, thanks again!

- John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-18 21:35 Issues with very large packets John Goerzen
@ 2021-02-19 12:36 ` Sergey Matveev
  2021-02-19 19:18   ` John Goerzen
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Matveev @ 2021-02-19 12:36 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 3401 bytes --]

Greetings!

*** John Goerzen [2021-02-18 15:35]:
>First, nncp-stat was extremely slow on a system that had packets like that
>queued for transmission.  I'm wondering if it is trying to read all the
>packets, even when called with no parameters?
>It makes me wonder if nncp-daemon was doing some sort of expensive scan at
>the beginning of the call, and either it or the OS cached the results?

Both of them, as all other commands uses (very simple) src/jobs.go code
to get the list of available encrypted packets in the spool. It does
I/O, but I would not call it expensive:

* list files in the directory (node's spool)
* open each file and read its XDR-encoded header
* this header (currently) takes only 172 bytes of data
* seek each file to the beginning
* return all that jobs metainformation with opened file descriptors

So the only I/O is directory and 172 bytes of each files reading. So if
you have many of files, then yes, it would take some time. But it will
read the whole block/record (up to 128KiB by default) on ZFS. And yes,
if you repeat the operation, then ZFS ARC cache should contain those
blocks, that is why it is much more faster.

First of all: I have no ideas how file's size can affect anyhow that
algorithm above. Possibly read-ahead configuration may read more than
single ZFS block, but it is question about just several ones in the
worst case. So I see no difference in 1TiB or 1GiB file's
metainformation getting. I am talking much about ZFS there only because
reading of that little piece of information will be more expensive on
it, comparing to other filesystems (and I think it is ok, because in
most real world use-cases you wish to read the whole file after that).

If we will keep that metainformation nearby, then it should help much:

* Keeping separate database/cache-file of course is not an option,
  because of complexity and raised consistency questions.
* In nearly all code I see in NNCP, only the niceness level is the only
  thing used everywhere. And a long time ago I actually thought about
  keeping it in the filename itself. But I did not like that clumsy
  optimization and still do not like, because niceness is some kind of
  private valuable metainformation itself (if someone send a packet with
  non-common niceness of 145 -- the fact that some packet with it
  appeared somewhere may be valuable). That is not nice :-)
* We can keep copy of that metainformation (that 172-byte) header in
  ".meta"/whatever file nearby. It it does not exist -- then read the
  beginning of file as used to. It can be recreated anytime atomically.
  I like that solution
* That kind of information can also be kept in filesystem's extended
  attributes. Honestly I have never ever worked with xattrs at all.
  Today was my first setextattr/getextattr command invocations. But
  seems that it should be even more optimal and faster access
  information than separate file on any filesystem. It xattrs are
  missing (disabled?), then fallback to ordinary file reading
* But as I can see, OpenBSD does not support xattrs at all. So fallback
  to separate ".meta" could be valuable anyway

So seems I am going to write that header keeping in xattrs with fallback
to separate file storage.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-19 12:36 ` Sergey Matveev
@ 2021-02-19 19:18   ` John Goerzen
  2021-02-19 19:46     ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: John Goerzen @ 2021-02-19 19:18 UTC (permalink / raw)
  To: Sergey Matveev; +Cc: nncp-devel

On Fri, Feb 19 2021, Sergey Matveev wrote:

> Greetings!
>
> *** John Goerzen [2021-02-18 15:35]:
>>First, nncp-stat was extremely slow on a system that had packets 
>>like that
>>queued for transmission.  I'm wondering if it is trying to read 
>>all the
>>packets, even when called with no parameters?
>>It makes me wonder if nncp-daemon was doing some sort of 
>>expensive scan at
>>the beginning of the call, and either it or the OS cached the 
>>results?
>
> Both of them, as all other commands uses (very simple) 
> src/jobs.go code
> to get the list of available encrypted packets in the spool. It 
> does
> I/O, but I would not call it expensive:

Indeed that doesn't seem like it should be very slow.  I will see 
if I can do some experimentation with strace next week (when I'll 
next have packets of this size) and see if I can validate it.

I do also sometimes have about a thousand packets in the rx 
directory.  That slows it down too, but to the point of taking a 
few dozen seconds, not to the point of taking a minute+ as this 
is.  Though of course to read an entire file of this size would 
take a lot more than a minute, so it wasn't that.  I am sure I saw 
this when there were way less than 1000 packets in the directory, 
and also reasonably sure that I saw it when there were 3.

Random other question: do .seen files impact this at all?

> metainformation getting. I am talking much about ZFS there only 
> because
> reading of that little piece of information will be more 
> expensive on
> it, comparing to other filesystems (and I think it is ok, 
> because in
> most real world use-cases you wish to read the whole file after 
> that).

It is indeed ZFS here.

> If we will keep that metainformation nearby, then it should help 
> much:
>
> * Keeping separate database/cache-file of course is not an 
> option,
>   because of complexity and raised consistency questions.

Agreed.  This use case is not worth a lot of optimizations, it 
makes sense not to have it in the filename.  .meta could work but 
again this one little thing may not justify it.  (Of course it 
should still be encrypted if it's in .meta or xattrs)

> * That kind of information can also be kept in filesystem's 
> extended
>   attributes. Honestly I have never ever worked with xattrs at 
>   all.
>   Today was my first setextattr/getextattr command invocations. 
>   But
>   seems that it should be even more optimal and faster access
>   information than separate file on any filesystem. It xattrs 
>   are
>   missing (disabled?), then fallback to ordinary file reading
> * But as I can see, OpenBSD does not support xattrs at all. So 
> fallback
>   to separate ".meta" could be valuable anyway

I'd discourage that path.  xattrs are really add-ons in POSIXland. 
The support for them is often not present, sometimes broken, 
sometimes has weird compatibility issues.  Probably not worth the 
hassle.

> So seems I am going to write that header keeping in xattrs with 
> fallback
> to separate file storage.

Hey, if you want to <grin>  But I don't think it's really worth 
doing this for my (or really most any) use case.  If indeed you're 
reading just a few bytes from the start of the file, this wouldn't 
make much difference (or at least shouldn't) - and may even hurt, 
because now things like ls -l could have 2000 files to look at 
instead of 1000.

Thanks again!

- John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-19 19:18   ` John Goerzen
@ 2021-02-19 19:46     ` Sergey Matveev
  2021-02-19 20:34       ` John Goerzen
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Matveev @ 2021-02-19 19:46 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 2482 bytes --]

*** John Goerzen [2021-02-19 13:18]:
>Random other question: do .seen files impact this at all?

There is only checking of their existence. Of course that can lead to
additional I/O (reading "inodes" of directory), but it should be cached
anyway. I am sure .seen impact is negligible.

>I'd discourage that path.  xattrs are really add-ons in POSIXland. The
>support for them is often not present, sometimes broken, sometimes has weird
>compatibility issues.  Probably not worth the hassle.

Believe you. And exactly that feeling I had when read about all that
xattrs stuff. Well, better forget about xattrs.

>Hey, if you want to <grin>  But I don't think it's really worth doing this
>for my (or really most any) use case.  If indeed you're reading just a few
>bytes from the start of the file, this wouldn't make much difference (or at
>least shouldn't) - and may even hurt, because now things like ls -l could
>have 2000 files to look at instead of 1000.

Probably I am wrong, but I really believe that especially on ZFS that
leads to huge read amplification. With default recordsize=128KiB it
plays no role to read 200B or 100KiB -- ZFS will anyway read the whole
record (it had to -- to check the integrity) (assume that compression
plays no role, because of encryption). But reading 200B file will lead
only reading of that 200B, that is even much smaller that disk sector
size. So thousand of files is many megabytes of random reads, that is
really heavy.

Looking for metainformation of 2000 files of course is an additional
I/O, but ZFS keeps directory information in pretty compact dictionaries,
those chunks are located in the recordsize blocks on the disk. And many
of them located nearby on the disk because of sequential ZFS transaction
write. So even thousands of additional files is not a comparable load at
all as read-amplificated reading of all that files'es first record. So
it is worth of it, however is an optimization for ZFS. But pretty simple
and optional.

>.meta could work but again this one little
>thing may not justify it.  (Of course it should still be encrypted if it's in
>.meta or xattrs)

It is anyway plaintext already in the encrypted packet. .meta will (I
assume) keep just a copy of the header of a file. If someone can read
encrypted packets -- it can read corresponding .meta ones.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-19 19:46     ` Sergey Matveev
@ 2021-02-19 20:34       ` John Goerzen
  2021-02-20 19:56         ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: John Goerzen @ 2021-02-19 20:34 UTC (permalink / raw)
  To: Sergey Matveev; +Cc: nncp-devel

On Fri, Feb 19 2021, Sergey Matveev wrote:

> Probably I am wrong, but I really believe that especially on ZFS 
> that
> leads to huge read amplification. With default recordsize=128KiB 
> it
> plays no role to read 200B or 100KiB -- ZFS will anyway read the 
> whole
> record (it had to -- to check the integrity) (assume that 
> compression
> plays no role, because of encryption). But reading 200B file 
> will lead
> only reading of that 200B, that is even much smaller that disk 
> sector
> size. So thousand of files is many megabytes of random reads, 
> that is
> really heavy.

I don't think you're wrong, but in my experience it just hasn't 
been a huge issue.  Yes nncp-stat can take a dozen seconds when 
there are a thousand packets there.  But how much of that is 
caused by head seeking vs. reading an extra 128ish K?  I mean, I 
would expect the cost of reading 128K vs. reading 200 bytes to be 
tiny compared to the latency of the seeks to get to the file in 
the first place.  This is all on HDD, of course; with SSD, I would 
imagine the 128K sequential read also to be fairly 
inconsequential.  But I guess the one way to find out is to test 
it!

-- John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-19 20:34       ` John Goerzen
@ 2021-02-20 19:56         ` Sergey Matveev
  2021-02-21  4:31           ` John Goerzen
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Matveev @ 2021-02-20 19:56 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 2509 bytes --]

*** John Goerzen [2021-02-19 14:34]:
>But how much of that is caused by head seeking vs. reading an
>extra 128ish K?  I mean, I would expect the cost of reading 128K vs. reading
>200 bytes to be tiny compared to the latency of the seeks to get to the file
>in the first place.

That is also true.

Well, I made that ".hdr" files support, which keep copy of the header. I
tried it on the following spool directory: Tx:    303 GiB,  839 pkts.
Non-modified nncp-stat takes:
       39.61 real         0.07 user         0.10 sys
I made it on ZFS that was exported/imported, so completely lacking any
cached information. .hdr supporting version took:
       16.97 real         0.05 user         0.08 sys
So it helps. Also there is some kind of hack: you can remove all .hdr
and recreate them simply by running nncp-stat. Because there are no
other heavy writing on the filesystem, those .hdr files will be located
much more linearly on the disk, much more denser, because of sequential
writings of checkpoints on ZFS. And after that nncp-stat works a
magnitude faster:
        1.62 real         0.01 user         0.04 sys

So random I/O operations is the bottleneck here. Sequential ZFS write of
bunch of buffered operations helps much (as in last case), but it is
hacky of course. .hdr files usage can be disabled in the configuration at all.

Also I have made support of ".nock" (non-checksummed) files. After
online daemons receives the file, it is renamed from ".part" to ".nock".
By default it is checksummed in the background and then renamed to
ordinary fully verified packet. But with -nock command-line/configuration
option you can disable its checksumming at all, doing that after
daemon/caller invocations with nncp-check command.

There is only one exception: if the file is received from the very
beginning (from zero offset), then it is hashed immediately in the
background during receiving. And if it is received till the end, then no
".nock" is created and file is immediately renamed to ordinary received
packet (of course if checksum is good). If it is interrupted and resumed
later, then ".nock" is used anyway.

All of that is in the "develop" branch in git and everything seems
working. But currently I did not test it much, so there should be some
bugs. I will work a couple of days under that updated versions and will
make a release.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-20 19:56         ` Sergey Matveev
@ 2021-02-21  4:31           ` John Goerzen
  2021-02-21  8:27             ` Sergey Matveev
  0 siblings, 1 reply; 8+ messages in thread
From: John Goerzen @ 2021-02-21  4:31 UTC (permalink / raw)
  To: Sergey Matveev; +Cc: nncp-devel

On Sat, Feb 20 2021, Sergey Matveev wrote:

> much more linearly on the disk, much more denser, because of 
> sequential
> writings of checkpoints on ZFS. And after that nncp-stat works a
> magnitude faster:
>         1.62 real         0.01 user         0.04 sys

Impressive!

> Also I have made support of ".nock" (non-checksummed) files. 
> After
> online daemons receives the file, it is renamed from ".part" to 
> ".nock".
> By default it is checksummed in the background and then renamed 
> to
> ordinary fully verified packet. But with -nock 
> command-line/configuration
> option you can disable its checksumming at all, doing that after
> daemon/caller invocations with nncp-check command.

Interesting.  How do these different options interact with 
removing the packet from the remote end?

I gather that, right now, the packet is preserved on the remote 
end until the recipient has checked and verified the checksum. 
With background checksumming, I assume that could happen later, 
possibly even after the call drops.  Will the sender still keep 
the packet around until the recipient confirms the checksum?  If 
not, that could result in data loss.  And if so, how exactly would 
that happen, especially since the call my drop by then?  (eg, 
onlinedeadline of 20s and it takes 200s to checksum a big file)  I 
worry there may be a race condition there where nncp-toss may see 
and receive a successfully-processed file before this can be 
indicated back to the sender.

> There is only one exception: if the file is received from the 
> very
> beginning (from zero offset), then it is hashed immediately in 
> the
> background during receiving. And if it is received till the end, 
> then no
> ".nock" is created and file is immediately renamed to ordinary 
> received
> packet (of course if checksum is good). If it is interrupted and 
> resumed
> later, then ".nock" is used anyway.

That's great!

> All of that is in the "develop" branch in git and everything 
> seems
> working. But currently I did not test it much, so there should 
> be some
> bugs. I will work a couple of days under that updated versions 
> and will
> make a release.

Thanks for all your work on this!

- John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Issues with very large packets
  2021-02-21  4:31           ` John Goerzen
@ 2021-02-21  8:27             ` Sergey Matveev
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Matveev @ 2021-02-21  8:27 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]

*** John Goerzen [2021-02-20 22:31]:
>Interesting.  How do these different options interact with removing the
>packet from the remote end?

Packet is deleted only and only when confirmation is sent. And
confirmation is sent only if either packet with the same hash exists in
the spool directory (completely checksummed and fully downloaded), or
corresponding .seen packet exists (that can appear only after tossing of
completely checksummed and fully downloaded packet).

By default, after packet is received, it is renamed .part->.nock and it
is added to background checker queue. When checker finishes
checksumming, it renames .nock->"no extension" and sends confirmation to
the remote side. If remote side is unavailable (connection is lost),
then that confirmation is lost. But ordinary packet (or .seen after its
tossing) will be in the spool. Each online session starts with sending
the whole list of packets (taking niceness in advance) to the remote
side. And during the next session, because of packet/.seen, confirmation
will be sent and packet deleted on remote side. Also, when session
begins, background checker is started with pre-filled queue of currently
existing .nock files (by default, if no -nock option is turned on), that
will send confirmation when .nock-s are checked.

So no loss is possible (.nock files are not enough for confirmation).
Lost connections are not a problem, if you keep state of seen packets --
confirmation will be sent during the next sessions.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-21  8:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 21:35 Issues with very large packets John Goerzen
2021-02-19 12:36 ` Sergey Matveev
2021-02-19 19:18   ` John Goerzen
2021-02-19 19:46     ` Sergey Matveev
2021-02-19 20:34       ` John Goerzen
2021-02-20 19:56         ` Sergey Matveev
2021-02-21  4:31           ` John Goerzen
2021-02-21  8:27             ` Sergey Matveev