Avoiding double writes

public inbox for nncp-devel@lists.cypherpunks.ru
Atom feed

* Avoiding double writes
@ 2021-11-01 17:47 John Goerzen
  2021-11-01 19:46 ` Sergey Matveev
  0 siblings, 1 reply; 7+ messages in thread
From: John Goerzen @ 2021-11-01 17:47 UTC (permalink / raw)
  To: nncp-devel

Hi,

Awhile back, we were discussing the temporary files that were 
needed for reading stdin for nncp-exec or nncp-file.  I believe 
the reason for this is that the header contains a signature of the 
data that follows, and it's not practical to seek back and write 
that later.

That raises a question... since the signature can't be verified 
without reading the entirety of the data anyhow, why not put the 
signature after the data instead of before it?

To do that, there needs to be some way of recognizing the end of 
the data.  I'm not sure how that happens now, since we also don't 
know the size in advance in those cases.  Is there some sort of 
blocking for data chunks?

If time permits, I may see about adding this feature if there's a 
design path that would be suitable.

- John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-01 17:47 Avoiding double writes John Goerzen
@ 2021-11-01 19:46 ` Sergey Matveev
  2021-11-02  0:11   ` John Goerzen
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Matveev @ 2021-11-01 19:46 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 4074 bytes --]

Greetings!

*** John Goerzen [2021-11-01 12:47]:
>Awhile back, we were discussing the temporary files that were needed for
>reading stdin for nncp-exec or nncp-file.  I believe the reason for this is
>that the header contains a signature of the data that follows, and it's not
>practical to seek back and write that later.

http://www.nncpgo.org/Encrypted.html
It is not because of signature, but mainly because of SIZE field.
Signed header contains everything you need to authenticate remote side
and create encryption keys to process ENCRYPTED data. There is not
signature over the whole packet. ENCRYPTED data contains encrypted size
as a first short block, and then pile of 128KiB blocks. Each block is
AEAD-encrypted, so it is authenticated. SIZE holds the payload size,
that can be smaller than the whole packet, because of added junk to hide
the actual payload size.

What can we do with the size, that anyway has to be known somehow.
Probably add its size as the very last block. But that way we have to
seek to the end (to read/decrypt it), then seek back to decrypt and copy
the actual payload. We can not do it in advance, because we do not know
where the actual payload stops and junk begins. Not an option.

We can store some structure inside each block (single signalling byte),
telling is it the payload one, or junk one. Now we can process it
sequentially. But junk in that case has to be some real data we will
really encrypt and authenticate. Junk can be zeroes, but AEAD-encryption
is much more expensive than current junk-generator made as BLAKE3-XOF
output. Much complication, more expensive (CPU). Ok, we know that if
junk block is started, then we can be sure that everything after will be
the junk, so we have to authenticate only the single block with the junk
and then quickly generate it without real AEAD-processing. Except for
the very last block with the size.

This is the option. That adds additional metadata to each block and
moves encrypted SIZE to the end. However we can not sequentially read
the packet and determine its size immediately: either we do seek, or
decrypt the data parsing it. So my main uncertainty is: is it worth of it?

Previously another stop-issue was the fact, that the whole generated
encrypted packet was hashed immediately, so we could not, for example,
leave fixed-sized SIZE block and fill it (seek back, then write) when
the whole data was read from stdin. Currently with MTH
(http://www.nncpgo.org/MTH.html) it can be done pretty efficiently: we
can hash only part of the data, and then hash another missing parts.

But NNCP allows encapsulating of transitional packets. So when I do:
nncp-file -via alice,bob - carol:dst, three encrypted packets are
generated on the fly feeding one to another: one is for carol, and
another two ones are transitional. So it complicated task of rewriting
SIZE field after the header more.

>To do that, there needs to be some way of recognizing the end of the data.
>I'm not sure how that happens now, since we also don't know the size in
>advance in those cases.  Is there some sort of blocking for data chunks?

Storing of the whole data in temporary file is exactly intended for
gaining knowledge of the resulting size in advance. Plaintext is split
on blocks, which are independently AEAD-encrypted. 1) Most AEAD ciphers
interfaces allow only the whole data to be processed, without
intermediate updates, so we have to split it; 2) It allows exiting
decryption process immediately if one of the blocks already failed
(unauthenticated), without waiting till the whole packet was processed
and we saw invalid MAC. Initially NNCP used ordinary block cipher mode
with ordinary MAC function at the very end -- modern AEAD like
ChaCha20-Poly1305 is just faster and overhead of 16*8=128 bytes per
MiB of payload is negligible.

And of course probably I am just missing some damn simple solution.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-01 19:46 ` Sergey Matveev
@ 2021-11-02  0:11   ` John Goerzen
  2021-11-02 10:03     ` Sergey Matveev
  0 siblings, 1 reply; 7+ messages in thread
From: John Goerzen @ 2021-11-02  0:11 UTC (permalink / raw)
  To: Sergey Matveev; +Cc: nncp-devel

Hi Sergey!

On Mon, Nov 01 2021, Sergey Matveev wrote:

> *** John Goerzen [2021-11-01 12:47]:
>>Awhile back, we were discussing the temporary files that were needed for
>>reading stdin for nncp-exec or nncp-file.  I believe the reason for this is
>>that the header contains a signature of the data that follows, and it's not
>>practical to seek back and write that later.
>
> http://www.nncpgo.org/Encrypted.html
> It is not because of signature, but mainly because of SIZE field.
> Signed header contains everything you need to authenticate remote side
> and create encryption keys to process ENCRYPTED data. There is not
> signature over the whole packet. ENCRYPTED data contains encrypted size
> as a first short block, and then pile of 128KiB blocks. Each block is
> AEAD-encrypted, so it is authenticated. SIZE holds the payload size,
> that can be smaller than the whole packet, because of added junk to hide
> the actual payload size.

Got it.

So, from a glance at the code, this size is primarily used for:

- differentiating encrypted data from padding

- display output

> We can store some structure inside each block (single signalling byte),
> telling is it the payload one, or junk one. Now we can process it
> sequentially. But junk in that case has to be some real data we will
> really encrypt and authenticate. Junk can be zeroes, but AEAD-encryption
> is much more expensive than current junk-generator made as BLAKE3-XOF
> output. Much complication, more expensive (CPU). Ok, we know that if
> junk block is started, then we can be sure that everything after will be
> the junk, so we have to authenticate only the single block with the junk
> and then quickly generate it without real AEAD-processing. Except for
> the very last block with the size.

Is it even really necessary to store a size?  I was thinking of
something along these lines also, with the signaling byte (or, perhaps
more accurately, a blocksize indicator).  For instance, each block would
begin with a size of the block of encrypted data, and after all the
encrypted data, a block size of zero could be used.  There would be no
need to explicitly give a size because the stream of blocks would
contain all the needed information.

But I'm also confused about the signature - since that comes before the
encrypted blocks, isn't that also a problem?

Thanks for the conversation,

John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-02  0:11   ` John Goerzen
@ 2021-11-02 10:03     ` Sergey Matveev
  2021-11-02 15:26       ` John Goerzen
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Matveev @ 2021-11-02 10:03 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 2558 bytes --]

Greetings!

*** John Goerzen [2021-11-01 19:11]:
>So, from a glance at the code, this size is primarily used for:
>- differentiating encrypted data from padding
>- display output

Exactly.

>Is it even really necessary to store a size?  I was thinking of
>something along these lines also, with the signaling byte (or, perhaps
>more accurately, a blocksize indicator).

Well, actually currently I do not see the strong need of the size too.
Of course it is nice to be able to quickly (just decipher single block
of data) determine real payload's size, but for years no code/command
did this, except for nncp-pkt for debugging.

So I agreed with you that signalling will be more than enough, allowing
"streaming" creation of encrypted packets.

However with MTH it is possible (if I am not wrong) to write everything
encrypted except for the first block(s), then returning back and
preappending those first blocks with the known size. I am not sure, but
that will require changing of much NNCP internal code, aimed for
streaming exclusively. But that won't require changing of the existing
encrypted format.

But! When I woke up, I realized that junk (padding) is not authenticated
anyhow now! If packet has two bytes of padding, then you can process it,
then strip off the last byte, because of MTH-hash change it will bypass
.seen-check, and it will be processed successfully again. Then you can
strip another byte and see if it was processed successfully (for
example, as an adversary, you see if some message/file appears somewhere
after you sent the modified encrypted packet). And do that until you
strip off all junk padding, thus knowing the real payload's size.

So format change is definitely needed now :-). It is vulnerability, but
of course not the crucial and hardly seriously exploitable, leading only
to possible real payload's size leak. Padding length has to be
authenticated.

But as we are going to change the format anyway, I think that it is safe
to get rid of SIZE field completely, adding some "signalling"
metainformation to the blocks, not forgetting about padding
authentication. And that won't affect much code I presume.

>But I'm also confused about the signature - since that comes before the
>encrypted blocks, isn't that also a problem?

Signature is made over the encrypted packet's header only. It comes
after the header itself, so do not see any problems :-)

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-02 10:03     ` Sergey Matveev
@ 2021-11-02 15:26       ` John Goerzen
  2021-11-02 17:49         ` Sergey Matveev
  2021-11-02 20:48         ` Sergey Matveev
  0 siblings, 2 replies; 7+ messages in thread
From: John Goerzen @ 2021-11-02 15:26 UTC (permalink / raw)
  To: Sergey Matveev; +Cc: nncp-devel

On Tue, Nov 02 2021, Sergey Matveev wrote:

> Well, actually currently I do not see the strong need of the 
> size too.
> Of course it is nice to be able to quickly (just decipher single 
> block
> of data) determine real payload's size, but for years no 
> code/command
> did this, except for nncp-pkt for debugging.

Yes, and for giving user output, the size of the encrypted packet 
could be used too.

> However with MTH it is possible (if I am not wrong) to write 
> everything
> encrypted except for the first block(s), then returning back and
> preappending those first blocks with the known size. I am not 
> sure, but
> that will require changing of much NNCP internal code, aimed for
> streaming exclusively. But that won't require changing of the 
> existing
> encrypted format.

But I don't think even this is needed, since as we're saying, we 
don't have a strong need for the size.

> So format change is definitely needed now :-). It is 
> vulnerability, but
> of course not the crucial and hardly seriously exploitable, 
> leading only
> to possible real payload's size leak. Padding length has to be
> authenticated.

Interestingly, AFAIK, OpenPGP has no provision for padding and its 
packet headers are unencrypted, so agreed that it isn't a big 
deal.

> But as we are going to change the format anyway, I think that it 
> is safe
> to get rid of SIZE field completely, adding some "signalling"
> metainformation to the blocks, not forgetting about padding
> authentication. And that won't affect much code I presume.

Yes, exactly.  This metadata could be as simple as a u32 
indicating how much of the following block is actual data.  Any 
u32 value beneath 128K (including zero) would indicate we've 
reached EOF of the original data and everything past that should 
be authenticated but discarded, I think.

>>But I'm also confused about the signature - since that comes 
>>before the
>>encrypted blocks, isn't that also a problem?
>
> Signature is made over the encrypted packet's header only. It 
> comes
> after the header itself, so do not see any problems :-)

Ahh, that makes sense.  I'm not all up on my crypto algorithms, 
but if I understand correctly, each encrypted block is 
authenticated with the BLAKE3 hash of that block plus the unsigned 
portion of the header?  And since that portion of the header 
contains the public part of the session key, that prevents data 
injection attacks, right?

So it would be possible with this streaming approach to still 
determine with certainty if we have received the entire file's 
data, and the correct data, by processing the hash and header for 
each block, right?

Thanks,

- John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-02 15:26       ` John Goerzen
@ 2021-11-02 17:49         ` Sergey Matveev
  2021-11-02 20:48         ` Sergey Matveev
  1 sibling, 0 replies; 7+ messages in thread
From: Sergey Matveev @ 2021-11-02 17:49 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

*** John Goerzen [2021-11-02 10:26]:
>Yes, and for giving user output, the size of the encrypted packet could be
>used too.

Agreed.

>Yes, exactly.  This metadata could be as simple as a u32 indicating how much
>of the following block is actual data.  Any u32 value beneath 128K
>(including zero) would indicate we've reached EOF of the original data and
>everything past that should be authenticated but discarded, I think.

I assume something like that, indeed. Plus included size of the
padding/junk, to be sure that we received all of it, packet is not
stripped. We have got holidays soon, so I hope will implement that.

>Ahh, that makes sense.  I'm not all up on my crypto algorithms, but if I
>understand correctly, each encrypted block is authenticated with the BLAKE3
>hash of that block plus the unsigned portion of the header?  And since that
>portion of the header contains the public part of the session key, that
>prevents data injection attacks, right?

Mostly you are right. Each (128KiB and SIZE) block uses BLAKE3 hash of
the unsigned part of the header as an associated data, used as an
additional input to AEAD-encryption of the block. So each block is
"tied" to the context it is used with: exactly that sender, recipient
and ephemeral public key. And each block uses implicit increasing nonce
counter, so blocks can not be reordered, thrown away or injected. Blocks
can not be taken from another "context" (another encrypted packet),
because of associated data.

>So it would be possible with this streaming approach to still determine with
>certainty if we have received the entire file's data, and the correct data,
>by processing the hash and header for each block, right?

Do not understand you clear :-(. Yes, we can be sure if we have received
the whole data or not, by looking if we reached "final" payload block,
containing also the size of the padding.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Avoiding double writes
  2021-11-02 15:26       ` John Goerzen
  2021-11-02 17:49         ` Sergey Matveev
@ 2021-11-02 20:48         ` Sergey Matveev
  1 sibling, 0 replies; 7+ messages in thread
From: Sergey Matveev @ 2021-11-02 20:48 UTC (permalink / raw)
  To: nncp-devel

[-- Attachment #1: Type: text/plain, Size: 4387 bytes --]

*** John Goerzen [2021-11-02 10:26]:
>Yes, exactly.  This metadata could be as simple as a u32 indicating how much
>of the following block is actual data.  Any u32 value beneath 128K
>(including zero) would indicate we've reached EOF of the original data and
>everything past that should be authenticated but discarded, I think.

Well, after some thoughts I came to the following construction.

* I do not like the fact that each payload-block will hold constant
  u32/whatever integer, that will differ only in single block in the
  whole stream. It is waste of space. Of course I understand that we are
  talking about mostly negligible 32-bits per every 128KiB of data, but
  I still do not like that waste :-)
* It can be replaced with "signalling" bit, or actually single byte (for
  convenience), telling that either current block is fully "payloaded"
  or holds an additional metadata, signalling the reaching end of the
  payload stream
* So actually we just have to differentiate single special block with
  metadata inside. It can be done by using different encryption key for
  it. That hack is used in widely-used CMAC for example: it uses one key
  to encrypt block with the pad, and another to signal that encrypted
  block has no padding. CMAC deals with just single 64-128 bit block,
  but NNCP with huge 128KiB one -- I think it is still acceptable CPU
  burn, because anyway excess 128KiB symmetric AEAD-decryption is much
  more cheaper that any of curve25519/ed25519 operations

So we derive two encryption keys: "ordinary" and "signalling" ones. When
block is encrypted with signalling key, that means that it holds two
64-bit integers at the beginning: full payload and padding sizes.
Period. That is completely enough change to the packets format. Let's
assume that each block holds 128-bytes of plaintext:

* If we are sending 200-bytes of data, then we generate two blocks:
  0: key=ordinary, 128-bytes of payload
  1: key=signalling, 64-bit integer with value 200 (full payload size)
                     64-bit integer with value 0 (no padding)
                     72-bytes of remaining payload
* If we wish to pad it with 30-bytes, then:
  1: key=signalling, 64-bit integer with value 200
                     64-bit integer with value 30
                     72-bytes of remaining payload
                     30-bytes of zeros
* If we are sending 128-bytes of data, then:
  0: key=ordinary, 128-bytes of payload
  1: key=signalling, 64-bit integer with value 128
                     64-bit integer with value 0 (no padding)
                     nothing else, 0 payload bytes remaining to read
* If we are sending 10 bytes of data, then:
  0: key=signalling, 64-bit integer with value 10
                     64-bit integer with value 0 (no padding)
                     10-bytes of payload
* If we are sending 126 bytes of data, plus 50 padding bytes then:
  0: key=signalling, 64-bit integer with value 126
                     64-bit integer with value 50
                     110-bytes of payload
  1: key=ordinary, 16-bytes of remaining payload
                   50-bytes of padding

If pad size exceeds free space inside the block, then I wish to use
current BLAKE3-XOF as a generator of random sequence. No real
AEAD-encrypted blocks, but just a stream of XOF output. But we do not
need to use cryptographic authentication, because that XOF is completely
deterministic (when we know session keys of course, adversary does not),
so we just can generate that stream too and compare them byte-by-byte:
it is much more faster. And we know exact pad size, to be sure that
noone stripped it off.

Slightly more bigger code (that is actually very simple, just some state
transitioning), slightly more CPU time spent on failed (initial)
decryption of signalled block, but minimal waste of additional space in
packets.

If we see that first block is already less that 128(KiB), then we can
decrypt it with signalling key immediately: so for very short packets
everything will be even more compact and faster than current
implementation, because in that "new" one there is only single encrypted
block, instead of two (one for SIZE encryption, another for the payload).

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-02 20:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-01 17:47 Avoiding double writes John Goerzen
2021-11-01 19:46 ` Sergey Matveev
2021-11-02  0:11   ` John Goerzen
2021-11-02 10:03     ` Sergey Matveev
2021-11-02 15:26       ` John Goerzen
2021-11-02 17:49         ` Sergey Matveev
2021-11-02 20:48         ` Sergey Matveev