[katzenpost] mixnet notes from visit to KU Leuven

Thu Feb 8 15:38:01 UTC 2018

Hello,

On Tue, 6 Feb 2018 12:39:56 +0000
dawuud <dawuud at riseup.net> wrote:
>   3. What other features should such an emulator have?
>      Which Loopix parameters will you want to specify?
>      Currently we have not implemented any decoy traffic (dummy
>      messages). Do you require decoy traffic (Loopix design) in order
>      to perform meaningful tests with our emulator?

I have client->provider->client loops working in a branch
(https://github.com/katzenpost/mailproxy/tree/bug17), though I am
totally uncertain of what the tuning should be and may change the
design based on feedback.

A rough sketch of the design I implemented (which is different from the
Loopix paper is thus):

 * For various reasons, it is useful to have Providers be able to run
   services that run as part of the provider process, with standardized
   service IDs that are broadcasted to the entire network.

   This is specified in draft form, and is already implemented and
   merged (because people wanted a key server).

 * Decoy traffic is implemented via this mechanism as opposed to end to
   end.

   For example, `alice at provider.invalid` will send decoy messages to
   `+loop at provider.invalid`, where a response will be generated if the
   decoy message has a SURB, and silently discarded otherwise.

   This mechanism will also work for mixes looping traffic back to
   themselves.

This is likely different from how people expected this to work, but the
rationale is as follows:

 * If client decoy traffic runs end to end, it is trivial for providers
   to be able to distinguish mix decoy traffic from client decoy
   traffic.

   With the dedicated decoy recipient scheme, providers know that
   someone sent decoy traffic, but does not know if the originator of
   the decoy traffic is a mix or a client.

   It is possible to have the final layer of the mixes be responsible
   for looping traffic, but that merely shifts the ability to
   distinguish the class of decoy traffic from the providers to a set of
   mixes.

 * This saves bandwidth and provider side spool storage, because
   messages end at the provider and are discarded after processing.
   Client to client decoy traffic requires that the client retreives
   and processes the decoy messages, as opposed to just the SURB
   responses.

Note: If things should be the other way, please let me know and I can
change it.

>      We can implement essentially any features you want so please
> elaborate.
> 
>   We would like to make a useful mixnet emulator that you can use
> based on https://github.com/katzenpost/demotools/tree/master/kimchi
> 
> * When should mixes start counting the poisson mix delay?
> 
>   There's two ways to do this:
> 
>   1. what we do now: start the poisson counter the moment the packet
> is received. 
>   2. an alternate implementation: start counting the poisson mix delay
>      when the packet arrives in the mix queue scheduler. Our
>      retransmission timer is set to:
> 
>     timer_duration = Poisson_RTT + send_slack
> 
>     where send_slack is some constant value to account for additional
>     network and processing delays. We might increase send_slack to
> account for the increase in variance.

Because I am somewhat old fashioned, the server code is currently
written along the lines of SEDA (Staged event-driven architecture).
The max queue dwell time at each stage is runtime configurable and is
primarily how the server attempts to defend itself from CPU overload.

The current default max dwell time values are as follows:

 * Crypto worker (Sphinx) queue: 250 ms (Does not include unwrap/replay
   check processing time)
 * Scheduling queue: 10 ms
 * Transmit queue: 50 ms

There hasn't been much rigor going into the default values, and my
current inclination is to dramatically reduce the Crypto worker queue
to be similar to the scheduling queue, and increase the transmit queue
considerably (because the network should be resilient to changing
network conditions).

It is highly likely that the scheduling queue slack will need to be
increased in the future (the scheduling queue right now is entirely in
memory, but eventually needs to be partially on disk).

As far as the client's view of retransmissions goes, the amount of
allowed slack is currently set to 5 mins.  It is unlikely that needs to
be increased.

Notes:

 * Under normal operation I do not expect any of the queue dwell
   thresholds to be tripped.  The server could very likely avoid
   fudging the delay based on dwell time entirely because any time when
   the amount fudged is significant is when the packet probably should
   have been dropped due to overload.

 * If people are planning on deploying mixes or providers on links
   that exceed 1 Gbit, they should let me know so I can re-tune the
   replay filter.

 * Unwrap performance is quite fast (approx 500 usec for the ~50 KiB
   packet size), but could be improved by (in decreasing order of gain):

    * Altering the Sphinx code to avoid heap allocations.

    * Switching the Sphinx MAC from HMAC-SHA256 to something else
      (Eg: HMAC-SHA512/256).

    * Building with Go 1.10 when it is out (Go's CTR-AES performance is
      garbage prior to the upcoming release where they fixed it because
      I complained about it).

    (The latter two optimizations are likely a wash as they only apply
     to the Sphinx header.  On "suitable for large deployment
     hardware", the other "expensive" operations (X25519, AEZv5) will
     use optimized assembly language implementations, so additional
     performance would be hard to extract from there.)

> Have I missed anything?
> Please do reply if this e-mail beckons you to.

Questions all decoy traffic related:

 * How is the client supposed to schedule sends?

   Right now there is one (network wide) parameter `LambdaP`, that
   specifies the mean of a Poisson distribution (in seconds).  The
   client will attempt to send traffic (be it cover or otherwise) by
   sampling from this distribution.

   Note: This is distinct from the (network wide) `Lambda` which is
   used to derive mixing delay.

   Should there be separate parameter(s) for decoy traffic?  If so how
   should it be parameterized?

 * What fraction of client decoy traffic should be loop vs discard?

 * What actions if any should a client take if loop decoy traffic fails
   to loop?  How is the client supposed to combat false positives?

 * (Most of these questions apply to mix loops as well.)

General questions:

 * How should a mix behave when it encounters a packet with exactly
   `0` delay?

   (This is very likely the only non-overload/network congestion
    condition when adjusting for server processing delay matters.)

Regards,

-- 
Yawning Angel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.mixnetworks.org/pipermail/katzenpost/attachments/20180208/21333a96/attachment.sig>