Please login or register.

Using "time-neighbors" in mixin selection in order to solve[...]

Section 3.1 of MRL-0004 explains how an attacker can use temporal associations in order to considerably increase his chance of tracking Monero transactions.

The problem is basically that output usage is not random on their age. There's a non-random distribution. So any random selection of outputs for the mixin would not be ideal, as the attacker can use the age to guess which input is more likely to be the true one.

So, if that's the problem, why not simply always use outputs which are "time-neighbor" to the real one?

Suppose you want to spend your output K with mixin 4. We sort all outputs in the chain with the same denomination as K by their block height and position in block. Let's suppose we end up with the following list segment:

... A, B, C, D, K, W, X, Y, Z ...

These are K's "time-neighbors". Our actual mixing must be composed exclusively of outputs from that list, and it must be sorted. The only random choice would be to select which of the following equally probable list to use:

  • [A, B, C, D, K]
  • [B, C, D, K, W]
  • [C, D, K, W, X]
  • [D, K, W, X, Y]
  • [K, W, X, Y, Z]

This reveals a very good estimate of which is the age of the true input. This problem can be mitigated by implementing the recommendation of sending one's money to self periodically. Plus, I believe this is a more than worthy trade-off when we consider that by implementing this, we'd be fixing the temporal association problem, making the probability of any input in a mixin being the true one equal to each other. By not taking input ages into account, we reveal with a much higher probability which is the true input, which is much worse than just revealing a good age estimate.

What do you guys think?

Replies: 8
Gingeropolous posted 8 years ago Replies: 1 | Weight: -220 | Link [ - ]

I'm confused. I thought that the combination of ring signatures AND stealthing made this an impossible attack.

As in, if the set A, B, C, X, Y, Z were on the blockchain and are used in your transaction. X, Y, Z are your outputs and can be determined due to their temporal association...

But in reality, in the transaction the whole set shows up as inputs as L, M, N, O, P, Q in a transaction. So a blockchain observer really can't even see that inputs L, M, N, O, P, Q came from A, B, C, X, Y, Z.

Or I might have the architecture wrong, and stealthing may only occur when creating outputs.... but inputs are outputs.... argh.

Reply to: Gingeropolous
EhVedadoOAnonimato posted 8 years ago Replies: 1 | Weight: -220 | Link [ - ]

Stealth addresses are for the generation of outputs. They make it so that there's no visible link between the outputs and the addresses. The Monero devs call that "unlinkability", and let's say that works pretty fine.

The problem is traceability, i.e., linking the output with an input in another transaction spending it. Ring signatures do make that harder, but there are many potential problems. Temporal association is one of them. It's well explained in the PDF. But basically, the problem is that actual spending does not follow a random distribution concerning output age. Younger outputs have a higher chance of being the ones truly spent in a transaction. So if you select your mixin in a random way, the probabilities will not be equal for each input.

Reply to: EhVedadoOAnonimato Gingeropolous
smooth posted 8 years ago Replies: 1 | Weight: -217 | Link [ - ]

I think intuitively it should be possible to remove, or at least greatly reduce, the probability bias. My argument is that if the attacker can infer a bias (by a distribution for real outputs), then the defender ought to be able to infer the same bias and eliminate it by choosing fake outputs in that manner. Thus (with one exception stated below) I see no advantage to the attacker, and people behaving rationally in choosing fake outputs using the available information should effectively negate the attack.

The one exception is that the attacker gets to see activity that occurs after the transaction. I don't know how significant that is to the distribution of outputs, but it might be.

Reply to: smooth EhVedadoOAnonimato Gingeropolous
EhVedadoOAnonimato posted 8 years ago Weight: -212 | Link [ - ]

It's not only a matter of how significant this after-fact knowledge is to the distribution. It's the problem of the history being there to be read at any time in the future that's more worrying.

Okay, if an attacker figures out a distribution that gives him good estimates about the probability bias, and if his discovery "leaks", future wallets could counter it. But all transactions done up until that moment could potentially be traced and linked...

smooth posted 8 years ago Replies: 1 | Weight: -221 | Link [ - ]

> This reveals a very good estimate of which is the age of the true input

I'm not sure this is a good tradeoff from a privacy perspective. The temporal association doesn't really give a "very good estimate" of anything, just a skewing of the probabilities that can be used to make inferences. But by revealing the age quite accurately you may make it much easier to correlate transactions with external events. So in a sense the cure may be worse than the disease.

Reply to: smooth
EhVedadoOAnonimato posted 8 years ago Replies: 1 | Weight: -221 | Link [ - ]

The skew of probabilities make transactions more easily traceable, this approach doesn't.

Assuming the recommendation of periodically sending one's money to self is implemented, and these transactions are random enough in what concern their timings, how could the age of the true input be correlated with external events?

Additionally, why wouldn't these correlations with external events be possible the same way with randomly selected outputs? And even worse, wouldn't these correlation have a higher chance of being correct if the probabilities are skewed? Basically, what I understand by "correlation with external events" would be some attacker trying to monitor outputs of a certain age range. In my example, any transaction on that range would give the attacker a 1/(mixin+1) chance of finding the correct path of the money. In a random selection, he would have a better chance of following the right path the moment he sees any transaction spending an output in that age range he's monitoring.

Reply to: EhVedadoOAnonimato smooth
smooth edited 8 years ago Replies: 1 | Weight: -219 | Link [ - ]

The correlations with external events (or simply with a timestamp) certainly wouldn't have a higher chance of being correct with random selection, because they would always be correct under this proposed method. At best with random selection one can identify a larger set of candidate transactions that spent outputs from that time window with some probability (even if not specifically 1/N). With the time-peer method you can identify 100% of the transactions that spent outputs from that time window, and also of course identify 100% of the transactions that did not.

Another problem. This makes targeting specific activity for unmixing via Sybil attacks (by controlling or gaining information about the non-spent mixins) much easier, because the eventual choice of mixins is known in advance. So if you want to Sybil attack an output that is created right now, you need only create many of your own sybil outputs right now. This could even happen by accident. Example: a company processes its payroll and pays 100 000 employees (or it could be any other bulk payment) at one time, making those payments time-peers. Employees will be able to Sybil attack each other, since their eventual spends will always use other employees' outputs.

Reply to: smooth EhVedadoOAnonimato smooth
EhVedadoOAnonimato posted 8 years ago Weight: -220 | Link [ - ]

> The correlations with external events (or simply with a timestamp) certainly wouldn't have a higher chance of being correct with random selection, because they would always be correct under this proposed method.

No, what I meant is that the correlation with the right input would be more likely to be done. You are watching a particular input, you see a transaction spending it. In a random distribution, you're more likely to guess correctly whether or not that input was truly spent on that transaction.

You're right in your second paragraph though. That's an important attack vector. Damn...