Hey, everyone, I wanted to write up a 2-month update. As usual, I am seeking feedback and commentary from the community. I spent about half of my time this month reading papers and textbooks. Similarly to last month, I spend time in #monero-dev on freenode, leading to many productive conversations. We started swamping that room with discussion, though, so we started the newer #monero-research-lab chat to discuss stuff. Knaccc, luigi, moneromooo, and kenshi84 are throwing down lots and lots of crypto ideas in there. It's been highly productive. At the end of August I will release the first Monero newsletter as my “final update” for the 3-month period of June-August. My first order of business in September will be to revise and update the research road map and release MRL-R002.
My motivations in the past month have shifted enormously because of some security concerns with the EABE "attack" described by knaccc, as well as some of the security concerns presented in the paper by Miller et al and the paper by Kumar et al. So I'll discuss those first.
Criticisms by Miller and Kumar: These are serious papers, in the sense that I am taking them seriously. But I don't think there is a lot of cause for serious concern. Both papers confuse the problem of sensitivity vs. specificity of a test, and ignore the problem of the false positive paradox in favor of sensationalist-sounding results.
The Miller paper, from my reading, looks to make a certain assumption about non-deducible transactions based on the deducible set, then develops a decision rule based on that. Disregard for a moment that we have a minimum ring-size now, and thus the deducibility of transactions "damps out" over time as transactions ripple through the network. Note they don't bother testing whether the properties of non-deducible transactions correlate with the properties of the deducible set, because they can't test such hypotheses with non-deducible transactions.... they are non-deducible. So... to be clear... these guys develop a decision rule based on an unfalsifiable hypothesis (e.g. timing distribution between deducible transactions and undeducible transactions are similar, since the most recent signer of a deduced transaction is usually the one that is deduced to be the true signer, the most receng signer of an undeduced transaction is usually the signer, etc), then construct a monte carlo simulation based on that same unfalisfiable hypothesis, and then showed their decision rule did a good job with their MC simulation. Okay, so far I'm not seeing a security concern here.
The so-called distributional problem put forth by Miller is still interesting and concerning to me, but for non-crypto reasons. I will elaborate some time in the coming weeks on this.
The Kumar paper is a little bit better, I actually think somewhat highly of it. They actually bother to define their true positive sets, for example, and they actually try to justify their work, they seem to grasp the idea of sensitivity vs. specificity and that they can't quite nail down both with Monero. Unfortunately, most of the criticisms from the Kumar paper are no longer relevant (although they certainly were at one time) because now RingCT transactions obfuscate amounts. Some of the arguments are still valid in that paper. But those have a few subtle flaws to them... I can see how someone doing something similar to this paper could have a pretty strong impact, but the specific criticisms in that paper are somewhat weak. Having said that, a few ideas from that paper have helped me come up with models for the EABE stuff (see below). If I'm going to respond to either or both, it will probably only be the Kumar paper.
EABE, Churn, Min mix-ins, Blockchain compression: If very few transactions occur between two transaction outputs owned by a KYC exchange Eve, then Eve can leverage her KYC knowledge of the identities of her users as well as the one-time receiving addresses of her users to reduce the variance in her estimates of culpability in that chain of transactions. This is a very serious problem because many cryptocurrency users and merchants have the following habit: Alice sees shiny object on sale with Bob, Alice buys crypto from Eve to send to Bob to buy shiny object, Bob cashes out with Eve with no intervening transactions. In this scenario, when Alice makes regular purchases of shiny objects from Bob, Eve can establish a pattern of Alice's purchasing habits with Bob. The problem with EABE? Alice gave Eve her personal identity and allowed Eve to link it to certain one-time addresses of Alice's. Alice willfully, by using a KYC exchange, sacrificed the anonymity properties granted by stealth addresses. This isn't really a cryptography problem, it's a human behavior problem.
The current solution, churn, is essentially "let's leverage the privacy properties of Monero such that most users are sufficiently protected from exposure." In this case, protection from exposure means ensureing that most users are implicated in most transactions, up to some definition of most.
My first-order estimates for using a churn-based solution for EABE suggested that with a ring size of around 10, users can obfuscate their purchasing habits with at most 7 churns. Bigger ring size, fewer churns, but the effect is rather mild; you still need 6 churns with a ring size of something like 20. This computation is based solely on the pigeon-hole principle.
My second-order estimates were based on the hypergeometric distribution. These results suggest that as minimum ring sizes get bigger than 15-20, "most" old cryptonote transaction outputs are implicated in "most" new randomly fashioned transaction outputs. Formally: for any one known (KYC) user, for any one known (revealed) transaction output owned by that user, and for any new randomly fashioned transaction output, as ring sizes get larger and larger than 15-20, the probability that the revealed output is implicated somewhere in the history of the new randomly fashioned transaction output approaches 1.0 very quickly, up to some definition of quickly. I was startled by how rapidly the probability "snapped" to 1.0 as ring size increased.
Also, think about the expected number of steps backward we need to take, proceeding through the history of the new randomly fashioned transaction output, before we hit the known revealed transaction output. As the ring size gets big, this number of steps gets small (with a ring size equal to "all transaction outputs" this step size is guaranteed to be 1.)
So, this was all good news; it suggests that a larger minimum ring size will allow churning to have a maximal effect. And since the beefy part of our transactions right now are range proofs, not ring signatures, this seems like a not-so-bad solution. I'm working on a 3rd order estimate right now where an observer sees many transaction outputs bundled into transactions, many of which are bundled into blocks, which are observed over time. The goal is to select system parameters to get close to the "all transaction output" scenario without causing a huge blockchain bloat. Yet... yet, rather than putting effort into detailing a computational model to get precise security bounds on this particular mode of attack, though, it would be far better to set ring sizes to something absurdly large, say ring size = 300 or 3000, and just recommend to users they send funds to themselves once before using with a merchant.
Of course, in order to justify ring signatures that large (heck, in order to compute ring sigs that large), we need more efficient ring signatures. I have found at least one set-up for O(sqrt(N)) ring signatures that doesn't require a trusted set-up, but it uses pairings-based cryptography. I'm currently working on converting this to something more suitable for our purposes.
So, after all this effort, it turns out that the best way to protect against EABE is blockchain compression and making signatures and range proofs smaller. Consequently, this idea has taken up the vast majority of my time this past month. I have a few hot leads in that regard.
Recruitment: Sarang posted to the FFS, go take a look. Once coming on board, we will together take another deeper look at the MRL Research Road Map and come up with a more organized action plan. I recommend folks take a look at that funding thread, offer advice/criticism, throw some micronero that way if they are so inclined.
Paper repository: I'm considering the idea of getting a repository going of all the papers I've read (or at least skimmed) recently. I'm not sure on the copyright implications of gathering all those PDFs together on github. If anyone has any ideas or suggestions on that front, please let me know. I've skimmed over 100 papers, but I've read around 40 carefully.
Zero-Knowledge Lit Review: Due to some security issues regarding the EABE attack, churn, etc, I got distracted from working on this. I plan on finishing up my current round of edits for Jeffrey very soon.
RingCT: I have found a few promising papers on making range proofs smaller, and we have had some discussions about embedding more information in range proofs, like some encrypted messages and so on. I have a single paragraph written, somewhere, on the proofs in Shen's paper.
Threshold multisignatures and subaddresses: My contribution to this is temporarily on hold. Knaccc and kenshi are still working on subaddresses regularly. I'm going to be very careful about saying nothing more about this here.