Building on blockchains

Last semester I participated in a research seminar on applications of blockchain technology at Aalto University. The aim of this seminar was to see which fields could benefit from the merits of blockchains, as well as gain some insight in how to design systems based on blockchain technologies. To facilitate this we teamed up in groups of two to work on selected topics, and I worked with Taneli (a colleague student at Aalto) to investigate whether it is feasible to create trackable supply chains using blockchain technologies.

Throughout the project I came to realize that a lot of pillars on which the average blockchain project relies turn out to be quite shaky when you inspect them up close, especially in the scalability department. This has left me wondering how there can be so many blockchain powered projects out there with huge investments backing them, and whether people actually realize where the limitations lie with current-day blockchain tech. Don’t get me wrong here, blockchains definitely hold a lot of potential, but adding some blockchain fairy dust to your project does not automatically dissolve all your problems and elevate you to some sort of Elysium. From what I’ve experienced so far, it is more likely to add as many problems as it solves and some of those are really challenging to overcome.

In this post I will try to shed some light on problems we identified during our research. These problems are not really something that affect every distributed application or every transaction out there but rather something related to blockchain technologies in general. It’s certainly not required to take these things into account for every application, but should you want to create the first blockchain Google then some careful consideration is definitely recommended in these areas.

Consensus

Every blockchain system out there has implemented some distributed consensus protocol, in order for nodes to be able to agree on which block is the ‘true’ successor to the current end of the chain. Bitcoin uses a consensus model which relies on some proof-of-work. While this is currently regarded as the ‘safe pick’ for consensus protocols (it helped make Bitcoin big after all!), it also wastes a lot of resources in achieving consensus. There are some alternatives out there, such as proof-of-stake where nodes prove that they have an interest in creating correct blocks before they start mining them. To my knowledge there is currently no system in use where proof-of-stake is used without at least a second consensus model supporting it, mainly because there are some caveats with the proof-of-stake approach that nobody has solved in an elegant way yet. In my opinion, the most interesting alternative to proof-of-work so far is the Stellar federated byzantine agreement protocol or SCP (Stellar Consensus Protocol) for short. This protocol allows nodes to choose which other nodes they trust, which -along with a healthy dose of black magic- enables federated consensus. It is especially interesting in that it (maybe?) creates the possibility of having different slightly overlapping branches of blockchains that grow in parallel, where nodes decide which branch they’re following based on what other nodes they trust. Note: I’m not 100% sure at this moment whether this is actually possible, as I have not yet reached a full understanding of the inner workings of the SCP.

Scalability

I do not think it is feasible right now to create a scalable (to the size of say, Visa, Netflix, and so forth) system that relies upon blockchains for its core functionality. The first hurdle here is performance: if you want to create a system that can handle a large amount of transactions per second (tps), blockchains are not your best bet. Bitcoin itself is limited to 7 tps. While this is the result of a manually imposed limit on the Bitcoin protocol, it is not clear to anyone how changing this limit would affect Bitcoin as a whole. For example, one consequence is that blocks will take longer to propagate over the network, affecting the speed at which they are verified by nodes. Other blockchain systems such as Ethereum are not (to my understanding) affected by this problem to the same degree, but unfortunately there are other problems that hamper scalability. In Ethereum’s case, I think the biggest limiting factor is the size of the blockchain: Ethereum allows you to store data structures into the blockchain. Now imagine there being two gazillion contracts, each storing some data into the chain with every invocation. Soon enough some nodes will not be able to store full copies of the chain, leading to other interesting problems (See the Ethereum white paper). Increasing the price for storing data into the chain might counter this, but at the same time it increases the cost needed to operate on the Ethereum chain, which is not exactly something that stimulates adoption. In my eyes that is more of a mitigation technique. Every other blockchain based project I looked at has similar scalability issues. One interesting mention is BigChainDB which aims to create a distributed database system where there is no central authority involved. BigchainDB builds upon an existing distributed database and then adds blocks and a consensus mechanism on top. The problem with this project (apart from the fact that all nodes actually participating in a DB need to have quite some resources at their disposal) is that for it to achieve its advertised sub-second performance, it assumes that all nodes are in the same data-center. That’s not really practical for a big open system like Ethereum, and in my eyes seems fit for only a limited number of applications. Nonetheless, it is an interesting project. The problem of scalability is widely acknowledged as an important one, and many extremely smart people are working on solutions, but as of yet I do not think it has been solved to the degree necessary for large scale blockchain applications to be feasible.

Long-term Storage

Long-term storage? Isn’t that related to scalability? Well, yes. But there are some extra things to take into account here. If the underlying blockchain system has some mechanism in place where nodes are not required to store full copies of the blockchain (so called ‘light nodes’), then this might mean that data stored in the blockchain is only served by a limited number of nodes, or in an extreme case not even at all (although that would mean not a single node on earth maintaining a full copy of the chain, quite far-fetched, I know)! When there are only a few nodes serving specific data that is stored in the chain, this opens up new possibilities for attacking the blockchain system by e.g. constantly harassing those nodes with some form of a DOS attack, or by having them hold this unique data hostage in some other way. I think there is an opportunity here to build a blockchain system on top of a content addressed network (e.g. IPFS), which would reduce the likelihood of this situation ever happening by quite a bit, as well as possily increasing the performance since popular data is likely to be stored somewhere geographically close to your location. It could also possibly lead to a cost decrease for storing data in systems like Ethereum.

There are still some topics missing on this list, and I might elaborate on those in a future post, but I think this is enough blockchain skepticism for today. The point I’ve been trying to get across with this post is that I don’t see blockchain-based applications becoming widespread before these issues are solved to a satisfactory degree, and that there is still a good ways to go before this field can be considered mature.