Are you a blockchain developer trying to come up with solutions to blockchain scaling and performance issues? You may be searching for information on blockchain sharding. I explain what is blockchain sharding, in this article.

The decentralized blockchain and its’ value

The promising blockchain technology has taken the world by storm due to its’ two central promises:

  1. Decentralization;
  2. Immutable record.

Decentralization enables the creation of entirely new business models, for e.g.:

  1. Bitcoin decentralized payment network is completely outside the control of governments and central banks, and people can send Bitcoin payments over a ‘Peer-to-Peer’ (P2P) network.
  2. Many blockchain and crypto projects have built their crypto tokens over the Ethreum blockchain platform, and they intend to disrupt the centralized economy. For e.g. Storj is a decentralized cloud storage network that could one day disrupt cloud computing giants like Amazon, Google, Microsoft, IBM.

Immutable record in blockchain gives people the assurance that their transaction records are tamper-proof, and this generates trust in the system. However, for the purpose of this article, I will focus on the decentralization aspect of this technology.

How is a decentralized blockchain implemented?

The main concept behind a decentralized blockchain is a P2P network. ‘Nodes’, i.e. computers on this network have the entire information present in the blockchain, hence, each node is a ledger of all transactions. Hence, we also call blockchain the ‘Distributed Ledger Technology’.

As you can see, there is no central administrator in this P2P network, hence no one can censor or intermediate. Hence the blockchain technology eliminates middlemen. This allows peer-to-peer transactions, which enabled many new business models after the Ethereum project introduced the ‘Smart Contract‘ concept.

Further, consider the advantage the network has against hackers. Hackers enjoy an advantage when they can exploit a ‘Single Point of Failure’. A centralized server is a favorite target of hackers. However, in the blockchain, there are many nodes, and all have the entire ledger of transactions!

Even if hackers take over one node, there are always other nodes, and hackers can’t manage to hijack all of them! Also, in this distributed network, hackers can’t stage a ‘51% attack’. These attacks involve capturing a majority of the computing power in the network. How many computers will hackers overpower?

Decentralization makes blockchain very secure. Cryptographic hash functions, private key public key data encryption, and consensus algorithm add to the security.

No cyber attacker has ever hacked a public permissionless decentralized blockchain. The incidents of cryptocurrency hacking you hear about are all instances of hackers attacking centralized servers of crypto exchanges.

Even the Ethereum DAO hack couldn’t target the blockchain network. It only made use of a loophole in the Ethereum DAO smart contract code running on top of the Ethereum blockchain. Read more about it in “Beginner’s Guide: What is Ethereum Classic?“.

The costs of the blockchain decentralized network

Before I can explain what is blockchain sharding, I need to explain the context in which the idea even appeared in the minds of blockchain developers. You have seen the advantages of the blockchain decentralized network, however, it also has a cost.

The most famous blockchain networks, for e.g. Bitcoin and Ethereum, use a consensus algorithm called ‘Proof of Work’ (POW). It requires that all nodes participate in the transaction validation process. Read more about it in “PoW Vs. PoS: A Comparison Between Two Blockchain Consensus Algorithms“.

This requires every node to process all transaction validation requests, hence every node must store all transactions. Bitcoin, Ethereum, and similarly popular blockchain networks are growing every day, with more users and transactions. This means that nodes will have to store a continuously growing number of transactions.

When a new user runs a full Bitcoin node, the ‘Initial Block Download’ (IBD) can take several days! Read this Bitcoin StackExchange discussion thread to see how time-consuming this operation is.

Also, all transaction validation operations in these blockchain networks are sequential, i.e. transaction validation for multiple blocks can’t go on simultaneously. Since every node must participate in the validation, the blockchain network will only be as fast as the slowest node!

While this requirement of every node storing all transactions secures the public blockchain networks, it also made these networks less scalable. Blockchain developers started to think of alternatives due to this issue.

Database sharding gave rise to the concept of the blockchain sharding

The concept of sharding originated in the database management technology, and the word ‘Shard’ means ‘a small part of a whole’. It’s the partitioning of a large database into smaller parts, which can be stored in different server instances.

There are indexing mechanisms for shards, and depending on the database query, the system fetches the data from the appropriate ‘shard’. It makes databases more performant and scalable. Read more about database sharding in this TechTarget definition of sharding.

So, what is blockchain sharding? Closely following the database sharding concept, the blockchain database is divided into horizontal partitions. A group of nodes maintains one such partition, while another group of nodes maintains another shard.

This eliminates the need for all nodes to store the entire blockchain database. With this arrangement, even slower nodes can now operate faster, since they need not load the entire ledger. This will improve the networks’ scalability.

Sharding requires a different blockchain consensus mechanism

By now, you can see that if you implement blockchain sharding, nodes can no longer see the entire blockchain database. How will the POW consensus algorithm work then? It requires all nodes to participate in transaction validation, and now nodes can’t even see the entire blockchain ledger!

Blockchain sharding requires a different blockchain consensus algorithm, called ‘Proof of Stake’ (PoS). In this algorithm, some nodes stake their own crypto tokens and take transaction validation responsibility.

The more tokens a node stakes, and the longer is the duration of the stake, the higher is the likelihood of that node to get transaction validation responsibility. We call them ‘Stakers’.

Since the implementation of sharding rules out POW algorithm for transaction validation, the network must identify ‘Stakers’ for each shard who will validate transactions. Hence, for implementing sharding, a blockchain network must use the PoS algorithm.

Disadvantages of blockchain sharding

A discussion on what is blockchain sharding will be incomplete without discussing its’ disadvantages. Keep in mind that the database sharding concept isn’t exactly easy!

You need to have very good database experts in your project team, who can plan a very good indexing strategy for your database shards. Although in a different manner, you need to plan the sharding of your blockchain ledger very well.

You may also occasionally hear that sharding may improve the scalability of the blockchain network but at the cost of security. However, you need to keep in mind what is blockchain sharding – it’s just a partitioning technique. By itself, the partitioning of a database can’t reduce the security of a database.

It’s actually the PoS algorithm that provides less decentralized security, not sharding. If a hacker buys a lot of crypto tokens and stakes them, then it’s likely that he will be a highly preferred staker. He can then manipulate transactions.

However, the natural economic dynamics provide an insurance against it. Any one person buying too many crypto tokens will attract a lot of attention, and it will drive up the price. In addition to the spotlight, the hacker will increasingly have to spend more money to eventually manipulate transactions.

Besides, the proposed ‘Casper’ protocol for Ethereums’ planned transition to the PoS algorithm plans to assign stakers in a randomized manner. This will reduce the probability of a malicious staker manipulating transactions. The Casper protocol also proposes locking down the staked amount and confiscating it for malicious stakers, who will never get a chance to stake in future.

The blockchain sharding is a relatively new concept. The SHARD Coin project uses it. We need to see how the technology evolves and whether it adds sustainable value to the scalability and performance of blockchain.