Cassandra cluster management with docker-compose

Erol Kaya
4 min readJul 31, 2019

--

I would like to describe how we can create a Cassandra cluster locally.

We will refer to this document;

First, make sure your Docker advanced settings look like this;

CPUs: 4 | Memory: 4GB (maybe more)

Otherwise, you may encounter this problem:

If you encounter the following issue

failed to create network cassandra-cluster_dc1ring: Error response from daemon: Pool overlaps with other one on this address space 

That means that you use this subnet already for something else. Even if the containers are down the networks are still persist. So you should run the following comment to clear out the networks.

docker network prune

Now, we are ready to create our docker-compose.yml file.

Once we save the docker-compose.yml file shown above into our local directory, let’s say in ~/cassandra-cluster, we will go to that directory on the terminal and then run this command;

$ cd ~/cassandra-cluster
$ docker compose -f docker-compose.yml up

This will get the Datastax Cassandra image from Docker hub and then will create containers respectively (3 nodes with specified properties) and run them.

We can check the status of containers that were created by the docker ps commands;

$ docker ps 
This will return all running containers. There should be three.
$ docker ps -a
This will return all the containers. If a container is not running for some reasons, then you can get the container id of the "Exited" one and manually start it by docker start <container-id>

We can access our Cassandra nodes via running these commands for each node;

$ docker exec -it DSE-6_node1 bash
$ docker exec -it DSE-6_node2 bash
$ docker exec -it DSE-6_node3 bash

Once we run the command above for one node (i.e. node1), we will be in the Cassandra node. Now, we can start using nodetool utility. The nodetool utility is a command-line interface for managing a cluster. It has a lot of useful commands and you can see the details of them by help command of it.

$ nodetool help 
Please have a quick view of the all commands available within nodetool. We will ephasize on some of them.

If everything goes fine, then we should see our Cluster with 3 nodes when we run the nodetool status command.

$ nodetool statusDatacenter: DC1===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address  Load Tokens    Owns (effective) Host ID            RackUN  172.30.0.4  100.74 KiB  3            100.0%      some uuid  RAC1
UN 172.30.0.2 118.98 KiB 3 100.0% some uuid RAC1
Datacenter: DC2===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns (effective) Host ID RackUN 172.30.0.3 162.29 KiB 3 77.5% some uuid RAC1

Now it is time to create a keyspace and some tables in it. In order to do so we should simply start with executing the cqlsh command. Cqlsh is the query language shell that Cassandra allows users to communicate with it.

$ cqlsh

You can get the keyspace details in the cluster by DESC command;

$ DESC keyspaces;

If you want to create a new keyspace for your project (let’s say a Music store project), then here is how it can be done;

$ CREATE KEYSPACE musicDb WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'};

Notice that creating a keyspace requires the Replication details. You need to specify the replication strategy and the replication factor. We chose SimpleStrategy as the strategy and 3 as the replication factor. I will explain the details about the replication strategies in Cassandra at another post.

Let’s select the keyspace we just created.

$ USE musicDb;

And let’s create a table in this keyspace;

$ CREATE TABLE musics_by_genre (
genre VARCHAR,
performer VARCHAR,
year INT,
title VARCHAR,
PRIMARY KEY ((genre), performer, year, title)
) WITH CLUSTERING ORDER BY (performer ASC, year DESC, title ASC);

We can check the details of this table by the DESC command;

$ DESC TABLE musics_by_genre;

Finally, let’s insert some data into this table;

$ INSERT INTO musics_by_genre (genre, performer, year, title) VALUES ('Rock', 'Nirvana', 1991, 'Smells Like Teen Spirit');

Whenever you execute a CRUD operation, you can set the Consistency level for that operation. Imagine one of your nodes is down and your Consistency Level is ALL, then Cassandra coordinator won’t return the result to the client. However, if you set the consistency level to 0, then even if the 2 of 3 are down, you will still get the result. We can prove that in our local cluster with this command.

First, open another tab in your terminal, then;

$ docker ps -aq
<container-id1>
<container-id2>
<container-id3>
$ docker stop <container-id1>
$ docker stop <container-id2>
$ docker ps
there will be only container (Cassandra node) running now.

Go to the node which is still running

$ docker exec -it <contanier-name3> bash
I assume the running container's name is DSE-6_node3

Then run cqlsh and go into the keyspace we created earlier.

$ cqlsh 
$ USE musicDb;

Now we are in the musicDb keyspace. Let’s first set consistency level to ALL.

$ CONSISTENCY ALL;

And run the select statement on the table we created before

$ SELECT * FROM musics_by_genre WHERE genre='Rock';
NoHostAvailable:

Nothing returns. Now set the consistency level to zero

$ CONSISTENCY ONE;
$ SELECT * FROM musics_by_genre WHERE genre='Rock';

Now you should be able to see this result;

genre | performer | year | title-------+-----------+------+-------------------------Rock |   Nirvana | 1991 | Smells Like Teen Spirit

--

--

Erol Kaya
Erol Kaya

Responses (2)