Apache Cassandra
Apache Cassandra
Apache Cassandra
Contents
1. Introduction to NoSQL systems, Extensible Record Stores and Amazon’s Dynamo
+ Google Bigtable
5. How is it used and installed, requirements and in what platforms does it run on
6. Demo
7. References
1.
Background
NoSQL, Extensible Record Stores, Cassandra’s Parents
NoSQL Systems
NoSQL or Not-Only-SQL systems: Next Generation Databases. The initial movement started in 2009
with the goal of creating modern, web-scale DBs. Currently, they exist more than 225 NoSQL
systems.
• Distributed
• Open Source
Extensible Record Stores (or Wide Column Stores)
• Motivated by Google’s Big Table.
• Basic Scalability Model: Rows and Columns are splitted into nodes.
How it works?
Allows read and write operations to continue even during network partitions and resolves update
conflicts using different conflict resolution mechanisms.
Consistent Hashing, Vector Clocks (not in Cassandra), Gossip Protocol, Hinted Handoff, Read Repair
Cassandra’s Parents - Google Bigtable
What is it?
A high performance data storage system built on Google File System and other Google
technologies.
How it works?
Provides both structure and data distribution but relies on a distributed file system for
durability.
Richer data model from Dynamo. One key, many values. Fast sequential access.
• Cloud availability
Installations in cloud environments
• Peer to peer architecture
Cassandra follows a peer-to-peer architecture, instead of master-slave architecture
• Flexible data model
Supports modern data types with fast writes and reads
• Fault tolerance
Nodes that fail can easily be restored or replaced
• High Performance
Cassandra has demonstrated brilliant performance under large sets of data
Strengths (2)
• ColumnFamily Store
Cassandra stores columns based on the column names, leading to very quick slicing
• Tunable consistency
Support for strong or eventual data consistency across a widely distributed cluster
• Schema-free/Schema-less
In Cassandra, columns can be created at your will within the rows. Cassandra data
model is also famously known as a schema-optional data model
• AP-CAP
Cassandra is typically classified as an AP system, meaning that availability and
partition tolerance are generally considered to be more important than consistency in
Cassandra
CAP and Cassandra
Variable number of columns per row
Weaknesses
Use Cases where is better to avoid using Cassandra
• If there are too many joins required to retrieve the data
• To store configuration data
• During compaction, things slow down and throughput degrades
• Basic things like aggregation operators are not supported
• Range queries on partition key are not supported
• If there are transactional data which require 100% consistency
• Cassandra can update and delete data but it is not designed to do so
Business Insider
“The basic problem Cassandra solved is that when you have a lot of
data sitting on a lot of servers, as Facebook does, you end up with a
house of cards. A single server going down can collapse the whole
stack.”
Cassandra compared to other
NoSQL Systems
Read & Write latency for workload Read/Write
Throughput for workload Read/Write &
Read/Scan/Write
Insert-mostly Workload
Mixed Operational & Analytical Workload
Read-Modify-Write Workload
Balanced Read/Write Mix
Read-mostly Workload
Load Process
VLDB Benchmark (RWS)
Differences between Cassandra and RDBMS
RDBMS Cassandra
relational database keyspace
rows which do not include a particular for each row, only the columns with a value
column value → NULL (in that position) are stored
I. Product Catalog/Playlist
V. Fraud Detection
In other words, applications that need to...
• store and handle time-series data (most common use case)
• scale predictably
• be continuously available
• Most famous:
Cassandra Summit
• Organized by DataStax for 7 consecutive years (in both US and Europe).
• Node
CASE STUDIES
Category: Messaging
Facebook Inbox Search - Requirements
“The system was required to handle a very high write throughput,
billions of writes per day, and also scale with the number of users”
“Since users are served from data centres that are geographically
distributed, being able to replicate data across data centres was key
to keep search latencies down”
• Lakshman, Malik
Facebook Inbox Search
The reason why Cassandra was initially built.
Performance:
Facebook abandoned Cassandra for the Inbox at late 2010
But…
• data size was growing too quickly
So… Cassandra
Instagram Fraud Detection (1)
• Started with 3 nodes and very soon they had grown to a 12 node
cluster.
“Implementing Cassandra cut our costs to the point where we were paying
around a quarter of what we were paying before. Not only that, but it also
freed us to just throw data at the cluster because it was much more scalable
and we could add nodes whenever needed.”
Previously in Redis, with the same (memory) limitations as in the Fraud Detection
case.
• 20.000 writes/sec.
• 15.000 reads/sec.
Category: Sensors and IoT
i2O Water
Description: i2O Water helps utility companies operate more
efficiently through the use of IoT aiming at solving the water crisis.
Challenges:
• Massive volumes of time-series data (>1.5 TB and growing)
Why?
I. performance (50-60.000 writes and 20-40.000 reads/sec instead of 0.5 writes/sec and 5
reads/sec with SQL Server)
Challenges:
• postgreSQL (previously used) and generally RDBMSs cannot
deliver 100% availability
Why?
I. high availability (due to masterclass architecture)
II. stores data for the entire product catalog and key customer
experience capabilities
IV. integration with Apache Spark for real time processing and
analytics
Spotify (2)
Results:
I. 40.000 requests/sec. handled successfully and on-time
III. >1.5 bn playlists created from 40m active users and managed in
real time
Spotify - Data Centres (2 in the US - 2 in Europe)
Data Data
Centre Centre
Data
Centre Data
Centre
Category: Recommendation/Personalization Engine
Netflix
Description: Netflix is the world’s leading internet television network with
more than 48 million users in 40 countries.
Challenges:
• Oracle database (was used until 2010) was approaching its limits
on traffic and capacity
Why?
I. persistent datastore, 100% uptime and cost-effective scalability
Challenges:
• MySQL (previously used for class interaction) was insufficient:
• unstable performance,
• unexpected downtime,
• limitation in introducing new features
Coursera (1)
Solution: After evaluating emerging database technologies, it chose
Cassandra (DataStax).
Why?
• 100% application uptime needed (customers from all over the
world)
• Scalability (enabling storage of growing user data)
Coursera (2)
Results:
I. 3 nodes on AWS in the US East region and plans to expand to multiple data
centers across different regions
“High availability with reliable performance is a big win for us. With Datastax Enterprise,
our customers around the world are able to take any course, anytime through our on-
demand model.”
• Daniel Chia, Software Engineer at Coursera
Coursera (3)
Coursera (4)
Coursera (5)
Coursera (6)
Coursera (7)
Category: Messaging
The Weather Channel
Description: The Weather Channel delivers breaking news to
countless viewers and users from web, desktop and mobile
applications.
Challenges:
• Customer experience in the center of attention (continuous
availability, global and diverse users)
Why?
I. linear scalability
• Memtable
• Bloom Filter
• Index File
Key Terms
• Gossip protocol: helps each node learn about the topology of the cluster
(communication and detection of faulty nodes).
Used in:
• BigTable, HBase, MongoDB, SQLite, RocksDB, InfluxDB
Data Model
• Each Row → Identified by a Unique Key (Primary Key)
• Keyspace → Outermost container for data (one or more column families)
• Column Family → Contains Supercolumns or Columns (but not both)
• Column → Basic data structures with: key, value, timestamp
• Supercolumn → Special column, stores a map of sub-columns. Columns that
you are likely to query together should be placed in the same column family.
• Columns could be of variable number per key. For instance, key K1 could
have 1024 columns/supercolumns while K2 could have 64
columns/supercolumns
Data Model (1)
• Partition key: The first column declared in the primary key. Determines which node stores the
data.
• Clustering Columns: The remaining fields of the primary key, which determine the ordering of the
data in the disk.
• Any column within a column family is accessed using the convention: column_family: column
• Time sorting: exploited by applications such as FB Inbox Search where the results are always
displayed in time sorted order.
Data Model (2)
Data Model (3)
Relational Schema vs Cassandra
SYSTEM ARCHITECTURE
Introduction
The architecture of a storage system that needs to operate in a production setting is complex.
I. Partitioning
II. Replication
III. Membership
V. Scaling
How?
• Dynamically partition the data over the set of nodes in the cluster.
Addressed by:
Analysing load information on the ring and having lightly loaded nodes move
on the ring to alleviate heavily loaded ones.
Partitioning (3)
Node: Storage layer within a server
Before:
● 1 server/machine (machine: physical server or EC2 instance-AWS)
Now:
● 256 vnodes/server (virtual nodes)
Why?
Much easier and faster in case of a node failure
Virtual Nodes (version >=1.2)
Replication
Used to achieve high availability and durability.
How?
• Replication factor: determines how many copies of your data exist.
• Coordinator node: in charge of the replication of the data items that fall within its range.
• Consistency level: refers to how much up-to-date and synchronized a row of Cassandra
is in all of its replicas e.g. quorum → replication_factor/2 + 1.
• Various replication policies: Rack Unaware, Rack Aware and Datacentre Aware.
• Each row is replicated across multiple datacentres which are connected through high
speed network links.
Replication - Rack Unaware
Replication - Zookeeper
• Cassandra elects a leader amongst its nodes using Zookeeper.
• All nodes on joining the cluster contact the leader who tells them for
what ranges they are replicas for.
• All nodes on joining the cluster contact the leader who tells them for what ranges
they are replicas for.
• Leader tries to maintain the invariant that no node is responsible for more than N-1
ranges in the ring.
• Metadata about the ranges a node is responsible is 1) cached locally at each node
and 2) in a fault-tolerant manner inside Zookeeper.
• This way, a node that crashes and comes back knows what ranges it was
responsible for.
Replication - Zookeeper (1)
Membership
Based on Scuttle-butt, a very efficient anti-entropy Gossip based
mechanism.
Benefits:
I. Efficient CPU utilization.
How?
• Make use of Φ Accrual Failure Detector (emits a value which represent a suspicion level
for each of monitored nodes)
• and so on…
Bootstrapping (adding a new node in the cluster)
Process of getting data from other nodes in the ring for a new node that starts
for the first time.
How?
• When the new node enters the cluster, it chooses a random token for its
position in the ring.
• It also reads its configuration file which contains the seeds (initial contact
points) of the cluster.
• Token information is then gossiped around the cluster enabling any node
to route a request for a key to the correct node.
Bootstrapping (adding a new node in the cluster) (1)
In Facebook’s environment…
• Node outages are often transient but may last for extended intervals.
• Failures can be of various forms such as disk failures, bad CPU, etc.
• A node failure rarely signifies a permanent departure and therefore should not result in re-
balancing of the partition assignment.
• To that effect, every message contains the cluster name of each Cassandra instance.
• An admin uses a cmd tool or a browser to connect to a Cassandra node and issue a
membership change to join or leave a cluster.
Scaling the Cluster
Adding a new node on the system in order to alleviate another heavily
loaded node.
How?
Each of these modules has been implemented from the ground up using Java.
The II) is built on top of a network layer which uses non-blocking I/O.
Application relate messages for replication and request routing relies on TCP.
Implementation Details (1)
The request routing modules are implemented using a certain state machine.
When a read/write request arrives at any node in the cluster the state
machine…
I. Identifies the node(s) that own the data for the key
II. Routes the requests to the nodes and wait for the responses to arrive
III. If the replies do not arrive within a configured timeout value fail the
request
IV. Figures out the latest response based on a timestamp
V. Schedules a repair of the data at any replica if they do not have the latest
piece of data.
No coordination at all?
“We have learnt that having some amount of coordination is essential
to making the implementation of some distributed features tractable”
II. When the nodes are available again, the write operation is sent
How is a Memtable flushed on the disk?
• A background thread keeps checking the size of all the
Memtables while the clients keep writing on the cluster
• Spark
• Client Libraries for: Python, Java, .Net, Ruby, PHP, Perl, C++ etc.
APIs
The Cassandra API consists of the following three simple methods:
• Most basic way to interact with Cassandra is using the CQL shell,
cqlsh.
• MS Excel
• Pentaho
• Tableau
• Jaspersoft
• Talend
Monitoring Cassandra
• Integration with Ganglia (distributed performance tool).
Terminal commands
6.
Demo
OpsCenter
OpsCenter (1)
OpsCenter (2)
7.
References
Main Reference
References
1. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):
35-40, 2010
2. Cassandra.apache.org. (2016). Apache Cassandra. [online] Available at: http://cassandra.apache.org/
3. Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), p.12.
4. Cockcroft, A. (2011). Benchmarking Cassandra Scalability on AWS - Over a million writes per second. [online]
Techblog.netflix.com. Available at: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-
on.html
5. Cs.uwaterloo.ca. (2016). [online] Available at:
https://cs.uwaterloo.ca/~tozsu/courses/CS848/W15/presentations/Cassandra.pdf
6. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A. and Gruber,
R. (2008). Bigtable. ACM Transactions on Computer Systems, 26(2), pp.1-26.
7. DataStax. (2016). Case Studies. [online] Available at: http://www.datastax.com/resources/casestudies
References (1)
8. Docs.datastax.com. (2016). About hinted handoff writes. [online] Available at:
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html
9. DataStax. (2016). Customers. [online] Available at: http://www.datastax.com/customers
10. Docs.datastax.com. (2016). Introduction to Cassandra Query Language. [online] Available at:
https://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html
11. DataStax. (2014). What on earth are people using Cassandra for anyway?. [online] Available at:
http://www.datastax.com/2014/06/what-are-people-using-cassandra-for
12. DataStax. (2012). A thrift to CQL3 upgrade guide. [online] Available at:
http://www.datastax.com/dev/blog/thrift-to-cql3
13. DataStax. (2012). Virtual nodes in Cassandra 1.2. [online] Available at:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
14. DataStax. (2012). Schema in Cassandra 1.1. [online] Available at: http://www.datastax.com/dev/blog/schema-
in-cassandra-1-1
References (2)
15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,
Vosshall, P. and Vogels, W. (2007). Dynamo. ACM SIGOPS Operating Systems Review, 41(6), p.205.
16. Docs.datastax.com. (2016). Architecture in brief. [online] Available at:
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureIntro_c.html
17. Docs.datastax.com. (2016). How data is distributed across a cluster (using virtual nodes). [online] Available
at:
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html
18. Docs.datastax.com. (2016). Internode communications (gossip). [online] Available at:
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureGossipAbout_c.html
19. D0.awsstatic.com. (2016). [online] Available at:
https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf
20. Edlich, P. (2016). NOSQL Databases. [online] Nosql-database.org. Available at: http://nosql-database.org/
References (3)
21. Edu.dmst.aueb.gr. (2016). Πύλη Τηλεκπαίδευσης Τμήματος Διοικητικής Επιστήμης & Τεχνολογίας: Είσοδος
στο δικτυακό τόπο. [online] Available at:
https://edu.dmst.aueb.gr/pluginfile.php/3614/mod_resource/content/0/BigDataSystems.pdf
22. En.wikipedia.org. (2016). Apache Cassandra. [online] Available at:
https://en.wikipedia.org/wiki/Apache_Cassandra
23. En.wikipedia.org. (2016). DataStax. [online] Available at: https://en.wikipedia.org/wiki/DataStax
24. En.wikipedia.org. (2016). Log-structured merge-tree. [online] Available at: https://en.wikipedia.org/wiki/Log-
structured_merge-tree
25. Exponential.io. (2016). Cassandra terminology - Exponential.io . [online] Available at:
http://exponential.io/blog/2015/01/08/cassandra-terminology/
References (4)
26. Facebook.com. (2016). Cassandra – A structured storage system on a P2P Network. [online] Available at:
https://www.facebook.com/notes/facebook-engineering/cassandra-a-structured-storage-system-on-a-p2p-
network/24413138919/
27. O', P. and Neil, E. (2016). The Log-Structured Merge-Tree (LSM-Tree). [online] Citeseerx.ist.psu.edu.
Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2782
28. YouTube. (2016). Getting Started with Cassandra CQL on a Mac. [online] Available at:
https://www.youtube.com/watch?v=9zQc959w6Ho
29. YouTube. (2016). Installing Apache Cassandra In Windows. [online] Available at:
https://www.youtube.com/watch?v=fspXzjwfii0
30. YouTube. (2016). Part 1 - Apache Cassandra Installation From Scratch - Ubuntu. [online] Available at:
https://www.youtube.com/watch?v=ToztU48UxYE
References (5)
31. Weinberger, M. (2016). The Facebook engineer who taught its data how to dance is solving a new
complicated problem. [online] Business Insider. Available at: http://www.businessinsider.com/hedvig-avinash-
lakshman-facebook-cassandra-data-storage-2015-3
32. Wiki.apache.org. (2016). FrontPage - Cassandra Wiki. [online] Available at:
https://wiki.apache.org/cassandra/
33. www.tutorialspoint.com. (2016). Cassandra Introduction. [online] Available at:
https://www.tutorialspoint.com/cassandra/cassandra_introduction.htm