Cosmos DB for MongoDB Developers: Migrating to Azure Cosmos DB and Using the MongoDB API
()
About this ebook
Cosmos DB for MongoDB Developers starts with an overview of NoSQL and Azure Cosmos DB and moves on to demonstrate the difference between geo-replication of Azure Cosmos DB compared to MongoDB. Along the way you’ll cover subjects including indexing, partitioning, consistency, and sizing, all of which will help you understand the concepts of read units and how this calculation is derived from an existing MongoDB’s usage.
The next part of the book shows you the process and strategies for migrating to Azure Cosmos DB. You will learn the day-to-day scenarios of using Azure Cosmos DB, its sizing strategies, and optimizing techniques for the MongoDB API. This information will help you when planning to migrate from MongoDB or if you would like to compare MongoDB to the Azure Cosmos DB MongoDB API before considering the switch.
What You Will Learn
- Migrate to MongoDB and understand its strategies
- Develop a sample application using MongoDB’s client driver
- Make use of sizing best practices and performance optimization scenarios
- Optimize MongoDB’s partition mechanism and indexing
MongoDB developers who wish to learn Azure Cosmos DB. It specifically caters to a technical audience, working on MongoDB.
Related to Cosmos DB for MongoDB Developers
Related ebooks
Microsoft Azure Cosmos DB Revealed: A Multi-Model Database Designed for the Cloud Rating: 0 out of 5 stars0 ratingsAzure Synapse Analytics Cookbook: Implement a limitless analytical platform using effective recipes for Azure Synapse Rating: 0 out of 5 stars0 ratingsAmazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance Rating: 0 out of 5 stars0 ratingsBeginning PostgreSQL on the Cloud: Simplifying Database as a Service on Cloud Platforms Rating: 0 out of 5 stars0 ratingsMongoDB Recipes: With Data Modeling and Query Building Strategies Rating: 0 out of 5 stars0 ratingsThe Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform Rating: 0 out of 5 stars0 ratingsPractical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud Rating: 0 out of 5 stars0 ratingsGetting Started with CockroachDB: A guide to using a modern, cloud-native, and distributed SQL database for your data-intensive apps Rating: 0 out of 5 stars0 ratingsUnderstanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions Rating: 0 out of 5 stars0 ratingsDemystifying Azure AI: Implementing the Right AI Features for Your Business Rating: 0 out of 5 stars0 ratingsThe Definitive Guide to AWS Infrastructure Automation: Craft Infrastructure-as-Code Solutions Rating: 0 out of 5 stars0 ratingsSQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform Rating: 0 out of 5 stars0 ratingsUltimate Azure Data Engineering Rating: 0 out of 5 stars0 ratingsMLOps with Red Hat OpenShift: A cloud-native approach to machine learning operations Rating: 0 out of 5 stars0 ratingsMastering MongoDB 6.x: Expert techniques to run high-volume and fault-tolerant database solutions using MongoDB 6.x Rating: 0 out of 5 stars0 ratingsArchitecting Cloud-Native Serverless Solutions: Design, build, and operate serverless solutions on cloud and open source platforms Rating: 0 out of 5 stars0 ratingsCloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure Rating: 0 out of 5 stars0 ratingsData Modeling for Azure Data Services: Implement professional data design and structures in Azure Rating: 0 out of 5 stars0 ratingsHands-on Cloud Analytics with Microsoft Azure Stack Rating: 0 out of 5 stars0 ratingsLearn SQL with MySQL: Retrieve and Manipulate Data Using SQL Commands with Ease Rating: 0 out of 5 stars0 ratingsThe Azure Cloud Native Architecture Mapbook: Explore Microsoft Cloud's infrastructure, application, data, and security architecture Rating: 0 out of 5 stars0 ratingsPolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond Rating: 0 out of 5 stars0 ratingsData Lakehouse in Action: Architecting a modern and scalable data analytics platform Rating: 0 out of 5 stars0 ratings
Programming For You
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsC Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Tiny Python Projects: Learn coding and testing with puzzles and games Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratingsAssembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Spies, Lies, and Algorithms: The History and Future of American Intelligence Rating: 4 out of 5 stars4/5Lua Game Development Cookbook Rating: 0 out of 5 stars0 ratingsExcel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsA Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick Rating: 0 out of 5 stars0 ratingsThe Ultimate Python Programming Guide For Beginner To Intermediate Rating: 5 out of 5 stars5/5
Reviews for Cosmos DB for MongoDB Developers
0 ratings0 reviews
Book preview
Cosmos DB for MongoDB Developers - Manish Sharma
© Manish Sharma 2018
Manish SharmaCosmos DB for MongoDB Developershttps://doi.org/10.1007/978-1-4842-3682-6_1
1. Why NoSQL?
Manish Sharma¹
(1)
Faridabad, Haryana, India
Since schooling most of us are taught to structure information, such that it can be represented in tabular form. But not all information can follow that structure, hence the existence of NULL values. The NULL value represents cells without information. To avoid NULLs, we must split one table into multiples, thus introducing the concept of normalization. In normalization, we split the tables, based on the level of normalization we select. These levels are 1NF (first normal form), 2NF, 3NF, BCNF (Boyce–Codd normal form, or 3.5NF), 4NF, and 5NF, to name just a few. Every level dictates the split, and, most commonly, people use 3NF, which is largely free of insert, update, and delete anomalies.
To achieve normalization, one must split information into multiple tables and then, while retrieving, join all the tables to make sense of the split information. This concept poses few problems, and it is still perfect for online transaction processing (OLTP).
Working on a system that handles data populated from multiple data streams and adheres to one defined structure is extremely difficult to implement and maintain. The volume of data is often humongous and mostly unpredictable. In such cases, splitting data into multiple pieces while inserting and joining the tables during data retrieval will add excessive latency.
We can solve this problem by inserting the data in its natural form. As there is no or minimal transformation required, the latency during inserting, updating, deleting, and retrieving will be drastically reduced. With this, scaling up and scaling out will be quick and manageable. Given the flexibility of this solution, it is the most appropriate one for the problem defined. The solution is NoSQL, also referred to as not only, or non-relational, SQL.
One can further prioritize performance over consistency, which is possible with a NoSQL solution and defined by the CAP (consistency, availability, and partition tolerance) theorem. In this chapter, I will discuss NoSQL, its diverse types, its comparison with relational database management systems (RDBMS), and its future applications.
Types of NoSQL
In NoSQL, data can be represented in multiple forms. Many forms of NoSQL exist, and the most commonly used ones are key-value, columnar, document, and graph. In this section, I will summarize the forms most commonly used.
Key-Value Pair
This is the simplest data structure form but offers excellent performance. All the data is referred only through keys, making retrieval very straightforward. The most popular database in this category is Redis Cache. An example is shown in Table 1-1.
Table 1-1
Key-Value Representation
The keys are in the ordered list, and a HashMap is used to locate the keys effectively.
Columnar
This type of database stores the data as columns instead of rows (as RDBMS do) and are optimized for querying large data sets. This type of database is generally known as a wide column store. Some of the most popular databases in this category include Cassandra, Apache Hadoop’s HBase, etc.
Unlike key-value pair databases, columnar databases can store millions of attributes associated with the key forming a table, but stored as columns. However, being a NoSQL database, it will not have any fixed name or number of columns, which makes it a true schema-free database.
Document
This type of NoSQL database manages data in the form of documents. Many implementations exist for this kind of database, and they have different various types of document representation. Some of the most popular store data as JSON, XML, BSON, etc. The basic idea of storing data in document form is to retrieve it faster, by matching to its meta information (see Figures 1-1 and 1-2).
../images/462104_1_En_1_Chapter/462104_1_En_1_Fig1_HTML.pngFigure 1-1
Sample document structure (JSON) code
../images/462104_1_En_1_Chapter/462104_1_En_1_Fig2_HTML.pngFigure 1-2
Sample document structure (XML) code
Documents can contain many different forms of data key-value pairs, key-array pairs, or even nested documents. One of the popular databases in this category is MongoDB.
Graph
This type of database stores data in the form of networks, e.g., social connections, family trees, etc. (see Figure 1-3). Its beauty lies in the way it stores the data: using a graph structure for semantic queries and representing it in the form of edges and nodes.
Nodes are leaf information that represent the entity, and the relationship (or relationships) between two nodes is defined using edges. In the real world, our relationship to every other individual is different which can be distinguished by various attributes, at the edges level.
../images/462104_1_En_1_Chapter/462104_1_En_1_Fig3_HTML.jpgFigure 1-3
Graph form of data representation
The graph form of data usually follows the standards defined by Apache TinkerPop, and the most popular database in this category is Neo4J (see Figure 1-4b which depicts the outcome of query executed in Figure 1-4a.
../images/462104_1_En_1_Chapter/462104_1_En_1_Fig4a_HTML.pngFigure 1-4a
Gremlin Query on TinkerPop Console to Fetch All the Records
../images/462104_1_En_1_Chapter/462104_1_En_1_Fig4b_HTML.jpgFigure 1-4b
Result in TinkerPop console
What to Expect from NoSQL
To better understand the need for using NoSQL, let’s compare it to RDBMS from a transactional standpoint. For RDBMS, any transaction will have certain characteristics, which are known as ACID—atomicity, consistency, isolation, and durability.
Atomicity
This property ensures that a transaction should be completed or doesn’t exist at all. If, for any reason, a transaction fails, a full set of changes that has occurred through the course of transaction will be removed. This is called rollback.
Consistency
This property ensures that the system will be in a consistent state after completion of a transaction (failed or successful).
Isolation
This property ensures that every transaction will have exclusivity over the resources, e.g., tables, rows, etc. The reads and writes of the transaction will not be visible to reads and writes of any other transaction.
Durability
This property ensures that the data should be persistent and shouldn’t get lost during a hardware, power, software, or any other failure. To achieve this, the system will log all the steps performed in the transaction and the state will get re-created whenever required.
By contrast, NoSQL relies on the concept of the CAP theorem, as follows.
Consistency
This ensures that the read performed by any transaction has the latest information/data for all the nodes. It is a bit different from the consistency defined in ACID, as ACID’s consistency states that all the data changes should provide a consistent data view for database connections.
Availability
Every time data is requested, a response is given without a guarantee of the latest data. This is critical for systems that require high performance and tolerate eventuality of data.
Partition Tolerance
This property will ensure that network failure between nodes will not impact the system failure or performance. It will help ensuring the availability of the system and consistent performance.
Most of the time, in a durable distributed system, network durability will be built in, which helps make all the nodes (partitions) available all the time. This means we are left with two choices, consistency or availability. When we choose availability, the system will always process the query and return the latest data, even if it can’t guarantee the concurrency of the data.
Another theorem, PACELC, is an extension of CAP and states that if a system is running normally in the absence of partitions, one must choose between latency and consistency. If the system is designed for high availability, one must replicate it, then a trade-off occurs between consistency and latency.
Architects must, therefore, choose the right balance between availability, consistency, and latency while defining the partition tolerance. Following are a few examples.
Example 1: Availability
Consider, for example, a device installed on an elevator for the purpose of monitoring that elevator. The device posts messages to the main server to provide a status report. If something goes wrong, it will alert the relevant personnel to perform an emergency response. Losing such a message will jeopardize the entire emergency response system, thus selecting availability over consistency in this case will make the most sense.
Example 2: Consistency
Consider a reward catalog system that keeps track of allocation and redemption of reward points. During redemption, the system must take care of rewards accumulated at point-in-time, and the transaction should be consistent. Otherwise, one can redeem rewards multiple times. In this case, selection of consistency is most critical.
NoSQL and Cloud
NoSQL is designed to do scale out and can span thousands of computer nodes. It has been used for quite a while and is gaining popularity because of its unmatchable performance. However, there is no such thing as a universal database. Hence, we should pick the best technology for the given use case. By design,