10gen-MongoDB Operations Best Practices 2.6
10gen-MongoDB Operations Best Practices 2.6
10gen-MongoDB Operations Best Practices 2.6
August 2014
Table of Contents
MONGODB OPERATIONS BEST PRACTICES
Data Migration
Hardware
Memory
Storage
CPU
Application Developer
Network Administrator
2
2
Document Model
Dynamic Schema
Collections
10
10
Networking
11
Production-Proven Recommendations
11
11
Indexes
Journaling
Transactions
Data Redundancy
11
Availability of Writes
12
Read Preferences
13
Schema Enforcement
Document Size
11
GridFS
13
14
14
Capped Collections
15
Dropping a Collection
15
Geographic Distribution
15
Indexing
Query Optimization
Profiling
13
5
5
15
15
Compound Indexes
15
Unique Indexes
Array Indexes
(MMS)
15
TTL Indexes
15
Geospatial Indexes
Mongodump
16
Sparse Indexes
Hash Indexes
17
Hardware Monitoring
18
SNMP
18
Database Profiler
18
Index Limitations
Mongotop
18
Mongostat
18
Working Sets
18
Linux Utilities
18
Setup
Windows Utilities
18
Database Configuration
Upgrades
V. CAPACITY PLANNING
Monitoring Tools
Things to Monitor
Application Logs And Database Logs
16
16
18
18
Page Faults
18
Disk
19
CPU
19
Connections
19
Op Counters
19
Queues
19
System Configuration
19
Shard Balancing
19
Replication Lag
20
20
VI. SECURITY
20
Defense in Depth
20
Authentication
20
Authorization
20
Auditing
21
Encryption
21
21
Query Injection
21
IX. CONCLUSION
22
22
MongoDB Operations
Best Practices
tors, and network administrators need minimal training to understand MongoDB. The concepts of a database, tuning, performance monitoring, data modeling,
index optimization and other topics are very relevant
to MongoDB. Because MongoDB is designed to be simple to administer and to deploy in large clustered environments, most users of MongoDB find that with
minimal training an existing operations professional
can become competent with MongoDB, and that MongoDB expertise can be gained in a relatively short period of time.
MongoDB is the open-source, document database popular among both developers and operations professionals given its agile and scalable architecture. MongoDB is used in tens of thousands of production deployments by organizations ranging in size from
emerging startups to the largest Fortune 50 companies. This paper provides guidance on best practices
for deploying and managing a MongoDB cluster. It assumes familiarity with the architecture of MongoDB
and a basic understanding of concepts related to the
deployment of enterprise software.
This document discusses many best practices for operating and deploying a MongoDB system. The MongoDB community is vibrant and new techniques and
lessons are shared every day.
This document is subject to change. For the most up
to date version of this document please visit mongodb.com. For the most current and detailed information on specific topics, please see the online documentation at mongodb.org. Many links are provided
throughout this document to help guide users to the
appropriate resources online.
Fundamentally MongoDB is a database and the concepts of the system, its operations, policies, and procedures should be familiar to users who have deployed and operated other database systems. While
some aspects of MongoDB are different from traditional relational database systems, skills and infrastructure developed for other database systems are
relevant to MongoDB and will help to make deployments successful. Typically MongoDB users find that
existing database administrators, system administra-
Roles and
Responsibilities
As with any database, applications deployed on MongoDB require careful planning and the coordination of
a number of roles in an organization's technical teams
to ensure successful maintenance and operation. Organizations tend to find many of the same individuals
and their respective roles for traditional technology
deployments are appropriate for a MongoDB deployment: Data Architects, Database Administrators, System Administrators, Application Developers, and Network Administrators.
APPLICATION DEVELOPER
The application developer works with other members
of the project team to ensure the requirements regarding functionality, deployment, security, and availability are clearly understood. The application itself is
written in a language such as Java, C#, PHP or Ruby.
Data will be stored, updated, and queried in MongoDB,
and language-specific drivers are used to communicate between MongoDB and the application. The application developer works with the data architect to
define and evolve the data model and to define the
query patterns that should be optimized. The application developer works with the database administrator,
sysadmin and network administrator to define the deployment and availability requirements of the application.
DATA ARCHITECT
While modeling data for MongoDB is typically simpler
than modeling data for a relational database, there
tend to be multiple options for a data model, and
tradeoffs with each alternative regarding performance,
resource utilization, ease of use, and other areas. The
data architect can carefully weigh these options with
the development team to make informed decisions regarding the design of the schema. Typically the data
architect performs tasks that are more proactive in nature, whereas the database administrator may perform
tasks that are more reactive.
NETWORK ADMINISTRATOR
A MongoDB deployment typically involves multiple
servers distributed across multiple data centers. Network resources are a critical component of a MongoDB
system. While MongoDB does not require any unusual
configurations or resources as compared to other database systems, the network administrator should be
consulted to ensure the appropriate policies, procedures, configurations, capacity, and security settings
are implemented for the project.
I. Preparing for a
MongoDB Deployment
SCHEMA DESIGN
Developers and data architects should work together
to develop the right data model, and they should invest time in this exercise early in the project. The application should drive the data model, updates, and
queries of your MongoDB system. Given MongoDB's
MongoDB includes support for many indexes, including compound, geospatial, TTL, text search, sparse,
unique, and others. For more information see the section on indexes.
Transactions
MongoDB guarantees atomic updates to data at the
document level. It is not possible to update multiple
documents in a single atomic operation. Atomicity of
updates may influence the schema for your application.
Schema Enforcement
Document Model
MongoDB does not enforce schemas. Schema enforcement should be performed by the application. For
more information on schema design, please see Data
Modeling Considerations for MongoDB in the MongoDB Documentation.
MongoDB stores data as documents in a binary representation called BSON. The BSON encoding extends
the popular JSON representation to include additional
types such as int, long, and floating point. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays,
sub-documents and binary data. It may be helpful to
think of documents as roughly equivalent to rows in a
relational database, and fields as roughly equivalent
to columns. However, MongoDB documents tend to
have all data for a given record in a single document,
whereas in a relational database information for a given record is usually spread across rows in many tables. In other words, data in MongoDB tends to be
more localized.
DOCUMENT SIZE
The maximum BSON document size in MongoDB is
16MB. Users should avoid certain application patterns
that would allow documents to grow unbounded. For
instance, applications should not typically update documents in a way that causes them to grow significantly after they have been created, as this can lead to inefficient use of storage. If the document size exceeds
its allocated space, MongoDB will relocate the document on disk. This automatic process can be resource
intensive and time consuming, and can unnecessarily
slow down other operations in the database.
Dynamic Schema
MongoDB documents can vary in structure. For example, documents that describe users might all contain
the userid and the lastdate they logged into the system, but only some of these documents might contain
the user's shipping address, and perhaps some of
those contain multiple shipping addresses. MongoDB
does not require that all documents conform to the
same structure. Furthermore, there is no need to declare the structure of documents to the system documents are self-describing.
Collections
Collections are groupings of documents. Typically all
documents in a collection have similar or related purposes for an application. It may be helpful to think of
collections as being analogous to tables in a relational database.
Indexes
MongoDB uses B-tree indexes to optimize queries. Indexes are defined in a collection on document fields.
process of moving documents and updating their associated indexes can be I/O-intensive and can unnecessarily impact performance.
To anticipate future growth, the usePowerOf2Sizes attribute is enabled by default on each collection. This
setting automatically configures MongoDB to round
up allocation sizes to the powers of 2 (e.g., 2, 4, 8, 16,
32, 64, etc). This setting reduces the chances of increased disk I/O at the cost of using some additional
storage.
Capped Collections
In some cases a rolling window of data should be
maintained in the system based on data size. Capped
collections are fixed-size collections that support
high-throughput inserts and reads based on insertion
order. A capped collection behaves like a circular
buffer: data is inserted into the collection, that insertion order is preserved, and when the total size reaches the threshold of the capped collection, the oldest
documents are deleted to make room for the newest
documents. For example, store log information from a
high-volume system in a capped collection to quickly
retrieve the most recent log entries without designing
for storage management.
An additional strategy is to manually pad the documents to provide sufficient space for document
growth. If the application will add data to a document
in a predictable fashion, the fields can be created in
the document before the values are known in order to
allocate the appropriate amount of space during document creation. Padding will minimize the relocation of
documents and thereby minimize over-allocation. This
factor can be viewed as the paddingFactor field in the
output of the db..stats() command. For example, a value of 1 indicates no padding factor, and a value of 1.5
indicates a padding factor of 50%.
Dropping a Collection
It is very efficient to drop a collection in MongoDB. If
your data lifecycle management requires periodically
deleting large volumes of documents, it may be best
to model those documents as a single collection.
Dropping a collection is much more efficient than removing all documents or a large subset of a collection, just as dropping a table is more efficient than
deleting all the rows in a table in a relational database.
GridFS
For files larger than 16MB, MongoDB provides a convention called GridFS, which is implemented by all
MongoDB drivers. GridFS automatically divides large
data into 256KB pieces called chunks and maintains
the metadata for all chunks. GridFS allows for retrieval of individual chunks as well as entire documents. For example, an application could quickly jump
to a specific timestamp in a video. GridFS is frequently
used to store large binary files such as images and
videos in MongoDB.
INDEXING
Like most database management systems, indexes are
a crucial mechanism for optimizing system performance in MongoDB. And while indexes will improve
the performance of some operations by one or more
orders of magnitude, they have associated costs in the
form of slower updates, disk usage, and memory usage. Users should always create indexes to support
queries, but should take care not to maintain indexes
that the queries do not use. Each index incurs some
cost for every insert and update operation: if the application does not use these indexes, then it can adversely affect the overall capacity of the database.
This is particularly important for deployments that
support insert-heavy workloads.
Query Optimization
and assign a unique value, or the value can be specified when the document is inserted. All user-defined
indexes are secondary indexes. Any field can be used
for a secondary index, including fields with arrays.
Compound Indexes
MongoDB supports index intersection, so that more
than one index can be used to satisfy a query. This capability is useful when running ad-hoc queries as data
access patterns are not typically not known in advance.
Where a query that accesses data based on multiple
predicates is known, it will be more performant to use
Compound Indexes, which use a single index structure
to maintain references to multiple fields. For example,
consider an application that stores data about customers. The application may need to find customers
based on last name, first name, and state of residence.
With a compound index on last name, first name, and
state of residence, queries could efficiently locate
people with all three of these values specified. An additional benefit of a compound index is that any leading field within the index can be used, so fewer indexes on single fields may be necessary: this compound
index would also optimize queries looking for customers by last name.
Unique Indexes
By specifying an index as unique, MongoDB will reject
inserts of new documents or the update of a document with an existing value for the field for which the
unique index has been created. By default all indexes
are not unique. If a compound index is specified as
unique, the combination of values must be unique.
If a document does not have a value specified for the
field then an index entry with a value of null will be
created for the document. Only one document may
have a null value for the field unless the sparse option
is enabled for the index, in which case index entries
are not made for documents that do not contain the
field.
PROFILING
MongoDB provides a profiling capability called Database Profiler, which logs fine-grained information
about database operations. The profiler can be enabled to log information for all events or only those
events whose duration exceeds a configurable threshold (whose default is 100ms). Profiling data is stored
in a capped collection where it can easily be searched
for relevant events it may be easier to query this
collection than parsing the log files.
Array Indexes
For fields that contain an array, each array value is
stored as a separate index entry. For example, documents that describe recipes might include a field for
ingredients. If there is an index on the ingredient
field, each ingredient is indexed and queries on the
ingredient field can be optimized by this index. There
is no special syntax required for creating array index-
Hash Indexes
Hash indexes compute a hash of the value of a field
and index the hashed value. The primary use of this
index is to enable hash-based sharding of a collection,
a simple and uniform distribution of documents across
shards.
TTL Indexes
In some cases data should expire out of the system
automatically. Time to Live (TTL) indexes allow the
user to specify a period of time after which the data
will automatically be deleted from the database. A
common use of TTL indexes is applications that maintain a rolling window of history (e.g., most recent 100
days) for user actions such as click streams.
Geospatial Indexes
MongoDB provides geospatial indexes to optimize
queries related to location within a two dimensional
space, such as projection systems for the earth. The
index supports data stored as both GeoJSON objects
and as regular 2D coordinate pairs. Documents must
have a field with a two-element array, such as latitude
and longitude to be indexed with a geospatial index.
These indexes allow MongoDB to optimize queries
that request all documents closest to a specific point
in the coordinate system.
Sparse Indexes
Sparse indexes only contain entries for documents
that contain the specified field. Because the document
data model of MongoDB allows for flexibility in the
data model from document to document, it is common
for some fields to be present only in a subset of all
documents. Sparse indexes allow for smaller, more efficient indexes when fields are not present in all documents.
By default, the sparse option for indexes is false. Using a sparse index will sometime lead to incomplete
results when performing index-based operations such
as filtering and sorting. By default, MongoDB will create null entries in the index for documents that are
missing the specified field.
Indexes can be built in either the foreground or background on both the primary and secondary members
of a replica set. While foreground index builds will be
faster, they will block other database operations while
running. To avoid blocking, indexes can be built in the
background, though this will impose a performance
overhead, especially if the index is larger than the
available RAM. Therefore the best approach to building indexes on replica sets is to:
Make sure that the application checks for the existence of all appropriate indexes on startup and that it
terminates if indexes are missing. Index creation
should be performed by separate application code and
during normal maintenance operations.
Index Maintenance Operations
6. When all the indexes have been built on the secondaries, restart the primary in standalone mode.
One of the secondaries will be elected as primary
so the application can continue to function.
WORKING SETS
Index Limitations
Indexes can impact update performance an update must first locate the data to change, so an index will help in this regard, but index maintenance
itself has overhead and this work will slow update
performance.
In-memory sorting of data without an index is limited to 32MB. This operation is very CPU intensive,
and in-memory sorts indicate an index should be
created to optimize these queries.
Common Mistakes Regarding Indexes
The following tips may help to avoid some common
mistakes regarding indexes:
Some operations may inadvertently purge a large percentage of the working set from memory, which adversely affects performance. For example, a query that
scans all documents in the database, where the database is larger than the RAM on the server, will cause
documents to be read into memory and the working
set to be written out to disk. Other examples include
Customers can deploy rolling upgrades without incurring any downtime, as each member of a replica set
can be upgraded individually without impacting cluster availability. It is possible for each member of a
replica set to run under different versions of MongoDB. As a precaution, the release notes for the MongoDB release should be consulted to determine if
there is a particular order of upgrade steps that needs
to be followed and whether there are any incompatibilities between two specific versions.
If your database working set size exceeds the available RAM of your system, consider increasing the RAM
or adding additional servers to the cluster and sharding your database. For a discussion on this topic, see
the section on Sharding Best Practices. It is far easier
to implement sharding before the resources of the
system become limited, so capacity planning is an important element in the successful delivery of the project.
DATA MIGRATION
A useful output included with the serverStatus command is a workingSet document that provides an estimated size of the MongoDB instance's working set.
Operations teams can track the number of pages accessed by the instance over a given period, and the
elapsed time from the oldest to newest document in
the working set. By tracking these metrics, it is possible to detect when the working set is approaching
current RAM limits and proactively take action to ensure the system is scaled.
The mongoimport and mongoexport tools are provided with MongoDB for simple loading or exporting of
data in JSON or CSV format. These tools may be useful
in moving data between systems as an initial step.
Other tools called mongodump and mongorestore are
useful for moving data between two MongoDB systems.
Setup
MongoDB provides repositories for .deb and .rpm
packages for consistent setup, upgrade, system integration, and configuration, . This software uses the
same binaries as the tarball packages provided at the
http://www.mongodb.org/downloads.
Database Configuration
HARDWARE
The following suggestions are only intended to provide high-level guidance for hardware for a MongoDB
deployment. The specific configuration of your hardware will be dependent on your data, your queries,
your performance SLA, your availability requirements,
and the capabilities of the underlying hardware components. MongoDB has extensive experience helping
customers to select hardware and tune their configurations and we frequently work with customers to
plan for and optimize their MongoDB systems.
Upgrades
Users should upgrade software as often as possible so
that they can take advantage of the latest features as
well as any stability updates or bug fixes. Upgrades
should be tested in non-production environments to
ensure production applications are not adversely affected by new versions of the software.
Memory
MongoDB makes extensive use of RAM to increase
performance. Ideally, the full working set fits in RAM.
As a general rule of thumb, the more RAM, the better.
As workloads begin to access data that is not in RAM,
the performance of MongoDB will degrade. MongoDB
delegates the management of RAM to the operating
system. MongoDB will use as much RAM as possible
until it exhausts what is available.
Storage
MongoDB does not require shared storage (e.g., storage area networks). MongoDB can use local attached
storage as well as solid state drives (SSDs). Most disk
access patterns in MongoDB do not have sequential
properties, and as a result, customers may experience
substantial performance gains by using SSDs. Good results and strong price to performance have been observed with SATA SSD and with PCI. Commodity SATA
spinning drives are comparable to higher cost spinning drives due to the non-sequential access patterns
of MongoDB: rather than spending more on expensive
spinning drives, that money may be more effectively
spent on more RAM or SSDs. Another benefit of using
SSDs is that they provide a more gradual degradation
of performance if the working set no longer fits in
memory.
IaaS deployments are especially good for initial testing and development, as they provide a low-risk, lowcost means for getting the database up and running.
MongoDB has partnerships with a number of cloud
and managed services providers, such as AWS, GCE,
IBM with Softlayer and Microsoft Windows Azure, in
addition to partners that provide fully managed instances of MongoDB, like MongoLab and Rackspace
with ObjectRocket.
CPU
MongoDB performance is typically not CPU-bound. As
MongoDB rarely encounters workloads able to leverage large numbers of cores, it is preferable to have
Within a shard, MongoDB further partitions documents into chunks. MongoDB maintains metadata
about the relationship of chunks to shards in the config server. Three config servers are maintained in
sharded deployments to ensure availability of the
metadata at all times. To estimate the total size of the
shard metadata, multiply the size of the chunk metadata times the total number of chunks in your database the default chunk size is 64MB. For example, a
64TB database would have 1 million chunks and the
total size of the shard metadata managed by the config servers would be 1 million times the size of the
chunk metadata, which could range from hundreds of
MB to several GB of metadata. Shard metadata access
is infrequent: each mongos maintains a cache of this
data, which is periodically updated by background
processes when chunks are split or migrated to other
shards. The hardware for a config server should therefore be focused on availability: redundant power supplies, redundant network interfaces, redundant RAID
controllers, and redundant storage should be used.
Do not use hugepages virtual memory pages, MongoDB performs better with normal virtual memory
pages.
Ensure that readahead settings for the block devices that store the database files are relatively
small as most access is non-sequential. For example, setting readahead to 32 (16KB) is a good starting point.
Synchronize time between your hosts. This is especially important in MongoDB clusters.
Linux provides controls to limit the number of resources and open files on a per-process and per-user
basis. The default settings may be insufficient for
MongoDB. Generally MongoDB should be the only
process on a system to ensure there is no contention
with other processes.
While each deployment has unique requirements, the
following settings are a good starting point mongod
and mongos instances. Use ulimit to apply these settings:
each mongos
10
naling helps prevent corruption and increases operational resilience. Journal commits are issued at least
as often as every 100ms by default. In the case of a
server crash, journal entries will be recovered automatically. Therefore the time between journal commits represents the maximum possible data loss. This
setting can be configured to a value that is appropriate for the application.
NETWORKING
Always run MongoDB in a trusted environment with
network rules that prevent access from all unknown
entities. There are a finite number of pre-defined
processes that communicate with a MongoDB system:
application servers, monitoring processes, and MongoDB processes.
It may be beneficial for performance to locate MongoDB's journal files and data files on separate storage
arrays. The I/O patterns for the journal are very sequential in nature and are well suited for storage devices that are optimized for fast sequential writes,
whereas the data files are well suited for storage devices that are optimized for random reads and writes.
Simply placing the journal files on a separate storage
device normally provides some performance enhancements by reducing disk contention.
By default MongoDB processes will bind to all available network interfaces on a system. If your system
has more than one network interface, bind MongoDB
processes to the private or internal network interface.
Detailed information on default port numbers for
MongoDB, configuring firewalls for MongoDB, VPN,
and other topics is available on the MongoDB Documentation page for Security Tutorials.
DATA REDUNDANCY
MongoDB maintains multiple copies of data, called
replica sets, using native replication. Users should use
replica sets to help prevent database downtime. Replica failover is fully automated in MongoDB, so it is not
necessary to manually intervene to recover in the
event of a failure.
PRODUCTION-PROVEN RECOMMENDATIONS
The latest suggestions on specific configurations for
operating systems, file systems, storage devices and
other system-related topics are maintained on the
MongoDB Documentation Production Notes page.
JOURNALING
MongoDB implements write-ahead journaling to enable fast crash recovery and consistency in the storage
engine. Journaling is enabled by default for 64-bit
platforms. Users should never disable journaling; jour-
11
For more details, please see the MongoDB Documentation on Upgrading MongoDB to 2.6.
the receipt of write operation as with a write concern level of ignore; however, the driver will receive and handle network errors as possible given
system networking configuration. Sometimes this
configuration is called fire and forget and it was
the default global write concern for all drivers before late 2012. Write Acknowledged: The mongod
will confirm the receipt of the write operation, allowing the client to catch network, duplicate key,
and other exceptions. This is the default global
write concern.
Ensure that the members of the replica set will always be able to elect a primary. Run an odd number of members or run an arbiter (a replica that exists solely for participating in election of the primary) on one of your application servers if you
have an even number of members.
AVAILABILITY OF WRITES
MongoDB allows one to specify the level of availability when issuing writes to the system, which is called
write concern. The following options can be configured on a per connection, per database, per collection,
or per operation basis:
Errors Ignored: Write operations are not acknowledged by MongoDB, and may not succeed in the
case of connection errors that the client is not yet
12
For more information see the MongoDB Documentation on Data Center Awareness.
READ PREFERENCES
Tag-aware sharding: Documents are partitioned according to a user-specified configuration that associates shard key ranges with physical shards. Users
can optimize the physical location of documents
for application requirements such as locating data
in specific data centers.
However, sharding can add operational complexity to
a MongoDB deployment and it has its own infrastructure requirements. As a result, users should shard as
necessary and when indicated by actual operational
requirements.
13
chunks by default. Low cardinality (e.g., the attribute size) will tend to group documents together
on a small number of shards, which in turn will require frequent rebalancing of the chunks. Instead,
a shard key should exhibit high cardinality.
Range-based sharding distributes a collection of documents based on a user-specified shard key. Because
shard keys and their values cannot be updated, and
because the values of the shard key determine the
physical storage of the documents as well as how
queries are routed across the cluster, it is very important to select a good shard key.
When using range-based sharding or tag-aware sharding, it is important to carefully select the appropriate
shard key because values close to one another will
be located on the same shard. Using a timestamp, for
example, would result in all new documents being
placed in the same shard, which might result in an uneven distribution of writes in the system.
As an example to illustrate good shard key selection,
consider an email repository. Each document includes
a unique key, in this case created by MongoDB automatically, a user identifier, the date and time the
email was sent, email subject, recipients, email body,
and any attachments. Emails can be large, in this case
up to 16MB, and most users have lots of email, several
GB each. Finally, we know that the most popular query
in the system is to retrieve all emails for a user, sorted
by time. We also know the second most popular query
is on recipients of emails. The indexes needed for this
system are on _id, user+time, and recipients.
Users who choose to shard should consider the following best practices:
{ _id ObjectId(),
user: 123,
time: Date(),
subject: ". . . ",
recipients: [.],
body: " . . . . ",
attachments: [.] }
Add capacity before it is needed. Cluster maintenance is lower risk and more simple to manage if
capacity is added before the system is over utilized. Run three configuration servers to provide
redundancy. Production deployments must use
14
three config servers. Config servers should be deployed in a topology that is robust and resilient to
a variety of failures.
Use replica sets. Replica sets provide data redundancy and allow MongoDB to continue operating in
a number of failure scenarios or during planned
maintenance. Replica sets should be used in any
deployment.
Cardinality
Insert Scaling
Query Isolation
_id
Doc level
One shard1
Scatter/gather2
hash(_id)
Hash level3
All shards
Scatter/gather
User
Many docs4
All shards
Targeted
User, time
Doc level
All shards
Targeted
1. MongoDB's auto-generated ObjectId() is based on timestamp and is therefore sequential in nature and will result in all documents being
written to the same shard.
2. Most frequent query is on userId+time, not _id, so all queries will be scatter/gather.
3. Cardinality will be reflective of extent of hash collisions: the more collisions, the worse the cardinality.
4. Each user has many documents, so the cardinality of the user field is not very high.
15
sync, a continuous backup is created by streaming encrypted and compressed MongoDB oplog data to MMS;
- MMS creates snapshots of MongoDB data and retains
multiple copies based on a user-defined retention policy; - By excluding specific namespaces, administrators
can be selective about what databases to backup; - To
restore data, administrators choose an existing snapshot. For replica sets, the administrator can also specify a specific point in time up to the last 96 hours
(user configurable). The backup service replays the
oplog to bring you to the moment you specify.
data changes.
To get started with MMS backup: - Administrators install the backup agent locally, which then conducts an
initial sync of the MongoDB data; - After the initial
16
For more on backup and restore in sharded environments, see the MongoDB Documentation page on
Backup and Restore Sharded Clusters and the tutorial
on Backup a Sharded Cluster with Filesystem Snapshots.
Mongodump
V. Capacity Planning
MONITORING TOOLS
MongoDB Management Service (MMS)
In addition to point-in-time MongoDB backups (discussed above), MMS also offers monitoring for MongoDB clusters. MMS monitoring is delivered as a hosted, cloud-based solution and is also available as an
on-premise solution with MongoDB Enterprise.
17
Hardware Monitoring
Scout
Munin
Zabbix
Linux Utilities
Other common utilities that should be used to monitor different aspects of a MongoDB system:
SNMP
ry.
Database Profiler
Performance Monitor, a Microsoft Management Console snap-in, is a useful tool for measuring a variety of
stats in a Windows environment.
THINGS TO MONITOR
Application Logs And Database Logs
Application and database logs should be monitored
for errors and other system information. It is important to correlate your application and database logs in
order to determine whether activity in the application
is ultimately responsible for other issues in the system. For example, a spike in user writes may increase
the volume of writes to MongoDB, which in turn may
overwhelm the underlying storage system. Without
the correlation of application and database logs, it
might take more time than necessary to establish that
the application is responsible for the increase in
writes rather than some process running in MongoDB.
Mongotop
mongotop is a utility that ships with MongoDB. It
tracks and reports the current read and write activity
of a MongoDB cluster. mongotop provides collectionlevel stats.
Mongostat
mongostat is a utility that ships with MongoDB. It
shows real-time statistics about all servers in your
MongoDB system. mongostat provides a comprehensive overview of all operations, including counts of
updates, inserts, page faults, index misses, and many
other important measures of the system health. mongostat is similar to the linux tool vmstat.
In the event of errors, exceptions or unexpected behavior, the logs should be saved and uploaded to
MongoDB when opening a support case. Logs for mongod processes running on primaries and secondaries,
as well as mongos and config processes will enable
the support team to more quickly root cause any issues.
Page Faults
When a working set ceases to fit in memory, or other
operations have moved other data into memory, the
volume of page faults may spike in your MongoDB
system. Page faults are part of the normal operation
Nagios
Ganglia
Cacti
18
Disk
Beyond memory, disk I/O is also a key performance
consideration for a MongoDB system because writes
are flushed to disk every 60 seconds and commits to
the journal every 100ms. Under heavy write, load the
underlying disk subsystem may become overwhelmed,
or other processes could be contending with MongoDB, or the RAID configuration may be inadequate
for the volume of writes. Other potential issues could
be the root cause, but the symptom is typically visible
through iostat as showing high disk utilization and
high queuing for writes.
System Configuration
It is not uncommon to make changes to hardware and
software in the course of a MongoDB deployment. For
example, a disk subsystem may be replaced to provide
better performance or increased capacity. When components are changed it is important to ensure their
configurations are appropriate for the deployment.
MongoDB is very sensitive to the performance of the
operating system and underlying hardware, and in
some cases the default values for system configurations are not ideal. For example, the default readahead for the file system could be several MB whereas
MongoDB is optimized for readahead values closer to
32KB. If the new storage system is installed without
making the change to the readahead from the default
to the appropriate setting, the application's performance is likely to degrade substantially.
CPU
A variety of issues could trigger high CPU utilization.
This may be normal under most circumstances, but if
high CPU utilization is observed without other issues
such as disk saturation or pagefaults, there may be an
unusual issue in the system. For example, a MapReduce job with an infinite loop, or a query that sorts
and filters a large number of documents from working
set without good index coverage, might cause a spike
in CPU without triggering issues in the disk system or
pagefaults.
Shard Balancing
Connections
MongoDB drivers implement connection pooling to facilitate efficient use of resources. Each connection
consumes 1MB of RAM, so be careful to monitor the
total number of connections so they do not overwhelm the available RAM and reduce the available
memory for the working set. This typically happens
when client applications do not properly close their
connections, or with Java in particular, that relies on
garbage collection to close the connections.
Op Counters
The utilization baselines for your application will help
you determine a normal count of operations. If these
counts start to substantially deviate from your baselines it may be an indicator that something has
19
work is currently being performed by the cluster, including rebalancing of documents across the shards.
Replication Lag
Replication lag is the amount of time it takes a write
operation on the primary to replicate to a secondary.
Some amount of delay is normal, but as replication
lag grows, significant issues may arise. Typical causes
of replication lag include network latency or connectivity issues, and disk latencies such as the throughput
of the secondaries being inferior to that of the primary.
User Rights Management. Control access to sensitive data using industry standard mechanisms for
authentication and authorization, including fieldlevel redaction.
AUTHENTICATION
Authentication can be managed from within the database itself or via MongoDB Enterprise integration with
external security mechanisms including LDAP, Windows Active Directory, Kerberos and x.509 certificates.
AUTHORIZATION
VI. Security
Additionally, MongoDB's Aggregation Pipeline includes a stage to implement Field Level Redaction,
providing a method to restrict the content of a returned document on a per-field level. The application
must pass the redaction logic to the database on each
request. It therefore relies on trusted middleware running in the application to ensure the redaction
DEFENSE IN DEPTH
A Defense in Depth approach is recommended for securing MongoDB deployments, and it addresses a
number of different methods for managing risk and
reducing risk exposure.
20
AUDITING
Security Administrators can user MongoDB's native
audit log to track access and administrative actions
taken against the database, with events written to the
console, syslog or a file. The DBA can then merge
these events into a single log using their own tools,
enabling a cluster-wide view of operations that affected multiple nodes.
IX. Conclusion
ENCRYPTION
MongoDB data can be encrypted on the network and
on disk.
MongoDB is the world's most popular NoSQL database, powering tens of thousands of operational and
analytical big data applications in mission-critical financial services, telecoms, government and enterprise
environments. MongoDB users rely on the best practices discussed in this guide to maintain the highly
available, secure and scalable operations demanded
by businesses today.
Data encryption software should ensure that the cryptographic keys remain safe and enable compliance
with standards such as HIPAA, PCI-DSS and FERPA.
MongoDB provides products and services to help customers get to production faster with less effort and
risk, including:
MongoDB Enterprise provides a management platform
for automating, monitoring, and backing up MongoDB
deployments; advanced security; support from MongoDB engineers; on-demand training; platform certification; and a commercial license.
MongoDB Management Service (MMS) offers automated deployment and zero-downtime upgrades, disaster
recovery and continuous monitoring in the cloud.
Production Support helps customers proactively identify and address potential issues. The same engineers
that build the database are available 24x7x365 to
help teams address their needs quickly and safely.
QUERY INJECTION
21
Resources
For more information, please visit mongodb.com or
mongodb.org, or contact us at sales@mongodb.com.
Resource
Website URL
MongoDB Enterprise
Download
mongodb.com/download
university.mongodb.com
mongodb.com/events
White Papers
mongodb.com/white-papers
Case Studies
mongodb.com/customers
Presentations
mongodb.com/presentations
Documentation
docs.mongodb.org
22