Nothing Special   »   [go: up one dir, main page]

Skip to content
share

Couchbase Monitoring Integration

Integration

Important Metrics to Watch and Alert on

View Operations

The View Operations metric measures the number of view queries executed by the Couchbase cluster.

It is an important metric for monitoring the performance and throughput of view queries in your Couchbase cluster. High view operation rates can indicate that the number of view queries is increasing and that the design of the views may need to be optimized for better performance. On the other hand, low view operation rates may indicate that the views are not being utilized effectively and may need to be re-evaluated.

Resident Item Ratio

The Resident Item Ratio metric indicates the percentage of active items in a bucket that are currently residing in the memory of the Couchbase server. In other words, it is the ratio of the number of active items residing in memory to the total number of active items in the bucket.

Resident Items Ratio

A high ratio means that a large portion of the active items in the bucket are resident in memory, which can lead to faster read and write performance. On the other hand, a low ratio indicates that most of the active items are residing on disk, which can lead to increased disk I/O and slower response times.

The Resident Item Ratio is an important metric for Couchbase performance tuning, as it can help identify if a bucket has sufficient memory resources to accommodate its working set. It can also help to determine if the working set of data is too large for the available memory, which can result in a high number of disk I/O operations and reduced performance.

Cache Miss Rate

Cache Miss Rate is a performance metric that indicates the percentage of times a requested item is not found in the cache and must be retrieved from the disk. A high cache miss rate can indicate that the working set of data is too large for the available cache size or that the cache eviction policy is not effective. It can lead to increased disk I/O, longer response times, and reduced throughput. On the other hand, a low cache miss rate means that most requested items are found in the cache, which results in faster response times and better overall performance. This number should be as close to zero as possible.

Cache Miss Rate

Total Items

This metric counts the total number of current items stored in a bucket including those not active (replica, dead and pending states). It is an important indicator of the size and growth of a Couchbase bucket, as well as the overall workload on the cluster. It can be used to monitor and manage the storage capacity and performance of a Couchbase deployment, and to optimize the allocation of resources such as memory, disk space, and network bandwidth.

Memory Usage High Watermark

The Memory High Watermark metric is a configurable threshold that determines the maximum amount of memory that the data service will allocate for storing active data in a bucket. When the active data reaches the Memory High Watermark, the data service will begin to evict items from memory to maintain the threshold.

The Memory High Watermark is expressed as a percentage of the total memory available to the Couchbase server. By default, the Memory HWM is set to 85%, which means that when the active data reaches 85% of the available memory, the data service will begin to evict items.

This is an important metric for monitoring the memory usage of your Couchbase cluster. It allows you to control the allocation of resources and prevent the data service from consuming all available memory, which could result in performance degradation or even crashes.

Memory High Watermark

Current Connections

The Current Connections metric indicates the number of active network connections between clients and the Couchbase cluster. This includes connections for data access, management operations, and other network traffic.

Current Connections

It's used for monitoring the load on the Couchbase cluster and for identifying potential bottlenecks or capacity issues. A high number of current connections can indicate that the cluster is experiencing heavy load and may need additional resources, such as increased network bandwidth or additional nodes.

Metrics

Metric Name
Key (Type) (Unit)
description
average background wait
background.wait.time.avg
(double_gauge) (sec)
Average background wait time
background wait time
background.wait.total
(double_counter) (sec)
Total background wait time
average commit time
disk.commit.time.avg
(double_gauge) (sec)
Average disk commit time
average update time
disk.update.time.avg
(double_gauge) (microsec)
Average disk update time
bytes read
bytes.read
(double_counter) (bytes)
Number of bytes per second sent into a bucket.
bytes written
bytes.written
(double_counter) (bytes)
Number of bytes per second sent from a bucket
cas bad values
cas.badval
(double_counter) ()
Compare and Swap bad values
cas hits
cas.hits
(double_counter) ()
Compare and Swap hits
cas misses
cas.misses
(double_counter) ()
Compare and Swap misses
cmd gets
cmd.get
(double_counter) ()
Number of get commands
cmd sets
cmd.set
(double_counter) ()
Number of set commands
doc actual disk size
docs.disk.actual.size
(long_gauge) (bytes)
Couch docs total size on disk
doc data disk size
docs.data.size
(long_gauge) (bytes)
Couch docs data size
docs disk size
docs.disk.size
(long_gauge) (bytes)
Couch docs total size
doc fragmentation
docs.fragmentation
(double_gauge) (%)
Couch docs fragmentation
spatial data size
data.spatial.size
(long_gauge) (bytes)
Size of object data for spatial views
spatial disk size
disk.spatial.size
(long_gauge) (bytes)
Amount of disk space occupied by spatial views
spatial ops
ops.spatial
(double_counter) ()
Spatial operations
total disk size
disk.size
(long_gauge) (bytes)
Couch total disk size.
view data size
views.data.size
(long_gauge) (bytes)
Size of object data for views.
view disk size
views.disk.size
(long_gauge) (bytes)
Amount of disk space occupied by views.
view fragmentation
views.fragmentation
(double_gauge) (%)
View fragmentation
view ops
views.ops
(double_counter) ()
View operations
cpu utilization
cpu.utilization
(double_gauge) (%)
CPU utilization percentage.
connections
connections.current
(long_gauge) ()
Current bucket connections.
total items
items.current.total
(long_gauge) ()
Num current items including those not active (replica, dead and pending states)
memory items
items.current
(long_gauge) ()
Num items in active vbuckets (temp + live)
decrement hits
decrement.hits
(double_counter) ()
Decrement hits
decrement misses
decrement.misses
(double_counter) ()
Decrement misses
delete hdouble_counterits
delete.hits
(double_counter) ()
Delete hits
delete misses
delete.misses
(double_counter) ()
Delete misses
commits
disk.commits.count
(double_counter) ()
Disk commits
updates
disk.updates.count
(double_counter) ()
Disk updates
writes
disk.write.queue
(long_gauge) ()
Disk write queue depth
reads
ep.background.fetched
(double_counter) ()
Disk reads
cache miss rate
ep.cache.miss.rate
(double_gauge) (%)
Cache miss rate
cache miss ratio
ep.cache.miss.ratio
(double_gauge) (%)
Cache miss ratio
DCP fts backoff
ep.dcp.fts.backoff
(double_counter) ()
Number of backoffs for fts DCP connections
DCP fts count
ep.dcp.fts.count
(double_gauge) ()
Number of fts DCP connections
DCP fts items remaining
ep.dcp.fts.items.remaining
(double_gauge) ()
Number of fts items remaining to be sent
DCP fts items sent
ep.dcp.fts.items.sent
(double_counter) ()
Number of fts items sent
DCP fts producers
ep.dcp.fts.producer.count
(double_gauge) ()
Number of fts producers
DCP fts total bytes
ep.dcp.2i.total.bytes
(double_counter) (bytes)
Number of bytes being sent for indexes DCP connections
DCP indexes backoff
ep.dcp.2i.backoff
(double_counter) ()
Number of backoffs for indexes DCP
DCP indexes count
ep.dcp.2i.count
(double_gauge) ()
Number of indexes DCP connections
DCP indexes items remaining
ep.dcp.2i.items.remaining
(double_gauge) ()
Number of indexes items remaining to be sent
DCP indexes items sent
ep.dcp.2i.items.sent
(double_counter) ()
Number of indexes items sent
DCP indexes producer
ep.dcp.2i.producer.count
(double_gauge) ()
Number of indexes producers
DCP other backoff
ep.dcp.other.backoff
(double_counter) ()
Number of backoffs for other DCP connections
DCP other count
ep.dcp.other.count
(double_gauge) ()
Number of other DCP connections
DCP other items remaining
ep.dcp.other.items.remaining
(double_gauge) ()
Number of other items remaining to be sent
DCP other items sent
ep.dcp.other.items.sent
(double_counter) ()
Number of other items sent
DCP other producers
ep.dcp.other.producer.count
(double_gauge) ()
Number of other producers
DCP other total bytes
ep.dcp.other.total.bytes
(double_counter) (bytes)
Number of bytes being sent for other DCP connections
DCP replica backoff
ep.dcp.replica.backoff
(double_counter) ()
Number of backoffs for replica DCP connections
DCP replica count
ep.dcp.replica.count
(double_gauge) ()
Number of replica DCP connections
DCP replica items remaining
ep.dcp.replica.items.remaining
(double_gauge) ()
Number of replica items remaining to be sent
DCP replica items sent
ep.dcp.replica.items.sent
(double_counter) ()
Number of replica items sent
DCP replica producer
ep.dcp.replica.producer.count
(double_gauge) ()
Number of replica producers
DCP replica total bytes
ep.dcp.replica.bytes.total
(double_counter) (bytes)
Number of bytes being sent for replica DCP connections
DCP views backoff
ep.dcp.views.backoff
(double_counter) ()
Number of backoffs for views DCP connections
DCP views count
ep.dcp.views.count
(double_gauge) ()
Number of views DCP connections
DCP views items remaining
ep.dcp.views.items.remaining
(double_gauge) ()
Number of views items remaining to be sent
DCP views items sent
ep.dcp.views.items.sent
(double_counter) ()
Number of views items sent
DCP views producer
ep.dcp.views.producer.count
(double_gauge) ()
Number of views producers
DCP views total bytes
ep.dcp.views.bytes.total
(double_counter) (bytes)
Number of bytes being sent for views DCP connections
DCP XDCR backoff
ep.dcp.xdcr.backoff
(double_counter) ()
Number of backoffs for XDCR DCP connections
DCP XDCR count
ep.dcp.xdcr.count
(double_gauge) ()
Number of XDCR DCP connections
DCP XDCR items remaining
ep.dcp.xdcr.items.remaining
(double_gauge) ()
Number of XDCR items remaining to be sent
DCP XDCR items sent
ep.dcp.xdcr.items.sent
(double_counter) ()
Number of XDCR items sent
DCP XDCR producer
ep.dcp.xdcr.producer.count
(double_gauge) ()
Number of XDCR producers
DCP XDCR total bytes
ep.dcp.xdcr.total.bytes
(double_counter) (bytes)
Number of bytes being sent for XDCR DCP connections
queue drained
ep.diskqueue.drain
(double_counter) ()
Total drained items in disk queue
queued
ep.diskqueue.fill
(double_counter) ()
Total enqueued items in disk queue
queue waiting items
ep.diskqueue.items
(long_gauge) ()
Total number of items waiting to be written to disk
current flushing items
ep.flusher.todo
(long_gauge) ()
Number of items currently being written
failed commits
ep.item.commit.failed
(double_gauge) ()
Number of times a transaction failed to commit due to storage errors
kv size
ep.kv.size
(long_gauge) (bytes)
Total amount of user data cached in RAM in this bucket
max size
ep.max.size
(long_gauge) (bytes)
The maximum amount of memory this bucket can use
memory high water mark
ep.mem.high.wat
(long_gauge) (bytes)
Memory usage high water mark for auto-evictions
memory low water mark
ep.mem.low.wat
(long_gauge) (bytes)
Memory usage low water mark for auto-evictions
metadata mem
ep.meta.data.memory
(long_gauge) (bytes)
Total amount of item metadata consuming RAM in this bucket
non-resident items
ep.num.non.resident
(long_gauge) ()
Number of non-resident items
ops del meta
ep.num.ops.del.meta
(double_counter) ()
Number of delete operations for this bucket as the target for XDCR
ops del ret meta
ep.num.ops.del.ret.meta
(double_counter) ()
Number of delRetMeta operations for this bucket as the target for XDCR
ops get meta
ep.num.ops.get.meta
(double_counter) ()
Number of read operations for this bucket as the target for XDCR
ops set meta
ep.num.ops.set.meta
(double_counter) ()
Number of set operations for this bucket as the target for XDCR
ops set rep meta
ep.num.ops.set.ret.meta
(double_counter) ()
Number of setRetMeta operations for this bucket as the target for XDCR
ejects
ep.num.value.ejects
(double_counter) ()
Number of times item values got ejected from memory to disk
ooms
ep.oom.errors
(long_gauge) ()
Number of times unrecoverable OOMs happened while processing operations
create ops
ep.ops.create
(double_counter) ()
Create operations
update ops
ep.ops.update
(double_counter) ()
Update operations
overhead
ep.overhead
(long_gauge) ()
Extra memory used by transient data like persistence queues or checkpoints
queue size
ep.queue.size
(long_gauge) ()
Number of items queued for storage
resident items
ep.resident.items.rate
(double_gauge) ()
Number of resident items
drain items
ep.tap.replica.queue.drain
(double_counter) ()
Total drained items in the replica queue
drain items
ep.tap.total.queue.drain
(double_counter) ()
Total drained items in the queue
queued
ep.tap.total.queue.fill
(double_gauge) ()
Total enqueued items in the queue
backlog size
ep.tap.total.total.backlog.size
(long_gauge) ()
Number of remaining items for replication
ooms
ep.tmp.oom.errors
(double_counter) ()
Number of times recoverable OOMs happened while processing operations
vb total
ep.vb.total
(long_gauge) ()
Total number of vBuckets for this bucket
evictions
evictions
(double_counter) ()
Number of evictions
get hits
get.hits
(double_counter) ()
Number of get hits
get misses
get.misses
(double_counter) ()
Number of get misses
hibernated requests
hibernated.requests
(double_gauge) ()
Number of streaming requests idle
hibernated waked
hibernated.waked
(double_counter) ()
Rate of streaming request wakeups
hit ratio
hit.ratio
(double_gauge) ()
Hit ratio
increment hits
increment.hits
(double_counter) ()
Number of increment hits
increment misses
increment.misses
(double_counter) ()
Number of increment misses
actual free
mem.actual.free
(long_gauge) (bytes)
Actual free memory
actual used
mem.actual.used
(long_gauge) (bytes)
Used memory
free
mem.free
(long_gauge) (bytes)
Free memory
total
mem.total
(long_gauge) (bytes)
Total available memory
used
mem.used
(long_gauge) (bytes)
Engine's total memory usage (deprecated)
used sys
mem.used.sys
(long_gauge) (bytes)
System memory usage
misses
misses
(double_counter) ()
Total number of misses
ops
ops
(double_counter) ()
Total number of operations
faults
page.faults
(double_gauge) ()
Number of page faults
repl docs queue
replication.docs.rep.queue
(double_gauge) ()
repl meta latency aggr
replication.meta.latency.aggr
(double_gauge) ()
rest requests
rest.requests
(double_counter) (request)
Number of HTTP requests
swap total
swap.total
(long_gauge) (bytes)
Total amount of swap available
swap used
swap.used
(long_gauge) (bytes)
Amount of swap used
vb active eject
vb.active.eject
(double_counter) (items)
Number of items being ejected to disk from active vBuckets
vb active item mem
vb.active.itm.memory
(long_gauge) ()
Amount of active user data cached in RAM in this bucket
vb active meta mem
vb.active.meta.data.memory
(long_gauge) ()
Amount of active item metadata consuming RAM in this bucket
vb active num non resident
vb.active.num.non.resident
(long_gauge) ()
Number of non resident vBuckets in the active state for this bucket
vb active num
vb.active.num
(long_gauge) ()
Number of active items
vb active ops create
vb.active.ops.create
(double_counter) (items)
New items being inserted into active vBuckets in this bucket
vb active ops update
vb.active.ops.update
(double_counter) (items)
Number of items updated on active vBucket for this bucket
vb active queue age
vb.active.queue.age
(long_gauge) (ms)
Sum of disk queue item age
vb active queue drain
vb.active.queue.drain
(double_counter) ()
Total drained items in the queue
vb active queue fill
vb.active.queue.fill
(double_counter) (items)
Number of active items being put on the active item disk queue
vb active queue size
vb.active.queue.size
(long_gauge) ()
Number of active items in the queue
vb active resident items ratio
vb.active.resident.items.ratio
(double_gauge) (%)
Number of resident items
vb avg active queue age
vb.avg.active.queue.age
(double_gauge) (sec)
Average age in seconds of active items in the active item queue
vb avg pending queue age
vb.avg.pending.queue.age
(double_gauge) (sec)
Average age in seconds of pending items in the pending item queue
vb avg replica queue age
vb.avg.replica.queue.age
(double_gauge) (sec)
Average age in seconds of replica items in the replica item queue
vb avg total queue age
vb.avg.total.queue.age
(double_gauge) (sec)
Average age of items in the queue
vb pending curr item
vb.pending.curr.items
(long_gauge) ()
Number of items in pending vBuckets
vb pending eject
vb.pending.eject
(double_counter) (items)
Number of items being ejected to disk from pending vBuckets
vb pending item mem
vb.pending.itm.memory
(double_gauge) ()
Amount of pending user data cached in RAM in this bucket
vb pending meta mem
vb.pending.meta.data.memory
(double_gauge) ()
Amount of pending item metadata consuming RAM in this bucket
vb pending num non resident
vb.pending.num.non.resident
(double_gauge) ()
Number of non resident vBuckets in the pending state for this bucket
vb pending num
vb.pending.num
(double_gauge) ()
Number of pending items
vb pending ops create
vb.pending.ops.create
(double_counter) ()
Number of pending create operations
vb pending ops update
vb.pending.ops.update
(double_counter) (items)
Number of items updated on pending vBucket for this bucket
vb pending queue age
vb.pending.queue.age
(double_gauge) (ms)
Sum of disk pending queue item age
vb pending queue drain
vb.pending.queue.drain
(double_counter) ()
Total drained pending items in the queue
vb pending queue fill
vb.pending.queue.fill
(double_counter) ()
Total enqueued pending items in disk queue
vb pending queue size
vb.pending.queue.size
(double_gauge) ()
Number of pending items in the queue
vb pending resident items ratio
vb.pending.resident.items.ratio
(double_gauge) ()
Number of resident pending items
vb replica curr items
vb.replica.curr.items
(long_gauge) ()
Number of in memory items
vb replica eject
vb.replica.eject
(double_counter) (items)
Number of items being ejected to disk from replica vBuckets
vb replica item mem
vb.replica.itm.memory
(long_gauge) ()
Amount of replica user data cached in RAM in this bucket
vb replica meta data mem
vb.replica.meta.data.memory
(long_gauge) (bytes)
Total metadata memory
vb replica num non resident
vb.replica.num.non.resident
(long_gauge) ()
Number of non resident vBuckets in the replica state for this bucket
vb replica num
vb.replica.num
(long_gauge) ()
Number of replica vBuckets
vb replica ops create
vb.replica.ops.create
(double_counter) ()
Number of replica create operations
vb replica ops update
vb.replica.ops.update
(double_counter) (items)
Number of items updated on replica vBucket for this bucket
vb replica queue age
vb.replica.queue.age
(long_gauge) (ms)
Sum of disk replica queue item age
vb replica queue drain
vb.replica.queue.drain
(double_counter) ()
Total drained replica items in the queue
vb replica queue fill
vb.replica.queue.fill
(double_counter) ()
Total enqueued replica items in disk queue
vb replica queue size
vb.replica.queue.size
(long_gauge) ()
Replica items in disk queue
vb replica resident items ratio
vb.replica.resident.items.ratio
(double_gauge) (%)
Number of resident replica items
vb total queue age
vb.queue.age.total
(long_gauge) (ms)
Sum of disk queue item age
XDCR ops
xdc.ops
(double_counter) ()
Number of cross-datacenter replication operations
active items
items.active
(long_gauge) ()
Number of active items in memory
total items
items.total
(long_gauge) ()
Total number of items
data size
docs.size
(long_gauge) (bytes)
Couch docs data size
data disk size
docs.disk.actual.size
(long_gauge) (bytes)
Couch docs total size on disk
views size
views.size
(long_gauge) (bytes)
Couch views data size
views disk size
views.disk.size
(long_gauge) (bytes)
Couch views data size on disk
memory items
items.replica
(long_gauge) ()
Number of in memory items
cores
cores
(long_gauge) ()
Cores
gc num
gc.num
(counter) ()
Number of objects garbage collected
gc pause percent
gc.pause.percent
(gauge) (%)
Garbage collection pause percentage
gc pause time
gc.pause.time
(counter) (seconds)
Garbage collection pause time
system memory
memory.system
(long_gauge) (bytes)
Memory used by the system
total memory
memory.total
(long_gauge) (bytes)
Memory used by Couchbase over the total period of time
usage memory
memory.usage
(long_gauge) (bytes)
Memory currently used by Couchbase
active requests
request.active.count
(long_gauge) ()
Number of active requests
requests completed
request.completed.count
(counter) ()
Number of requests completed
request prepared percent
request.prepared.percent
(gauge) (%)
Percentage of requests prepared
request time mean
request.time.mean
(gauge) (seconds)
Average request time
total threads
threads.total
(long_gauge) ()
total_threads