Nothing Special   »   [go: up one dir, main page]

DevOps Blog

Explained: Monitoring & Telemetry in DevOps

Test Automation Frameworks: The Ultimate Guide Test Automation Frameworks: The Ultimate Guide
4 minute read
Muhammad Raza

DevOps is a data-driven software development lifecycle (SDLC) framework. DevOps engineers analyze logs and metrics data generated across all software components and the underlying hardware infrastructure. This helps them understand a variety of areas:

  • Application and system performance
  • Usage patterns
  • Bugs
  • Security and regulatory issues
  • Opportunities for improvement

Extensive application monitoring and telemetry is required before an application achieves the coveted Service Level Agreement (SLA) uptime of five 9’s or more: available at least 99.999% of the time. But what exactly is monitoring and telemetry and how does it fit into a DevOps environment? Let’s discuss.

(This article is part of our DevOps Guide. Use the right-hand menu to go deeper into individual practices and concepts.)

DevOps Best Practice DevOps Best Practice

What is monitoring?

Monitoring is a common IT practice. In the context of DevOps, monitoring entails the process of collecting logs and metrics data to observe and detect performance and compliance at every stage of the SDLC pipeline. Monitoring involves tooling that can be programmed to

  • Procure specific log data streams
  • Produce an intuitive visual representation of the metrics performance
  • Create alerts based on specified criteria

The goals of monitoring in DevOps include:

  • Improve visibility and control of app components and IT infrastructure operations. Applications can range from cybersecurity to resource optimization. For instance, monitoring tools can alert incidents of network breaches and excessive network traffic at a specific node.
  • Monitor application performance issues, identify bugs, and understand how specific app components behave in production and test environments. Once deployed, monitoring tools alert on several metrics to track resource utilization and workload distribution. With this information, engineers can allocate resources to account for dynamic traffic and workload demands.
  • Understand user and market behavior. This information can help engineers make technical decisions such as adding a specific feature, removing a button, or investing in cloud resources to further improve the SLA performance. Proactive decision making in this regard helps organizations maintain and expand their market share in the competitive business landscape.

(Explore continuous delivery metrics, including monitoring.)

What is telemetry?

Telemetry is a subset of monitoring and refers to the mechanism of representing the measurement data provided by a monitoring tool. Telemetry can be seen as agents that can be programmed to extract specific monitoring data such as:

  • High-volume time-series information on resource utilization
  • Real-time alerting for specific incidents

DevOps monitoring vs telemetry

Consider the case of motor racing where fans get to see metrics such as top speed, G-forces, lap times, race position, and other information that displays on TV screens. These measurement displays refer to the telemetry.

Conversely, the process of installing sensors, extracting data, and providing a limited set of metrics information onto TVs is, in its entirety, called monitoring.

In the context of DevOps, some of the most common metrics measured are related to the health and performance of an application, and various corresponding metrics are always visible at the dashboard.

Monitoring challenges

Before discussing the various DevOps use cases of telemetry, let’s discuss the most common monitoring challenges facing DevOps organizations:

  • Operations personnel invest significant time and resources to find performance issues on the infrastructure and apps.
  • Devs frequently interrupt their development work to address new bugs and issues identified at the production stage.
  • The rapid release cycle approach makes apps prone to performance issues—thorough testing takes time and resources that may not be justified from a business perspective.
  • The deployment procedure is complex: engineers need to synchronize and coordinate multiple development workstreams, within microservices, multi-cloud, and hybrid IT
  • Anomalies are a sign of potential emerging issues. It’s important to identify and contain the damages before the impact is realized and spreads across the global user base.
  • Security and regulatory restrictions require organizations to exercise deep control and maintain visibility into the hardware resources operating sensitive user data and applications. This is challenging, especially when the underlying infrastructure is a cloud network operating off-premise by a third-party vendor that can offer only limited logs data, metrics information, and insights into the hardware components.

Monitoring & telemetry use cases

In order to address these challenges, DevOps teams use a variety of monitoring tools to carefully identify and understand patterns that could predict future performance of an app, service, or the underlying infrastructure.

Some of the common use cases of telemetry in DevOps include the following metrics and use cases:

Data analysis is necessary

Analysis follows monitoring. Telemetry doesn’t necessarily include analyzed and processed logs or metrics information. The decision making based on telemetry of log metrics requires extensive analysis of a variety of KPIs and can be integrated with the monitoring systems to trigger automated actions when necessary.

Related reading

New strategies for modern service assurance

86% of global IT leaders in a recent IDG survey find it very, or extremely, challenging to optimize their IT resources to meet changing business demands.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing [email protected].

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Muhammad Raza

Muhammad Raza is a Stockholm-based technology consultant working with leading startups and Fortune 500 firms on thought leadership branding projects across DevOps, Cloud, Security and IoT.