Datadog vs Grafana vs AWS CloudWatch

Rajith
Innovation Incubator
10 min readApr 4, 2022

--

Today, enterprises are choosing to operate large numbers of servers both in the cloud and in their data centers to meet the ever-increasing demand. As organizations adopt more cloud-native technologies and IT infrastructure is becoming increasingly distributed, organizations must align business objectives and end-user experience with the availability and performance of the IT infrastructure. This shift requires infrastructure monitoring to ensure all our components work together across cloud environments, operating systems, storage, servers, virtualized systems, etc. In this post, we’ll explore the three best monitoring tools available.

List of tools we will compare:

  • Datadog
  • AWS CloudWatch
  • Grafana

Datadog

Datadog is monitoring, security, and analytics platform for developers, IT operations teams, security engineers, and business users in the cloud age. It can perform effective monitoring of servers, tools, and databases.

It helps users see inside any stack, at any scale, any app, and anywhere. It has been one of the pioneering tools to have a focus on infrastructure monitoring. The perfect merger of monitoring app performance, infrastructure, logs, and user experience is what makes it special.

Grafana

Grafana allows us to query, visualize, alert on, and understand our metrics no matter where they are stored. Create, explore, and share beautiful dashboards with our team and foster a data-driven culture. Grafana is primarily used to visualize our time-series database data into meaningful charts from which we can draw insights. Grafana can be used to build an open-source stack for APM, time-series, and logs monitoring.

AWS Cloudwatch

Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), IT managers, and product owners. CloudWatch provides us with data and actionable insights to monitor our applications, respond to system-wide performance changes, and optimize resource utilization. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. We get a unified view of operational health and gain complete visibility of our AWS resources, applications, and services running on AWS and on-premises. We can use CloudWatch to detect anomalous behavior in our environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep our applications running smoothly.

Key Differences

  1. Getting started with

If we are using AWS services, then CloudWatch already offers a default console to monitor the services we use in our AWS account.

For using Datadog, we first need to sign up for a Datadog account. Once we sign up, we can install Datadog agents on our hosts. The Datadog agent reports metrics and events from our host to Datadog.

For getting started with Grafana, we first need to install it. Once Grafana is installed we can connect it to our desired data source and start visualizing the data.

Some of the popular data sources that Grafana supports are:

  • Prometheus
  • Jaeger
  • Zipkin
  • AWS CloudWatch
  • Graphite
  • Azure Monitor

2. Multi-cloud support

Datadog supports multi-cloud monitoring like AWS, Azure, and Google cloud services.

CloudWatch is used to monitor AWS resources and applications that run on it. We can use the CloudWatch Logs agent installer on an instance to install and configure the CloudWatch Logs agent. After installation is complete, logs automatically flow from the instance to the log stream we create while installing the agent. The agent confirms that it has started and it stays running until we disable it.

Grafana supports multi-cloud monitoring with the help of plugins.

3. Pricing

With Amazon CloudWatch, there is no up-front commitment or minimum fee; we simply pay for what we use. We will be charged at the end of the month for our usage. CloudWatch provides a free tier that we can explore. CloudWatch’s paid tier called EC2 detailed monitoring starts at $2.10 per instance per month(assuming 7 metrics per instance). The cost also depends on the number of metrics sent and is divided into multiple tiers. The first 10k metrics are charged at $0.30 per metric per month.

Datadog is an expensive enterprise monitoring tool with many different pricing tiers that vary on our use cases. For example, infrastructure enterprise monitoring starts at $23 per host per month while its APM sand continuous profiler starts at $40 per host per month.

The open-source version of Grafana comes for free, although we do need to account for the cost of data storage and networking. GrafanaLabs offers paid cloud plans starting at $49 per month, which scale up based on usage.

4. Dashboard

It offers better visibility and UI experience. Since the Datadog interactions with the CloudWatch are through the CloudWatch API, so it exposes more metrics

By default, this is the landing page after we login to Datadog

CloudWatch enables us to create custom dashboards with the metrics and logs. Some features are missing like, grouping servers.

CloudWatch dashboard

Grafana also offers better Visibility and UI experience for the end-users. Custom dashboard creation is a bit complex in Grafana.

Grafana dashboard example

5. Alerts and notification management

Monitoring all of our infrastructures in one place wouldn’t be complete without the ability to know when critical changes are occurring.

Datadog integrates with partners like PagerDuty to ensure our on-call team members can be added to incidents and appropriately notified. Datadog also supports email and slack integration for notification.

AWS Cloudwatch has Alarms for alerts and several actions can be taken as part of this Alarm. For example, when there is high CPU or memory usage in web servers we can initiate an alarm to trigger autoscaling. Alarms also come with notification provision via email and SMS, it can also be integrated with PagerDuty for call alerts.

In Grafana, when an alert changes state, it sends out notifications. Each alert rule can have multiple notifications. To add a notification to an alert rule we first need to add and configure a notification channel (can be email, PagerDuty, or other integration)

6. Log management

Logging the important parts of our system’s operations is crucial for maintaining infrastructure health. Modern infrastructure can generate thousands of log events per minute. In this situation, we need to choose which logs to send to a log management solution, and which logs to archive. Filtering our logs before sending them, however, may lead to gaps in coverage or the accidental removal of valuable data.

Datadog Log Management, also referred to as Datadog logs or logging removes these limitations by decoupling log ingestion from indexing. This enables us to cost-effectively collect, process, archive, explore, and monitor all of our logs without limitations, also known as Logging without Limits.

CloudWatch Logs enables us to centralize the logs from all of our systems, applications, and AWS services that we use, in a single, highly scalable service. we can then easily view them, search them for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis. CloudWatch Logs enables us to see all of our logs, regardless of their source, as a single and consistent flow of events ordered by time, and we can query them and sort them based on other dimensions, group them by specific fields, create custom computations with a powerful query language, and visualize log data in dashboards.

Grafana Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.

7. Metrics management

CloudWatch Metrics are data about the performance of our systems. By default, many services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances). we can also enable detailed monitoring for some resources, such as our Amazon EC2 instances, or publish our application metrics. Amazon CloudWatch can load all the metrics in our account (both AWS resource metrics and application metrics that we provide) for search, graphing, and alarms.

In Datadog, metric data is ingested and stored as data points with a value and timestamp. A sequence of data points is stored as a time series. Any metrics with fractions of a second timestamp are rounded to the nearest second. If any points have the same timestamp, the latest point overwrites the previous ones. Metrics that track system health come automatically through Datadog’s integrations with more than 500 services. We can also track metrics that are specific to our business — also known as custom metrics. We can track things such as the number of user logins or user cart sizes to the frequency of our team’s code commits. In addition, metrics can help us adjust the scale of our environment to meet the demand from our customers. Knowing exactly how much we need to consume in resources can help us save money or improve performance.

Grafana Metrics are stored in a time-series database (TSDB), like Prometheus, by recording a metric and pairing that entry with a time stamp. Each TSDB uses a slightly different data model, but all combine these two aspects and Grafana Cloud can accept their different metrics formats for visualization. Grafana and Grafana Cloud offer a variety of visualizations to suit different use cases.

8. Compliances

Health organizations need to adhere to HIPAA compliance requirements for application logs as they grow and scale. Audit logs must be safely collected from every service within an organization’s system and stored for six years in case they are needed for an internal or HHS investigation. With Datadog’s HIPAA-compliant log management and security solutions, organizations can capture and store audit logs on a long-term basis, leverage their logs to verify their level of compliance with other HIPAA provisions, and automatically detect security threats in real-time.

Third-party auditors assess the security and compliance of Amazon CloudWatch as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others. Amazon CloudWatch itself does not produce, store, or transmit PHI. Customers can monitor CloudWatch API calls with AWS CloudTrail.

Grafana Labs maintains PCI Compliance through third-party approved scanning vendors. Its also Certified through an independent third-party audit with A-LIGN for ISO 27001.

9. Support

Datadog offers a support platform at help.datadoghq.com and live-chat with Datadog Support Team on any business day between the hours of 10:00 and 19:00 ET. They also have a slack channel of community members and Datadog staff to discuss the latest Datadog announcements and features, get assistance with questions we have, and more

AWS Support gives customers help on technical issues and additional guidance to operate their infrastructures in the cloud. Customers can choose a tier that meets their specific requirements, continuing the AWS tradition of providing the building blocks of success without bundling or long-term commitments. AWS Support is one-on-one, fast-response support from experienced technical support engineers. The service helps customers use AWS’s products and features. With pay-by-the-month pricing and unlimited support cases, customers are freed from long-term commitments. Customers with operational issues or technical questions can contact a team of support engineers and receive predictable response times and personalized support.

There are differing types of Grafana Cloud account options, including a free tier. Each has a different feature set and different levels of support provided. There are 3 types of Grafana Cloud accounts Free, Pro and Advanced. Support is limited to documentation set and queries in the public community forums in the Free account, whereas email support is included in the Pro account. In Advanced support both email support and call support is included

10. Serverless APM Monitoring

Datadog gives us the ability to view all of those metrics, logs, and traces from our serverless applications in one place. With the help of a lambda function called Datadog Forwarder, Datadog can export custom matrices from Lambda. AWS CloudWatch provides effective Serverless APM monitoring with the help of Amazon X-Ray. Prebuilt dashboards are available in Grafana for serverless monitoring. We can also create custom dashboards

Evaluation Based Our Organizational Needs

We plan to host the infra and the application in the AWS Cloud environment. The following image shows the scoring done based on some evaluation criteria as per our organizational needs.

During the assessment process, we defined two main criteria:

  • Value
  • Ability to Execute

Value is nothing but the combination of Functional Capabilities and Non-Functional capabilities. Ability to Execute stands for some general properties which include the ability to integrate, the market adoption, install-base, and pricing.

The score is nothing but the value assigned to rate each product feature. The score includes the value from 0–5.

Here we assume the weights in the following way.

  • Value-50%
  • Functional Capabilities-50%
  • Non-Functional Capabilities-50%
  • Ability to Execute-50%
  • Integration-75%
  • Market Adoption-25%

We can choose the most suitable tool based on the highest aggregate Value and the Ability to Execute.

Conclusion

The comparative study shows that no tool can be said to be perfect. We have to choose the tool based on our application architecture and organizational needs. For example, if we are planning to launch infra in AWS it would be perfect to use AWS CloudWatch for metrics, logs, dashboards .. as it already has preconfigured configurations for this. If we are planning low-cost monitoring It would be better to go for Grafana as it’s open-source and free. Datadog is compatible with many infra but its price is variable and usually determined by the number of agents/hosts installed on our system; we also have to purchase each feature individually. AWS CloudWatch can satisfy almost all monitoring needs. It does require the installation of a plugin or agent in most cases of CloudWatch. Also, the pay-as-you-go concept provides this at a low cost. Each tool can be leveraged depending on its audience, pricing, and ultimate application.

Reference

  1. https://aws.amazon.com/cloudwatch/features/
  2. https://www.datadoghq.com/product/
  3. https://grafana.com/oss/grafana/

Co-Authors: Aaron Anil Augustine Chalissery Subash

--

--

Rajith
Innovation Incubator

DevOps | AWS | CKA | Docker l Git l Ansible | Linux