About Platform9 Monitoring
Platform9 deploys monitoring with every cluster to ensure that workloads run with a purpose built metrics platform. Platform9 Monitoring provides metrics across all nodes in a cluster as well as metrics from the Kubernetes cluster itself; covering pod metrics, cluster metrics, API Server metrics and OS level collections. Platform9 Monitoring consists of Kubernetes native open source monitoring tools. When enabled Monitoring deploys a pre-configured and integrated Prometheus, Alertmanager and Grafana that instantly provides insight into every aspect of the Kubernetes cluster, including a set of rules that fire alerts for the most critical of Kubernetes infrastructure.
The Kubernetes and the cloud native community have made monitoring a standard for operating infrastructure and containerized applications.
Why has monitoring become a required standard? Workloads running within Kubernetes are fundamentally different, not only has the infrastructure changed, so too has the application architecture. The result of this change is an environment that is significantly more complex than most virtualized workloads.
Historically application teams and infrastructure teams tracked stateful workloads that persisted on a relatively stable hypervisor platforms that infrequently changed or were upgraded. Issues manifested in
RAM over allocation and
storage IOPS bottlenecks that with the simple tooling where easy to find and solve.
Advanced application issues required stepping into the application layer, complicating the monitoring and troubleshooting process with the requirement to understand trace level data, application heap usage, database performance and API response times. However, these workloads were fairly stable and static, in part dues to the complexity of monolithic application deployments. Containers and cloud native applications have transformed previously static and unchanging applications. Cloud native applications change faster, scaling up & down and leverage infrastructure that is also more dynamic and ’needs’ based. Gone are simple and easy application root cause analytics.
As a provider and operator of Managed Virtualization & Kubernetes platforms we face the challenges of dynamic environments everyday.
Platform9 Monitoring is designed to help DevOps, SRE teams, ITOps and application managers quickly, easily and sustainably scale and operate multiple clusters across BareOS and Public Clouds. Providing certainty in deployments, observability of performance, notifications on availability and a mechanism to investigate
root-cause that helps restore service faster.
Why is Monitoring Important
As Kubernetes workloads are transient and often stateless investigating issues retroactively can be almost impossible. This results in prolonged customer facing issues and costly investigations that impact release schedules and often result in complete outages. By having monitoring built-in by default many issues are tracked, captured and persisted in a stateful platform that enables root-cause investigation in all environments; development, QA, Staging and Production. A repeatable and scalable monitoring platform ensures that issues are found, causes are identified and ultimately fixes are implemented and product quality improves.
Prometheus has fast become the default open source monitoring tool of choice for Kubernetes and many non cloud native environments. Able to run as a binary or container Prometheus scrapes (remote collection) data from Exporters, stores the data in a time series database and integrates with Alert Manager for notifications and Grafana for dashboard visualization.
Alertmanager operates by listing for alerts that are generated by rules that are periodically evaluating the metrics collected by Prometheus. Alertmanger is able to deduplicate, group and route alerts to downstream receivers. Alertmanger supports native integrations with email servers, PagerDuty, VictorOps, OpsGenie and any webhook based interface. Alertmanager is also able to assist with the silencing and inhibition of alerts, which means that during change windows or prolonged outages users are able to selectively pause notifications.
Grafana provides a simple and open source metric analytics & visualization tool. First released in 2014 Grafana is leveraged by thousands of users to visualize time series data from an extensive library of infrastructure and application platforms including Prometheus.
Why is Platform9 Monitoring in Early Access?
Cluster monitoring is great and we don’t want to stop there. Monitoring is being built to provide always-on, access anywhere, cross cluster monitoring. Today, Platform9 Monitoring deploys into each cluster, relying on the clusters availability for data, processing, alerts and visualization, future versions will leverage the Platform9 Management Plane to centralize metrics that are critical to your infrastructure health and business performance.
How to: Enabling Monitoring in a Kubernetes Cluster.
Monitoring is enabled by default for all newly created Kubernetes clusters (you may opt out in the cluster creation wizard). If a cluster is already running you may enable Monitoring from the Infrastructure Clusters dashboard by selecting the desired cluster and clicking the Enable Monitoring button. Once enabled Platform9 will create a dedicated
Namespace within the cluster and utilize an Operator to setup and run Prometheus, Alertmanger and Grafana. After the deployment completes a Grafana Dashboard link will be available on the Infrastructure Clusters view as well as the Cluster Details view.
For additional details on enabling Monitoring see the PMK Guide.