Configuring Persistent Storage

Platform9 Monitoring deploys Prometheus, Alertmanager and Grafana in a single click on all clusters, this deployment leverages ephemeral storage. To configure Platform9 Monitoring to use persistent storage a storage class must be added to the cluster and the monitoring deployment updated to consume the storage class using Kubectl.

Warning

To enable persistent storage you must have a Storage Class configured and able to provision persistent volume claims.

Add a Storage Class to Prometheus

The first step is to setup a storage class, if your cluster is running without storage follow the guide to setup the PortWorx CSI.

Once you have a storage class configured run the Kubectl command below to edit the deployment:

kubectl -n pf9-monitoring edit prometheus system

Info

Editing the running configuration uses the linux command line text editor Vi. For help with Vi view this guide.

The default configuration is below, this configuration needs to be updated with a valid storage specification.

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  creationTimestamp: "2021-01-15T18:09:32Z"
  generation: 1
  managedFields:
  - apiVersion: monitoring.coreos.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences: {}
      f:spec:
        .: {}
        f:additionalScrapeConfigs:
          .: {}
          f:key: {}
          f:name: {}
        f:alerting:
          .: {}
          f:alertmanagers: {}
        f:replicas: {}
        f:resources:
          .: {}
          f:requests:
            .: {}
            f:cpu: {}
            f:memory: {}
        f:retention: {}
        f:ruleSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:prometheus: {}
            f:role: {}
        f:rules:
          .: {}
          f:alert: {}
        f:scrapeInterval: {}
        f:serviceAccountName: {}
        f:serviceMonitorSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:prometheus: {}
            f:role: {}
    manager: promplus
    operation: Update
    time: "2021-01-15T18:09:32Z"
  name: system
  namespace: pf9-monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: false
    kind: Deployment
    name: monhelper
    uid: cbc48a82-3c1f-4a2b-9b2a-ebbc32ae2e65
  resourceVersion: "2733"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/pf9-monitoring/prometheuses/system
  uid: c1722922-4973-4973-8e29-ba0269ad9a79
spec:
  additionalScrapeConfigs:
    key: additional-scrape-config.yaml
    name: scrapeconfig
  alerting:
    alertmanagers:
    - name: sys-alertmanager
      namespace: pf9-monitoring
      port: web
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
  retention: 7d
  ruleSelector:
    matchLabels:
      prometheus: system
      role: alert-rules
  rules:
    alert: {}
  scrapeInterval: 2m
  serviceAccountName: system-prometheus
  serviceMonitorSelector:
    matchLabels:
      prometheus: system
      role: service-monitor

The deployment needs to have the following storage section added. The storage class name must be updated to match your cluster and the amount of storage should also be specified.

The storage class in this example is running on Portworx Storage, to add Portworx see the PortWorx CSI guide.

storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: portworx-csi-sc

The final configuration should match the configuration below.

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  creationTimestamp: "2021-02-23T02:06:49Z"
  generation: 2
  managedFields:
  - apiVersion: monitoring.coreos.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences: {}
      f:spec:
        .: {}
        f:additionalScrapeConfigs:
          .: {}
          f:key: {}
          f:name: {}
        f:alerting:
          .: {}
          f:alertmanagers: {}
        f:replicas: {}
        f:resources:
          .: {}
          f:requests:
            .: {}
            f:cpu: {}
            f:memory: {}
        f:retention: {}
        f:ruleSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:prometheus: {}
            f:role: {}
        f:rules:
          .: {}
          f:alert: {}
        f:scrapeInterval: {}
        f:serviceAccountName: {}
        f:serviceMonitorSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:prometheus: {}
            f:role: {}
    manager: promplus
    operation: Update
    time: "2021-02-23T02:06:49Z"
  - apiVersion: monitoring.coreos.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:storage:
          .: {}
          f:volumeClaimTemplate:
            .: {}
            f:spec:
              .: {}
              f:accessModes: {}
              f:resources:
                .: {}
                f:requests:
                  .: {}
                  f:storage: {}
              f:storageClassName: {}
    manager: kubectl
    operation: Update
    time: "2021-02-23T03:44:02Z"
  name: system
  namespace: pf9-monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: false
    kind: Deployment
    name: monhelper
    uid: 9dd25109-bef1-4510-b261-16dd7a62d4bd
  resourceVersion: "169910"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/pf9-monitoring/prometheuses/system
  uid: ce60b009-cb28-4ba9-9db3-9745d14bf267
spec:
  additionalScrapeConfigs:
    key: additional-scrape-config.yaml
    name: scrapeconfig
  alerting:
    alertmanagers:
    - name: sys-alertmanager
      namespace: pf9-monitoring
      port: web
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
  retention: 7d
  ruleSelector:
    matchLabels:
      prometheus: system
      role: alert-rules
  rules:
    alert: {}
  scrapeInterval: 2m
  serviceAccountName: system-prometheus
  serviceMonitorSelector:
    matchLabels:
      prometheus: system
      role: service-monitor
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: portworx-csi-sc

Troubleshooting

To see if the deployment is healthy run kubectl -n pf9-monitoring get allThe resulting output should show all services in a running state. If any pods or services are in a creating state rerun the command again.``

If there is an issue the prometheus-system-0 pods will fail to start or enter crashLoopBackoff.

{% tabs %} {% tab language="bash" title="Bash" %} {% code %}

kubectl -n pf9-monitoring get all
NAME                                      READY   STATUS              RESTARTS   AGE
pod/alertmanager-sysalert-0               2/2     Running             0          9s
pod/grafana-695dccdd85-97gwb              0/2     ContainerCreating   0          4s
pod/kube-state-metrics-68dfc664dc-4hgt8   1/1     Running             0          2m12s
pod/node-exporter-857v7                   1/1     Running             0          114s
pod/node-exporter-98zch                   1/1     Running             0          114s
pod/node-exporter-jkv77                   1/1     Running             0          114s
pod/node-exporter-qq9xf                   1/1     Running             0          114s
pod/prometheus-system-0                   3/3     Running             0          9s

NAME                            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated   ClusterIP   None          <none>        9093/TCP,9094/TCP,9094/UDP   9s
service/grafana-ui              ClusterIP   10.21.1.130   <none>        80/TCP                       4s
service/kube-state-metrics      ClusterIP   None          <none>        8443/TCP,8081/TCP            2m21s
service/node-exporter           ClusterIP   None          <none>        9100/TCP                     118s
service/prometheus-operated     ClusterIP   None          <none>        9090/TCP                     9s
service/sys-alertmanager        ClusterIP   10.21.1.92    <none>        9093/TCP                     9s
service/sys-prometheus          ClusterIP   10.21.2.148   <none>        9090/TCP                     9s

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   4         4         4       4            4           kubernetes.io/os=linux   114s

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana              0/1     1            0           4s
deployment.apps/kube-state-metrics   1/1     1            1           2m12s

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-695dccdd85              1         1         0       4s
replicaset.apps/kube-state-metrics-68dfc664dc   1         1         1       2m12s

NAME                                     READY   AGE
statefulset.apps/alertmanager-sysalert   1/1     9s
statefulset.apps/prometheus-system       1/1     9s

Get Monitoring Pod Status

Run kubectl -n pf9-monitoring describe pod prometheus-system-0and review the events output. The output will show any errors impacting the Pod state. For example, prometheus is failing to start because the PVC cannot be found. To solve this issue the PVC must be manually recreated using Kubectl to apply the Solution example.

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" is being deleted
  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" not found
  Warning  FailedScheduling  <unknown>  default-scheduler  persistentvolumeclaim "prometheus-system-db-prometheus-system-0" not found
{% endtab %}
{% tab language="yaml" title="Solution: Create PVC YAML" %}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
   name: prometheus-system-db-prometheus-system-0
spec:
   storageClassName: portworx-csi-sc
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
       storage: 2Gi

View Prometheus Container Logs

If the Pod events do not indicate that the issue is within Kubernetes itself it can be useful to look at the container logs for the Prometheus logs. To do this from the Platfrom9 SaaS Management Plane navigate to the Workloads dashboard and select the Pods tab. Filter the table to your cluster and set the namespace to pf9-monitoring. Once the table updates click the view logs link for the prometheus-system-0 container. This will open the container logs in a new tab within your browser.

Below is an example permissions error preventing the Pod from starting on each node.

level=info ts=2021-02-23T06:26:49.738Z caller=main.go:331 msg="Starting Prometheus" version="(version=2.16.0, branch=HEAD, revision=b90be6f32a33c03163d700e1452b54454ddce0ec)"
level=info ts=2021-02-23T06:26:49.738Z caller=main.go:332 build_context="(go=go1.13.8, user=root@7ea0ae865f12, date=20200213-23:50:02)"
level=info ts=2021-02-23T06:26:49.738Z caller=main.go:333 host_details="(Linux 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 prometheus-system-0 (none))"
level=info ts=2021-02-23T06:26:49.738Z caller=main.go:334 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-02-23T06:26:49.738Z caller=main.go:335 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-02-23T06:26:49.739Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffca7417a5a, 0xb, 0x14, 0x2c90040, 0xc0006a0510, 0x2c90040)
	/app/promql/query_logger.go:117 +0x4cd
main.main()
	/app/cmd/prometheus/main.go:362 +0x5243

Incorrect Storage Class Name

If you incorrectly specify the storage class name you will need first update the prometheus configuration and then delete the persistent volume claim by running: kubectl delete pvc <pvc-name> -n pf9-monitoring

Once the PVC is deleted the the Pods will start up and claim a new PVC.

PreviousCluster Types NextDisallowing Workloads On Master Node

Last updated 1 month ago

Was this helpful?

Good night

hashtagAdd a Storage Class to Prometheus

hashtagTroubleshooting

hashtagGet Monitoring Pod Status

hashtagView Prometheus Container Logs

hashtagIncorrect Storage Class Name

Add a Storage Class to Prometheus

Troubleshooting

Get Monitoring Pod Status

View Prometheus Container Logs

Incorrect Storage Class Name