0%

Grafana各类指标理解

今天来梳理一下Grafana图表及其后面的公式。

Kubernetes / Compute Resources / Cluster

CPU Utilisation

1
1 - avg(rate(node_cpu_seconds_total{mode="idle", cluster=""}[1m]))

在prometheus上面查询指标

node_cpu_seconds_total{mode="idle"}

1
2
3
4
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-master",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-t9ljw",service="node-exporter"}	3102.08
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-node1",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-7vq8n",service="node-exporter"} 3046.73
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-node2",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-vg596",service="node-exporter"} 3069.61
node_cpu_seconds_total{cpu="1",endpoint="https",instance="k8s-master",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-t9ljw",service="node-exporter"} 3096.23

所以CPU Utilisation算的是各节点CPU利用率的平均值。

job=”node-exporter”

CPU Usage

1
sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=""}) by (namespace)

在prometheus上面查询指标

namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=""}

1
2
3
4
5
6
7
8
9
10
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="addon-resizer",namespace="monitoring",pod="kube-state-metrics-65d5b4b99d-llrjd"}	0.00022111388787432652
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-0"} 0.00275714828677409
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-1"} 0.0029093557196228424
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-2"} 0.0027905491021107728
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-kube-controllers",namespace="kube-system",pod="calico-kube-controllers-5598cf8794-8mgdz"} 0.0009434578301088127
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-jtvh8"} 0.017518785546616542
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-k6m8t"} 0.022689515968190806
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-rb9qx"} 0.01819155156978804
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="config-reloader",namespace="monitoring",pod="alertmanager-main-0"} 0.0000029602042055748096
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="config-reloader",namespace="monitoring",pod="alertmanager-main-1"} 0.0000024442638833786885

Memory

1
sum(container_memory_rss{cluster="", container!=""}) by (namespace)

在prometheus上面查询指标

container_memory_rss

1
2
container_memory_rss{container="POD",container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1a047e8b0c961b34e915140fc2a8711c.slice/docker-20e1377aeb77873fcf4ac5e4380d47f28c0f594773ba047442b00dfc6f116837.scope",image="k8s.gcr.io/pause:3.1",instance="172.16.64.233:10250",job="kubelet",name="k8s_POD_etcd-k8s-master_kube-system_1a047e8b0c961b34e915140fc2a8711c_14",namespace="kube-system",node="k8s-master",pod="etcd-k8s-master",pod_name="etcd-k8s-master",service="kubelet"}	45056
container_memory_rss{container="POD",container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod31622d49_04a8_4a95_8b80_736012e85215.slice/docker-c4e100bf3571c0fa25537cbd0ea7839bf2fd1486462a7b2626a552dfcf7503ec.scope",image="k8s.gcr.io/pause:3.1",instance="172.16.64.232:10250",job="kubelet",name="k8s_POD_nginx-deployment-6f89946645-pwpf7_default_31622d49-04a8-4a95-8b80-736012e85215_13",namespace="default",node="k8s-node1",pod="nginx-deployment-6f89946645-pwpf7",pod_name="nginx-deployment-6f89946645-pwpf7",service="kubelet"} 45056

job=”kubelet”

Kubernetes / Compute Resources / Namespace (Pods)

1
sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster="", namespace="monitoring"}) by (pod)

Kubernetes / Compute Resources / Pod

显示各个选中Pod中,各个Container的状态。

sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace=”default”, pod=”nginx-deployment-6f89946645-pwpf7”, container!=”POD”, cluster=””}) by (container)

#

现在的逻辑就是要把container的指标打上pod的标签

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)