0%

Prometheus下拉列表里面的指标是怎么来的?

如果安装了alert-manager, kube-state-metrics这样组件,这些组件会提供/metrics接口,然后Prometheus就可以拉取这些接口,从而获取指标数据,便展示于Prometheus
Dashboard的下拉列表里面了。当然,在指标里面有可能还会加入一些标签,如job, instance之类的。

网上有人说Servicemonitor是exporter的抽象,我觉得是不对的。最多抽象了一部分而已。

一个Servicemonitor对应n个Targetes里面的记录。

当你删除Servicemonitor对象,在Targets页面,这个特定的Target肯定是不显示了,但并不代表Prometheus下拉列表中的指标会消失,指标还是存在的。只是有可能有些标签变化了,这个标签变化的还没有经过验证。

当你删除Pod时,比如把alert-manager的deployment删除,对应的Pod自动删除,这时,下拉列表对应的指标还存在,但却没有值了。

这个时候,如果把Prometheus重启一下,这个指标才会消失。

kube-state-metrics的pods删除后,kube-开始的指标,如:kube_pod_info,就没有值了。

node-exporterdaemonset删除后,node_cpu_seconds_total节点的指标就没值了。

今天来梳理一下Grafana图表及其后面的公式。

Kubernetes / Compute Resources / Cluster

CPU Utilisation

1
1 - avg(rate(node_cpu_seconds_total{mode="idle", cluster=""}[1m]))

在prometheus上面查询指标

node_cpu_seconds_total{mode="idle"}

1
2
3
4
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-master",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-t9ljw",service="node-exporter"}	3102.08
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-node1",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-7vq8n",service="node-exporter"} 3046.73
node_cpu_seconds_total{cpu="0",endpoint="https",instance="k8s-node2",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-vg596",service="node-exporter"} 3069.61
node_cpu_seconds_total{cpu="1",endpoint="https",instance="k8s-master",job="node-exporter",mode="idle",namespace="monitoring",pod="node-exporter-t9ljw",service="node-exporter"} 3096.23

所以CPU Utilisation算的是各节点CPU利用率的平均值。

job=”node-exporter”

CPU Usage

1
sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=""}) by (namespace)

在prometheus上面查询指标

namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=""}

1
2
3
4
5
6
7
8
9
10
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="addon-resizer",namespace="monitoring",pod="kube-state-metrics-65d5b4b99d-llrjd"}	0.00022111388787432652
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-0"} 0.00275714828677409
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-1"} 0.0029093557196228424
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="alertmanager",namespace="monitoring",pod="alertmanager-main-2"} 0.0027905491021107728
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-kube-controllers",namespace="kube-system",pod="calico-kube-controllers-5598cf8794-8mgdz"} 0.0009434578301088127
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-jtvh8"} 0.017518785546616542
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-k6m8t"} 0.022689515968190806
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="calico-node",namespace="kube-system",pod="calico-node-rb9qx"} 0.01819155156978804
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="config-reloader",namespace="monitoring",pod="alertmanager-main-0"} 0.0000029602042055748096
namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{container="config-reloader",namespace="monitoring",pod="alertmanager-main-1"} 0.0000024442638833786885

Memory

1
sum(container_memory_rss{cluster="", container!=""}) by (namespace)

在prometheus上面查询指标

container_memory_rss

1
2
container_memory_rss{container="POD",container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1a047e8b0c961b34e915140fc2a8711c.slice/docker-20e1377aeb77873fcf4ac5e4380d47f28c0f594773ba047442b00dfc6f116837.scope",image="k8s.gcr.io/pause:3.1",instance="172.16.64.233:10250",job="kubelet",name="k8s_POD_etcd-k8s-master_kube-system_1a047e8b0c961b34e915140fc2a8711c_14",namespace="kube-system",node="k8s-master",pod="etcd-k8s-master",pod_name="etcd-k8s-master",service="kubelet"}	45056
container_memory_rss{container="POD",container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod31622d49_04a8_4a95_8b80_736012e85215.slice/docker-c4e100bf3571c0fa25537cbd0ea7839bf2fd1486462a7b2626a552dfcf7503ec.scope",image="k8s.gcr.io/pause:3.1",instance="172.16.64.232:10250",job="kubelet",name="k8s_POD_nginx-deployment-6f89946645-pwpf7_default_31622d49-04a8-4a95-8b80-736012e85215_13",namespace="default",node="k8s-node1",pod="nginx-deployment-6f89946645-pwpf7",pod_name="nginx-deployment-6f89946645-pwpf7",service="kubelet"} 45056

job=”kubelet”

Kubernetes / Compute Resources / Namespace (Pods)

1
sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster="", namespace="monitoring"}) by (pod)

Kubernetes / Compute Resources / Pod

显示各个选中Pod中,各个Container的状态。

sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace=”default”, pod=”nginx-deployment-6f89946645-pwpf7”, container!=”POD”, cluster=””}) by (container)

#

现在的逻辑就是要把container的指标打上pod的标签

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)

配置文件,格式一般为application-{profile}.yaml

  • application.yaml
  • application-dev.yaml
  • application-prod.yaml

未指定任何profile时,会默认加载application.yaml配置文件。

所以,与环境无关的属性配置,都应该放到application.yaml文件中,与环境有关的,则放到相应的其他配置文件中。

激活profile

application.yaml文件里面指定。

1
2
3
spring:
profiles:
active: dev

在IDEA里面修改

编辑配置

方法1

EnvironmentVM options这一栏里面填入:-Dspring.profiles.active=dev

方法2

EnvironmentProgram arguments这一栏里面填入: --spring-profiles-active=dev

方法3

Spring bootActive profiles这一栏里面填入:dev

运行jar时指定参数

1
java -jar demo.jar --spring.profiles.active=dev

运行后,console里面会提示本次运行,是用到了哪个配置文件覆盖或追加了最基本的application.yaml文件。

1
2
[18:25:42.127] INFO  org.springframework.boot.SpringApplication 679 logStartupProfileInfo - The following profiles 
are active: dev