【集群】KubeSphere搭建记录——ks-installer 解析
💡简介
KubeSphere 部署后,Prometheus 一直无法正常部署,排查后发现是底层 NFS 系统出了故障,但想修改 Prometheus 的存储配置却无从下手。
因此,对 ks-installer 组件自动安装器进行了解析。
🫎修改配置方法
根据对 ks-installer 组件极其核心脚本分析,想修改 monitoring 模块的 Prometheus 组件配置方式如下:
假设/root/wxl/cluster-configuration.yaml
配置文件对应的cluster-configuration
已部署,相关属性如下:
1 | kind: ClusterConfiguration(简称cc) |
可通过以下命令修改:kubectl edit cc -n kubesphere-system ks-installer
删除出错prometheus对应的
pvc
(因为monitoring
会获取历史配置)
见/kubesphere/roles/roles/ks-monitor/tasks/get_old_config.yaml:4
在已部署的
cluster-configuration
中删除status:monitoring
(或改为满足以下条件:"status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
)修改相关参数。当ks-installer检测到
ClusterConfiguration
变化则会重新部署。
ps:如果想取消 Prometheus 持久化,只要把storage属性完全删掉就可以了。
🧠解析:自动化安装Prometheus逻辑[1]
ks-installer本质是一个脚本执行器(shell-operator),发生变化时自动执行部署脚本。(部署在shell-operator中的脚本可以订阅预设的钩子,钩子发生变化后触发脚本)
shell-operator支持以下三类钩子:
- OnStartup:启动后即运行
- schedule:crontab格式的定时任务
- kubernetes:监控Kubernetes资源,根据定义的事件类型来响应
ks-installer的pod中/hooks/kubesphere目录下包含两个文件:
- installRunner.py:部署ks-installer
- schedule.sh:定期执行任务,检查状态、注册
查看安装日志:kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
通过进入pod查看ks-installer文件,可以理解安装Prometheus逻辑[2]:kubectl -n kubesphere-system exec -it $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- bash
根据[2]发现Prometheus的相关配置在/kubesphere/kubesphere/prometheus
目录下。找到使我们报错的文件:/kubesphere/kubesphere/prometheus/prometheus/prometheus-prometheus.yaml
🖼️解析:ks-installer 核心脚本
installRunner.py
文件位置:
- 进入 ks-installer pod
kubectl exec -it -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -- bash
- 进入钩子目录
cd /hooks/kubesphere
逻辑解析:以 monitoring 组件为例
在 configFile 文件中配置enabled(部分组件默认enabled,无需单独配置)
1
2
3# /kubesphere/config/ks-config.json
monitoring:
enabled: true配置 Ansible playbook 脚本,说明流程
1
2
3
4
5
6
7# /kubesphere/playbooks/monitoring.yaml
- hosts: localhost # 表示 playbook 中的任务将在 localhost上执行。
gather_facts: false # 表示 Ansible 不会收集关于 localhost 的 facts。Facts 是 Ansible 收集的关于系统的信息,包括操作系统、网络接口、硬件、环境变量等等。如果你不需要这些信息,可以设置 gather_facts: false 来提高 playbook 的执行速度。
roles: # 一个列表,定义了应用于 localhost 的 Ansible roles。Roles 是一种组织 playbook 的方式,它们包含了一系列相关的任务、变量、模板等等。在这个例子中,有两个 roles:kubesphere-defaults 和 ks-monitor。这意味着 Ansible 将执行这两个 roles 中定义的任务。
- kubesphere-defaults
- ks-monitor查看 roles 定义的任务(核心在3.2.3)
3.1 kubesphere-defaults
1
2
3
4
5
6
7
8
9
10
11
12# /kubesphere/roles/kubesphere-defaults/tasks/main.yaml
- name: KubeSphere | Setting images' namespace override
set_fact: # 当local_registry被设置为北京阿里云仓库、或zone被设置为cn时,设置 namespace_override 变量
namespace_override: "kubesphereio"
when: (local_registry is defined and local_registry == "registry.cn-beijing.aliyuncs.com") or (zone is defined and zone == "cn")
- name: KubeSphere | Configuring defaults
debug: # 输出信息msg到日志
msg: "Check roles/kubesphere-defaults/defaults/main.yml"
tags:
- always3.2 ks-monitor
3.2.1 main
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41# /kubesphere/roles/ks-monitor/tasks/main.yaml
# 导入一系列其他的任务文件(import_tasks)和执行一个 shell 命令
- import_tasks: prometheus-stack.yaml # 导入 prometheus-stack.yaml 文件中定义的任务,并在 common.monitoring.type 未定义或者不等于 'external' 时执行。
when:
- "common.monitoring.type is not defined or common.monitoring.type != 'external'"
- import_tasks: monitoring-dashboard.yaml # 当没有monitoring的status时,初始化monitoring
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: ks-istio-monitoring.yaml
when:
- "servicemesh.enabled is defined and servicemesh.enabled"
- import_tasks: gpu-monitoring.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- name: Monitoring | Importing ks-monitoring status
shell: >
{{ bin_dir }}/kubectl patch cc ks-installer
--type merge
-p '{"status": {"monitoring": {"status": "enabled", "enabledTime": "{{ lookup('pipe','date +%Y-%m-%dT%H:%M:%S%Z') }}"}}}'
-n kubesphere-system
register: cc_result
failed_when: "cc_result.stderr and 'Warning' not in cc_result.stderr"
until: cc_result is succeeded
retries: 5
delay: 3
- import_tasks: thanos-ruler.yaml
when:
- alerting is defined
- alerting.enabled is defined
- alerting.enabled == true
- "status.alerting is not defined or status.alerting.status is not defined or status.alerting.status != 'enabled'"
- import_tasks: alert-migrate.yaml
when:
- alerting is defined and alerting.enabled is defined and alerting.enabled == true
- "status.alerting is not defined or status.alerting.status is not defined or status.alerting.status != 'enabled'"3.2.2 prometheus-stack
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45# /kubesphere/roles/ks-monitor/tasks/prometheus-stack.yaml
- import_tasks: cleanup.yaml
- import_tasks: generate_manifests.yaml
- import_tasks: prometheus-operator.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: node-exporter.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: kube-state-metrics.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: grafana.yaml
when:
- monitoring.grafana is defined
- monitoring.grafana.enabled is defined
- monitoring.grafana.enabled == true
- import_tasks: prometheus.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: etcd.yaml
- import_tasks: k8s-monitor.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: ks-core-monitor.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: alertmanager.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"
- import_tasks: notification-manager.yaml
when:
- "status.monitoring is not defined or status.monitoring.status is not defined or status.monitoring.status != 'enabled'"3.2.3 generate_manifests
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28# /kubesphere/roles/ks-monitor/tasks/generate_manifests.yaml
- name: Monitoring | Getting ks-monitoring installation files
copy:
src: "{{ item }}"
dest: "{{ kubesphere_dir }}/"
loop:
- "prometheus"
- import_tasks: get_old_config.yaml
- name: Monitoring | Creating manifests
template:
src: "{{ item.file }}.j2"
dest: "{{ kubesphere_dir }}/{{ item.path }}/{{ item.file }}"
with_items:
- { path: prometheus/prometheus-operator, file: prometheus-operator-deployment.yaml }
- { path: prometheus/prometheus, file: prometheus-prometheus.yaml }
- { path: prometheus/prometheus, file: prometheus-podDisruptionBudget.yaml}
- { path: prometheus/kube-state-metrics, file: kube-state-metrics-deployment.yaml }
- { path: prometheus/node-exporter, file: node-exporter-daemonset.yaml }
- { path: prometheus/alertmanager, file: alertmanager-alertmanager.yaml }
- { path: prometheus/alertmanager, file: alertmanager-podDisruptionBudget.yaml }
- { path: prometheus/grafana, file: grafana-deployment.yaml }
- { path: prometheus/etcd, file: prometheus-serviceMonitorEtcd.yaml }
- { path: prometheus/etcd, file: prometheus-endpointsEtcd.yaml }
- { path: prometheus/thanos-ruler, file: thanos-ruler-thanosRuler.yaml }
- { path: prometheus/thanos-ruler, file: thanos-ruler-podDisruptionBudget.yaml }3.2.4 get_old_config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44# /kubesphere/roles/roles/ks-monitor/tasks/get_old_config.yaml
- name: Monitoring | Checking Prometheus PersistentVolumeClaim
shell: >
{{ bin_dir }}/kubectl get pvc -n kubesphere-monitoring-system prometheus-k8s-db-prometheus-k8s-0 -o jsonpath='{.spec.resources.requests.storage}'
register: prometheus_pvc
failed_when: false # 即使命令执行失败,也不会停止 playbook 的执行
- name: Monitoring | Setting Prometheus data pv size
set_fact:
prometheus_pv_size: "{{ prometheus_pvc.stdout }}"
when:
- prometheus_pvc.rc == 0
- prometheus_pvc.stdout != ""
failed_when: false
- name: Monitoring | Checking Prometheus retention days
shell: >
{{ bin_dir }}/kubectl get prometheuses.monitoring.coreos.com -n kubesphere-monitoring-system k8s -o jsonpath='{.spec.retention}'
register: prometheus_retention
failed_when: false
- name: Monitoring | Setting Prometheus retention days
set_fact:
prometheus_retention_duration: "{{ prometheus_retention.stdout }}"
when:
- prometheus_retention.rc == 0
- prometheus_retention.stdout != ""
failed_when: false
- name: Monitoring | Checking Prometheus node selector
shell: |
{{ bin_dir }}/kubectl get prometheuses.monitoring.coreos.com -n kubesphere-monitoring-system k8s -o go-template --template="{{ '{{' }}range \$key, \$value := .spec.nodeSelector{{ '}}' }} {{ '{{' }}\$key{{ '}}' }}: {{ '{{' }}\$value{{ '}}' }}
{{ '{{' }}end{{ '}}' }}"
register: prometheus_node_selector
failed_when: false
- name: Monitoring | Setting Prometheus node selector
set_fact:
prometheus_node_selector_map: "{{ prometheus_node_selector.stdout }}"
when:
- prometheus_node_selector.rc == 0
- prometheus_node_selector.stdout != ""
failed_when: false3.2.5 monitoring-dashboard
1
2
3
4
5
6
7
8
9
10
11
12
13# /kubesphere/roles/ks-monitor/tasks/monitoring-dashboard.yaml
- name: Monitoring | Getting monitoring-dashboard installation files
copy:
src: "{{ item }}" # 遍历loop中的值,此处只有monitoring-dashboard
dest: "{{ kubesphere_dir }}/"
loop:
- "monitoring-dashboard"
- name: Monitoring | Installing monitoring-dashboard
shell: >
{{ bin_dir }}/kubectl apply -f {{ kubesphere_dir }}/monitoring-dashboard
🏥反思
- 希望这篇博客对你有帮助!如果你有任何问题或需要进一步的帮助,请随时提问。
- 如果你喜欢这篇文章,欢迎动动小手给我一个follow或star。
🗺参考文献
- 标题: 【集群】KubeSphere搭建记录——ks-installer 解析
- 作者: Fre5h1nd
- 创建于 : 2024-01-19 00:05:20
- 更新于 : 2024-03-08 15:36:48
- 链接: https://freshwlnd.github.io/2024/01/19/k8s/k8s-kubesphere-installer/
- 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。