Release Notes and Announcements
- Release Notes
- Announcements
- Release Notes
Product Introduction
Purchase Guide
- Purchase Instructions
- Purchase a TKE General Cluster
- Purchasing Native Nodes
- Purchasing a Super Node
Getting Started
Cluster Configuration
- General Cluster Overview
- Cluster Management
- Network Management
- Storage Management
- Node Management
- GPU Resource Management
- Remote Terminals
Application Configuration
- Workload Management
- Service and Configuration Management
- Component and Application Management
- Auto Scaling
- Container Login Methods
Observability Configuration
- Ops Observability
- Cost Insights and Optimization
Scheduler Configuration
- Scheduling Component Overview
- Resource Utilization Optimization Scheduling
- Business Priority Assurance Scheduling
- QoS Awareness Scheduling
Security and Stability
- TKE Security Group Settings
- Identity Authentication and Authorization
- Application Security
Multi-cluster Management
- Planned Upgrade
- Backup Center
Cloud Native Service Guide
- Cloud Service for etcd
- TMP
- TKE Serverless Cluster Guide
- TKE Registered Cluster Guide
Use Cases
- Cluster
- Serverless Cluster
- Scheduling
- Security
- Service Deployment
- Network
- Release
- Logs
- Monitoring
- OPS
- Terraform
- DevOps
- Auto Scaling
- Containerization
- Cost Management
- Hybrid Cloud
- AI
Troubleshooting
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Elastic Cluster APIs
- Resource Reserved Coupon APIs
- Cluster APIs
- Third-party Node APIs
- Relevant APIs for Addon
- Network APIs
- Node APIs
- Node Pool APIs
- TKE Edge Cluster APIs
- Cloud Native Monitoring APIs
- Scaling group APIs
- Super Node APIs
- Other APIs
- Data Types
- Error Codes
- TKE API 2022-05-01
FAQs
- TKE General Cluster
- TKE Serverless Cluster
- About OPS
- Hidden Danger Handling
- About Services
- Image Repositories
- About Remote Terminals
- Event FAQs
- Resource Management
Service Agreement
- TKE Service Level Agreement
- TKE Serverless Service Level Agreement
Contact Us
Glossary

Monitoring Add-Ons Release Notes

Download

Modo Foco

Tamanho da Fonte

Última atualização: 2025-02-24 18:16:10

monitor-agent Release Notes
Change Time
Version Number
Change Content
Restrictions and Impacts
2024-11-28
v1.3.17
Added the timeout settings to fix the stuck issue that occurs when standalone-metrics obtains metrics.
The metric port and protocol are made adaptable to cluster upgrade.
Fixed the issue where obtaining mounted disk metrics gets stuck due to NFS failure.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-07-30
v1.3.16
Added the systemd mode for cadvisor.
Fixed the problem of repeated statistics in disk-related metrics calculation for native nodes.
Modified the Job podNormal logic for the Pod in the informer list-watch failed status.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-05-10
v1.3.14
Fixed the problem of excluding Pods in the Succeeded and Failed status during list-watch.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-03-18
v1.3.12
Exposed chart parameters to support onDelete policy upgrade.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-02-29
v1.3.11
Adapted to the GPU metrics, and changed the calculation method of GPU metrics at the Pod and node levels from aggregating container-level metrics to directly using the values exposed by the exporter.
Exposed add-ons tags for GPU metrics to preferentially use gpu-exporter: "true". If this tag is not available, use name: gpu-manager-ds.
Fixed the problem where a program panic is triggered when the GPU driver of a node is abnormal. GPU metrics will not be collected, and this will not affect collection of other basic metrics.
Fixed the problem where the program will get stuck in special cases when HTTP requests are sent to crane to pull data. The HTTP request will be canceled upon timeout.
Fixed the problem where the monitor-agent add-ons will run on different CPU cores at different time on large core nodes. The Pod's working_set metric will be too large over time, leading to OOM errors.
Fixed the problem where the monitoring add-ons fail to collect data due to changes of the /metrics port and protocol for later versions of controller-manager and scheduler, to adapt to the port changes of controller-manager and scheduler.
Modified the calculation method to exclude the iowait time for the high node I/O scenario where the calculated node CPU utilization is too high due to inclusion of the iowait time.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-02-4
v1.3.10
Extracted the monitor-agent privileged mode into a chart parameter. The privileged mode is disabled by default.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-08-17
v1.3.9
Fixed the problem where the workload is normal when the container is in the creating status.
Used the client-go mechanism to automatically refresh the Token to prevent Token expiration When kubeletJob is used to send requests to kubelet.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-04-25
v1.3.7
Fixed the problem where Pod-level GPU utilization (node) and GPU memory utilization (node) metrics fail to be collected properly, and the problem where Pods in the terminating status fail to be deleted due to container mounting to the host directory.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-21
v1.3.6
Added metrics of native nodes, including 1-minute load, total disk capacity, disk utilization, and write bandwidth of nodes.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-01-18
v1.3.5
Optimized the scenario where related monitoring metrics are not reported when cadvisor does not expose the container_fs_usage_bytes and container_fs_limit_bytes metrics.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-01-12
v1.3.4
Fixed the problem where the file system usage is 0 when the runtime is containerd.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-12-13
v1.3.3
Optimized the method of pulling basic monitoring metrics.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-11-08
v1.3.2
Fixed the problem where basic monitoring fail to report monitoring metrics properly.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-10-20
v1.3.1
Fixed the metric drop problem.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-08-25
v1.3.0
Tencent Kubernetes Engine (TKE) basic monitoring supports the following PVC monitoring metrics: PVC cloud disk size, PVC cloud disk utilization, and PVC cloud disk usage.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-08-09
v1.2.2
Updated the GPU metrics calculation method.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-28
v1.2.1
Updated the methods of calculating the node CPU packing rate and node memory packing rate.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-25
v1.2.0
Added the following metrics: Pod CPU optimizable amount, Pod memory optimizable amount, node CPU packing rate, and node memory packing rate.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-21
v1.1.1
Fixed the problem where the basic monitoring add-ons do not complete the collection, calculation, and reporting tasks within the corresponding cycles.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-05
v1.1.0
tke-monitor-agent mounts the host paths /proc/meminfo and /proc/cpuinfo to collect node CPU utilization and memory utilization.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-06-23
v1.0.0
Managed the basic monitoring add-ons by using chart.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
clustermonitor Release Notes
Change Time
Version Number
Change Content
Restrictions and Impacts
2025-01-08
v1.3.2
Added the control plane add-on monitoring capability.
This upgrade will not affect the existing business. During the upgrade, add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-11-20
v1.2.0
Added the metric monitoring capability for native node sub-machines, submitting dimension service data.
Fixed the monitoring add-on exceptions in the CDC cluster.
This upgrade will not affect the existing business. During the upgrade, add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-10-30
v1.1.0
Allowed to enable measurement data reporting through the measure-enabled switch.
This upgrade will not affect the existing business. During the upgrade, add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-09-24
v1.0.13
Fixed the panic issue caused by clustermonitor failing to initialize proxy in CDC cluster scenarios.
Fixed the issue where the total GPU reported as 0 after the user modifies the node alias in the user cluster.
Fixed the issue where standaloneMetrics uses the new instanceid to report the node-related metrics instead after it supports nodes changing instanceid.
This upgrade will not affect the existing business. During the upgrade, add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-07-30
v1.0.12
Removed the dependency of master monitoring capability on token, and removed --token from startup parameters.
Fixed SSRF vulnerabilities.
Fixed the concurrency problem when the CPU and memory of cluster nodes are retrieved.
Fixed the problem of no value for the workload GPU utilization because the total cluster GPU is calculated as zero.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-03-27
v1.0.11
Allowed the managed cluster to report cluster storage object quantity metrics (pods, configmaps, and others).
Primarily collected data from gpu-exporter on each node during calculation of the total GPU core and GPU memory for the cluster. If no data can be collected, it can be obtained from the Status field of each node.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-11-05
v1.0.10
Allowed to collect metrics of three major add-ons in the managed cluster by using cluster-monitor.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-08-21
v1.0.9
Fixed the problem where the number of workload replicas associated with the Horizontal Pod Autoscaler (HPA) is scaled out to the maximum due to excessively high CPU usage when the HPA created by the user is based on the CPU usage in core resource metrics.
Allowed deployment of CDC scenarios to nodes.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-08-15
v1.0.8
Allowed deployment of CDC scenarios to user clusters.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-06-20
v1.0.7
Optimized cost metrics reporting logic.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-06-08
v1.0.6
Fixed the problem where the k8s_pod_ping_succeed metric is not reported when the Pod is not in the running status.
Fixed the problem where the data cache is not cleaned up when the number of data entries reported to barad exceeds 1,000.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-04-03
v1.0.5
Added annotation.service.kubernetes.io/qcloud-loadbalancer-multiplex
: "true" for clustermonitor service to reuse ENILB in an independent cluster scenario with the inspection add-on.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-29
v1.0.4
Added the Node status, Pod Ready status, and cost metrics collection and reporting.
Optimized metric retrieval for the HPA data source hpa-metrics-server.
Upgraded the metrics-server version.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-24
v1.0.3
Fixed the problem of clustermonitor version upgrade failure.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-16
v1.0.2
Fixed the problem of apiserver CPU/mem utilization drop.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-14
v1.0.1
Managed the basic monitoring add-ons by using chart.
This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
﻿
﻿

Ajuda e Suporte

Esta página foi útil?

Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.

comentários

Change Time	Version Number	Change Content	Restrictions and Impacts
2024-11-28	v1.3.17	Added the timeout settings to fix the stuck issue that occurs when standalone-metrics obtains metrics. The metric port and protocol are made adaptable to cluster upgrade. Fixed the issue where obtaining mounted disk metrics gets stuck due to NFS failure.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-07-30	v1.3.16	Added the systemd mode for cadvisor. Fixed the problem of repeated statistics in disk-related metrics calculation for native nodes. Modified the Job podNormal logic for the Pod in the informer list-watch failed status.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-05-10	v1.3.14	Fixed the problem of excluding Pods in the Succeeded and Failed status during list-watch.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-03-18	v1.3.12	Exposed chart parameters to support onDelete policy upgrade.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-02-29	v1.3.11	Adapted to the GPU metrics, and changed the calculation method of GPU metrics at the Pod and node levels from aggregating container-level metrics to directly using the values exposed by the exporter. Exposed add-ons tags for GPU metrics to preferentially use gpu-exporter: "true". If this tag is not available, use name: gpu-manager-ds. Fixed the problem where a program panic is triggered when the GPU driver of a node is abnormal. GPU metrics will not be collected, and this will not affect collection of other basic metrics. Fixed the problem where the program will get stuck in special cases when HTTP requests are sent to crane to pull data. The HTTP request will be canceled upon timeout. Fixed the problem where the monitor-agent add-ons will run on different CPU cores at different time on large core nodes. The Pod's working_set metric will be too large over time, leading to OOM errors. Fixed the problem where the monitoring add-ons fail to collect data due to changes of the /metrics port and protocol for later versions of controller-manager and scheduler, to adapt to the port changes of controller-manager and scheduler. Modified the calculation method to exclude the iowait time for the high node I/O scenario where the calculated node CPU utilization is too high due to inclusion of the iowait time.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2024-02-4	v1.3.10	Extracted the monitor-agent privileged mode into a chart parameter. The privileged mode is disabled by default.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-08-17	v1.3.9	Fixed the problem where the workload is normal when the container is in the creating status. Used the client-go mechanism to automatically refresh the Token to prevent Token expiration When kubeletJob is used to send requests to kubelet.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-04-25	v1.3.7	Fixed the problem where Pod-level GPU utilization (node) and GPU memory utilization (node) metrics fail to be collected properly, and the problem where Pods in the terminating status fail to be deleted due to container mounting to the host directory.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-03-21	v1.3.6	Added metrics of native nodes, including 1-minute load, total disk capacity, disk utilization, and write bandwidth of nodes.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-01-18	v1.3.5	Optimized the scenario where related monitoring metrics are not reported when cadvisor does not expose the container_fs_usage_bytes and container_fs_limit_bytes metrics.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2023-01-12	v1.3.4	Fixed the problem where the file system usage is 0 when the runtime is containerd.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-12-13	v1.3.3	Optimized the method of pulling basic monitoring metrics.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-11-08	v1.3.2	Fixed the problem where basic monitoring fail to report monitoring metrics properly.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-10-20	v1.3.1	Fixed the metric drop problem.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-08-25	v1.3.0	Tencent Kubernetes Engine (TKE) basic monitoring supports the following PVC monitoring metrics: PVC cloud disk size, PVC cloud disk utilization, and PVC cloud disk usage.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-08-09	v1.2.2	Updated the GPU metrics calculation method.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-28	v1.2.1	Updated the methods of calculating the node CPU packing rate and node memory packing rate.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-25	v1.2.0	Added the following metrics: Pod CPU optimizable amount, Pod memory optimizable amount, node CPU packing rate, and node memory packing rate.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-21	v1.1.1	Fixed the problem where the basic monitoring add-ons do not complete the collection, calculation, and reporting tasks within the corresponding cycles.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-07-05	v1.1.0	tke-monitor-agent mounts the host paths /proc/meminfo and /proc/cpuinfo to collect node CPU utilization and memory utilization.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.
2022-06-23	v1.0.0	Managed the basic monitoring add-ons by using chart.	This upgrade will not affect the existing business. During the upgrade, some add-ons may be unavailable, so it is recommended to perform the upgrade during off-peak periods.

tencent cloud

Tencent Kubernetes Engine

Monitoring Add-Ons Release Notes

monitor-agent Release Notes

clustermonitor Release Notes

Ajuda e Suporte