tencent cloud

Tencent Kubernetes Engine

Release Notes and Announcements
Release Notes
Announcements
Release Notes
Product Introduction
Overview
Strengths
Architecture
Scenarios
Features
Concepts
Native Kubernetes Terms
Common High-Risk Operations
Regions and Availability Zones
Service Regions and Service Providers
Open Source Components
Purchase Guide
Purchase Instructions
Purchase a TKE General Cluster
Purchasing Native Nodes
Purchasing a Super Node
Getting Started
Beginner’s Guide
Quickly Creating a Standard Cluster
Examples
Container Application Deployment Check List
Cluster Configuration
General Cluster Overview
Cluster Management
Network Management
Storage Management
Node Management
GPU Resource Management
Remote Terminals
Application Configuration
Workload Management
Service and Configuration Management
Component and Application Management
Auto Scaling
Container Login Methods
Observability Configuration
Ops Observability
Cost Insights and Optimization
Scheduler Configuration
Scheduling Component Overview
Resource Utilization Optimization Scheduling
Business Priority Assurance Scheduling
QoS Awareness Scheduling
Security and Stability
TKE Security Group Settings
Identity Authentication and Authorization
Application Security
Multi-cluster Management
Planned Upgrade
Backup Center
Cloud Native Service Guide
Cloud Service for etcd
TMP
TKE Serverless Cluster Guide
TKE Registered Cluster Guide
Use Cases
Cluster
Serverless Cluster
Scheduling
Security
Service Deployment
Network
Release
Logs
Monitoring
OPS
Terraform
DevOps
Auto Scaling
Containerization
Microservice
Cost Management
Hybrid Cloud
AI
Troubleshooting
Disk Full
High Workload
Memory Fragmentation
Cluster DNS Troubleshooting
Cluster kube-proxy Troubleshooting
Cluster API Server Inaccessibility Troubleshooting
Service and Ingress Inaccessibility Troubleshooting
Common Service & Ingress Errors and Solutions
Engel Ingres appears in Connechtin Reverside
CLB Ingress Creation Error
Troubleshooting for Pod Network Inaccessibility
Pod Status Exception and Handling
Authorizing Tencent Cloud OPS Team for Troubleshooting
CLB Loopback
API Documentation
History
Introduction
API Category
Making API Requests
Elastic Cluster APIs
Resource Reserved Coupon APIs
Cluster APIs
Third-party Node APIs
Relevant APIs for Addon
Network APIs
Node APIs
Node Pool APIs
TKE Edge Cluster APIs
Cloud Native Monitoring APIs
Scaling group APIs
Super Node APIs
Other APIs
Data Types
Error Codes
TKE API 2022-05-01
FAQs
TKE General Cluster
TKE Serverless Cluster
About OPS
Hidden Danger Handling
About Services
Image Repositories
About Remote Terminals
Event FAQs
Resource Management
Service Agreement
TKE Service Level Agreement
TKE Serverless Service Level Agreement
Contact Us
Glossary

Basic Monitoring

PDF
Focus Mode
Font Size
Last updated: 2024-12-12 17:59:35

Basic Monitoring

Why is a node assigned more CPU cores and memory than the node resource specification?

Reason: CPU cores and memory assigned to a node are calculated based on the CPU and memory requests in each Pod on the node, but the requests of failed Pods are not subtracted in the calculation.
Example. A node's specification is 4-core 8 GB MEM, and three Pods are running on it. Below is the resource request usage:
Pod 1's requests are two CPU cores and 4 GB memory during normal operations.
Pod 2's requests are one CPU core and 2 GB memory during normal operations.
Pod 3 is in Failed status, and its requests are 0.5 CPU core and 1 GB memory.
The idle resources on the node are 4 - 2 - 1 = 1 CPU core and 8 - 4 - 2 = 2 GB memory. Pod 4's requests are 0.8 CPU core and 1.5 GB memory, which meets the scheduler's requirements and is scheduled to the node normally. At this point, the node has four Pods, three normal ones and one failed one, and is assigned 4.3 CPU cores and 8.5 GB memory (as the requests of the failed Pod are not subtracted in the calculation, the node specification is exceeded).
This problem was fixed in the new version in May, that is, requests of failed Pods are subtracted during the calculation of the node resources to be assigned.

Why the Pod status is normal, but the k8s_workload_abnormal monitoring metric is abnormal?

Reason: The metric status is subject to whether Pods of the workload are normal, and whether a Pod is normal is subject to the four types in pod.status.condition. k8s_workload_abnormal will be considered normal only when the four metrics are all True at the same time; otherwise, it will be considered abnormal.
PodScheduled: The Pod has been scheduled to a node.
ContainersReady: All containers in the Pod are ready.
Initialized: All init containers have completed successfully.
Ready: The Pod can provide the Service to requests and should be added to the load balancer pool of the corresponding Service.

Causes of tke-monitor-agent DaemonSet errors

Error
Cause
Solution
The domain name `receiver.barad.tencentyun.com` failed to be resolved, and the metric failed to be reported, so the cluster didn't have the monitoring data.
The node DNS was modified.
Add `hostAlias` to the `tke-monitor-agent` DaemonSet as follows:```hostAliases:- hostnames: - receiver.barad.tencentyun.com ip: 169.254.0.4```


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback