tencent cloud

Tencent Kubernetes Engine

Release Notes and Announcements
Release Notes
Announcements
Release Notes
Product Introduction
Overview
Strengths
Architecture
Scenarios
Features
Concepts
Native Kubernetes Terms
Common High-Risk Operations
Regions and Availability Zones
Service Regions and Service Providers
Open Source Components
Purchase Guide
Purchase Instructions
Purchase a TKE General Cluster
Purchasing Native Nodes
Purchasing a Super Node
Getting Started
Beginner’s Guide
Quickly Creating a Standard Cluster
Examples
Container Application Deployment Check List
Cluster Configuration
General Cluster Overview
Cluster Management
Network Management
Storage Management
Node Management
GPU Resource Management
Remote Terminals
Application Configuration
Workload Management
Service and Configuration Management
Component and Application Management
Auto Scaling
Container Login Methods
Observability Configuration
Ops Observability
Cost Insights and Optimization
Scheduler Configuration
Scheduling Component Overview
Resource Utilization Optimization Scheduling
Business Priority Assurance Scheduling
QoS Awareness Scheduling
Security and Stability
TKE Security Group Settings
Identity Authentication and Authorization
Application Security
Multi-cluster Management
Planned Upgrade
Backup Center
Cloud Native Service Guide
Cloud Service for etcd
TMP
TKE Serverless Cluster Guide
TKE Registered Cluster Guide
Use Cases
Cluster
Serverless Cluster
Scheduling
Security
Service Deployment
Network
Release
Logs
Monitoring
OPS
Terraform
DevOps
Auto Scaling
Containerization
Microservice
Cost Management
Hybrid Cloud
AI
Troubleshooting
Disk Full
High Workload
Memory Fragmentation
Cluster DNS Troubleshooting
Cluster kube-proxy Troubleshooting
Cluster API Server Inaccessibility Troubleshooting
Service and Ingress Inaccessibility Troubleshooting
Common Service & Ingress Errors and Solutions
Engel Ingres appears in Connechtin Reverside
CLB Ingress Creation Error
Troubleshooting for Pod Network Inaccessibility
Pod Status Exception and Handling
Authorizing Tencent Cloud OPS Team for Troubleshooting
CLB Loopback
API Documentation
History
Introduction
API Category
Making API Requests
Elastic Cluster APIs
Resource Reserved Coupon APIs
Cluster APIs
Third-party Node APIs
Relevant APIs for Addon
Network APIs
Node APIs
Node Pool APIs
TKE Edge Cluster APIs
Cloud Native Monitoring APIs
Scaling group APIs
Super Node APIs
Other APIs
Data Types
Error Codes
TKE API 2022-05-01
FAQs
TKE General Cluster
TKE Serverless Cluster
About OPS
Hidden Danger Handling
About Services
Image Repositories
About Remote Terminals
Event FAQs
Resource Management
Service Agreement
TKE Service Level Agreement
TKE Serverless Service Level Agreement
Contact Us
Glossary

Common High-Risk Operations

PDF
Focus Mode
Font Size
Last updated: 2025-12-03 17:57:59
When deploying or running business, you may trigger high-risk operations at different levels, leading to service failures to different degrees. To help you estimate and avoid operational risks, this document describes the consequences of the high-risk operations and corresponding solutions. Below you can find the high-risk operations you may trigger when dealing with clusters, networking and load balancing, logs, and cloud disks.

Clusters

Category
High-risk Operation
Consequence
Solution
Master and etcd nodes
Modifying the security groups of nodes in a cluster
Master node may become unavailable
Configure security groups as recommended by Tencent Cloud
Node expires or is terminated
The master node becomes unavailable
Unrecoverable
Reinstalling operating system
Master components get deleted
Unrecoverable
Upgrading master or etcd component version on your own
Cluster may become unavailable
Roll back to the original version
Deleting or formatting core directory data such as node /etc/kubernetes
The master node becomes unavailable
Unrecoverable
Changing node IP
The master node becomes unavailable
Change back to the old IP
Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own
Master node may become unavailable
Configure parameters as recommended by Tencent Cloud
Changing master or etcd certificate on your own
Cluster may become unavailable
Unrecoverable
Worker node
Modifying the security groups of nodes in a cluster
Nodes may become unavailable
Configure security groups as recommended by Tencent Cloud
Node expires or is terminated
The node becomes unavailable
Unrecoverable
Reinstalling operating system
Node components get deleted
Remove the node and add it back to the cluster
Upgrading node component version on your own
Node may become unavailable
Roll back to the original version
Changing node IP
Node becomes unavailable
Change back to the old IP
Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own
Node may become unavailable
Configure parameters as recommended by Tencent Cloud
Modifying operating system configuration
Node may become unavailable
Try to restore the configurations or delete the node and purchase a new one
Others
Modifying permissions in CAM
Some cluster resources, such as cloud load balancers, may not be able to be created
Restore the permissions


Networking and Load Balancing

High-risk Operation
Consequence
Solution
Modifying kernel parameters net.ipv4.ip_forward=0
Network not connected
Modify kernel parameters to net.ipv4.ip_forward=1
Modifying kernel parameter net.ipv4.tcp_tw_recycle = 1
NAT exception
Modify kernel parameter net.ipv4.tcp_tw_recycle = 0
Container CIDR’s UDP port 53 is not opened to the Internet in the security group configuration of the node
In-cluster DNS cannot work normally
Configure security groups as recommended by Tencent Cloud
Modifying or deleting LB tags added in TKE
A new LB is purchased
Restore the LB tags
Creating custom listeners in TKE-managed LB through LB console
Modification gets reset by TKE
Automatically create listeners through service YAML
Binding custom backend rs in TKE-managed LB through LB console
Prohibit manual binding of backend rs
Modifying certificate of TKE-managed LB through LB console
Automatically manage certificate through ingress YAML
Modifying TKE-managed LB listener name through LB console
Prohibit modification of TKE-managed LB listener name


Logs

High-risk Operation
Consequence
Solution
Notes
Deleting the /tmp/ccs-log-collector/pos directory of the host
Log gets collected again
None
Files in Pod record where they are collected
Deleting the /tmp/ccs-log-collector/buffer directory of the host
Log gets lost
None
Buffer contains log cache file


Cloud Disks

High-risk Operation
Consequence
Solution
Manually unmounting cloud disks through console
Writing to Pod reports IO errors
Delete the mount directory of the node and reschedule the Pod
Unmounting disk mounting path on the node
Pod gets written to the local disk
Re-mount the corresponding directory onto Pod
Directly operating CBS block device on the node
Pod gets written to the local disk
None


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback