tencent cloud

Elasticsearch Service

User Guide
Release Notes and Announcements
Release Notes
Product Announcements
Security Announcement
Product Introduction
Overview
Elasticsearch Version Support Notes
Features
Elastic Stack (X-Pack)
Strengths
Scenarios
Capabilities and Restrictions
Related Concepts
Purchase Guide
Billing Overview
Pricing
Elasticsearch Service Serverless Pricing
Notes on Arrears
ES Kernel Enhancement
Kernel Release Notes
Targeted Routing Optimization
Compression Algorithm Optimization
FST Off-Heap Memory Optimization
Getting Started
Evaluation of Cluster Specification and Capacity Configuration
Creating Clusters
Accessing Clusters
ES Serverless Guide
Service Overview
Basic Concepts
5-Minute Quick Experience
Quick Start
Access Control
Writing Data
Data Query
Index Management
Alarm Management
ES API References
Related Issues
Data Application Guide
Data Application Overview
Data Management
Elasticsearch Guide
Managing Clusters
Access Control
Multi-AZ Cluster Deployment
Cluster Scaling
Cluster Configuration
Plugin Configuration
Monitoring and Alarming
Log Query
Data Backup
Upgrade
Practical Tutorial
Data Migration and Sync
Use Case Construction
Index Configuration
SQL Support
Receiving Watcher Alerts via WeCom Bot
API Documentation
History
Introduction
API Category
Instance APIs
Making API Requests
Data Types
Error Codes
FAQs
Product
ES Cluster
Service Level Agreement
Glossary
New Version Introduction
Elasticsearch Service July 2020 Release
Elasticsearch Service February 2020 Release
Elasticsearch Service December 2019 Release

High Cluster CPU Utilization

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2021-08-11 11:17:23

Problem Description

All nodes in the cluster have high CPU utilization, but there are not a lot of reads and writes. The specific problem can be seen on the Stack Monitoring page in Kibana:

In addition, you can also see the CPU utilization of each node on the node monitoring page in the ES console:

In this case, because the cluster read rate and write rate are not high, it is difficult to quickly find the root cause from the monitoring perspective. Therefore, you need to observe carefully and find the cause from the details. Below are several possible scenarios and corresponding troubleshooting ideas.
Note:
The situation where the CPU utilization of an individual node is much higher than that of other nodes is quite common. In most cases, this is caused by uneven load due to improper use of the cluster. For more information, please see Uneven Cluster Load.

Troubleshooting

Large query requests cause the CPU utilization to soar

This situation is relatively common, and clues can be found from monitoring. Monitoring data shows that the fluctuation of the query request volume is basically in line with the maximum CPU utilization of the cluster.

To further identify the problem, you need to enable slow log collection for the cluster. For more information, please see Querying Cluster Logs. You can get more information from the slow logs, such as the indexes that cause slow queries, query parameters, and query content.

Solutions

Try to avoid large text searches and optimize queries.
Use the slow logs to identify indexes where queries are slow. For some indexes with a small amount of data, set a small number of shards and multiple replicas, such as one-shard-multi-replica, to improve the query performance.

Write requests causes the CPU utilization to soar

If monitoring data shows that the CPU utilization surge is related to writes, then enable slow log collection for the cluster, identify slow write requests, and optimize them. You can also get the hot_threads information to identify which thread is consuming the CPU:
curl http://9.15.49.78:9200/_nodes/hot_threads

For example, it is found here that there are a lot of ingest pipeline operations, and such operations are very resource intensive.


Solutions

If you encounter the above problems, you need to optimize as appropriate on the business side. The key point of troubleshooting such problems is to make good use of the cluster's monitoring metrics to quickly locate the problems and then use the cluster logs together to identify the root causes, so that the problems can be solved quickly.

Ajuda e Suporte

Esta página foi útil?

comentários