tencent cloud

Content Delivery Network

릴리스 노트 및 공지 사항
릴리스 노트
신규 사용자 가이드
제품 소개
제품 개요
제품 장점
응용 시나리오
기본 개념
사용 제한
CDN 성능 설명(샘플링)
구매 가이드
CDN 구매 가이드
시작하기
CDN 설정 시작하기
도메인 연결
CNAME 설정
도메인 이름 소유권 확인
도메인 액세스 관련 FAQ
구성 가이드
도메인 관리
도메인 이름 구성
통계 분석
퍼지와 프리패치
로그 서비스
서비스 문의
오프라인 캐시
권한 관리
권한 설정
콘솔 권한 설명
서브 계정 또는 협업 파트너의 실시간 로그 활성화 방법
사례 튜토리얼
CDN - CVM
CDN - COS
DNSPod를 통한 CNAME 설정
API문서
History
Introduction
API Category
Content Management APIs
Real-time Log APIs
Service Query APIs
Data Query APIs
Making API Requests
Log Query APIs
StopCdnDomain
Configuration Management APIs
Obsoleted APIs
Other APIs
Data Types
Error Codes
FAQ
기능 특성 문제
과금 관련 문제
도메인 액세스 관련 문제
캐시 구성
퍼지와 프리패치 관련 문제
통계 분석 문제
HTTPS 관련 문제
노드 네트워크 문제
장애 관련 문제
장애 처리
상태 코드 설명 및 권장 해결 방안
노드별 캐시 콘텐츠 불일치
CDN 액세스 후 웹 페이지 액세스 속도가 느려지는 문제
도메인 이름에 CDN 연결 후 액세스 시 514 반환
트래픽 히트율 저조
CDN 도메인 404 상태 코드 발생
예외 페이지 표시 - CORS 오류
리소스 캐시 미적용
용어집
문서Content Delivery Network

Statistical Description of Sampled Data

포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-01-14 17:00:49
The data analysis feature of CDN helps users analyze traffic patterns by deeply examining vast amounts of log data. To optimize user experience, sampling-based statistical techniques are introduced in data analysis, ensuring both accuracy and timeliness of queries even when processing large datasets.

What is sampling data statistics

In data analysis, sampling refers to selecting a representative subset from the entire dataset for analysis, in order to extract valuable information. For example, when conducting a social survey, researchers cannot survey every single person; therefore, they select a portion of the population as a representative sample, using the responses from this sample to reflect the tendencies of the entire population.

Which indicators will be sampled for statistics

The CDN utilizes dynamic sampling techniques to adapt to varying log data volumes from different users, ensuring the accuracy and efficiency of data analysis. For data analysis queries such as TOP URLs, TOP 100 client IPs, TOP 100 Referers, and TOP User Agents, sampling is used for statistical analysis when the domain's QPS reaches the following conditions:
QPS is in the range [10,000, 100,000), and the sampling rate is 10%
QPS is in the range [100,000, 1,000,000), and the sampling rate is 1%
QPS is in the range [1,000,000, +∞), and the sampling rate is 0.1%
The sampling strategy determines the QPS based on data at 5-minute intervals. If the QPS meets the above conditions, sampling is triggered; otherwise, no sampling occurs. An example is shown below:
If the domain's QPS (queries per second) reaches 10,000 in the 5-minute log data from 00:01 to 00:05, then 10% sampling is applied, meaning 10% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS reaches 100,000 in the 5-minute log data collected from 00:06 to 00:10, then 1% sampling is applied, meaning 1% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS is 5000 in the 5-minute log data collected from 00:11 to 00:15, then no sampling is applied, and the calculation is based on all request logs.
Note:
The CDN continuously optimizes and adjusts its sampling strategy based on the scale of platform log data and users' actual needs. If you have any questions about the data analysis query results, please feel free to contact us.

How to use full data statistics?

If your business needs require in-depth analysis of all log data, we recommend using the CDN's Real-time Logs feature. Real-time Logs can transfer detailed, complete log data to your designated log analysis system (such as Tencent Cloud CLS), allowing you to perform fine-grained data processing using the complete dataset. With Real-time Logs, you can ensure more accurate data analysis results in scenarios requiring higher data precision, thus providing more accurate data support for your business decisions.

Explanation of Data Representativeness

The CDN provides a unique identifier (Request ID) for each request log. The sampling system uses this unique identifier to perform sampling analysis on your data, ensuring the randomness of the sampling factor. Our tests show that when the features you need to analyze constitute a high percentage of the overall data, sampling analysis can provide you with fast and accurate results. However, we must also point out that when the features you need to analyze constitute a small percentage of the overall data, the results of the sampling analysis may be skewed due to the small sample size.
For example, you have a dataset with 10,000 log entries, containing three URL paths A, B, and C, with quantities distributed as 7000 (70%), 2900 (29%), and 100 (1%), respectively. In the ideal scenario, after 10% sampling, the sample sizes for URL paths A, B, and C would be 700, 290, and 10. However, because the sample size for URL C is too small, the accuracy of estimating the overall population based on the sample will be significantly reduced. In this case, the results of your drill-down analysis on URL C may not meet expectations.


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백