tencent cloud

Data Accelerator Goose FileSystem

Release Notes and Announcements
Release Notes
Product Selection Guide
GooseFSx
Product Introduction
Quick Start
Purchase Guide
Console Guide
Tool Guide
Practical Tutorial
Service Level Agreement
Glossary
GooseFS
Product Introduction
Billing Overview
Quick Start
Core Features
Console Guide
Developer Guide
Client Tools
Cluster Configuration Practice
Data Security
Service Level Agreement
GooseFS-Lite
GooseFS-Lite Tool
Practical Tutorial
Use GooseFS in Kubernetes to Speed Up Spark Data
Access Bucket Natively with POSIX Semantics Using GooseFS
GooseFS Distributedload Tuning Practice
FAQs
GooseFS Policy
Privacy Policy
Data Processing And Security Agreement

Page Store Cache

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2025-07-17 17:42:49

Page Store Feature Overview

Starting from GooseFS version 1.7.0 and above, the new Page Store caching feature is supported. Compared with the traditional Block Store caching, the former can significantly optimize the cache space utilization and cold read efficiency for discrete IO access models. The architectural comparison between Page Store and Block Store is as follows.




The Page Store storage architecture breaks the limitation of the default 128MB Block size per Worker node as the minimum caching unit. Without adding any additional metadata burden to the Master node, it allows caching data in 1MB Page format on Worker nodes. 
This storage architecture, on one hand, significantly improves cache space utilization. On the other hand, in scenarios involving random cold reads for single files, it can noticeably reduce request response time and cache activation latency.  Below is a comparison of cold read latency between Block mode and Page mode in a random read scenario for an OLAP query.




Using Page Store

To use the Page Store mode, you need to add the following configuration to the Worker's data storage:
# ...

goosefs.worker.block.store.type=PAGE
# Worker page storage type.
goosefs.worker.page.store.dirs=/Users/yuyang/goosefs-data/paged-block
goosefs.worker.page.store.page.size=2MB
goosefs.worker.page.store.size=6MB
goosefs.worker.page.store.overhead=0
goosefs.worker.network.reader.buffer.size=1MB

# ...
Then distribute the configuration synchronization to all nodes, and restart the Worker processes on all nodes for it to take effect.

Directory Structure between Page Store and Block Store




Under the data catalog of Page Store, a level 4 directory structure is adopted:
The first-level subdirectory is DEFAULT LOCAL and cannot be changed;
The second-level subdirectory represents the Page Size in this directory. For example, the subdirectory 2097152 in the figure indicates that the size of each Page in this directory is 2MB;
The third level is a hash bucket value, which is calculated using the hash value of each Block directory name and ${goosefs.worker.page.store.local.store.file.buckets};
The fourth-level subdirectory is the Block directory, which stores all cached Pages belonging to this Block. The naming rule for this directory is: paged_block_${blockId}.
Note:
Under the second-level subdirectory, there is usually a .TEMP directory, which stores all ongoing write operations Block data. The naming convention for each Block directory under the .TEMP directory is paged_block_${blockId}session{sessionId}.

Switching between Page Store and Block Store

Currently, when short-circuit read-write is not enabled by default, it allows free switching of storage types by modifying the configuration. Specifically, change the configuration item `goosefs.worker.block.store.type` to the corresponding storage type, then restart the Worker Server for the changes to take effect. The Worker will load and cache data based on the data path specified by the corresponding storage type.
When short-circuit read is enabled, the above method can also be used to modify and take effect, but the process where the Client is located must be restarted simultaneously; otherwise, a short-circuit read/write exception may occur.

Configuration Parameters of PageStore

Parameter
Default Value
Description
goosefs.worker.block.store.type
FILE
Specify the storage type on the Worker, with options being FILE and PAGE.
Default is FILE, which is the traditional Block storage mode. Specify PAGE for Page storage mode.
goosefs.worker.page.store.page.size
1MB
Specify the size of each Page. The default size is 1MB, which can be assigned as required, such as 128KB or 256KB.
goosefs.worker.page.store.dirs
/tmp/goosefs-cache
Assign the data directory for Page Store. For example: /data/goosefs-data/paged-block.
goosefs.worker.page.store.size
512MB
Assign the size of the data directory for Page Store. The default is 512MB. If it exceeds the capacity constraint, Page-granularity elimination will be triggered.
goosefs.worker.cache.request.pending.timeout
500ms
Option to optimize timeout wait for cache breakdown issues in high-concurrency cold read scenarios, default is 500 ms.
If a concurrent read causing cache breakdown occurs on the same Worker node, the later requests will attempt to wait for 500ms to directly return data from the cache rather than penetrate to the underlying storage.
If the waiting timeout occurs, it will read from the underlying storage and return.
When this value is set to less than or equal to 0, it is equivalent to disabling cache breakdown optimization.
goosefs.worker.page.store.overhead
0.1
Reserved space for Page storage. The default value is 0.1, which means 10 percent of the space is reserved as reserved space. When the water level is reached, eviction is triggered.
goosefs.worker.page.store.evictor.class
com.qcloud.cos.goosefs.client.file.cache.evictor.LRUCacheEvictor
The elimination algorithm for Page storage, with supported options:
com.qcloud.cos.goosefs.client.file.cache.evictor.LRUCacheEvictor;
com.qcloud.cos.goosefs.client.file.cache.evictor.LFUCacheEvictor.
goosefs.worker.page.store.eviction.retries
10
Maximum elimination attempt count, default is 10 times.
goosefs.worker.page.store.evictor.lfu.logbase
2.0
Specify the LogBase for the LFU elimination algorithm.
goosefs.worker.page.store.local.store.file.buckets
1000
Number of Hash buckets for storing Paged Block directories, default is 1000.

About Hierarchical Storage

GooseFS does not support tiered storage under the Page Store storage type, meaning only the default first-layer storage medium is available. This design is due to the high overhead of indexing and flowing discrete Pages between multi-level storage, which significantly impacts Worker node resource usage and performance.

Ajuda e Suporte

Esta página foi útil?

comentários