tencent cloud

Tencent Cloud TI Platform

Product Introduction
Overview
Product Pricing
Benefits to Customers
Use Cases
Purchase Guide
Billing Overview
Purchase Mode
Renewal Instructions
Overdue Payment Instructions
Security Compliance
Data Security Protection Mechanism
Monitoring, Auditing, and Logging
Security Compliance Qualifications
Quick Start
Platform Usage Preparation
Operation Guide
Model Hub
Task-Based Modeling
Dev Machine
Model Management
Model Evaluation
Online Services
Resource Group Management
Managing Data Sources
Tikit
GPU Virtualization
Practical Tutorial
Deploying and Reasoning of LLM
LLM Training and Evaluation
Built-In Training Image List
Custom Training Image Specification
Angel Training Acceleration Feature Introduction
Implementing Resource Isolation Between Sub-users Based on Tags
API Documentation
History
Introduction
API Category
Making API Requests
Online Service APIs
Data Types
Error Codes
Related Agreement
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Open-Source Software Information
Contact Us

Resource Guide for Large Model Training

PDF
Focus Mode
Font Size
Last updated: 2025-05-09 15:54:30
This document aims to introduce the configuration resources that can guarantee the normal running of the model when performing large-scale model training on the TI-ONE platform, for your reference only.
The following are recommended resources for training the platform built-in open-source large model.

Recommended Resources (SFT-FULL)
BatchSize=1,MaxSequenceLength=2048
Recommended Resources (SFT-LORA)
BatchSize=1,MaxSequenceLength=2048
Models below 7B
HCCPNV6 Model: 1 card for models below 3b; 2 cards for 7b/8b models;
HCCPNV6 model: 1 GPU
13b model
HCCPNV6 model: 4 GPUs
HCCPNV6 model: 1 card
32b model
HCCPNV6 model: 8 GPUs
HCCPNV6 model: 2 GPUs
70b model
HCCPNV6 model: 2 machines with 16 GPUs
HCCPNV6 model: 4 GPUs
DeepSeek-R1-671b/DeepSeek-V3-671b
HCCPNV6 model: 32 machines with 256 GPUs
Not supported.
Hunyuan-large
HCCPNV6 model: 8 machines with 64 GPUs
HCCPNV6 model: 8 GPUs
The platform built-in open-source large model uses the LORA fine-tuning method by default, which can be configured through the FinetuningType parameter.
The 7b model requires 100 cores and 500g of memory on a single node; the 13b and 70b models require 150 cores and 1t of memory on a single node. It is recommended to use the complete machine resources for larger models.
Some models use tilearn acceleration technology, which can achieve about 30% acceleration effect when training on recommended resources.



Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback