tencent cloud

Tencent Cloud TI Platform

Product Introduction
Overview
Product Pricing
Benefits to Customers
Use Cases
Purchase Guide
Billing Overview
Purchase Mode
Renewal Instructions
Overdue Payment Instructions
Security Compliance
Data Security Protection Mechanism
Monitoring, Auditing, and Logging
Security Compliance Qualifications
Quick Start
Platform Usage Preparation
Operation Guide
Model Hub
Task-Based Modeling
Dev Machine
Model Management
Model Evaluation
Online Services
Resource Group Management
Managing Data Sources
Tikit
GPU Virtualization
Practical Tutorial
Deploying and Reasoning of LLM
LLM Training and Evaluation
Built-In Training Image List
Custom Training Image Specification
Angel Training Acceleration Feature Introduction
Implementing Resource Isolation Between Sub-users Based on Tags
API Documentation
History
Introduction
API Category
Making API Requests
Online Service APIs
Data Types
Error Codes
Related Agreement
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Open-Source Software Information
Contact Us

Custom Training Image Specification

PDF
Focus Mode
Font Size
Last updated: 2025-05-09 15:58:08
If the built-in images on the platform do not meet your requirements, you can also use a custom image to create training tasks and Notebook instances. The following is a Dockerfile example for a custom image:

Basic Image Specification

To enable a custom image to start a training task in task-based modeling, the image must have the openssh-server component installed. An example is provided below:
# Self-modify the basic image
FROM ubuntu:20.04

# Install openssh-server
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd
Note:
If the basic image is a centos system, use yum/dnf for package management and self-adjust the installation command.

Notebook Image Specification

To enable a custom image to launch an instance in Notebook, besides meeting the above basic image specifications, it is also required to install the JupyterLab component and set the appropriate /opt/dl/run startup script. An example is as follows:
# Self-modify the basic image
FROM ubuntu:20.04

# Install openssh-server
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd

# Install python3, pip3
RUN apt-get update && apt-get install -y python3.8 python3.8-distutils curl && \\
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \\
python3.8 get-pip.py && rm -f get-pip.py

# Install jupyterlab
RUN pip3 install jupyterlab

# Configure the /opt/dl/run startup script
RUN mkdir -p /opt/dl && echo "cd /home/tione/notebook && jupyter lab --allow-root --no-browser --ip=0.0.0.0 --port=8888 --notebook-dir=/home/tione/notebook --NotebookApp.allow_origin='*' --NotebookApp.token=''" > /opt/dl/run && chmod a+x /opt/dl/run
Note:
If the basic image already includes the corresponding package, skip the corresponding installation command.
Here, /home/tione/notebook is the default disk mounting path for the notebook. Whether this path exists in the mirror does not affect its use on the platform.

Complete Custom Image Dockerfile Example

Here's a complete custom image example based on NVIDIA's PyTorch image that supports creating training tasks and Notebook instances. It is recommended to directly create a custom image based on this example:
# [Recommended] Use NVIDIA's PyTorch image as the basic image to be compatible with newer open-source libraries and GPU card types.
FROM nvcr.io/nvidia/pytorch:23.07-py3

# [Recommended] Modify the software source (if using in Tencent Cloud, it is recommended to use the private network source).
# [Tencent Public Network Software Source] mirrors.tencent.com
# [Tencent Cloud Private Network Software Source] mirrors.tencentyun.com
ENV TENCENT_MIRRORS="mirrors.tencentyun.com"
RUN sed -i "s/archive.ubuntu.com/${TENCENT_MIRRORS}/g" /etc/apt/sources.list && \\
sed -i "s/security.ubuntu.com/${TENCENT_MIRRORS}/g" /etc/apt/sources.list && \\
pip config set global.index-url http://${TENCENT_MIRRORS}/pypi/simple && \\
pip config set global.no-cache-dir true && \\
pip config set global.trusted-host ${TENCENT_MIRRORS}

# [Recommended] If using NVIDIA's PyTorch mirror, it is recommended to delete the default NVIDIA source to speed up pip package query and installation.
RUN rm /etc/xdg/pip/pip.conf /etc/pip.conf /root/.pip/pip.conf /root/.config/pip/pip.conf && pip config unset global.extra-index-url

# [Basic Image Specification] Install openssh-server. The SSH login functionality of notebook and task-based modeling both depend on the openssh-server component.
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd

# [Notebook Image Specification] Configure the /opt/dl/run startup entry
RUN mkdir -p /opt/dl && echo "cd /home/tione/notebook && jupyter lab --allow-root --no-browser --ip=0.0.0.0 --port=8888 --notebook-dir=/home/tione/notebook --NotebookApp.allow_origin='*' --NotebookApp.token=''" > /opt/dl/run && chmod a+x /opt/dl/run

# [Recommended] Use tini as the entrypoint to facilitate reclaiming zombie processes
RUN apt-get update && apt-get install -y tini && apt-get clean
ENTRYPOINT ["/usr/bin/tini", "-g", "--"]

# [Optional - Recommended installation when using HCC-GPU instances] TCCL RDMA communication optimization
# (If using NVIDIA's PyTorch mirror, need to delete the pre-installed NCCL plugin in /opt/hpcx/nccl_rdma_sharp_plugin/lib)
RUN wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins_1.2_amd64.deb && \\
dpkg -i nccl-rdma-sharp-plugins_1.2_amd64.deb && rm -f nccl-rdma-sharp-plugins_1.2_amd64.deb && \\
rm -rf /opt/hpcx/nccl_rdma_sharp_plugin/lib/*

# [Optional] Install Tikit (excluding big data components)
RUN pip install tencentcloud-sdk-python==3.0.955 coscmd==1.8.6.31 && \\
pip install --no-dependencies -U tikit

# [Custom] Install required dependency libraries.
RUN pip3 install accelerate==0.21.0 bitsandbytes==0.40.2 datasets==2.14.1 deepspeed==0.10.0 evaluate==0.4.0 peft==0.4.0 protobuf==3.20.3 scipy==1.10.1 sentencepiece==0.1.99 transformers==4.31.0
Note:
Recommend using Tencent Cloud's software source for Custom Image to support faster installation speed. The above example already includes the configuration method. To configure another software source, see Tencent Cloud Software Source to Accelerate Software Package Download and Update.
If you need to use HCC - GPU instances for multi - machine training, it is recommended to install the TCCL plug - in in the suggested image to optimize RDMA communication under Tencent Cloud StarPulse network. The above example has included the configuration method. If you want to use other installation methods, see Installation Instructions of TCCL for GPU Instances.
It is recommended to install Tikit in the Notebook image to easily submit training tasks. The above example already includes the simplest installation method. If you need to use big data components and require the full installation of Tikit, see Tikit Installation and Initialization for more information.
Training images do not currently support variables declared in bash configuration files such as bashrc.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback