tencent cloud

Data Lake Compute

Release Notes
Product Introduction
Overview
Strengths
Use Cases
Purchase Guide
Billing Overview
Refund
Payment Overdue
Configuration Adjustment Fees
Getting Started
Complete Process for New User Activation
DLC Data Import Guide
Quick Start with Data Analytics in Data Lake Compute
Quick Start with Permission Management in Data Lake Compute
Quick Start with Partition Table
Enabling Data Optimization
Cross-Source Analysis of EMR Hive Data
Standard Engine Configuration Guide
Configuring Data Access Policy
Operation Guide
Console Operation Introduction
Development Guide
Runtime Environment
SparkJar Job Development Guide
PySpark Job Development Guide
Query Performance Optimization Guide
UDF Function Development Guide
System Restraints
Client Access
JDBC Access
TDLC Command Line Interface Tool Access
Third-party Software Linkage
Python Access
Practical Tutorial
Accessing DLC Data with Power BI
Table Creation Practice
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Direct Query of DLC Internal Storage with StarRocks
Spark cost optimization practice
DATA + AI
Using DLC to Analyze CLS Logs
Using Role SSO to Access DLC
Resource-Level Authentication Guide
Implementing Tencent Cloud TCHouse-D Read and Write Operations in DLC
DLC Native Table
SQL Statement
SuperSQL Statement
Overview of Standard Spark Statement
Overview of Standard Presto Statement
Reserved Words
API Documentation
History
Introduction
API Category
Making API Requests
Data Table APIs
Task APIs
Metadata APIs
Service Configuration APIs
Permission Management APIs
Database APIs
Data Source Connection APIs
Data Optimization APIs
Data Engine APIs
Resource Group for the Standard Engine APIs
Data Types
Error Codes
General Reference
Error Codes
Quotas and limits
Operation Guide on Connecting Third-Party Software to DLC
FAQs
FAQs on Permissions
FAQs on Engines
FAQs on Features
FAQs on Spark Jobs
DLC Policy
Privacy Policy
Data Privacy And Security Agreement
Service Level Agreement
Contact Us

Python Access

PDF
Focus Mode
Font Size
Last updated: 2024-07-31 17:33:57
DLC offers tools compliant with the DBAPI 2.0 standard. You can connect to DLC's Presto/Spark engine via Python, allowing for convenient SQL operations on DLC database tables.

Environment preparations

1. Python 3.9 or higher version.
2. Install tencentcloud-dlc-connector.
pip install -i https://mirrors.tencent.com/pypi/simple/ tencentcloud-dlc-connector

Usage Examples

Step 1: Connect to the engine

Code:
import tdlc_connector
import datetime
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
token=None,
endpoint=None,
catalog=constants.Catalog.DATALAKECATALOG,
engine="<ENGINE>",
engine_type=constants.EngineType.AUTO,
result_style=constants.ResultStyles.LIST,
download=False,
mode=constants.Mode.LASY,
database='',
config={},
callback=None,
callback_events=None,
)
Parameter description:
Parameter
Description
region
Engine Region, such as ap-nanjing, ap-beijing, ap-guangzhou, ap-shanghai, ap-chengdu, ap-chongqing, na-siliconvalley, ap-singapore, ap-hongkong
secret_id
Tencent Cloud SecretID
secret_key
Tencent Cloud SecretKey
token
(Optional) Temporary Secret Token
endpoint
(Optional) Connect to the service node
engine
Engine name used, for example "test_python"
engine_type
(Optional) Engine type: corresponding to the engine type of the engine name, default value constants.EngineType.AUTO
For example: AUTO, PRESTO, SPARK, SPARK_BATCH
result_style
(Optional) Format of the returned result, options are LIST/DICT
download
(Optional) Whether to download the data directly True/False, see Download Mode Description
mode
(Optional) Mode. Supports ALL/LAZY/STREAM
database
(Optional) Default database
config
(Optional) Submit to cluster configuration
callback
(Optional) Callback function, function signature def cb(statement_id, status)
callback_events
(Optional) Callback trigger event, used in conjunction with callback, see callback mechanism description for details
driver_size
(Optional) Driver node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters)
Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
executor_size
(Optional) Executor node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters) Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
executor_num
(Optional) Number of Executor nodes, default value 1 (Only valid for SPARK_BATCH clusters)
executor_max_num
(Optional) Maximum number of Executor nodes, if not equal to executor_num, then enable Dynamic Resource Allocation (Only valid for SPARK_BATCH clusters)

Download Mode Explanation:

Serial number
download
mode
Description
1
False
ALL
Fetch all data from the interface, can only fetch data after completion
2
False
LASY
Fetch data from the interface, delay fetching data to the server based on the amount fetched
3
False
STREAM
Same as LASY mode
4
True
ALL
Download all results from COS (requires COS read permission) using local temporary storage, recommended for large data volumes
5
True
LASY
Download results from COS (requires COS read permission), delay downloading files based on fetch data volume
6
True
STREAM
Read result stream from COS in real-time (requires COS read permission), slower performance, extremely low local memory disk usage ratio

Step 2: Execute SQL

Code:
# Basic Operations

cursor = conn.cursor()
count = cursor.execute("SELECT 1")
print(cursor.fetchone()) # Read one line of data
for row in cursor.fetchall(): # Read the remaining multiple lines of data
print(row)

# Use the pyformat format

cursor.execute("SELECT * FROM dummy WHERE date < %s", datetime.datetime.now())
cursor.execute("SELECT * FROM dummy WHERE status in %s", (('SUCCESS', 'INIT', 'FAIL'),))
cursor.execute("SELECT * FROM dummy WHERE date < %(date)s AND status = %(status)s", {'date': datetime.datetime.now(), 'status': 'SUCCESS'})

# Use BULK method

cursor.executemany("INSERT INTO dummy VALUES(%s, %s)", [('Zhang San', 18), ('Li Si', 20)])

Basic Operation Procedure

The process of the aforementioned code is as follows:
1. A cursor object is created with conn.cursor().
2. A SQL query statement is executed with cursor.execute("SELECT 1"), and the result is assigned to the variable count.
3. A line of data is read through the cursor.fetchone() method and printed out.

Characteristic function

Description of the callback mechanism

import tdlc_connector
import datetime
from tdlc_connector import constants


def tdlc_connector_callback(statement_id, state):
'''
parmas: statement_id Quest id
params: state Task status. The enumeration value is constants.TaskStatus
'''
print(statement_id, state)


conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
engine_type=constants.EngineType.SPARK,
result_style=constants.ResultStyles.LIST,
callback=tdlc_connector_callback,
callback_events=[constants.CallbackEvent.ON_INIT, constants.CallbackEvent.ON_SUCCESS]
)

cursor = conn.cursor()
cursor.execute("SELECT 1")
cursor.fetchone()

# The callback function is called when the task is initialized and the task is successful


Submit the task to the job cluster

Currently, you can submit tasks to the Spark job cluster. For details, see the following example.
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>", # Select the spark job engine
result_style=constants.ResultStyles.LIST,
driver_size=constants.PodSize.SMALL, # Select Driver Specifications
executor_size=constants.PodSize.SMALL, # Select the Executor specification
executor_num=1, # Set the number of Executors
executor_max_num=1, # Set the maximum number of executors. If it is not equal to {executor_num}, enable dynamic resource allocation
)
Note:
Upgrade the connector to >= 1.1.0 to use this feature.

Automatically infer engine type

You do not need to specify the engine type. The connector will automatically infer the engine type. For details, see the following example.
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
engine_type=constants.EngineType.AUTO # This parameter can be set to AUTO or not to drive automatic inference
)
Note:
Upgrade the connector to >= 1.1.0 to use this feature.

Null conversion

The current result set is stored in CSV format, the engine will convert the null value into an empty string by default, if you need to distinguish the null value, please specify the null value symbol, such as "\\1", the engine query result will convert the null value into "\\1", while the driver will convert the "\\1" field into None, please refer to the following example.
from tdlc_connector import constants, formats

formats.FORMAT_STRING_NULL = '\\1'

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
result_style=constants.ResultStyles.LIST
)
Note:
Null conversion currently only supports SparkSQL clusters. Upgrade connector to >= 1.1.3.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback