tencent cloud

Tencent Cloud TCHouse-P

Release Notes
Product Introduction
Overview
Use Cases
Node Specification
Cluster Architecture
Purchase Guide
Billing Overview
Purchase Methods
Payment Overdue
Getting Started
Operation Guide
Managing Cluster
Accessing Data Warehouse
Monitoring and Alarming
Access Management
Performance Metrics
TPC-B
Tools and Downloads
Data Ingestion
Importing TencentDB Data Offline with DataX
Syncing Incremental Data from MySQL with DataX
Importing and Exporting COS Data at High Speed with External Table
Syncing EMR Data with External Table
Implementing CDWPG UPSERT with Rule
Data Warehouse Development
Creating Airflow in Cloud
API Documentation
History
Introduction
API Category
Making API Requests
Information Query APIs
Instance APIs
Query APIs
Cluster Management APIs
Cluster Operation APIs
Data Types
Error Codes
Practical Tutorial
Data Warehouse Table Development
Table Distribution Key Selection
Table Storage Format Selection
Table Partition Usage
Extension Usage
Cold Data Backup
Statistics and Space Maintenance
FAQs
Tencent Cloud TCHouse-P Policy
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
DocumentaçãoTencent Cloud TCHouse-PData IngestionSyncing EMR Data with External Table

Syncing EMR Data with External Table

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2024-11-27 15:36:05

Background

In data warehouse construction, Hive is usually used to process the raw data (at the petabyte level), perform time-consuming ETL jobs, and hand over the results (at the terabyte level) to a quasi-real-time computing engine such as Tencent Cloud TCHouse-P to connect BI tools and present reports in quasi real time.
This document describes how to import data from Hive on EMR to Tencent Cloud TCHouse-P via COS.

Directions

Note:
Tencent Cloud TCHouse-P supports only CSV and GZIP but not ORC and Parquet formats.
The efficiency of importing COS data to Tencent Cloud TCHouse-P depends on the number of files, which is recommended to be N times the number of compute nodes in Tencent Cloud TCHouse-P.
1. Enable EMR's capability to read and write COS data. First, you need to ensure that EMR is able to read and write COS data. You can click Enable COS when creating an EMR instance.
2. Create a Hive local table and write data into it.
create table hive_local_table(c1 int, c2 string, c3 int, c4 string);
insert into hive_local_table values(1001, 'c2', 99, 'c4'),(1002, 'c2', 100, 'c4'),(1003, 'c2', 101, 'c4'),(1004, 'c2', 100, 'c4'),(1005, 'c2', 101, 'c4')
3. Create a Hive COS external table.
create table hive_cos_table(c1 int, c2 string, c3 int, c4 string)
row format delimited fields terminated by ','
LINES TERMINATED BY '\\n'
stored as textfile location 'cosn://{bucket_name}/{dir_name}';
For more information, see Creating Databases Based on COS.
4. Import the local data into COS.
insert into hive_cos_table select * from hive_local_table;
After successful write, you can see the file in the corresponding COS directory.
5. Create a COS external table in Tencent Cloud TCHouse-P.
CREATE READABLE EXTERNAL TABLE snova_cos_table (c1 int, c2 varchar(32), c3 int, c4 varchar(32))
LOCATION('cos:// {BUCKET}-{APPID}.cos.{REGION}.myqcloud.com/{PREFIX} secretKey=**** secretId=***')
FORMAT 'csv';
6. Create a local table in Tencent Cloud TCHouse-P and import data into it.
create table snova_local_table(c1 int, c2 text, c3 int, c4 text);
insert into snova_local_table select * from snova_cos_table;


Ajuda e Suporte

Esta página foi útil?

comentários