COPY TO
COPY TO outputs results from Materialize to standard output or object storage.
This command is useful to output SUBSCRIBE results
to stdout, or perform bulk exports to Amazon S3.
Syntax
Copy to stdout
Copying results to stdout is useful to output the stream of updates from a
SUBSCRIBE command in interactive SQL clients like psql.
COPY ( <query> ) TO STDOUT [WITH ( <option> = <val> )];
| Syntax element | Description | ||||
|---|---|---|---|---|---|
<query>
|
The SELECT or SUBSCRIBE query whose results are copied.
|
||||
WITH ( <option> = <val> )
|
Optional. The following
|
Copy to Amazon S3 and S3 compatible services
Copying results to Amazon S3 (or S3-compatible services) is useful to perform tasks like periodic backups for auditing, or downstream processing in analytical data warehouses like Snowflake, Databricks or BigQuery. For step-by-step instructions, see the integration guide for Amazon S3.
The COPY TO command is one-shot: every time you want to export results, you
must run the command. To automate exporting results on a regular basis, you can
set up scheduling, for example using a simple cron-like service or an
orchestration platform like Airflow or Dagster.
COPY <query> TO '<s3_uri>'
WITH (
AWS CONNECTION = <connection_name>,
FORMAT = <format>
[, MAX FILE SIZE = <size> ]
);
| Syntax element | Description | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<query>
|
The SELECT query whose results are copied.
|
||||||||||||||||||||||||
<s3_uri>
|
The unique resource identifier (URI) of the Amazon S3 bucket (and prefix) to store the output results in. | ||||||||||||||||||||||||
AWS CONNECTION = <connection_name>
|
The name of the AWS connection to use in the COPY TO command. For details on creating connections, check the CREATE CONNECTION documentation page.
|
||||||||||||||||||||||||
FORMAT = '<format>'
|
The file format to write. Valid formats are
|
||||||||||||||||||||||||
[MAX FILE SIZE = <size>]
|
Optional. Sets the approximate maximum file size (in bytes) of each file uploaded to the S3 bucket. |
Details
Copy to S3: CSV
Writer settings
For 'csv' format, Materialize writes CSV files using the following
writer settings:
| Setting | Value |
|---|---|
| delimiter | , |
| quote | " |
| escape | " |
| header | false |
Copy to S3: Parquet
Writer settings
For 'parquet' format, Materialize writes Parquet files that aim for
maximum compatibility with downstream systems. The following Parquet
writer settings are used:
| Setting | Value |
|---|---|
| Writer version | 1.0 |
| Compression | snappy |
| Default column encoding | Dictionary |
| Fallback column encoding | Plain |
| Dictionary page encoding | Plain |
| Dictionary data page encoding | RLE_DICTIONARY |
If you encounter issues trying to ingest Parquet files produced by Materialize into your downstream systems, please contact our team.
Parquet data types
When using the parquet format, Materialize converts the values in the
result set to Apache Arrow,
and then serializes this Arrow representation to Parquet. The Arrow schema is
embedded in the Parquet file metadata and allows reconstructing the Arrow
representation using a compatible reader.
Materialize also includes Parquet LogicalType annotations
where possible. However, many newer LogicalType annotations are not supported
in the 1.0 writer version.
Materialize also embeds its own type information into the Apache Arrow schema.
The field metadata in the schema contains an ARROW:extension:name annotation
to indicate the Materialize native type the field originated from.
| Materialize type | Arrow extension name | Arrow type | Parquet primitive type | Parquet logical type |
|---|---|---|---|---|
bigint |
materialize.v1.bigint |
int64 |
INT64 |
|
boolean |
materialize.v1.boolean |
bool |
BOOLEAN |
|
bytea |
materialize.v1.bytea |
large_binary |
BYTE_ARRAY |
|
date |
materialize.v1.date |
date32 |
INT32 |
DATE |
double precision |
materialize.v1.double |
float64 |
DOUBLE |
|
integer |
materialize.v1.integer |
int32 |
INT32 |
|
jsonb |
materialize.v1.jsonb |
large_utf8 |
BYTE_ARRAY |
|
map |
materialize.v1.map |
map (struct with fields keys and values) |
Nested | MAP |
list |
materialize.v1.list |
list |
Nested | |
numeric |
materialize.v1.numeric |
decimal128[38, 10 or max-scale] |
FIXED_LEN_BYTE_ARRAY |
DECIMAL |
real |
materialize.v1.real |
float32 |
FLOAT |
|
smallint |
materialize.v1.smallint |
int16 |
INT32 |
INT(16, true) |
text |
materialize.v1.text |
utf8 or large_utf8 |
BYTE_ARRAY |
STRING |
time |
materialize.v1.time |
time64[nanosecond] |
INT64 |
TIME[isAdjustedToUTC = false, unit = NANOS] |
uint2 |
materialize.v1.uint2 |
uint16 |
INT32 |
INT(16, false) |
uint4 |
materialize.v1.uint4 |
uint32 |
INT32 |
INT(32, false) |
uint8 |
materialize.v1.uint8 |
uint64 |
INT64 |
INT(64, false) |
timestamp |
materialize.v1.timestamp |
time64[microsecond] |
INT64 |
TIMESTAMP[isAdjustedToUTC = false, unit = MICROS] |
timestamp with time zone |
materialize.v1.timestampz |
time64[microsecond] |
INT64 |
TIMESTAMP[isAdjustedToUTC = true, unit = MICROS] |
Arrays ([]) |
materialize.v1.array |
struct with list field items and uint8 field dimensions |
Nested | |
uuid |
materialize.v1.uuid |
fixed_size_binary(16) |
FIXED_LEN_BYTE_ARRAY |
|
oid |
Unsupported | |||
interval |
Unsupported | |||
record |
Unsupported |
Privileges
The privileges required to execute this statement are:
USAGEprivileges on the schemas that all relations and types in the query are contained in.SELECTprivileges on all relations in the query.- NOTE: if any item is a view, then the view owner must also have the necessary privileges to execute the view definition. Even if the view owner is a superuser, they still must explicitly be granted the necessary privileges.
USAGEprivileges on all types used in the query.USAGEprivileges on the active cluster.
Examples
Copy to stdout
COPY (SUBSCRIBE some_view) TO STDOUT WITH (FORMAT binary);
Copy to S3
File format Parquet
COPY some_view TO 's3://mz-to-snow/parquet/'
WITH (
AWS CONNECTION = aws_role_assumption,
FORMAT = 'parquet'
);
For 'parquet' format, Materialize writes Parquet files that aim for
maximum compatibility with downstream systems. The following Parquet
writer settings are used:
| Setting | Value |
|---|---|
| Writer version | 1.0 |
| Compression | snappy |
| Default column encoding | Dictionary |
| Fallback column encoding | Plain |
| Dictionary page encoding | Plain |
| Dictionary data page encoding | RLE_DICTIONARY |
If you encounter issues trying to ingest Parquet files produced by Materialize into your downstream systems, please contact our team.
See also Copy to S3: Parquet Data Types.
File format CSV
COPY some_view TO 's3://mz-to-snow/csv/'
WITH (
AWS CONNECTION = aws_role_assumption,
FORMAT = 'csv'
);
For 'csv' format, Materialize writes CSV files using the following
writer settings:
| Setting | Value |
|---|---|
| delimiter | , |
| quote | " |
| escape | " |
| header | false |
Related pages
CREATE CONNECTION- Integration guides: