title: "COPY TO"
description: "COPY TO
outputs results from Materialize to standard output or object storage."
menu:
main:
parent: "commands"
COPY TO
outputs results from Materialize to standard output or object storage.
This command is useful to output SUBSCRIBE
results
to stdout
, or perform bulk exports to Amazon S3.
stdout
Copying results to stdout
is useful to output the stream of updates from a
SUBSCRIBE
command in interactive SQL clients like psql
.
{{< diagram "copy-to-stdout.svg" >}}
Field | Use |
---|---|
query | The SELECT or SUBSCRIBE query to output results for. |
field | The name of the option you want to set. |
val | The value for the option. |
WITH
options {#copy-to-stdout-with-options}Name | Values | Default value | Description |
---|---|---|---|
FORMAT |
TEXT ,BINARY , CSV |
TEXT |
Sets the output formatting method. |
COPY (SUBSCRIBE some_view) TO STDOUT WITH (FORMAT binary);
{{< public-preview />}}
Copying results to Amazon S3 (or S3-compatible services) is useful to perform tasks like periodic backups for auditing, or downstream processing in analytical data warehouses like Snowflake, Databricks or BigQuery. For step-by-step instructions, see the integration guide for Amazon S3.
The COPY TO
command is one-shot: every time you want to export results, you
must run the command. To automate exporting results on a regular basis, you can
set up scheduling, for example using a simple cron
-like service or an
orchestration platform like Airflow or Dagster.
{{< diagram "copy-to-s3.svg" >}}
Field | Use |
---|---|
query | The SELECT query to copy results out for. |
_objectname | The name of the object to copy results out for. |
AWS CONNECTION _connectionname | The name of the AWS connection to use in the COPY TO command. For details on creating connections, check the CREATE CONNECTION documentation page. |
_s3uri | The unique resource identifier (URI) of the Amazon S3 bucket (and prefix) to store the output results in. |
FORMAT | The file format to write. |
field | The name of the option you want to set. |
val | The value for the option. |
WITH
options {#copy-to-s3-with-options}Name | Values | Default value | Description |
---|---|---|---|
MAX FILE SIZE |
integer |
Sets the approximate maximum file size (in bytes) of each file uploaded to the S3 bucket. |
Syntax: FORMAT = 'csv'
By default, Materialize writes CSV files using the following writer settings:
Setting | Value |
---|---|
delimiter | , |
quote | " |
escape | " |
header | false |
Syntax: FORMAT = 'parquet'
Materialize writes Parquet files that aim for maximum compatibility with downstream systems. By default, the following Parquet writer settings are used:
Setting | Value |
---|---|
Writer version | 1.0 |
Compression | snappy |
Default column encoding | Dictionary |
Fallback column encoding | Plain |
Dictionary page encoding | Plain |
Dictionary data page encoding | RLE_DICTIONARY |
If you run into a snag trying to ingest Parquet files produced by Materialize into your downstream systems, please contact our team or open a bug report!
Materialize converts the values in the result set to Apache Arrow, and then serializes this Arrow representation to Parquet. The Arrow schema is embedded in the Parquet file metadata and allows reconstructing the Arrow representation using a compatible reader.
Materialize also includes Parquet LogicalType
annotations
where possible. However, many newer LogicalType
annotations are not supported
in the 1.0 writer version.
Materialize also embeds its own type information into the Apache Arrow schema.
The field metadata in the schema contains an ARROW:extension:name
annotation
to indicate the Materialize native type the field originated from.
Materialize type | Arrow extension name | Arrow type | Parquet primitive type | Parquet logical type |
---|---|---|---|---|
bigint |
materialize.v1.bigint |
int64 |
INT64 |
|
boolean |
materialize.v1.boolean |
bool |
BOOLEAN |
|
bytea |
materialize.v1.bytea |
large_binary |
BYTE_ARRAY |
|
date |
materialize.v1.date |
date32 |
INT32 |
DATE |
double precision |
materialize.v1.double |
float64 |
DOUBLE |
|
integer |
materialize.v1.integer |
int32 |
INT32 |
|
jsonb |
materialize.v1.jsonb |
large_utf8 |
BYTE_ARRAY |
|
map |
materialize.v1.map |
map (struct with fields keys and values ) |
Nested | MAP |
list |
materialize.v1.list |
list |
Nested | |
numeric |
materialize.v1.numeric |
decimal128[38, 10 or max-scale] |
FIXED_LEN_BYTE_ARRAY |
DECIMAL |
real |
materialize.v1.real |
float32 |
FLOAT |
|
smallint |
materialize.v1.smallint |
int16 |
INT32 |
INT(16, true) |
text |
materialize.v1.text |
utf8 or large_utf8 |
BYTE_ARRAY |
STRING |
time |
materialize.v1.time |
time64[nanosecond] |
INT64 |
TIME[isAdjustedToUTC = false, unit = NANOS] |
uint2 |
materialize.v1.uint2 |
uint16 |
INT32 |
INT(16, false) |
uint4 |
materialize.v1.uint4 |
uint32 |
INT32 |
INT(32, false) |
uint8 |
materialize.v1.uint8 |
uint64 |
INT64 |
INT(64, false) |
timestamp |
materialize.v1.timestamp |
time64[microsecond] |
INT64 |
TIMESTAMP[isAdjustedToUTC = false, unit = MICROS] |
timestamp with time zone |
materialize.v1.timestampz |
time64[microsecond] |
INT64 |
TIMESTAMP[isAdjustedToUTC = true, unit = MICROS] |
Arrays ([] ) |
materialize.v1.array |
struct with list field items and uint8 field dimensions |
Nested | |
uuid |
materialize.v1.uuid |
fixed_size_binary(16) |
FIXED_LEN_BYTE_ARRAY |
|
oid |
Unsupported | |||
interval |
Unsupported | |||
record |
Unsupported |
{{< tabs >}} {{< tab "Parquet">}}
COPY some_view TO 's3://mz-to-snow/parquet/'
WITH (
AWS CONNECTION = aws_role_assumption,
FORMAT = 'parquet'
);
{{< /tab >}}
{{< tab "CSV">}}
COPY some_view TO 's3://mz-to-snow/csv/'
WITH (
AWS CONNECTION = aws_role_assumption,
FORMAT = 'csv'
);
{{< /tab >}} {{< /tabs >}}
The privileges required to execute this statement are:
{{< include-md file="shared-content/sql-command-privileges/copy-to.md" >}}
CREATE CONNECTION