S3 Compatible Object Storage
This guide walks you through the steps required to export results from Materialize to an S3 compatible object storage service, such as Google Cloud Storage, or Cloudflare R2.
Before you begin:
- Make sure that you have setup your bucket
- Obtain the S3 compatible URI for your bucket, as well as S3 access tokens (
ACCESS_KEY_ID
andSECRET_ACCESS_KEY
). Instructions to obtain these vary by provider.
Step 1. Create a connection
-
In the SQL Shell, or your preferred SQL client connected to Materialize, create an AWS connection, replacing
<ACCESS_KEY_ID>
and<SECRET_ACCESS_KEY>
with the credentials for your bucket. The AWS connection can be used to connect to any S3 compatible object storage service.CREATE SECRET secret_access_key AS '<SECRET_ACCESS_KEY>'; CREATE CONNECTION bucket_connection TO AWS ( ACCESS KEY ID = '<ACCESS_KEY_ID>', SECRET ACCESS KEY = SECRET secret_access_key );
-
Validate the connection you created using the
VALIDATE CONNECTION
command.VALIDATE CONNECTION bucket_connection;
If no validation error is returned, you’re ready to use this connection to run a bulk export from Materialize to your target bucket.
Step 2. Run a bulk export
To export data to your target bucket, use the COPY TO
command, and the AWS connection you created in the previous step. Replace the ‘<S3_BUCKET_URI>’
with the S3 compatible URI for your target bucket.
COPY some_object TO '<S3_BUCKET_URI>'
WITH (
AWS CONNECTION = bucket_connection,
FORMAT = 'parquet'
);
For details on the Parquet writer settings Materialize uses, as well as data type support and conversion, check the reference documentation.
COPY some_object TO '<S3_BUCKET_URI>'
WITH (
AWS CONNECTION = bucket_connection,
FORMAT = 'csv'
);
Step 3. (Optional) Add scheduling
Bulk exports to object storage using the COPY TO
command are one-shot: every time
you want to export results, you must run the command. To automate running bulk
exports on a regular basis, you can set up scheduling, for example using a
simple cron
-like service or an orchestration platform like Airflow or
Dagster.