# SQL commands SQL commands reference. ## Create/Alter/Drop Objects | CREATE | ALTER | DROP | | --- | --- | --- | | [`CREATE CLUSTER`](/sql/create-cluster) | [`ALTER CLUSTER`](/sql/alter-cluster) | [`DROP CLUSTER`](/sql/drop-cluster) | | [`CREATE CLUSTER REPLICA`](/sql/create-cluster-replica) | [`ALTER CLUSTER REPLICA`](/sql/alter-cluster-replica) | [`DROP CLUSTER REPLICA`](/sql/drop-cluster-replica) | | [`CREATE CONNECTION`](/sql/create-connection) | [`ALTER CONNECTION`](/sql/alter-connection) | [`DROP CONNECTION`](/sql/drop-connection) | | [`CREATE DATABASE`](/sql/create-database) | [`ALTER DATABASE`](/sql/alter-database) | [`DROP DATABASE`](/sql/drop-database) | | [`CREATE INDEX`](/sql/create-index) | [`ALTER INDEX`](/sql/alter-index) | [`DROP INDEX`](/sql/drop-index) | | [`CREATE MATERIALIZED VIEW`](/sql/create-materialized-view) | [`ALTER MATERIALIZED VIEW`](/sql/alter-materialized-view) | [`DROP MATERIALIZED VIEW`](/sql/drop-materialized-view) | | [`CREATE NETWORK POLICY`](/sql/create-network-policy) | [`ALTER NETWORK POLICY`](/sql/alter-network-policy) | [`DROP NETWORK POLICY`](/sql/drop-network-policy) | | [`CREATE ROLE`](/sql/create-role) | [`ALTER ROLE`](/sql/alter-role) | [`DROP ROLE`](/sql/drop-role)
[`DROP USER`](/sql/drop-user) | | [`CREATE SCHEMA`](/sql/create-schema) | [`ALTER SCHEMA`](/sql/alter-schema) | [`DROP SCHEMA`](/sql/drop-schema) | | [`CREATE SECRET`](/sql/create-secret) | [`ALTER SECRET`](/sql/alter-secret) | [`DROP SECRET`](/sql/drop-secret) | | [`CREATE SINK`](/sql/create-sink) | [`ALTER SINK`](/sql/alter-sink) | [`DROP SINK`](/sql/drop-sink) | | [`CREATE SOURCE`](/sql/create-source) | [`ALTER SOURCE`](/sql/alter-source) | [`DROP SOURCE`](/sql/drop-source) | | [`CREATE TABLE`](/sql/create-table) | [`ALTER TABLE`](/sql/alter-table) | [`DROP TABLE`](/sql/drop-table) | | [`CREATE TYPE`](/sql/create-type) | [`ALTER TYPE`](/sql/alter-type) | [`DROP TYPE`](/sql/drop-type) | | [`CREATE VIEW`](/sql/create-view) | [`ALTER VIEW`](/sql/alter-view) | [`DROP VIEW`](/sql/drop-view) | ## Create/Read/Update/Delete Data The following commands perform CRUD operations on materialized views, views, sources, and tables: | Select/Subscribe | - [`SELECT`](/sql/select) - [`SUBSCRIBE`](/sql/subscribe) | | Cursor | - [`CLOSE`](/sql/close) - [`DECLARE`](/sql/declare) - [`FETCH`](/sql/fetch) | | Sink | - [`ALTER SINK`](/sql/alter-sink) - [`CREATE SINK`](/sql/create-sink) - [`DROP SINK`](/sql/drop-sink) | | Transactions | - [`BEGIN`](/sql/begin) - [`COMMIT`](/sql/commit) - [`ROLLBACK`](/sql/rollback) | | Copy | - [`COPY FROM`](/sql/copy-from) - [`COPY TO`](/sql/copy-to) | ## RBAC Commands to manage roles and privileges and owners: | Roles | - [`ALTER ROLE`](/sql/alter-role) - [`CREATE ROLE`](/sql/create-role) - [`DROP ROLE`](/sql/drop-role)
[`DROP USER`](/sql/drop-user) - [`GRANT ROLE`](/sql/grant-role) - [`REVOKE ROLE`](/sql/revoke-role) - [`SHOW ROLES`](/sql/show-roles) | | Privileges | - [`ALTER DEFAULT PRIVILEGES`](/sql/alter-default-privileges) - [`GRANT PRIVILEGE`](/sql/grant-privilege) - [`REVOKE PRIVILEGE`](/sql/revoke-privilege) | | Owners | - [`ALTER CLUSTER`](/sql/alter-cluster) - [`ALTER CLUSTER REPLICA`](/sql/alter-cluster-replica) - [`ALTER CONNECTION`](/sql/alter-connection) - [`ALTER DATABASE`](/sql/alter-database) - [`ALTER MATERIALIZED VIEW`](/sql/alter-materialized-view) - [`ALTER SCHEMA`](/sql/alter-schema) - [`ALTER SECRET`](/sql/alter-secret) - [`ALTER SINK`](/sql/alter-sink) - [`ALTER SOURCE`](/sql/alter-source) - [`ALTER TABLE`](/sql/alter-table) - [`ALTER TYPE`](/sql/alter-type) - [`ALTER VIEW`](/sql/alter-view) - [`DROP OWNED`](/sql/drop-owned) - [`REASSIGN OWNED`](/sql/reassign-owned) | ## Query Introspection (`Explain`) - [`EXPLAIN ANALYZE`](/sql/explain-analyze) - [`EXPLAIN FILTER PUSHDOWN`](/sql/explain-filter-pushdown) - [`EXPLAIN PLAN`](/sql/explain-plan) - [`EXPLAIN SCHEMA`](/sql/explain-schema) - [`EXPLAIN TIMESTAMP`](/sql/explain-timestamp) ## Object Introspection (`SHOW`) { #show } - [`SHOW`](/sql/show) - [`SHOW CLUSTER REPLICAS`](/sql/show-cluster-replicas) - [`SHOW CLUSTERS`](/sql/show-clusters) - [`SHOW COLUMNS`](/sql/show-columns) - [`SHOW CONNECTIONS`](/sql/show-connections) - [`SHOW CREATE CLUSTER`](/sql/show-create-cluster) - [`SHOW CREATE CONNECTION`](/sql/show-create-connection) - [`SHOW CREATE INDEX`](/sql/show-create-index) - [`SHOW CREATE MATERIALIZED VIEW`](/sql/show-create-materialized-view) - [`SHOW CREATE SINK`](/sql/show-create-sink) - [`SHOW CREATE SOURCE`](/sql/show-create-source) - [`SHOW CREATE TABLE`](/sql/show-create-table) - [`SHOW CREATE TYPE`](/sql/show-create-type) - [`SHOW CREATE VIEW`](/sql/show-create-view) - [`SHOW DATABASES`](/sql/show-databases) - [`SHOW DEFAULT PRIVILEGES`](/sql/show-default-privileges) - [`SHOW INDEXES`](/sql/show-indexes) - [`SHOW MATERIALIZED VIEWS`](/sql/show-materialized-views) - [`SHOW NETWORK POLICIES (Cloud)`](/sql/show-network-policies) - [`SHOW OBJECTS`](/sql/show-objects) - [`SHOW PRIVILEGES`](/sql/show-privileges) - [`SHOW ROLE MEMBERSHIP`](/sql/show-role-membership) - [`SHOW ROLES`](/sql/show-roles) - [`SHOW SCHEMAS`](/sql/show-schemas) - [`SHOW SECRETS`](/sql/show-secrets) - [`SHOW SINKS`](/sql/show-sinks) - [`SHOW SOURCES`](/sql/show-sources) - [`SHOW SUBSOURCES`](/sql/show-subsources) - [`SHOW TABLES`](/sql/show-tables) - [`SHOW TYPES`](/sql/show-types) - [`SHOW VIEWS`](/sql/show-views) ## Session Commands related with session state and configurations: - [`DISCARD`](/sql/discard) - [`RESET`](/sql/reset) - [`SET`](/sql/set) - [`SHOW`](/sql/show) ## Validations - [`VALIDATE CONNECTION`](/sql/validate-connection) ## Prepared Statements - [`DEALLOCATE`](/sql/deallocate) - [`EXECUTE`](/sql/execute) - [`PREPARE`](/sql/prepare) --- ## Namespaces Namespaces are a way to organize Materialize objects logically. In organizations with multiple objects, namespaces help avoid naming conflicts and make it easier to manage objects. ## Namespace hierarchy Materialize follows SQL standard's namespace hierarchy for most objects (for the exceptions, see [Other objects](#other-objects)). | | | |---------------------------| ------------| | 1st/Highest level: | **Database** | | 2nd level: | **Schema** | | 3rd level: |

**Table**
**View**
**Materialized view**
**Connection**

**Source**
**Sink**
**Index**

**Type**
**Function**
**Secret**

| | 4th/Lowest level: | **Column** | Each layer in the hierarchy can contain elements from the level immediately beneath it. That is, - Databases can contain: schemas; - Schemas can contain: tables, views, materialized views, connections, sources, sinks, indexes, types, functions, and secrets; - Tables, views, and materialized views can contain: columns. ### Qualifying names Namespaces enable disambiguation and access to objects across different databases and schemas. Namespaces use the dot notation format (`.....`) and allow you to refer to objects by: - **Fully qualified names** Used to reference objects in a different database (Materialize allows cross-database queries); e.g., ``` . .. .. ... ``` > **Tip:** You can use fully qualified names to reference objects within the same > database (or within the same database and schema). However, for brevity and > readability, you may prefer to use qualified names instead. - **Qualified names** - Used to reference objects within the same database but different schema, use the schema and object name; e.g., ``` . . .

. ``` - Used to reference objects within the same database and schema, use the object name; e.g., ```

. . ``` ## Namespace constraints All namespaces must adhere to [identifier rules](/sql/identifiers). ## Other objects The following Materialize objects exist outside the standard SQL namespace hierarchy: - **Clusters**: Referenced directly by its name. For example, to create a materialized view in the cluster `cluster1`: ```mzsql CREATE MATERIALIZED VIEW mv IN CLUSTER cluster1 AS ...; ``` - **Cluster replicas**: Referenced as `.`. For example, to delete replica `r1` in cluster `cluster1`: ```mzsql DROP CLUSTER REPLICA cluster1.r1 ``` - **Roles**: Referenced by their name. For example, to alter the `manager` role, your SQL statement would be: ```mzsql ALTER ROLE manager ... ``` ### Other object namespace constraints - Two clusters or two roles cannot have the same name. However, a cluster and a role can have the same name. - Replicas can have the same names as long as they belong to different clusters. Materialize automatically assigns names to replicas (e.g., `r1`, `r2`). ## Database details - By default, Materialize regions have a database named `materialize`. - By default, each database has a schema called `public`. - You can specify which database you connect to either when you connect (e.g. `psql -d my_db ...`) or within SQL using [`SET DATABASE`](/sql/set/) (e.g. `SET DATABASE = my_db`). - Materialize allows cross-database queries. --- ## ALTER CLUSTER Use `ALTER CLUSTER` to: - Change configuration of a cluster, such as the `SIZE` or `REPLICATON FACTOR`. - Rename a cluster. - Change owner of a cluster. For completeness, the syntax for `SWAP WITH` operation is provided. However, in general, you will not need to manually perform this operation. ## Syntax `ALTER CLUSTER` has the following syntax variations: **Set a configuration:** ### Set a configuration To set a cluster configuration: ```mzsql ALTER CLUSTER SET ( SIZE = [, REPLICATION FACTOR = ] [, MANAGED = ] [, SCHEDULE = MANUAL|ON REFRESH(...)] ) [WITH ( [,...])] ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the cluster you want to alter. | | `SIZE` | The size of the resource allocations for the cluster. {{< yaml-list column="Cluster size" data="m1_cluster_sizing" numColumns="3" >}} See [Size](#available-sizes) for details as well as legacy sizes available. {{< warning >}} Changing the size of a cluster may incur downtime. For more information, see [Resizing considerations](#resizing). {{< /warning >}} Not available for `ALTER CLUSTER ... RESET` since there is no default `SIZE` value. | | `REPLICATION FACTOR` | Optional.The number of replicas to provision for the cluster. Each replica of the cluster provisions a new pool of compute resources to perform exactly the same computations on exactly the same data. For more information, see [Replication factor considerations](#replication-factor). Default: `1` | | `MANAGED` | Optional. Whether to automatically manage the cluster's replicas based on the configured size and replication factor. If `FALSE`, enables the use of the deprecated [`CREATE CLUSTER REPLICA`](/sql/create-cluster-replica) command. Default: `TRUE` | | `SCHEDULE` | Optional. The [scheduling type](/sql/create-cluster/#scheduling) for the cluster. Valid values are `MANUAL` and `ON REFRESH`. Default: `MANUAL` | | `WITH ([,...])` | The following ``s are supported: \| Option \| Description \| \|--------\|-------------\| \| `WAIT UNTIL READY(...)` \| ***Private preview.** This option has known performance or stability issues and is under activedevelopment.* {{< include-from-yaml data="examples/alter_cluster" name="wait-until-ready-cmd-option" >}} \| \| `WAIT FOR` \| ***Private preview.** This option has known performance or stability issues and is under active development.* A fixed duration to wait for the new replicas to be ready. This option can lead to downtime. As such, we recommend using the `WAIT UNTIL READY` option instead.\| | **Reset to default:** ### Reset to default To reset a cluster configuration back to its default value: ```mzsql ALTER CLUSTER RESET ( REPLICATION FACTOR | MANAGED | SCHEDULE, ... ) ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the cluster you want to alter. | | `REPLICATION FACTOR` | Optional. The number of replicas to provision for the cluster. Default: `1` | | `MANAGED` | Optional. Whether to automatically manage the cluster's replicas based on the configured size and replication factor. Default: `TRUE` | | `SCHEDULE` | Optional. The [scheduling type](/sql/create-cluster/#scheduling) for the cluster. Default: `MANUAL` | **Rename:** ### Rename To rename a cluster: ```mzsql ALTER CLUSTER RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the cluster. | | `` | The new name of the cluster. | > **Note:** You cannot rename system clusters, such as `mz_system` and `mz_catalog_server`. **Change owner:** ### Change owner To change the owner of a cluster: ```mzsql ALTER CLUSTER OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the cluster you want to change ownership of. | | `` | The new owner of the cluster. | To change the owner, you must have ownership of the cluster and membership in the ``. See also [Required privileges](#required-privileges). **Swap with:** ### Swap with > **Important:** Information about the `SWAP WITH` operation is provided for completeness. The > `SWAP WITH` operation is used for blue/green deployments. In general, you will > not need to manually perform this operation. To swap the name of this cluster with another cluster: ```mzsql ALTER CLUSTER SWAP WITH ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the first cluster. | | `` | The name of the second cluster. | ## Considerations ### Resizing > **Tip:** For help sizing your clusters, navigate to **Materialize Console >** > [**Monitoring**](/console/monitoring/)>**Environment Overview**. This page > displays cluster resource utilization and sizing advice. #### Available sizes **M.1 Clusters:** > **Note:** The values set forth in the table are solely for illustrative purposes. > Materialize reserves the right to change the capacity at any time. As such, you > acknowledge and agree that those values in this table may change at any time, > and you should not rely on these values for any capacity planning. | Cluster size | Compute Credits/Hour | Total Capacity | Notes | | --- | --- | --- | --- | | M.1-nano | 0.75 | 26 GiB | | | M.1-micro | 1.5 | 53 GiB | | | M.1-xsmall | 3 | 106 GiB | | | M.1-small | 6 | 212 GiB | | | M.1-medium | 9 | 318 GiB | | | M.1-large | 12 | 424 GiB | | | M.1-1.5xlarge | 18 | 636 GiB | | | M.1-2xlarge | 24 | 849 GiB | | | M.1-3xlarge | 36 | 1273 GiB | | | M.1-4xlarge | 48 | 1645 GiB | | | M.1-8xlarge | 96 | 3290 GiB | | | M.1-16xlarge | 192 | 6580 GiB | Available upon request | | M.1-32xlarge | 384 | 13160 GiB | Available upon request | | M.1-64xlarge | 768 | 26320 GiB | Available upon request | | M.1-128xlarge | 1536 | 52640 GiB | Available upon request | **Legacy cc Clusters:** > **Tip:** In most cases, you **should not** use legacy sizes. [M.1 sizes](#available-sizes) > offer better performance per credit for nearly all workloads. We recommend using > M.1 sizes for all new clusters, and recommend migrating existing > legacy-sized clusters to M.1 sizes. Materialize is committed to supporting > customers during the transition period as we move to deprecate legacy sizes. > The legacy size information is provided for completeness. Valid legacy cc cluster sizes are: * `25cc` * `50cc` * `100cc` * `200cc` * `300cc` * `400cc` * `600cc` * `800cc` * `1200cc` * `1600cc` * `3200cc` * `6400cc` * `128C` * `256C` * `512C` For clusters using legacy cc sizes, resource allocations are proportional to the number in the size name. For example, a cluster of size `600cc` has 2x as much CPU, memory, and disk as a cluster of size `300cc`, and 1.5x as much CPU, memory, and disk as a cluster of size `400cc`. Clusters of larger sizes can process data faster and handle larger data volumes. See also: - [M.1 to cc size mapping](/sql/m1-cc-mapping/). - [Materialize service consumption table](https://materialize.com/pdfs/pricing.pdf). - [Blog:Scaling Beyond Memory: How Materialize Uses Swap for Larger Workloads](https://materialize.com/blog/scaling-beyond-memory/). #### Resource allocation To determine the specific resource allocation for a given cluster size, query the [`mz_cluster_replica_sizes`](/reference/system-catalog/mz_catalog/#mz_cluster_replica_sizes) system catalog table. > **Warning:** The values in the `mz_cluster_replica_sizes` table may change at any > time. You should not rely on them for any kind of capacity planning. #### Downtime Resizing operation can incur downtime unless used with WAIT UNTIL READY option. See [zero-downtime cluster resizing](#zero-downtime-cluster-resizing) for details. #### Zero-downtime cluster resizing You can use the `WAIT UNTIL READY` option to perform a zero-downtime resizing, which incurs **no downtime**. Instead of restarting the cluster, this approach spins up an additional cluster replica under the covers with the desired new size, waits for the replica to be hydrated, and then replaces the original replica. ```sql ALTER CLUSTER c1 SET (SIZE 'M.1-xsmall') WITH (WAIT UNTIL READY (TIMEOUT = '10m', ON TIMEOUT = 'COMMIT')); ``` The `ALTER` statement is blocking and will return only when the new replica becomes ready. This could take as long as the specified timeout. During this operation, any other reconfiguration command issued against this cluster will fail. Additionally, any connection interruption or statement cancelation will cause a rollback — no size change will take effect in that case. > **Note:** Using `WAIT UNTIL READY` requires that the session remain open: you need to > make sure the Console tab remains open or that your `psql` connection remains > stable. > Any interruption will cause a cancellation, no cluster changes will take > effect. ### Replication factor The `REPLICATION FACTOR` option determines the number of replicas provisioned for the cluster. Each replica of the cluster provisions a new pool of compute resources to perform exactly the same computations on exactly the same data. Each replica incurs cost, calculated as `cluster size * replication factor` per second. See [Usage & billing](/administration/billing/) for more details. #### Replication factor and fault tolerance Provisioning more than one replica provides **fault tolerance**. Clusters with multiple replicas can tolerate failures of the underlying hardware that cause a replica to become unreachable. As long as one replica of the cluster remains available, the cluster can continue to maintain dataflows and serve queries. > **Note:** - Each replica incurs cost, calculated as `cluster size * > replication factor` per second. See [Usage & > billing](/administration/billing/) for more details. > - Increasing the replication factor does **not** increase the cluster's work > capacity. Replicas are exact copies of one another: each replica must do > exactly the same work (i.e., maintain the same dataflows and process the same > queries) as all the other replicas of the cluster. > To increase the capacity of a cluster, you must increase its > [size](#resizing). Materialize automatically assigns names to replicas (e.g., `r1`, `r2`). You can view information about individual replicas in the Materialize console and the system catalog. #### Availability guarantees When provisioning replicas, - For clusters sized **under `3200cc`**, Materialize guarantees that all provisioned replicas in a cluster are spread across the underlying cloud provider's availability zones. - For clusters sized at **`3200cc` and above**, even distribution of replicas across availability zones **cannot** be guaranteed. ## Required privileges To execute the `ALTER CLUSTER` command, you need: - Ownership of the cluster. - To rename a cluster, you must also have membership in the ``. - To swap names with another cluster, you must also have ownership of the other cluster. See also: - [Access control (Materialize Cloud)](/security/cloud/access-control/) - [Access control (Materialize Self-Managed)](/security/self-managed/access-control/) ### Rename restrictions You cannot rename system clusters, such as `mz_system` and `mz_catalog_server`. ## Examples ### Replication factor The following example uses `ALTER CLUSTER` to update the `REPLICATION FACTOR` of cluster `c1` to ``2``: ```mzsql ALTER CLUSTER c1 SET (REPLICATION FACTOR 2); ``` Increasing the `REPLICATION FACTOR` increases the cluster's [fault tolerance](#replication-factor-and-fault-tolerance), not its work capacity. ### Resizing You can alter the cluster size with **no downtime** (i.e., [zero-downtime cluster resizing](#zero-downtime-cluster-resizing)) by running the `ALTER CLUSTER` command with the `WAIT UNTIL READY` [option](#syntax): ```mzsql ALTER CLUSTER c1 SET (SIZE 'M.1-xsmall') WITH (WAIT UNTIL READY (TIMEOUT = '10m', ON TIMEOUT = 'COMMIT')); ``` > **Note:** Using `WAIT UNTIL READY` requires that the session remain open: you need to > make sure the Console tab remains open or that your `psql` connection remains > stable. > Any interruption will cause a cancellation, no cluster changes will take > effect. Alternatively, you can alter the cluster size immediately, without waiting, by running the `ALTER CLUSTER` command: ```mzsql ALTER CLUSTER c1 SET (SIZE 'M.1-xsmall'); ``` This will incur downtime when the cluster contains objects that need re-hydration before they are ready. This includes indexes, materialized views, and some types of sources. ### Schedule For use cases that require using [scheduled clusters](/sql/create-cluster/#scheduling), you can set or change the originally configured schedule and related options using the `ALTER CLUSTER` command. ```sql ALTER CLUSTER c1 SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour')); ``` See the reference documentation for [`CREATE CLUSTER`](../create-cluster/#scheduling) or [`CREATE MATERIALIZED VIEW`](../create-materialized-view/#refresh-strategies) for more details on scheduled clusters. ### Converting unmanaged to managed clusters > **Note:** When getting started with Materialize, we recommend using managed clusters. You > can convert any unmanaged clusters to managed clusters by following the > instructions below. Alter the `managed` status of a cluster to managed: ```mzsql ALTER CLUSTER c1 SET (MANAGED); ``` Materialize permits converting an unmanged cluster to a managed cluster if the following conditions are met: * The cluster replica names are `r1`, `r2`, ..., `rN`. * All replicas have the same size. * If there are no replicas, `SIZE` needs to be specified. * If specified, the replication factor must match the number of replicas. Note that the cluster will not have settings for the availability zones, and compute-specific settings. If needed, these can be set explicitly. ## See also - [`CREATE CLUSTER`](/sql/create-cluster/) - [`CREATE SINK`](/sql/create-sink/) - [`SHOW SINKS`](/sql/show-sinks) --- ## ALTER CLUSTER REPLICA Use `ALTER CLUSTER REPLICA` to: - Rename a cluster replica. - Change owner of a cluster replica. ## Syntax **Rename:** ### Rename To rename a cluster replica: ```mzsql ALTER CLUSTER REPLICA RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the cluster replica. | | `` | The new name of the cluster replica. | > **Note:** You cannot rename replicas in system clusters. **Change owner:** ### Change owner To change the owner of a cluster replica: ```mzsql ALTER CLUSTER REPLICA OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the cluster replica you want to change ownership of. | | `` | The new owner of the cluster replica. | To change the owner of a cluster replica, you must be the current owner and have membership in the ``. ## Privileges The privileges required to execute this statement are: - Ownership of the cluster replica. - In addition, to change owners: - Role membership in `new_owner`. - `CREATE` privileges on the containing cluster. ## Example The following changes the owner of the cluster replica `production.r1` to `admin`. The user running the command must: - Be the current owner; - Be a member of `admin`; and - Have `CREATE` privilege on the `production` cluster. ```mzsql ALTER CLUSTER REPLICA production.r1 OWNER TO admin; ``` --- ## ALTER CONNECTION Use `ALTER CONNECTION` to: - Modify the parameters of a connection, such as the hostname to which it points. - Rotate the key pairs associated with an [SSH tunnel connection]. - Rename a connection. - Change owner of a connection. ## Syntax **SET/DROP/RESET options:** ### SET/DROP/RESET options To modify connection parameters: ```mzsql ALTER CONNECTION [IF EXISTS] SET (

[AS ] [, ...] [WITH ()] ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the PostgreSQL/MySQL/SQL Server source you want to alter. | | `

` | The upstream table to add to the source. | | **AS** `` | Optional. The name for the subsource in Materialize. | | **WITH (TEXT COLUMNS (`` [, ...]))** | Optional. List of columns to decode as `text` for types that are unsupported in Materialize. | > **Note:** When you add a new subsource to an existing source ([`ALTER SOURCE ... ADD > SUBSOURCE ...`](/sql/alter-source/)), Materialize starts the snapshotting > process for the new subsource. During this snapshotting, the data ingestion for > the existing subsources for the same source is temporarily blocked. As such, if > possible, you can resize the cluster to speed up the snapshotting process and > once the process finishes, resize the cluster for steady-state. **Rename:** ### Rename To rename a source: ```mzsql ALTER SOURCE RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the source you want to alter. | | `` | The new name of the source. | See also [Renaming restrictions](/sql/identifiers/#renaming-restrictions). **Change owner:** ### Change owner To change the owner of a source: ```mzsql ALTER SOURCE OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the source you want to change ownership of. | | `` | The new owner of the source. | To change the owner of a source, you must be the owner of the source and have membership in the ``. See also [Privileges](#privileges). **(Re)Set retain history config:** ### (Re)Set retain history config To set the retention history for a source: ```mzsql ALTER SOURCE [IF EXISTS] SET (RETAIN HISTORY [=] FOR ); ``` | Syntax element | Description | | --- | --- | | `` | The name of the source you want to alter. | | `` | ***Private preview.** This option has known performance or stability issues and is under active development.* Duration for which Materialize retains historical data, which is useful to implement [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). Accepts positive [interval](/sql/types/interval/) values (e.g. `'1hr'`). Default: `1s`. | To reset the retention history to the default for a source: ```mzsql ALTER SOURCE [IF EXISTS] RESET (RETAIN HISTORY); ``` | Syntax element | Description | | --- | --- | | `` | The name of the source you want to alter. | **(Re)Set timestamp interval:** ### (Re)Set timestamp interval To set the timestamp interval for a source: ```mzsql ALTER SOURCE [IF EXISTS] SET (TIMESTAMP INTERVAL [=] ); ``` | Syntax element | Description | | --- | --- | | `` | The name of the source you want to alter. | | `` | The interval at which timestamps are assigned to the data read from this source. Accepts positive [interval](/sql/types/interval/) values (e.g. `'500ms'`, `'1s'`). The value must be between the system parameters `min_timestamp_interval` and `max_timestamp_interval`. Default: `1s`. | To reset the timestamp interval to the system default for a source: ```mzsql ALTER SOURCE [IF EXISTS] RESET (TIMESTAMP INTERVAL); ``` | Syntax element | Description | | --- | --- | | `` | The name of the source you want to alter. | ## Context ### Adding subsources to a PostgreSQL/MySQL/SQL Server source Note that using a combination of dropping and adding subsources lets you change the schema of the PostgreSQL/MySQL/SQL Server tables that are ingested. > **Important:** When you add a new subsource to an existing source ([`ALTER SOURCE ... ADD > SUBSOURCE ...`](/sql/alter-source/)), Materialize starts the snapshotting > process for the new subsource. During this snapshotting, the data ingestion for > the existing subsources for the same source is temporarily blocked. As such, if > possible, you can resize the cluster to speed up the snapshotting process and > once the process finishes, resize the cluster for steady-state. ### Dropping subsources from a PostgreSQL/MySQL/SQL Server source Dropping a subsource prevents Materialize from ingesting any data from it, in addition to dropping any state that Materialize previously had for the table (such as its contents). If a subsource encounters a deterministic error, such as an incompatible schema change (e.g. dropping an ingested column), you can drop the subsource. If you want to ingest it with its new schema, you can then add it as a new subsource. You cannot drop the "progress subsource". ## Examples ### Adding subsources ```mzsql ALTER SOURCE pg_src ADD SUBSOURCE tbl_a, tbl_b AS b WITH (TEXT COLUMNS [tbl_a.col]); ``` > **Important:** When you add a new subsource to an existing source ([`ALTER SOURCE ... ADD > SUBSOURCE ...`](/sql/alter-source/)), Materialize starts the snapshotting > process for the new subsource. During this snapshotting, the data ingestion for > the existing subsources for the same source is temporarily blocked. As such, if > possible, you can resize the cluster to speed up the snapshotting process and > once the process finishes, resize the cluster for steady-state. ### Dropping subsources To drop a subsource, use the [`DROP SOURCE`](/sql/drop-source/) command: ```mzsql DROP SOURCE tbl_a, b CASCADE; ``` ### Changing the timestamp interval To set a custom timestamp interval for a source: ```mzsql ALTER SOURCE kafka_src SET (TIMESTAMP INTERVAL = '500ms'); ``` To reset the timestamp interval to the system default: ```mzsql ALTER SOURCE kafka_src RESET (TIMESTAMP INTERVAL); ``` ## Privileges The privileges required to execute this statement are: - Ownership of the source being altered. - In addition, to change owners: - Role membership in `new_owner`. - `CREATE` privileges on the containing schema if the source is namespaced by a schema. ## See also - [`CREATE SOURCE`](/sql/create-source/) - [`DROP SOURCE`](/sql/drop-source/) - [`SHOW SOURCES`](/sql/show-sources) --- ## ALTER SYSTEM RESET Use `ALTER SYSTEM RESET` to globally restore the value of a configuration parameter to its default value. This command is an alternative spelling for [`ALTER SYSTEM SET...TO DEFAULT`](../alter-system-set). To see the current value of a configuration parameter, use [`SHOW`](../show). ## Syntax ```mzsql ALTER SYSTEM RESET ; ``` Syntax element | Description ---------------|------------ `` | The configuration parameter's name. ### Key configuration parameters Name | Default value | Description | Modifiable? --------------------------------------------|---------------------------|-----------------------------------------------------------------------|-------------- `cluster` | `quickstart` | The current cluster. | Yes `cluster_replica` | | The target cluster replica for `SELECT` queries. | Yes `database` | `materialize` | The current database. | Yes `search_path` | `public` | The schema search order for names that are not schema-qualified. | Yes `transaction_isolation` | `strict serializable` | The transaction isolation level. For more information, see [Consistency guarantees](/overview/isolation-level/).

Accepts values: `serializable`, `strict serializable`. | Yes ### Other configuration parameters Name | Default value | Description | Modifiable? --------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------- `allowed_cluster_replica_sizes` | *Varies* | The allowed sizes when creating a new cluster replica. | [Contact support] `application_name` | | The application name to be reported in statistics and logs. This parameter is typically set by an application upon connection to Materialize (e.g. `psql`). | Yes `auto_route_catalog_queries` | `true` | Boolean flag indicating whether to force queries that depend only on system tables to run on the `mz_catalog_server` cluster for improved performance. | Yes `client_encoding` | `UTF8` | The client's character set encoding. The only supported value is `UTF-8`. | Yes `client_min_messages` | `notice` | The message levels that are sent to the client.

Accepts values: `debug5`, `debug4`, `debug3`, `debug2`, `debug1`, `log`, `notice`, `warning`, `error`. Each level includes all the levels that follow it. | Yes `datestyle` | `ISO, MDY` | The display format for date and time values. The only supported value is `ISO, MDY`. | Yes `emit_introspection_query_notice` | `true` | Whether to print a notice when querying replica introspection relations. | Yes `emit_timestamp_notice` | `false` | Boolean flag indicating whether to send a `notice` specifying query timestamps. | Yes `emit_trace_id_notice` | `false` | Boolean flag indicating whether to send a `notice` specifying the trace ID, when available. | Yes `enable_rbac_checks` | `true` | Boolean flag indicating whether to apply RBAC checks before executing statements. | Yes `enable_session_rbac_checks` | `false` | Boolean flag indicating whether RBAC is enabled for the current session. | No `extra_float_digits` | `3` | Boolean flag indicating whether to adjust the number of digits displayed for floating-point values. | Yes `failpoints` | | Allows failpoints to be dynamically activated. | No `idle_in_transaction_session_timeout` | `120s` | The maximum allowed duration that a session can sit idle in a transaction before being terminated. If this value is specified without units, it is taken as milliseconds (`ms`). A value of zero disables the timeout. | Yes `integer_datetimes` | `true` | Boolean flag indicating whether the server uses 64-bit-integer dates and times. | No `intervalstyle` | `postgres` | The display format for interval values. The only supported value is `postgres`. | Yes `is_superuser` | | Reports whether the current session is a _superuser_ with admin privileges. | No `max_aws_privatelink_connections` | `0` | The maximum number of AWS PrivateLink connections in the region, across all schemas. | [Contact support] `max_clusters` | `10` | The maximum number of clusters in the region | [Contact support] `max_connections` | `5000` | The maximum number of concurrent connections in the region | [Contact support] `max_credit_consumption_rate` | `1024` | The maximum rate of credit consumption in a region. Credits are consumed based on the size of cluster replicas in use. | [Contact support] `max_databases` | `1000` | The maximum number of databases in the region. | [Contact support] `max_identifier_length` | `255` | The maximum length in bytes of object identifiers. | No `max_kafka_connections` | `1000` | The maximum number of Kafka connections in the region, across all schemas. | [Contact support] `max_mysql_connections` | `1000` | The maximum number of MySQL connections in the region, across all schemas. | [Contact support] `max_objects_per_schema` | `1000` | The maximum number of objects in a schema. | [Contact support] `max_postgres_connections` | `1000` | The maximum number of PostgreSQL connections in the region, across all schemas. | [Contact support] `max_query_result_size` | `1073741824` | The maximum size in bytes for a single query's result. | Yes `max_replicas_per_cluster` | `5` | The maximum number of replicas of a single cluster | [Contact support] `max_result_size` | `1 GiB` | The maximum size in bytes for a single query's result. | [Contact support] `max_roles` | `1000` | The maximum number of roles in the region. | [Contact support] `max_schemas_per_database` | `1000` | The maximum number of schemas in a database. | [Contact support] `max_secrets` | `100` | The maximum number of secrets in the region, across all schemas. | [Contact support] `max_sinks` | `1000` | The maximum number of sinks in the region, across all schemas. | [Contact support] `max_sources` | `25` | The maximum number of sources in the region, across all schemas. | [Contact support] `max_tables` | `200` | The maximum number of tables in the region, across all schemas | [Contact support] `mz_version` | Version-dependent | Shows the Materialize server version. | No `network_policy` | `default` | The default network policy for the region. | Yes `real_time_recency` | `false` | Boolean flag indicating whether [real-time recency](/get-started/isolation-level/#real-time-recency) is enabled for the current session. | [Contact support] `real_time_recency_timeout` | `10s` | Sets the maximum allowed duration of `SELECT` statements that actively use [real-time recency](/get-started/isolation-level/#real-time-recency). If this value is specified without units, it is taken as milliseconds (`ms`). | Yes `server_version_num` | Version-dependent | The PostgreSQL compatible server version as an integer. | No `server_version` | Version-dependent | The PostgreSQL compatible server version. | No `sql_safe_updates` | `false` | Boolean flag indicating whether to prohibit SQL statements that may be overly destructive. | Yes `standard_conforming_strings` | `true` | Boolean flag indicating whether ordinary string literals (`'...'`) should treat backslashes literally. The only supported value is `true`. | Yes `statement_timeout` | `10s` | The maximum allowed duration of the read portion of write operations; i.e., the `SELECT` portion of `INSERT INTO ... (SELECT ...)`; the `WHERE` portion of `UPDATE ... WHERE ...` and `DELETE FROM ... WHERE ...`. If this value is specified without units, it is taken as milliseconds (`ms`). | Yes `timezone` | `UTC` | The time zone for displaying and interpreting timestamps. The only supported value is `UTC`. | Yes [Contact support]: /support ## Privileges The privileges required to execute this statement are: - [_Superuser_ privileges](/security/cloud/users-service-accounts/#organization-roles) ## Related pages - [`SHOW`](../show) - [`ALTER SYSTEM SET`](../alter-system-set) --- ## ALTER SYSTEM SET Use `ALTER SYSTEM SET` to globally modify the value of a configuration parameter. To see the current value of a configuration parameter, use [`SHOW`](../show). ## Syntax ```mzsql ALTER SYSTEM SET [TO|=] ``` Syntax element | Description ---------------|------------ `` | The name of the configuration parameter to modify. `` | The value to assign to the configuration parameter. **DEFAULT** | Reset the configuration parameter's default value. Equivalent to [`ALTER SYSTEM RESET`](../alter-system-reset). ### Key configuration parameters Name | Default value | Description | Modifiable? --------------------------------------------|---------------------------|-----------------------------------------------------------------------|-------------- `cluster` | `quickstart` | The current cluster. | Yes `cluster_replica` | | The target cluster replica for `SELECT` queries. | Yes `database` | `materialize` | The current database. | Yes `search_path` | `public` | The schema search order for names that are not schema-qualified. | Yes `transaction_isolation` | `strict serializable` | The transaction isolation level. For more information, see [Consistency guarantees](/overview/isolation-level/).

Accepts values: `serializable`, `strict serializable`. | Yes ### Other configuration parameters Name | Default value | Description | Modifiable? --------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------- `allowed_cluster_replica_sizes` | *Varies* | The allowed sizes when creating a new cluster replica. | [Contact support] `application_name` | | The application name to be reported in statistics and logs. This parameter is typically set by an application upon connection to Materialize (e.g. `psql`). | Yes `auto_route_catalog_queries` | `true` | Boolean flag indicating whether to force queries that depend only on system tables to run on the `mz_catalog_server` cluster for improved performance. | Yes `client_encoding` | `UTF8` | The client's character set encoding. The only supported value is `UTF-8`. | Yes `client_min_messages` | `notice` | The message levels that are sent to the client.

Accepts values: `debug5`, `debug4`, `debug3`, `debug2`, `debug1`, `log`, `notice`, `warning`, `error`. Each level includes all the levels that follow it. | Yes `datestyle` | `ISO, MDY` | The display format for date and time values. The only supported value is `ISO, MDY`. | Yes `emit_introspection_query_notice` | `true` | Whether to print a notice when querying replica introspection relations. | Yes `emit_timestamp_notice` | `false` | Boolean flag indicating whether to send a `notice` specifying query timestamps. | Yes `emit_trace_id_notice` | `false` | Boolean flag indicating whether to send a `notice` specifying the trace ID, when available. | Yes `enable_rbac_checks` | `true` | Boolean flag indicating whether to apply RBAC checks before executing statements. | Yes `enable_session_rbac_checks` | `false` | Boolean flag indicating whether RBAC is enabled for the current session. | No `extra_float_digits` | `3` | Boolean flag indicating whether to adjust the number of digits displayed for floating-point values. | Yes `failpoints` | | Allows failpoints to be dynamically activated. | No `idle_in_transaction_session_timeout` | `120s` | The maximum allowed duration that a session can sit idle in a transaction before being terminated. If this value is specified without units, it is taken as milliseconds (`ms`). A value of zero disables the timeout. | Yes `integer_datetimes` | `true` | Boolean flag indicating whether the server uses 64-bit-integer dates and times. | No `intervalstyle` | `postgres` | The display format for interval values. The only supported value is `postgres`. | Yes `is_superuser` | | Reports whether the current session is a _superuser_ with admin privileges. | No `max_aws_privatelink_connections` | `0` | The maximum number of AWS PrivateLink connections in the region, across all schemas. | [Contact support] `max_clusters` | `10` | The maximum number of clusters in the region | [Contact support] `max_connections` | `5000` | The maximum number of concurrent connections in the region | [Contact support] `max_credit_consumption_rate` | `1024` | The maximum rate of credit consumption in a region. Credits are consumed based on the size of cluster replicas in use. | [Contact support] `max_databases` | `1000` | The maximum number of databases in the region. | [Contact support] `max_identifier_length` | `255` | The maximum length in bytes of object identifiers. | No `max_kafka_connections` | `1000` | The maximum number of Kafka connections in the region, across all schemas. | [Contact support] `max_mysql_connections` | `1000` | The maximum number of MySQL connections in the region, across all schemas. | [Contact support] `max_objects_per_schema` | `1000` | The maximum number of objects in a schema. | [Contact support] `max_postgres_connections` | `1000` | The maximum number of PostgreSQL connections in the region, across all schemas. | [Contact support] `max_query_result_size` | `1073741824` | The maximum size in bytes for a single query's result. | Yes `max_replicas_per_cluster` | `5` | The maximum number of replicas of a single cluster | [Contact support] `max_result_size` | `1 GiB` | The maximum size in bytes for a single query's result. | [Contact support] `max_roles` | `1000` | The maximum number of roles in the region. | [Contact support] `max_schemas_per_database` | `1000` | The maximum number of schemas in a database. | [Contact support] `max_secrets` | `100` | The maximum number of secrets in the region, across all schemas. | [Contact support] `max_sinks` | `1000` | The maximum number of sinks in the region, across all schemas. | [Contact support] `max_sources` | `25` | The maximum number of sources in the region, across all schemas. | [Contact support] `max_tables` | `200` | The maximum number of tables in the region, across all schemas | [Contact support] `mz_version` | Version-dependent | Shows the Materialize server version. | No `network_policy` | `default` | The default network policy for the region. | Yes `real_time_recency` | `false` | Boolean flag indicating whether [real-time recency](/get-started/isolation-level/#real-time-recency) is enabled for the current session. | [Contact support] `real_time_recency_timeout` | `10s` | Sets the maximum allowed duration of `SELECT` statements that actively use [real-time recency](/get-started/isolation-level/#real-time-recency). If this value is specified without units, it is taken as milliseconds (`ms`). | Yes `server_version_num` | Version-dependent | The PostgreSQL compatible server version as an integer. | No `server_version` | Version-dependent | The PostgreSQL compatible server version. | No `sql_safe_updates` | `false` | Boolean flag indicating whether to prohibit SQL statements that may be overly destructive. | Yes `standard_conforming_strings` | `true` | Boolean flag indicating whether ordinary string literals (`'...'`) should treat backslashes literally. The only supported value is `true`. | Yes `statement_timeout` | `10s` | The maximum allowed duration of the read portion of write operations; i.e., the `SELECT` portion of `INSERT INTO ... (SELECT ...)`; the `WHERE` portion of `UPDATE ... WHERE ...` and `DELETE FROM ... WHERE ...`. If this value is specified without units, it is taken as milliseconds (`ms`). | Yes `timezone` | `UTC` | The time zone for displaying and interpreting timestamps. The only supported value is `UTC`. | Yes [Contact support]: /support ## Privileges The privileges required to execute this statement are: - [_Superuser_ privileges](/security/cloud/users-service-accounts/#organization-roles) ## Related pages - [`ALTER SYSTEM RESET`](../alter-system-reset) - [`SHOW`](../show) --- ## ALTER TABLE Use `ALTER TABLE` to: - Rename a table. - Change owner of a table. - Change retain history configuration for the table. ## Syntax **Rename:** ### Rename To rename a table: ```mzsql ALTER TABLE RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the table you want to alter. | | `` | The new name of the table. | See also [Renaming restrictions](/sql/identifiers/#renaming-restrictions). **Change owner:** ### Change owner To change the owner of a table: ```mzsql ALTER TABLE OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the table you want to change ownership of. | | `` | The new owner of the table. | To change the owner of a table, you must be the owner of the table and have membership in the ``. See also [Privileges](#privileges). **(Re)Set retain history config:** ### (Re)Set retain history config To set the retention history for a user-populated table: ```mzsql ALTER TABLE SET (RETAIN HISTORY [=] FOR ); ``` | Syntax element | Description | | --- | --- | | `` | The name of the table you want to alter. | | `` | ***Private preview.** This option has known performance or stability issues and is under active development.* Duration for which Materialize retains historical data, which is useful to implement [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). Accepts positive [interval](/sql/types/interval/) values (e.g. `'1hr'`). Default: `1s`. | To reset the retention history to the default for a user-populated table: ```mzsql ALTER TABLE RESET (RETAIN HISTORY); ``` | Syntax element | Description | | --- | --- | | `` | The name of the table you want to alter. | ## Privileges The privileges required to execute this statement are: - Ownership of the table being altered. - In addition, to change owners: - Role membership in `new_owner`. - `CREATE` privileges on the containing schema if the table is namespaced by a schema. --- ## ALTER TYPE Use `ALTER TYPE` to: - Rename a type. - Change owner of a type. ## Syntax **Rename:** ### Rename To rename a type: ```mzsql ALTER TYPE RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the type. | | `` | The new name of the type. | See also [Renaming restrictions](/sql/identifiers/#renaming-restrictions). **Change owner:** ### Change owner To change the owner of a type: ```mzsql ALTER TYPE OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the type you want to change ownership of. | | `` | The new owner of the type. | To change the owner of a type, you must be the current owner and have membership in the ``. ## Privileges The privileges required to execute this statement are: - Ownership of the type being altered. - In addition, to change owners: - Role membership in `new_owner`. - `CREATE` privileges on the containing schema if the type is namespaced by a schema. --- ## ALTER VIEW Use `ALTER VIEW` to: - Rename a view. - Change owner of a view. ## Syntax **Rename:** ### Rename To rename a view: ```mzsql ALTER VIEW RENAME TO ; ``` | Syntax element | Description | | --- | --- | | `` | The current name of the view. | | `` | The new name of the view. | See also [Renaming restrictions](/sql/identifiers/#renaming-restrictions). **Change owner:** ### Change owner To change the owner of a view: ```mzsql ALTER VIEW OWNER TO ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the view you want to change ownership of. | | `` | The new owner of the view. | To change the owner of a view, you must be the current owner and have membership in the ``. ## Privileges The privileges required to execute this statement are: - Ownership of the view being altered. - In addition, to change owners: - Role membership in `new_owner`. - `CREATE` privileges on the containing schema if the view is namespaced by a schema. --- ## BEGIN

BEGIN starts a transaction block. Once a transaction is started:

Statements within the transaction are executed sequentially.
A transaction ends with either a COMMIT or a ROLLBACK statement.
- If all transaction statements succeed and a COMMIT is issued, all changes are saved.
- If all transaction statements succeed and a ROLLBACK is issued, all changes are discarded.
- If an error occurs and either a COMMIT or a ROLLBACK is issued, all changes are discarded.

Materialize only supports [**read-only** transactions](#read-only-transactions) or [**write-only** (specifically, insert-only) transactions](#write-only-transactions). See [Details](#details) for more information. ## Syntax ```mzsql BEGIN [

Setting	Value
delimiter	`,`
quote	`"`
escape	`"`
header	`false`

### Copy to S3: Parquet {#copy-to-s3-parquet} #### Writer settings

For 'parquet' format, Materialize writes Parquet files that aim for maximum compatibility with downstream systems. The following Parquet writer settings are used:

Setting	Value
Writer version	1.0
Compression	`snappy`
Default column encoding	Dictionary
Fallback column encoding	Plain
Dictionary page encoding	Plain
Dictionary data page encoding	`RLE_DICTIONARY`

If you encounter issues trying to ingest Parquet files produced by Materialize into your downstream systems, please contact our team.

#### Parquet data types

When using the parquet format, Materialize converts the values in the result set to Apache Arrow, and then serializes this Arrow representation to Parquet. The Arrow schema is embedded in the Parquet file metadata and allows reconstructing the Arrow representation using a compatible reader.

Materialize also includes Parquet LogicalType annotations where possible. However, many newer LogicalType annotations are not supported in the 1.0 writer version.

Materialize also embeds its own type information into the Apache Arrow schema. The field metadata in the schema contains an ARROW:extension:name annotation to indicate the Materialize native type the field originated from.

Materialize type	Arrow extension name	Arrow type	Parquet primitive type	Parquet logical type
`bigint`	`materialize.v1.bigint`	`int64`	`INT64`
`boolean`	`materialize.v1.boolean`	`bool`	`BOOLEAN`
`bytea`	`materialize.v1.bytea`	`large_binary`	`BYTE_ARRAY`
`date`	`materialize.v1.date`	`date32`	`INT32`	`DATE`
`double precision`	`materialize.v1.double`	`float64`	`DOUBLE`
`integer`	`materialize.v1.integer`	`int32`	`INT32`
`jsonb`	`materialize.v1.jsonb`	`large_utf8`	`BYTE_ARRAY`
`map`	`materialize.v1.map`	`map` (`struct` with fields `keys` and `values`)	Nested	`MAP`
`list`	`materialize.v1.list`	`list`	Nested
`numeric`	`materialize.v1.numeric`	`decimal128[38, 10 or max-scale]`	`FIXED_LEN_BYTE_ARRAY`	`DECIMAL`
`real`	`materialize.v1.real`	`float32`	`FLOAT`
`smallint`	`materialize.v1.smallint`	`int16`	`INT32`	`INT(16, true)`
`text`	`materialize.v1.text`	`utf8` or `large_utf8`	`BYTE_ARRAY`	`STRING`
`time`	`materialize.v1.time`	`time64[nanosecond]`	`INT64`	`TIME[isAdjustedToUTC = false, unit = NANOS]`
`uint2`	`materialize.v1.uint2`	`uint16`	`INT32`	`INT(16, false)`
`uint4`	`materialize.v1.uint4`	`uint32`	`INT32`	`INT(32, false)`
`uint8`	`materialize.v1.uint8`	`uint64`	`INT64`	`INT(64, false)`
`timestamp`	`materialize.v1.timestamp`	`time64[microsecond]`	`INT64`	`TIMESTAMP[isAdjustedToUTC = false, unit = MICROS]`
`timestamp with time zone`	`materialize.v1.timestampz`	`time64[microsecond]`	`INT64`	`TIMESTAMP[isAdjustedToUTC = true, unit = MICROS]`
Arrays (`[]`)	`materialize.v1.array`	`struct` with `list` field `items` and `uint8` field `dimensions`	Nested
`uuid`	`materialize.v1.uuid`	`fixed_size_binary(16)`	`FIXED_LEN_BYTE_ARRAY`
`oid`	Unsupported
`interval`	Unsupported
`record`	Unsupported

## Privileges The privileges required to execute this statement are: - `USAGE` privileges on the schemas that all relations and types in the query are contained in. - `SELECT` privileges on all relations in the query. - NOTE: if any item is a view, then the view owner must also have the necessary privileges to execute the view definition. Even if the view owner is a _superuser_, they still must explicitly be granted the necessary privileges. - `USAGE` privileges on all types used in the query. - `USAGE` privileges on the active cluster. ## Examples ### Copy to stdout {#copy-to-stdout-examples} ```mzsql COPY (SUBSCRIBE some_view) TO STDOUT WITH (FORMAT binary); ``` ### Copy to S3 {#copy-to-s3-examples} #### File format Parquet ```mzsql COPY some_view TO 's3://mz-to-snow/parquet/' WITH ( AWS CONNECTION = aws_role_assumption, FORMAT = 'parquet' ); ```

For 'parquet' format, Materialize writes Parquet files that aim for maximum compatibility with downstream systems. The following Parquet writer settings are used:

Setting	Value
Writer version	1.0
Compression	`snappy`
Default column encoding	Dictionary
Fallback column encoding	Plain
Dictionary page encoding	Plain
Dictionary data page encoding	`RLE_DICTIONARY`

If you encounter issues trying to ingest Parquet files produced by Materialize into your downstream systems, please contact our team.

See also [Copy to S3: Parquet Data Types](#parquet-data-types). #### File format CSV ```mzsql COPY some_view TO 's3://mz-to-snow/csv/' WITH ( AWS CONNECTION = aws_role_assumption, FORMAT = 'csv' ); ```

For 'csv' format, Materialize writes CSV files using the following writer settings:

Setting	Value
delimiter	`,`
quote	`"`
escape	`"`
header	`false`

## Related pages - [`CREATE CONNECTION`](/sql/create-connection) - Integration guides: - [Amazon S3](/serve-results/s3/) - [Snowflake (via S3)](/serve-results/snowflake/) --- ## CREATE CLUSTER `CREATE CLUSTER` creates a new [cluster](/concepts/clusters/). ## Syntax ```mzsql CREATE CLUSTER ( SIZE = [, REPLICATION FACTOR = ] [, MANAGED = ] [, SCHEDULE = MANUAL|ON REFRESH(...)] ); ``` | Syntax element | Description | | --- | --- | | `` | A name for the cluster. | | `SIZE` | The size of the resource allocations for the cluster. {{< yaml-list column="Cluster size" data="m1_cluster_sizing" numColumns="3" >}} See [Size](#size) for details as well as legacy sizes available. | | `REPLICATION FACTOR` | Optional. The number of replicas to provision for the cluster. See [Replication factor](#replication-factor) for details. Default: `1` | | `MANAGED` | Optional. Whether to automatically manage the cluster's replicas based on the configured size and replication factor. Specify `FALSE` to create an **unmanaged** cluster. With unmanaged clusters, you need to manually manage the cluster's replicas using the the [`CREATE CLUSTER REPLICA`](/sql/create-cluster-replica) and [`DROP CLUSTER REPLICA`](/sql/drop-cluster-replica) commands. When creating an unmanaged cluster, you must specify the `REPLICAS` option as well. {{< tip >}} When getting started with Materialize, we recommend starting with managed clusters. {{}} Default: `TRUE` | | `SCHEDULE` | Optional. The [scheduling type](#scheduling) for the cluster. Valid values are: - `MANUAL` - `ON REFRESH` Default: `MANUAL` | ## Details ### Initial state Each Materialize region initially contains a [pre-installed cluster](/sql/show-clusters/#pre-installed-clusters) named `quickstart` with a size of `25cc` and a replication factor of `1`. You can drop or alter this cluster to suit your needs. ### Choosing a cluster When performing an operation that requires a cluster, you must specify which cluster you want to use. Not explicitly naming a cluster uses your session's active cluster. To show your session's active cluster, use the [`SHOW`](/sql/show) command: ```mzsql SHOW cluster; ``` To switch your session's active cluster, use the [`SET`](/sql/set) command: ```mzsql SET cluster = other_cluster; ``` ### Resource isolation Clusters provide **resource isolation.** Each cluster provisions a dedicated pool of CPU, memory, and, optionally, scratch disk space. All workloads on a given cluster will compete for access to these compute resources. However, workloads on different clusters are strictly isolated from one another. A given workload has access only to the CPU, memory, and scratch disk of the cluster that it is running on. Clusters are commonly used to isolate different classes of workloads. For example, you could place your development workloads in a cluster named `dev` and your production workloads in a cluster named `prod`. ### Size The `SIZE` option determines the amount of compute resources available to the cluster. **M.1 Clusters:** > **Note:** The values set forth in the table are solely for illustrative purposes. > Materialize reserves the right to change the capacity at any time. As such, you > acknowledge and agree that those values in this table may change at any time, > and you should not rely on these values for any capacity planning. | Cluster size | Compute Credits/Hour | Total Capacity | Notes | | --- | --- | --- | --- | | M.1-nano | 0.75 | 26 GiB | | | M.1-micro | 1.5 | 53 GiB | | | M.1-xsmall | 3 | 106 GiB | | | M.1-small | 6 | 212 GiB | | | M.1-medium | 9 | 318 GiB | | | M.1-large | 12 | 424 GiB | | | M.1-1.5xlarge | 18 | 636 GiB | | | M.1-2xlarge | 24 | 849 GiB | | | M.1-3xlarge | 36 | 1273 GiB | | | M.1-4xlarge | 48 | 1645 GiB | | | M.1-8xlarge | 96 | 3290 GiB | | | M.1-16xlarge | 192 | 6580 GiB | Available upon request | | M.1-32xlarge | 384 | 13160 GiB | Available upon request | | M.1-64xlarge | 768 | 26320 GiB | Available upon request | | M.1-128xlarge | 1536 | 52640 GiB | Available upon request | **Legacy cc Clusters:** Materialize offers the following legacy cc cluster sizes: > **Tip:** In most cases, you **should not** use legacy sizes. [M.1 sizes](#size) > offer better performance per credit for nearly all workloads. We recommend using > M.1 sizes for all new clusters, and recommend migrating existing > legacy-sized clusters to M.1 sizes. Materialize is committed to supporting > customers during the transition period as we move to deprecate legacy sizes. > The legacy size information is provided for completeness. * `25cc` * `50cc` * `100cc` * `200cc` * `300cc` * `400cc` * `600cc` * `800cc` * `1200cc` * `1600cc` * `3200cc` * `6400cc` * `128C` * `256C` * `512C` The resource allocations are proportional to the number in the size name. For example, a cluster of size `600cc` has 2x as much CPU, memory, and disk as a cluster of size `300cc`, and 1.5x as much CPU, memory, and disk as a cluster of size `400cc`. To determine the specific resource allocations for a size, query the [`mz_cluster_replica_sizes`](/reference/system-catalog/mz_catalog/#mz_cluster_replica_sizes) table. > **Warning:** The values in the `mz_cluster_replica_sizes` table may change at any > time. You should not rely on them for any kind of capacity planning. Clusters of larger sizes can process data faster and handle larger data volumes. **Legacy t-shirt Clusters:** Materialize also offers some legacy t-shirt cluster sizes for upsert sources. > **Tip:** In most cases, you **should not** use legacy t-shirt sizes. [M.1 sizes](#size) > offer better performance per credit for nearly all workloads. We recommend using > M.1 sizes for all new clusters, and recommend migrating existing > legacy-sized clusters to M.1 sizes. Materialize is committed to supporting > customers during the transition period as we move to deprecate legacy sizes. > The legacy size information is provided for completeness.

Warning: Materialize regions that were enabled after 15 April 2024 do not have access to legacy sizes.

When legacy sizes are enabled for a region, the following sizes are available: * `3xsmall` * `2xsmall` * `xsmall` * `small` * `medium` * `large` * `xlarge` * `2xlarge` * `3xlarge` * `4xlarge` * `5xlarge` * `6xlarge` See also: - [M.1 to cc size mapping](/sql/m1-cc-mapping/). - [Materialize service consumption table](https://materialize.com/pdfs/pricing.pdf). - [Blog:Scaling Beyond Memory: How Materialize Uses Swap for Larger Workloads](https://materialize.com/blog/scaling-beyond-memory/). #### Cluster resizing You can change the size of a cluster to respond to changes in your workload using [`ALTER CLUSTER`](/sql/alter-cluster). Depending on the type of objects the cluster is hosting, this operation **might incur downtime**. See the reference documentation for [`ALTER CLUSTER`](/sql/alter-cluster#zero-downtime-cluster-resizing) for more details on cluster resizing. ### Replication factor The `REPLICATION FACTOR` option determines the number of replicas provisioned for the cluster. Each replica of the cluster provisions a new pool of compute resources to perform exactly the same computations on exactly the same data. Provisioning more than one replica improves **fault tolerance**. Clusters with multiple replicas can tolerate failures of the underlying hardware that cause a replica to become unreachable. As long as one replica of the cluster remains available, the cluster can continue to maintain dataflows and serve queries. Materialize makes the following guarantees when provisioning replicas: - Replicas of a given cluster are never provisioned on the same underlying hardware. - Replicas of a given cluster are spread as evenly as possible across the underlying cloud provider's availability zones. Materialize automatically assigns names to replicas like `r1`, `r2`, etc. You can view information about individual replicas in the console and the system catalog, but you cannot directly modify individual replicas. You can pause a cluster's work by specifying a replication factor of `0`. Doing so removes all replicas of the cluster. Any indexes, materialized views, sources, and sinks on the cluster will cease to make progress, and any queries directed to the cluster will block. You can later resume the cluster's work by using [`ALTER CLUSTER`] to set a nonzero replication factor. > **Note:** A common misconception is that increasing a cluster's replication > factor will increase its capacity for work. This is not the case. Increasing > the replication factor increases the **fault tolerance** of the cluster, not its > capacity for work. Replicas are exact copies of one another: each replica must > do exactly the same work (i.e., maintain the same dataflows and process the same > queries) as all the other replicas of the cluster. > To increase a cluster's capacity, you should instead increase the cluster's > [size](#size). ### Credit usage Each [replica](#replication-factor) of the cluster consumes credits at a rate determined by the cluster's size: Size | Legacy size | Credits per replica per hour ----------|--------------|----------------------------- `25cc` | `3xsmall` | 0.25 `50cc` | `2xsmall` | 0.5 `100cc` | `xsmall` | 1 `200cc` | `small` | 2 `300cc` | | 3 `400cc` | `medium` | 4 `600cc` | | 6 `800cc` | `large` | 8 `1200cc` | | 12 `1600cc` | `xlarge` | 16 `3200cc` | `2xlarge` | 32 `6400cc` | `3xlarge` | 64 `128C` | `4xlarge` | 128 `256C` | `5xlarge` | 256 `512C` | `6xlarge` | 512 Credit usage is measured at a one second granularity. For a given replica, credit usage begins when a `CREATE CLUSTER` or [`ALTER CLUSTER`] statement provisions the replica and ends when an [`ALTER CLUSTER`] or [`DROP CLUSTER`] statement deprovisions the replica. A cluster with a [replication factor](#replication-factor) of zero uses no credits. As an example, consider the following sequence of events: Time | Event --------------------|--------------------------------------------------------- 2023-08-29 3:45:00 | `CREATE CLUSTER c (SIZE '400cc', REPLICATION FACTOR 2`) 2023-08-29 3:45:45 | `ALTER CLUSTER c SET (REPLICATION FACTOR 1)` 2023-08-29 3:47:15 | `DROP CLUSTER c` Cluster `c` will have consumed 0.4 credits in total: * Replica `c.r1` was provisioned from 3:45:00 to 3:47:15, consuming 0.3 credits. * Replica `c.r2` was provisioned from 3:45:00 to 3:45:45, consuming 0.1 credits. ### Scheduling To support [scheduled refreshes in materialized views](../create-materialized-view/#refresh-strategies), you can configure a cluster to automatically turn on and off using the `SCHEDULE...ON REFRESH` syntax. ```mzsql CREATE CLUSTER my_scheduled_cluster ( SIZE = 'M.1-large', SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour') ); ``` Scheduled clusters should **only** contain materialized views configured with a non-default [refresh strategy](../create-materialized-view/#refresh-strategies) (and any indexes built on these views). These clusters will automatically turn on (i.e., be provisioned with compute resources) based on the configured refresh strategies, and **only** consume credits for the duration of the refreshes. It's not possible to manually turn on a cluster with `ON REFRESH` scheduling. If you need to turn on a cluster outside its schedule, you can temporarily disable scheduling and provision compute resources using [`ALTER CLUSTER`](../alter-cluster/#schedule): ```mzsql ALTER CLUSTER my_scheduled_cluster SET (SCHEDULE = MANUAL, REPLICATION FACTOR = 1); ``` To re-enable scheduling: ```mzsql ALTER CLUSTER my_scheduled_cluster SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour')); ``` #### Hydration time estimate

Syntax: HYDRATION TIME ESTIMATE interval

By default, scheduled clusters will turn on at the scheduled refresh time. To avoid [unavailability of the objects scheduled for refresh](/sql/create-materialized-view/#querying-materialized-views-with-refresh-strategies) during the refresh operation, we recommend turning the cluster on ahead of the scheduled time to allow hydration to complete. This can be controlled using the `HYDRATION TIME ESTIMATE` clause. #### Scheduling strategy To check the scheduling strategy associated with a cluster, you can query the [`mz_internal.mz_cluster_schedules`](/reference/system-catalog/mz_internal/#mz_cluster_schedules) system catalog table: ```mzsql SELECT c.id AS cluster_id, c.name AS cluster_name, cs.type AS schedule_type, cs.refresh_hydration_time_estimate FROM mz_internal.mz_cluster_schedules cs JOIN mz_clusters c ON cs.cluster_id = c.id WHERE c.name = 'my_refresh_cluster'; ``` To check if a scheduled cluster is turned on, you can query the [`mz_catalog.mz_cluster_replicas`](/reference/system-catalog/mz_catalog/#mz_cluster_replicas) system catalog table: ```mzsql SELECT cs.cluster_id, -- A cluster with scheduling is "on" when it has compute resources -- (i.e. a replica) attached. CASE WHEN cr.id IS NOT NULL THEN true ELSE false END AS is_on FROM mz_internal.mz_cluster_schedules cs JOIN mz_clusters c ON cs.cluster_id = c.id AND cs.type = 'on-refresh' LEFT JOIN mz_cluster_replicas cr ON c.id = cr.cluster_id; ``` You can also use the [audit log](/reference/system-catalog/mz_catalog/#mz_audit_events) to observe the commands that are automatically run when a scheduled cluster is turned on and off for materialized view refreshes: ```mzsql SELECT * FROM mz_audit_events WHERE object_type = 'cluster-replica' ORDER BY occurred_at DESC; ``` Any commands attributed to scheduled refreshes will be marked with `"reason":"schedule"` under the `details` column. ### Known limitations Clusters have several known limitations: * When a cluster using legacy cc size of `3200cc` or larger uses multiple replicas, those replicas are not guaranteed to be spread evenly across the underlying cloud provider's availability zones. ## Examples ### Basic Create a cluster with two `M.1-large` replicas: ```mzsql CREATE CLUSTER c1 (SIZE = 'M.1-large', REPLICATION FACTOR = 2); ``` ### Empty Create a cluster with no replicas: ```mzsql CREATE CLUSTER c1 (SIZE 'M.1-xsmall', REPLICATION FACTOR = 0); ``` You can later add replicas to this cluster with [`ALTER CLUSTER`]. ## Privileges The privileges required to execute this statement are: - `CREATECLUSTER` privileges on the system. ## See also - [`ALTER CLUSTER`] - [`DROP CLUSTER`] [AWS availability zone IDs]: https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html [`ALTER CLUSTER`]: /sql/alter-cluster/ [`DROP CLUSTER`]: /sql/drop-cluster/ [`SELECT`]: /sql/select [`SUBSCRIBE`]: /sql/subscribe [`mz_cluster_replica_sizes`]: /reference/system-catalog/mz_catalog#mz_cluster_replica_sizes --- ## CREATE CLUSTER REPLICA `CREATE CLUSTER REPLICA` provisions a new replica for an [**unmanaged** cluster](/sql/create-cluster/#unmanaged-clusters). > **Tip:** When getting started with Materialize, we recommend starting with managed > clusters. ## Syntax ```mzsql CREATE CLUSTER REPLICA . ( SIZE = ); ``` | Syntax element | Description | | --- | --- | | `` | The cluster you want to attach a replica to. | | `` | A name for this replica. | | `SIZE` | The size of the resource allocations for the cluster. {{< yaml-list column="Cluster size" data="m1_cluster_sizing" numColumns="3" >}} See [Size](#size) for details as well as legacy sizes available. | ## Details ### Size The `SIZE` option for replicas is identical to the [`SIZE` option for clusters](/sql/create-cluster/#size) option, except that the size applies only to the new replica. **M.1 Clusters:** > **Note:** The values set forth in the table are solely for illustrative purposes. > Materialize reserves the right to change the capacity at any time. As such, you > acknowledge and agree that those values in this table may change at any time, > and you should not rely on these values for any capacity planning. | Cluster size | Compute Credits/Hour | Total Capacity | Notes | | --- | --- | --- | --- | | M.1-nano | 0.75 | 26 GiB | | | M.1-micro | 1.5 | 53 GiB | | | M.1-xsmall | 3 | 106 GiB | | | M.1-small | 6 | 212 GiB | | | M.1-medium | 9 | 318 GiB | | | M.1-large | 12 | 424 GiB | | | M.1-1.5xlarge | 18 | 636 GiB | | | M.1-2xlarge | 24 | 849 GiB | | | M.1-3xlarge | 36 | 1273 GiB | | | M.1-4xlarge | 48 | 1645 GiB | | | M.1-8xlarge | 96 | 3290 GiB | | | M.1-16xlarge | 192 | 6580 GiB | Available upon request | | M.1-32xlarge | 384 | 13160 GiB | Available upon request | | M.1-64xlarge | 768 | 26320 GiB | Available upon request | | M.1-128xlarge | 1536 | 52640 GiB | Available upon request | **Legacy cc Clusters:** Materialize offers the following legacy cc cluster sizes: > **Tip:** In most cases, you **should not** use legacy sizes. [M.1 sizes](#size) > offer better performance per credit for nearly all workloads. We recommend using > M.1 sizes for all new clusters, and recommend migrating existing > legacy-sized clusters to M.1 sizes. Materialize is committed to supporting > customers during the transition period as we move to deprecate legacy sizes. > The legacy size information is provided for completeness. * `25cc` * `50cc` * `100cc` * `200cc` * `300cc` * `400cc` * `600cc` * `800cc` * `1200cc` * `1600cc` * `3200cc` * `6400cc` * `128C` * `256C` * `512C` The resource allocations are proportional to the number in the size name. For example, a cluster of size `600cc` has 2x as much CPU, memory, and disk as a cluster of size `300cc`, and 1.5x as much CPU, memory, and disk as a cluster of size `400cc`. To determine the specific resource allocations for a size, query the [`mz_cluster_replica_sizes`](/reference/system-catalog/mz_catalog/#mz_cluster_replica_sizes) table. > **Warning:** The values in the `mz_cluster_replica_sizes` table may change at any > time. You should not rely on them for any kind of capacity planning. Clusters of larger sizes can process data faster and handle larger data volumes. See also: - [M.1 to cc size mapping](/sql/m1-cc-mapping/). - [Materialize service consumption table](https://materialize.com/pdfs/pricing.pdf). - [Blog:Scaling Beyond Memory: How Materialize Uses Swap for Larger Workloads](https://materialize.com/blog/scaling-beyond-memory/). ### Homogeneous vs. heterogeneous hardware provisioning Because Materialize uses active replication, all replicas will be instructed to do the same work, irrespective of their resource allocation. For the most stable performance, we recommend using the same size and disk configuration for all replicas. However, it is possible to use different replica configurations in the same cluster. In these cases, the replicas with less resources will likely be continually burdened with a backlog of work. If all of the faster replicas become unreachable, the system might experience delays in replying to requests while the slower replicas catch up to the last known time that the faster machines had computed. ## Example ```mzsql CREATE CLUSTER REPLICA c1.r1 (SIZE = 'M.1-large'); ``` ## Privileges The privileges required to execute this statement are: - Ownership of the cluster. ## See also - [`DROP CLUSTER REPLICA`] [AWS availability zone ID]: https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html [`DROP CLUSTER REPLICA`]: /sql/drop-cluster-replica --- ## CREATE CONNECTION [//]: # "TODO: This page could be broken up." A connection describes how to connect and authenticate to an external system you want Materialize to read from or write to. Once created, a connection is **reusable** across multiple [`CREATE SOURCE`](/sql/create-source) and [`CREATE SINK`](/sql/create-sink) statements. To use credentials that contain sensitive information (like passwords and SSL keys) in a connection, you must first [create secrets](/sql/create-secret) to securely store each credential in Materialize's secret management system. Credentials that are generally not sensitive (like usernames and SSL certificates) can be specified as plain `text`, or also stored as secrets. > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. ## Source and sink connections ### AWS An Amazon Web Services (AWS) connection provides Materialize with access to an Identity and Access Management (IAM) user or role in your AWS account. You can use AWS connections to perform [bulk exports to Amazon S3](/serve-results/s3/), perform [authentication with an Amazon MSK cluster](#kafka-aws-connection), or perform [authentication with an Amazon RDS MySQL database](#mysql-aws-connection). ```mzsql CREATE CONNECTION TO AWS ( ENDPOINT = '', REGION = '', ACCESS KEY ID = { '' | SECRET }, SECRET ACCESS KEY = SECRET , SESSION TOKEN = { '' | SECRET }, ASSUME ROLE ARN = '', ASSUME ROLE SESSION NAME = '' ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `ENDPOINT` | *Value:* `text` *Advanced.* Override the default AWS endpoint URL. Allows targeting S3-compatible services like MinIO. | | `REGION` | *Value:* `text` *For Materialize Cloud only* The AWS region to connect to. Defaults to the current Materialize region. | | `ACCESS KEY ID` | *Value:* secret or `text` The access key ID to connect with. Triggers credentials-based authentication. **Warning!** Use of credentials-based authentication is deprecated. AWS strongly encourages the use of role assumption-based authentication instead. | | `SECRET ACCESS KEY` | *Value:* secret The secret access key corresponding to the specified access key ID. Required and only valid when `ACCESS KEY ID` is specified. | | `SESSION TOKEN` | *Value:* secret or `text` The session token corresponding to the specified access key ID. Only valid when `ACCESS KEY ID` is specified. | | `ASSUME ROLE ARN` | *Value:* `text` The Amazon Resource Name (ARN) of the IAM role to assume. Triggers role assumption-based authentication. | | `ASSUME ROLE SESSION NAME` | *Value:* `text` The session name to use when assuming the role. Only valid when `ASSUME ROLE ARN` is specified. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `false`. \| | #### Permissions {#aws-permissions} > **Warning:** Failing to constrain the external ID in your role trust policy will allow > other Materialize customers to assume your role and use AWS privileges you > have granted the role! When using role assumption-based authentication, you must configure a [trust policy] on the IAM role that permits Materialize to assume the role. Materialize always uses the following IAM principal to assume the role: ``` arn:aws:iam::664411391173:role/MaterializeConnection ``` Materialize additionally generates an [external ID] which uniquely identifies your AWS connection across all Materialize regions. To ensure that other Materialize customers cannot assume your role, your IAM trust policy **must** constrain access to only the external ID that Materialize generates for the connection: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "" } } } ] } ``` You can retrieve the external ID for the connection, as well as an example trust policy, by querying the [`mz_internal.mz_aws_connections`](/reference/system-catalog/mz_internal/#mz_aws_connections) table: ```mzsql SELECT id, external_id, example_trust_policy FROM mz_internal.mz_aws_connections; ``` #### Examples {#aws-examples} **Role assumption:** In this example, we have created the following IAM role for Materialize to assume:

Name	AWS account ID	Trust policy
`WarehouseExport`	000000000000	```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::000000000000:role/MaterializeConnection" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "mz_00000000-0000-0000-0000-000000000000_u0" } } } ] } ```

To create an AWS connection that will assume the `WarehouseExport` role: ```mzsql CREATE CONNECTION aws_role_assumption TO AWS ( ASSUME ROLE ARN = 'arn:aws:iam::000000000000:role/WarehouseExport', REGION = 'us-east-1' ); ``` **Credentials:** > **Warning:** Use of credentials-based authentication is deprecated. AWS strongly encourages > the use of role assumption-based authentication instead. To create an AWS connection that uses static access key credentials: ```mzsql CREATE SECRET aws_secret_access_key AS '...'; CREATE CONNECTION aws_credentials TO AWS ( ACCESS KEY ID = 'ASIAV2KIV5LPTG6HGXG6', SECRET ACCESS KEY = SECRET aws_secret_access_key ); ``` ### S3 compatible object storage You can use an AWS connection to perform bulk exports and bulk imports with any S3 compatible object storage service, such as Google Cloud Storage, Cloudflare R2, or MinIO. While connecting to S3 compatible object storage, you need to provide static access key credentials, specify the endpoint, and the region. To create a connection that uses static access key credentials: ```mzsql CREATE SECRET secret_access_key AS '...'; CREATE CONNECTION gcs_connection TO AWS ( ACCESS KEY ID = 'ASIAV2KIV5LPTG6HGXG6', SECRET ACCESS KEY = SECRET secret_access_key, ENDPOINT = 'https://storage.googleapis.com', REGION = 'us' ); ``` ### Kafka A Kafka connection establishes a link to a [Kafka] cluster. You can use Kafka connections to create [sources](/sql/create-source/kafka) and [sinks](/sql/create-sink/kafka/). #### Syntax {#kafka-syntax} ```mzsql CREATE CONNECTION TO KAFKA ( BROKER '' | BROKERS ('', '', ...), SECURITY PROTOCOL = { 'PLAINTEXT' | 'SSL' | 'SASL_PLAINTEXT' | 'SASL_SSL' }, SASL MECHANISMS = { 'PLAIN' | 'SCRAM-SHA-256' | 'SCRAM-SHA-512' }, SASL USERNAME = { '' | SECRET }, SASL PASSWORD = SECRET , SSL CERTIFICATE AUTHORITY = { '' | SECRET }, SSL CERTIFICATE = { '' | SECRET }, SSL KEY = SECRET , SSH TUNNEL = , AWS CONNECTION = , AWS PRIVATELINK (PORT ), PROGRESS TOPIC = '', PROGRESS TOPIC REPLICATION FACTOR = ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `BROKER` / `BROKERS` | *Value:* `text` / `text[]` The Kafka bootstrap server(s). Exactly one of `BROKER`, `BROKERS`, or `AWS PRIVATELINK` must be specified. | | `SECURITY PROTOCOL` | *Value:* `text` The security protocol to use: `PLAINTEXT`, `SSL`, `SASL_PLAINTEXT`, or `SASL_SSL`. Defaults to `SASL_SSL` if any `SASL ...` options are specified or if the `AWS CONNECTION` option is specified, otherwise defaults to `SSL`. | | `SASL MECHANISMS` | *Value:* `text` The SASL mechanism to use for authentication: `PLAIN`, `SCRAM-SHA-256`, or `SCRAM-SHA-512`. Despite the name, this option only allows a single mechanism to be specified. Required if the security protocol is `SASL_PLAINTEXT` or `SASL_SSL`. Cannot be specified if `AWS CONNECTION` is specified. | | `SASL USERNAME` / `SASL PASSWORD` | *Value:* secret or `text` / secret Your SASL credentials. Required and only valid when the security protocol is `SASL_PLAINTEXT` or `SASL_SSL`. | | `SSL CERTIFICATE AUTHORITY` | *Value:* secret or `text` The certificate authority (CA) certificate in PEM format. Used to validate the brokers' TLS certificates. If unspecified, uses the system's default CA certificates. Only valid when the security protocol is `SSL` or `SASL_SSL`. | | `SSL CERTIFICATE` / `SSL KEY` | *Value:* secret or `text` / secret Your TLS certificate and key in PEM format for SSL client authentication. If unspecified, no client authentication is performed. Only valid when the security protocol is `SSL` or `SASL_SSL`. | | `SSH TUNNEL` | *Value:* object name The name of an [SSH tunnel connection](#ssh-tunnel) to route network traffic through by default. | | `AWS CONNECTION` | *Value:* object name The name of an [AWS connection](#aws) to use when performing IAM authentication with an Amazon MSK cluster. Only valid if the security protocol is `SASL_PLAINTEXT` or `SASL_SSL`. | | `AWS PRIVATELINK` | *Value:* object name The name of an [AWS PrivateLink connection](#aws-privatelink) to route network traffic through. Exactly one of `BROKER`, `BROKERS`, or `AWS PRIVATELINK` must be specified. | | `PROGRESS TOPIC` | *Value:* `text` The name of a topic that Kafka sinks can use to track internal consistency metadata. Default: `_materialize-progress-{REGION ID}-{CONNECTION ID}`. | | `PROGRESS TOPIC REPLICATION FACTOR` | *Value:* `int` The partition count to use when creating the progress topic (if the Kafka topic does not already exist). Default: Broker's default. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `true`. \| | To connect to a Kafka cluster with multiple bootstrap servers, use the `BROKERS` option: ```mzsql CREATE CONNECTION kafka_connection TO KAFKA ( BROKERS ('broker1:9092', 'broker2:9092') ); ``` #### Security protocol examples {#kafka-auth} **PLAINTEXT:** > **Warning:** It is insecure to use the `PLAINTEXT` security protocol unless > you are using a [network security connection](#network-security-connections) > to tunnel into a private network, as shown below. ```mzsql CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'unique-jellyfish-0000.prd.cloud.redpanda.com:9092', SECURITY PROTOCOL = 'PLAINTEXT', SSH TUNNEL ssh_connection ); ``` **SSL:** With both TLS encryption and TLS client authentication: ```mzsql CREATE SECRET kafka_ssl_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE SECRET kafka_ssl_key AS '-----BEGIN PRIVATE KEY----- ...'; CREATE SECRET ca_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'rp-f00000bar.cloud.redpanda.com:30365', SECURITY PROTOCOL = 'SSL' SSL CERTIFICATE = SECRET kafka_ssl_cert, SSL KEY = SECRET kafka_ssl_key, -- Specifying a certificate authority is only required if your cluster's -- certificates are not issued by a CA trusted by the Mozilla root store. SSL CERTIFICATE AUTHORITY = SECRET ca_cert ); ``` With only TLS encryption: > **Warning:** It is insecure to use TLS encryption with no authentication unless > you are using a [network security connection](#network-security-connections) > to tunnel into a private network as shown below. ```mzsql CREATE SECRET ca_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE CONNECTION kafka_connection TO KAFKA ( BROKER = 'rp-f00000bar.cloud.redpanda.com:30365', SECURITY PROTOCOL = 'SSL', SSH TUNNEL ssh_connection, -- Specifying a certificate authority is only required if your cluster's -- certificates are not issued by a CA trusted by the Mozilla root store. SSL CERTIFICATE AUTHORITY = SECRET ca_cert ); ``` **SASL_PLAINTEXT:** > **Warning:** It is insecure to use the `SASL_PLAINTEXT` security protocol unless > you are using a [network security connection](#network-security-connections) > to tunnel into a private network, as shown below. ```mzsql CREATE SECRET kafka_password AS '...'; CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'unique-jellyfish-0000.us-east-1.aws.confluent.cloud:9092', SECURITY PROTOCOL = 'SASL_PLAINTEXT', SASL MECHANISMS = 'SCRAM-SHA-256', -- or `PLAIN` or `SCRAM-SHA-512` SASL USERNAME = 'foo', SASL PASSWORD = SECRET kafka_password, SSH TUNNEL ssh_connection ); ``` **SASL_SSL:** ```mzsql CREATE SECRET kafka_password AS '...'; CREATE SECRET ca_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'unique-jellyfish-0000.us-east-1.aws.confluent.cloud:9092', SECURITY PROTOCOL = 'SASL_SSL', SASL MECHANISMS = 'SCRAM-SHA-256', -- or `PLAIN` or `SCRAM-SHA-512` SASL USERNAME = 'foo', SASL PASSWORD = SECRET kafka_password, -- Specifying a certificate authority is only required if your cluster's -- certificates are not issued by a CA trusted by the Mozilla root store. SSL CERTIFICATE AUTHORITY = SECRET ca_cert ); ``` **AWS IAM:** ```mzsql CREATE CONNECTION aws_msk TO AWS ( ASSUME ROLE ARN = 'arn:aws:iam::000000000000:role/MaterializeMSK', REGION = 'us-east-1' ); CREATE CONNECTION kafka_msk TO KAFKA ( BROKER 'msk.mycorp.com:9092', SECURITY PROTOCOL = 'SASL_SSL', AWS CONNECTION = aws_msk ); ``` #### Network security {#kafka-network-security} If your Kafka broker is not exposed to the public internet, you can tunnel the connection through an AWS PrivateLink service (Materialize Cloud) or an SSH bastion host. **AWS PrivateLink (Materialize Cloud):** > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. Depending on the hosted service you are connecting to, you might need to specify a PrivateLink connection [per advertised broker](#kafka-privatelink-syntax) (e.g. Amazon MSK), or a single [default PrivateLink connection](#kafka-privatelink-default) (e.g. Redpanda Cloud). ##### Broker connection syntax {#kafka-privatelink-syntax} > **Warning:** If your Kafka cluster advertises brokers that are not specified > in the `BROKERS` clause, Materialize will attempt to connect to > those brokers without any tunneling. ```mzsql CREATE CONNECTION TO KAFKA ( BROKERS ( ':' USING , ':' USING ), ... ); ``` | Syntax element | Description | | --- | --- | | `:` | The hostname and port of each Kafka broker. | | `USING ` | Specifies how to connect to each broker (e.g., via AWS PrivateLink or SSH tunnel). | ##### `kafka_broker` ```mzsql ':' USING AWS PRIVATELINK ( AVAILABILITY ZONE = '', PORT = ) ``` | Syntax element | Description | | --- | --- | | `AWS PRIVATELINK ` | The name of an AWS PrivateLink connection through which network traffic for this broker should be routed. | | `AVAILABILITY ZONE` | The ID of the availability zone of the AWS PrivateLink service in which the broker is accessible. | | `PORT` | The port of the AWS PrivateLink service to connect to. | The `USING` clause specifies that Materialize Cloud should connect to the designated broker via an AWS PrivateLink service. Brokers do not need to be configured the same way, but the clause must be individually attached to each broker that you want to connect to via the tunnel. ##### Broker connection options {#kafka-privatelink-options} Field | Value | Required | Description ----------------------------------------|------------------|:--------:|------------------------------- `AWS PRIVATELINK` | object name | ✓ | The name of an [AWS PrivateLink connection](#aws-privatelink) through which network traffic for this broker should be routed. `AVAILABILITY ZONE` | `text` | | The ID of the availability zone of the AWS PrivateLink service in which the broker is accessible. If unspecified, traffic will be routed to each availability zone declared in the [AWS PrivateLink connection](#aws-privatelink) in sequence until the correct availability zone for the broker is discovered. If specified, Materialize will always route connections via the specified availability zone. `PORT` | `integer` | | The port of the AWS PrivateLink service to connect to. Defaults to the broker's port. ##### Example {#kafka-privatelink-example} Suppose you have the following infrastructure: * A Kafka cluster consisting of two brokers named `broker1` and `broker2`, both listening on port 9092. * A Network Load Balancer that forwards port 9092 to `broker1:9092` and port 9093 to `broker2:9092`. * A PrivateLink endpoint service attached to the load balancer. You can create a connection to this Kafka broker in Materialize like so: ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1', 'use1-az4') ); CREATE CONNECTION kafka_connection TO KAFKA ( BROKERS ( 'broker1:9092' USING AWS PRIVATELINK privatelink_svc, 'broker2:9092' USING AWS PRIVATELINK privatelink_svc (PORT 9093) ) ); ``` ##### Default connections {#kafka-privatelink-default} [Redpanda Cloud](/ingest-data/redpanda/redpanda-cloud/)) does not require listing every broker individually. In this case, you should specify a PrivateLink connection and the port of the bootstrap server instead. ##### Default connection syntax {#kafka-privatelink-default-syntax} ```mzsql CREATE CONNECTION TO KAFKA ( AWS PRIVATELINK (PORT ), ... ); ``` | Syntax element | Description | | --- | --- | | `AWS PRIVATELINK ` | The name of an AWS PrivateLink connection through which network traffic should be routed. | | `PORT` | The port of the AWS PrivateLink service to connect to. | ##### Default connection options {#kafka-privatelink-default-options} Field | Value | Required | Description ----------------------------------------|------------------|:--------:|------------------------------- `AWS PRIVATELINK` | object name | ✓ | The name of an [AWS PrivateLink connection](#aws-privatelink) through which network traffic for this broker should be routed. `PORT` | `integer` | | The port of the AWS PrivateLink service to connect to. Defaults to the broker's port. ##### Example {#kafka-privatelink-default-example} ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1') ); CREATE CONNECTION kafka_connection TO KAFKA ( AWS PRIVATELINK (PORT 30292) SECURITY PROTOCOL = 'SASL_PLAINTEXT', SASL MECHANISMS = 'SCRAM-SHA-256', SASL USERNAME = 'foo', SASL PASSWORD = SECRET red_panda_password ); ``` For step-by-step instructions on creating AWS PrivateLink connections and configuring an AWS PrivateLink service to accept connections from Materialize, check [this guide](/ops/network-security/privatelink/). **SSH tunnel:** ##### Syntax {#kafka-ssh-syntax} > **Warning:** If you do not specify a default `SSH TUNNEL` and your Kafka > cluster advertises brokers that are not listed in the `BROKERS` clause, > Materialize will attempt to connect to those brokers without any tunneling. ```mzsql CREATE CONNECTION TO KAFKA ( BROKERS ( ':' USING , ':' USING ), ... ); ``` | Syntax element | Description | | --- | --- | | `:` | The hostname and port of each Kafka broker. | | `USING ` | Specifies how to connect to each broker (e.g., via AWS PrivateLink or SSH tunnel). | ##### `kafka_broker` ```mzsql ':' USING SSH TUNNEL ``` | Syntax element | Description | | --- | --- | | `SSH TUNNEL ` | The name of an SSH tunnel connection through which network traffic for this broker should be routed. | The `USING` clause specifies that Materialize should connect to the designated broker via an SSH bastion server. Brokers do not need to be configured the same way, but the clause must be individually attached to each broker that you want to connect to via the tunnel. ##### Example {#kafka-ssh-example} Using a default SSH tunnel: ```mzsql CREATE CONNECTION ssh_connection TO SSH TUNNEL ( HOST '', USER '', PORT ); CREATE CONNECTION kafka_connection TO KAFKA ( BROKER 'broker1:9092', SSH TUNNEL ssh_connection ); ``` Using different SSH tunnels for each broker, with a default for brokers that are not listed: ```mzsql CREATE CONNECTION ssh1 TO SSH TUNNEL (HOST 'ssh1', ...); CREATE CONNECTION ssh2 TO SSH TUNNEL (HOST 'ssh2', ...); CREATE CONNECTION kafka_connection TO KAFKA ( BROKERS ( 'broker1:9092' USING SSH TUNNEL ssh1, 'broker2:9092' USING SSH TUNNEL ssh2 ) SSH TUNNEL ssh_1 ); ``` For step-by-step instructions on creating SSH tunnel connections and configuring an SSH bastion server to accept connections from Materialize, check [this guide](/ops/network-security/ssh-tunnel/). ### Confluent Schema Registry A Confluent Schema Registry connection establishes a link to a [Confluent Schema Registry] server. You can use Confluent Schema Registry connections in the `FORMAT` clause of [`CREATE SOURCE`] and [`CREATE SINK`] statements. #### Syntax {#csr-syntax} ```mzsql CREATE CONNECTION TO CONFLUENT SCHEMA REGISTRY ( URL '', USERNAME = { '' | SECRET }, PASSWORD = SECRET , SSL CERTIFICATE = { '' | SECRET }, SSL KEY = SECRET , SSL CERTIFICATE AUTHORITY = { '' | SECRET }, AWS PRIVATELINK , SSH TUNNEL ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `URL` | *Value:* `text`. Required. The schema registry URL. | | `USERNAME` / `PASSWORD` | *Value:* secret or `text` / secret Credentials for basic HTTP authentication. `PASSWORD` is required and only valid if `USERNAME` is specified. | | `SSL CERTIFICATE` / `SSL KEY` | *Value:* secret or `text` / secret Your TLS certificate and key in PEM format for TLS client authentication. If unspecified, no TLS client authentication is performed. Only respected if the URL uses the `https` protocol. | | `SSL CERTIFICATE AUTHORITY` | *Value:* secret or `text` The certificate authority (CA) certificate in PEM format. Used to validate the server's TLS certificate. If unspecified, uses the system's default CA certificates. Only respected if the URL uses the `https` protocol. | | `AWS PRIVATELINK` | *Value:* object name The name of an [AWS PrivateLink connection](#aws-privatelink) to route network traffic through. | | `SSH TUNNEL` | *Value:* object name The name of an [SSH tunnel connection](#ssh-tunnel) to route network traffic through. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `true`. \| | #### Examples {#csr-example} Using username and password authentication with TLS encryption: ```mzsql CREATE SECRET csr_password AS '...'; CREATE SECRET ca_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE CONNECTION csr_basic TO CONFLUENT SCHEMA REGISTRY ( URL 'https://rp-f00000bar.cloud.redpanda.com:30993', USERNAME = 'foo', PASSWORD = SECRET csr_password -- Specifying a certificate authority is only required if your cluster's -- certificates are not issued by a CA trusted by the Mozilla root store. SSL CERTIFICATE AUTHORITY = SECRET ca_cert ); ``` Using TLS for encryption and authentication: ```mzsql CREATE SECRET csr_ssl_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE SECRET csr_ssl_key AS '-----BEGIN PRIVATE KEY----- ...'; CREATE SECRET ca_cert AS '-----BEGIN CERTIFICATE----- ...'; CREATE CONNECTION csr_ssl TO CONFLUENT SCHEMA REGISTRY ( URL 'https://rp-f00000bar.cloud.redpanda.com:30993', SSL CERTIFICATE = SECRET csr_ssl_cert, SSL KEY = SECRET csr_ssl_key, -- Specifying a certificate authority is only required if your cluster's -- certificates are not issued by a CA trusted by the Mozilla root store. SSL CERTIFICATE AUTHORITY = SECRET ca_cert ); ``` #### Network security {#csr-network-security} If your Confluent Schema Registry server is not exposed to the public internet, you can tunnel the connection through an AWS PrivateLink service (Materialize Cloud) or an SSH bastion host. **AWS PrivateLink (Materialize Cloud):** > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. ##### Example {#csr-privatelink-example} ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1', 'use1-az4') ); CREATE CONNECTION csr_privatelink TO CONFLUENT SCHEMA REGISTRY ( URL 'http://my-confluent-schema-registry:8081', AWS PRIVATELINK privatelink_svc ); ``` **SSH tunnel:** ##### Example {#csr-ssh-example} ```mzsql CREATE CONNECTION ssh_connection TO SSH TUNNEL ( HOST '', USER '', PORT ); CREATE CONNECTION csr_ssh TO CONFLUENT SCHEMA REGISTRY ( URL 'http://my-confluent-schema-registry:8081', SSH TUNNEL ssh_connection ); ``` ### MySQL A MySQL connection establishes a link to a [MySQL] server. You can use MySQL connections to create [sources](/sql/create-source/mysql). #### Syntax {#mysql-syntax} ```mzsql CREATE CONNECTION TO MYSQL ( HOST '', PORT , USER '', PASSWORD SECRET , SSL MODE = { 'disabled' | 'required' | 'verify_ca' | 'verify_identity' }, SSL CERTIFICATE AUTHORITY = { '' | SECRET }, SSL CERTIFICATE = { '' | SECRET }, SSL KEY = SECRET , AWS CONNECTION , AWS PRIVATELINK , SSH TUNNEL ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `HOST` | *Value:* `text`. Required. Database hostname. | | `PORT` | *Value:* `integer` Port number to connect to at the server host. Default: `3306`. | | `USER` | *Value:* `text`. Required. Database username. | | `PASSWORD` | *Value:* secret Password for the connection. | | `SSL MODE` | *Value:* `text` Enables SSL connections if set to `required`, `verify_ca`, or `verify_identity`. See the [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/using-encrypted-connections.html) for more details. Default: `disabled`. | | `SSL CERTIFICATE AUTHORITY` | *Value:* secret or `text` The certificate authority (CA) certificate in PEM format. Used for both SSL client and server authentication. If unspecified, uses the system's default CA certificates. | | `SSL CERTIFICATE` / `SSL KEY` | *Value:* secret or `text` / secret Client SSL certificate and key in PEM format. | | `AWS CONNECTION` | *Value:* object name The name of an [AWS connection](#aws) to use when performing IAM authentication with an Amazon RDS MySQL cluster. Only valid if `SSL MODE` is set to `required`, `verify_ca`, or `verify_identity`. Incompatible with `PASSWORD` being set. | | `AWS PRIVATELINK` | *Value:* object name The name of an [AWS PrivateLink connection](#aws-privatelink) to route network traffic through. | | `SSH TUNNEL` | *Value:* object name The name of an [SSH tunnel connection](#ssh-tunnel) to route network traffic through. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `true`. \| | #### Example {#mysql-example} ```mzsql CREATE SECRET mysqlpass AS ''; CREATE CONNECTION mysql_connection TO MYSQL ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 3306, USER 'root', PASSWORD SECRET mysqlpass ); ``` #### Network security {#mysql-network-security} If your MySQL server is not exposed to the public internet, you can tunnel the connection through an AWS PrivateLink service (Materialize Cloud) or an SSH bastion host. **AWS PrivateLink (Materialize Cloud):** > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. ##### Example {#mysql-privatelink-example} ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1', 'use1-az4') ); CREATE CONNECTION mysql_connection TO MYSQL ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 3306, USER 'root', PASSWORD SECRET mysqlpass, AWS PRIVATELINK privatelink_svc ); ``` For step-by-step instructions on creating AWS PrivateLink connections and configuring an AWS PrivateLink service to accept connections from Materialize, check [this guide](/ops/network-security/privatelink/). **SSH tunnel:** ##### Example {#mysql-ssh-example} ```mzsql CREATE CONNECTION tunnel TO SSH TUNNEL ( HOST 'bastion-host', PORT 22, USER 'materialize' ); CREATE CONNECTION mysql_connection TO MYSQL ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', SSH TUNNEL ssh_connection ); ``` For step-by-step instructions on creating SSH tunnel connections and configuring an SSH bastion server to accept connections from Materialize, check [this guide](/ops/network-security/ssh-tunnel/). **AWS IAM:** ##### Example {#mysql-aws-connection-example} ```mzsql CREATE CONNECTION aws_rds_mysql TO AWS ( ASSUME ROLE ARN = 'arn:aws:iam::000000000000:role/MaterializeRDS', REGION = 'us-west-1' ); CREATE CONNECTION mysql_connection TO MYSQL ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 3306, USER 'root', AWS CONNECTION aws_rds_mysql, SSL MODE 'verify_identity' ); ``` ### PostgreSQL A Postgres connection establishes a link to a single database of a [PostgreSQL] server. You can use Postgres connections to create [sources](/sql/create-source/postgres). #### Syntax {#postgres-syntax} ```mzsql CREATE CONNECTION TO POSTGRES ( HOST '', PORT , DATABASE '', USER '', PASSWORD SECRET , SSL MODE = { 'disable' | 'require' | 'verify_ca' | 'verify_full' }, SSL CERTIFICATE AUTHORITY = { '' | SECRET }, SSL CERTIFICATE = { '' | SECRET }, SSL KEY = SECRET , AWS PRIVATELINK , SSH TUNNEL ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `HOST` | *Value:* `text`. Required. Database hostname. | | `PORT` | *Value:* `integer` Port number to connect to at the server host. Default: `5432`. | | `DATABASE` | *Value:* `text`. Required. Target database. | | `USER` | *Value:* `text`. Required. Database username. | | `PASSWORD` | *Value:* secret Password for the connection. | | `SSL MODE` | *Value:* `text` Enables SSL connections if set to `require`, `verify_ca`, or `verify_full`. Default: `disable`. | | `SSL CERTIFICATE AUTHORITY` | *Value:* secret or `text` The certificate authority (CA) certificate in PEM format. Used for both SSL client and server authentication. If unspecified, uses the system's default CA certificates. | | `SSL CERTIFICATE` / `SSL KEY` | *Value:* secret or `text` / secret Client SSL certificate and key in PEM format. | | `AWS PRIVATELINK` | *Value:* object name The name of an [AWS PrivateLink connection](#aws-privatelink) to route network traffic through. | | `SSH TUNNEL` | *Value:* object name The name of an [SSH tunnel connection](#ssh-tunnel) to route network traffic through. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `true`. \| | #### Example {#postgres-example} ```mzsql CREATE SECRET pgpass AS ''; CREATE CONNECTION pg_connection TO POSTGRES ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 5432, USER 'postgres', PASSWORD SECRET pgpass, SSL MODE 'require', DATABASE 'postgres' ); ``` #### Network security {#postgres-network-security} If your PostgreSQL server is not exposed to the public internet, you can tunnel the connection through an AWS PrivateLink service (Materialize Cloud)or an SSH bastion host. **AWS PrivateLink:** > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. ##### Example {#postgres-privatelink-example} ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1', 'use1-az4') ); CREATE CONNECTION pg_connection TO POSTGRES ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 5432, DATABASE postgres, USER postgres, PASSWORD SECRET pgpass, AWS PRIVATELINK privatelink_svc ); ``` For step-by-step instructions on creating AWS PrivateLink connections and configuring an AWS PrivateLink service to accept connections from Materialize, check [this guide](/ops/network-security/privatelink/). **SSH tunnel:** ##### Example {#postgres-ssh-example} ```mzsql CREATE CONNECTION tunnel TO SSH TUNNEL ( HOST 'bastion-host', PORT 22, USER 'materialize' ); CREATE CONNECTION pg_connection TO POSTGRES ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 5432, SSH TUNNEL tunnel, DATABASE 'postgres' ); ``` For step-by-step instructions on creating SSH tunnel connections and configuring an SSH bastion server to accept connections from Materialize, check [this guide](/ops/network-security/ssh-tunnel/). ### SQL Server A SQL Server connection establishes a link to a single database of a [SQL Server] instance. You can use SQL Server connections to create [sources](/sql/create-source/sql-server). #### Syntax {#sql-server-syntax} ```mzsql CREATE CONNECTION TO SQL SERVER ( HOST '', PORT , DATABASE '', USER '', PASSWORD SECRET , SSL MODE = { 'disabled' | 'required' | 'verify_ca' | 'verify' }, SSL CERTIFICATE AUTHORITY = { '' | SECRET } ) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `HOST` | *Value:* `text`. Required. Database hostname. | | `PORT` | *Value:* `integer` Port number to connect to at the server host. Default: `1433`. | | `DATABASE` | *Value:* `text`. Required. Target database. | | `USER` | *Value:* `text`. Required. Database username. | | `PASSWORD` | *Value:* secret. Required. Password for the connection. | | `SSL MODE` | *Value:* `text` Enables SSL connections if set to `required`, `verify_ca`, or `verify`. See the [SQL Server documentation](https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-sql-server-encryption) for more details. - `disabled` - no encryption. - `required` - encryption required, no certificate validation. - `verify` - encryption required, validate server certificate using OS configured CA. - `verify_ca` - encryption required, validate server certificate using provided CA certificates (requires `SSL CERTIFICATE AUTHORITY`). Default: `disabled`. | | `SSL CERTIFICATE AUTHORITY` | *Value:* secret or `text` One or more client SSL certificates in PEM format. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `VALIDATE` \| `boolean` \| Whether [connection validation](#connection-validation) should be performed on connection creation. Default: `true`. \| | #### Example {#sql-server-example} ```mzsql CREATE SECRET sqlserver_pass AS ''; CREATE CONNECTION sqlserver_connection TO SQL SERVER ( HOST 'instance.foo000.us-west-1.rds.amazonaws.com', PORT 1433, USER 'SA', PASSWORD SECRET sqlserver_pass, DATABASE 'my_db' ); ``` ### Iceberg Catalog > **Public Preview:** This feature is in public preview. An Iceberg catalog connection establishes a link to an [Apache Iceberg](https://iceberg.apache.org/) catalog. You can use Iceberg catalog connections to create [Iceberg sinks](/sql/create-sink/iceberg). #### Syntax {#iceberg-catalog-syntax} ```mzsql CREATE CONNECTION TO ICEBERG CATALOG ( CATALOG TYPE = 's3tablesrest', URL = '', WAREHOUSE = '', AWS CONNECTION = ); ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `CATALOG TYPE` | *Value:* `text`. Required. The type of Iceberg catalog. Currently only `'s3tablesrest'` (AWS S3 Tables) is supported. | | `URL` | *Value:* `text`. Required. The URL of the Iceberg catalog endpoint. For AWS S3 Tables, use `https://s3tables..amazonaws.com/iceberg`. | | `WAREHOUSE` | *Value:* `text`. Required. The ARN of the S3 Tables bucket: `arn:aws:s3tables:::bucket/`. | | `AWS CONNECTION` | *Value:* object name. Required. The name of an [AWS connection](#aws) to use for authentication. | #### Example {#iceberg-catalog-example} ```mzsql -- First, create an AWS connection for authentication CREATE CONNECTION aws_connection TO AWS (ASSUME ROLE ARN = 'arn:aws:iam::123456789012:role/MaterializeIceberg'); -- Create the Iceberg catalog connection CREATE CONNECTION iceberg_catalog TO ICEBERG CATALOG ( CATALOG TYPE = 's3tablesrest', URL = 'https://s3tables.us-east-1.amazonaws.com/iceberg', WAREHOUSE = 'arn:aws:s3tables:us-east-1:123456789012:bucket/my-table-bucket', AWS CONNECTION = aws_connection ); ``` For more information about using Iceberg sinks, see the [Iceberg sink documentation](/serve-results/sink/iceberg/). ## Network security connections ### AWS PrivateLink (Materialize Cloud) {#aws-privatelink} > **Note:** Connections using AWS PrivateLink is for Materialize Cloud only. An AWS PrivateLink connection establishes a link to an [AWS PrivateLink] service. You can use AWS PrivateLink connections in [Confluent Schema Registry connections](#confluent-schema-registry), [Kafka connections](#kafka), and [Postgres connections](#postgresql). #### Syntax {#aws-privatelink-syntax} ```mzsql CREATE CONNECTION TO AWS PRIVATELINK ( SERVICE NAME '', AVAILABILITY ZONES ('', '', ...) ); ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `SERVICE NAME` | *Value:* `text`. Required. The name of the AWS PrivateLink service. | | `AVAILABILITY ZONES` | *Value:* `text[]`. Required. The IDs of the AWS availability zones in which the service is accessible. | #### Permissions {#aws-privatelink-permissions} Materialize assigns a unique principal to each AWS PrivateLink connection in your region using an Amazon Resource Name of the following form: ``` arn:aws:iam::664411391173:role/mz__ ``` After creating the connection, you must configure the AWS PrivateLink service to accept connections from the AWS principal Materialize will connect as. The principals for AWS PrivateLink connections in your region are stored in the [`mz_aws_privatelink_connections`](/reference/system-catalog/mz_catalog/#mz_aws_privatelink_connections) system table. ```mzsql SELECT * FROM mz_aws_privatelink_connections; ``` ``` id | principal --------+--------------------------------------------------------------------------- u1 | arn:aws:iam::664411391173:role/mz_20273b7c-2bbe-42b8-8c36-8cc179e9bbc3_u1 u7 | arn:aws:iam::664411391173:role/mz_20273b7c-2bbe-42b8-8c36-8cc179e9bbc3_u7 ``` For more details on configuring a trusted principal for your AWS PrivateLink service, see the [AWS PrivateLink documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permissions). > **Warning:** Do **not** grant access to the root principal for the Materialize AWS account. > Doing so will allow any Materialize customer to create a connection to your > AWS PrivateLink service. #### Accepting connection requests {#aws-privatelink-requests} If your AWS PrivateLink service is configured to require acceptance of connection requests, you must additionally approve the connection request from Materialize after creating the connection. For more details on manually accepting connection requests, see the [AWS PrivateLink documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests). #### Example {#aws-privatelink-example} ```mzsql CREATE CONNECTION privatelink_svc TO AWS PRIVATELINK ( SERVICE NAME 'com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc', AVAILABILITY ZONES ('use1-az1', 'use1-az4') ); ``` ### SSH tunnel An SSH tunnel connection establishes a link to an SSH bastion server. You can use SSH tunnel connections in [Kafka connections](#kafka), [MySQL connections](#mysql), and [Postgres connections](#postgresql). #### Syntax {#ssh-tunnel-syntax} ```mzsql CREATE CONNECTION TO SSH TUNNEL ( HOST '', PORT , USER '' ); ``` | Syntax element | Description | | --- | --- | | `` | A name for the connection. | | `HOST` | *Value:* `text`. Required. The hostname of the SSH bastion server. | | `PORT` | *Value:* `integer`. Required. The port to connect to. | | `USER` | *Value:* `text`. Required. The name of the user to connect as. | #### Key pairs {#ssh-tunnel-keypairs} Materialize automatically manages the key pairs for an SSH tunnel connection. Each connection is associated with two key pairs. The private key for each key pair is stored securely within your region and cannot be retrieved. The public key for each key pair is stored in the [`mz_ssh_tunnel_connections`] system table. When Materialize connects to the SSH bastion server, it presents both keys for authentication. To allow key pair rotation without downtime, you should configure your SSH bastion server to accept both key pairs. You can then **rotate the key pairs** using [`ALTER CONNECTION`]. Materialize currently generates SSH key pairs using the [Ed25519 algorithm], which is fast, secure, and [recommended by security professionals][latacora-crypto]. Some legacy SSH servers do not support the Ed25519 algorithm. You will not be able to use these servers with Materialize's SSH tunnel connections. We routinely evaluate the security of the cryptographic algorithms in use in Materialize. Future versions of Materialize may use a different SSH key generation algorithm as security best practices evolve. #### Examples {#ssh-tunnel-example} Create an SSH tunnel connection: ```mzsql CREATE CONNECTION ssh_connection TO SSH TUNNEL ( HOST 'bastion-host', PORT 22, USER 'materialize' ); ``` Retrieve the public keys for the SSH tunnel connection you just created: ```mzsql SELECT mz_connections.name, mz_ssh_tunnel_connections.* FROM mz_connections JOIN mz_ssh_tunnel_connections USING(id) WHERE mz_connections.name = 'ssh_connection'; ``` ``` id | public_key_1 | public_key_2 -------+---------------------------------------+--------------------------------------- ... | ssh-ed25519 AAAA...76RH materialize | ssh-ed25519 AAAA...hLYV materialize ``` ## Connection validation {#connection-validation} Materialize automatically validates the connection and authentication parameters for most connection types on connection creation: Connection type | Validated by default | ----------------------------|----------------------| AWS | | Kafka | ✓ | Confluent Schema Registry | ✓ | MySQL | ✓ | PostgreSQL | ✓ | SSH Tunnel | | AWS PrivateLink | | For connection types that are validated by default, if the validation step fails, the creation of the connection will also fail and a validation error is returned. You can disable connection validation by setting the `VALIDATE` option to `false`. This is useful, for example, when the parameters are known to be correct but the external system is unavailable at the time of creation. Connection types that require additional setup steps after creation, like AWS and SSH tunnel connections, can be **manually validated** using the [`VALIDATE CONNECTION`](/sql/validate-connection) syntax once all setup steps are completed. ## Privileges The privileges required to execute this statement are: - `CREATE` privileges on the containing schema. - `USAGE` privileges on all connections and secrets used in the connection definition. - `USAGE` privileges on the schemas that all connections and secrets in the statement are contained in. ## Related pages - [`CREATE SECRET`](/sql/create-secret) - [`CREATE SOURCE`](/sql/create-source) - [`CREATE SINK`](/sql/create-sink) [AWS PrivateLink]: https://aws.amazon.com/privatelink/ [Confluent Schema Registry]: https://docs.confluent.io/platform/current/schema-registry/index.html#sr-overview [Kafka]: https://kafka.apache.org [MySQL]: https://www.mysql.com/ [PostgreSQL]: https://www.postgresql.org [SQL Server]: https://www.microsoft.com/en-us/sql-server [`ALTER CONNECTION`]: /sql/alter-connection [`CREATE SOURCE`]: /sql/create-source [`CREATE SINK`]: /sql/create-sink [`mz_aws_privatelink_connections`]: /reference/system-catalog/mz_catalog/#mz_aws_privatelink_connections [`mz_connections`]: /reference/system-catalog/mz_catalog/#mz_connections [`mz_ssh_tunnel_connections`]: /reference/system-catalog/mz_catalog/#mz_ssh_tunnel_connections [Ed25519 algorithm]: https://ed25519.cr.yp.to [latacora-crypto]: https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html [trust policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#term_trust-policy [external ID]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html --- ## CREATE DATABASE Use `CREATE DATABASE` to create a new database. ## Syntax ```mzsql CREATE DATABASE [IF NOT EXISTS] ; ``` | Syntax element | Description | | --- | --- | | `IF NOT EXISTS` | If specified, do not generate an error if a database of the same name already exists. If not specified, throw an error if a database of the same name already exists. | | `` | A name for the database. | ## Details Databases can contain schemas. By default, each database has a schema called `public`. For more information about databases, see [Namespaces](/sql/namespaces). ## Examples ```mzsql CREATE DATABASE IF NOT EXISTS my_db; ``` ```mzsql SHOW DATABASES; ``` ```nofmt materialize my_db ``` ## Privileges The privileges required to execute this statement are: - `CREATEDB` privileges on the system. ## Related pages - [`DROP DATABASE`](../drop-database) - [`SHOW DATABASES`](../show-databases) --- ## CREATE INDEX `CREATE INDEX` creates an in-memory [index](/concepts/indexes/) on a source, view, or materialized view. In Materialize, indexes store query results in memory within a specific [cluster](/concepts/clusters/), and keep these results **incrementally updated** as new data arrives. This ensures that indexed data remains [fresh](/concepts/reaction-time), reflecting the latest changes with minimal latency. The primary use case for indexes is to accelerate direct queries issued via [`SELECT`](/sql/select/) statements. By maintaining fresh, up-to-date results in memory, indexes can significantly [optimize query performance](/transform-data/optimization/), reducing both response time and compute load—especially for resource-intensive operations such as joins, aggregations, and repeated subqueries. Because indexes are scoped to a single cluster, they are most useful for accelerating queries within that cluster. For results that must be shared across clusters or persisted to durable storage, consider using a [materialized view](/sql/create-materialized-view), which also maintains fresh results but is accessible system-wide. ## Syntax **CREATE INDEX:** ### Create index Create an index using the specified columns as the index key. ```mzsql CREATE INDEX [] [IN CLUSTER ] ON [USING ] (, ...) [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `` | A name for the index. | | `IN CLUSTER ` | The [cluster](/sql/create-cluster) to maintain this index. If not specified, defaults to the active cluster. | | `` | The name of the source, view, or materialized view on which you want to create an index. | | `USING ` | The name of the index method to use. The only supported method is [`arrangement`](/overview/arrangements). | | `(, ...)` | The expressions to use as the key for the index. | | `WITH ([,...])` | The following `` is supported: \| Option \| Description \| \|----------------------------\|-------------\| \| `RETAIN HISTORY FOR` \| ***Private preview.** This option has known performance or stability issues and is under active development.* Duration for which Materialize retains historical data, which is useful to implement [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). **Note:** Configuring indexes to retain history is not recommended. Instead, consider creating a materialized view for your subscription query and configuring the history retention period on the view instead. See [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). Accepts positive [interval](/sql/types/interval/) values (e.g. `'1hr'`). Default: `1s`. \| | **CREATE DEFAULT INDEX:** ### Create default index Create a default index using a set of columns that uniquely identify each row. If this set of columns cannot be inferred, all columns are used. ```mzsql CREATE DEFAULT INDEX [IN CLUSTER ] ON [USING ] [WITH ()]; ``` | Syntax element | Description | | --- | --- | | `IN CLUSTER ` | The [cluster](/sql/create-cluster) to maintain this index. If not specified, defaults to the active cluster. | | `` | The name of the source, view, or materialized view on which you want to create an index. | | `USING ` | The name of the index method to use. The only supported method is [`arrangement`](/overview/arrangements). | | `WITH ([,...])` | The following `` is supported: \| Option \| Description \| \|----------------------------\|-------------\| \| `RETAIN HISTORY FOR` \| ***Private preview.** This option has known performance or stability issues and is under active development.* Duration for which Materialize retains historical data, which is useful to implement [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). **Note:** Configuring indexes to retain history is not recommended. Instead, consider creating a materialized view for your subscription query and configuring the history retention period on the view instead. See [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). Accepts positive [interval](/sql/types/interval/) values (e.g. `'1hr'`). Default: `1s`. \| | ## Details ### Restrictions - You can only reference the columns available in the `SELECT` list of the query that defines the view. For example, if your view was defined as `SELECT a, b FROM src`, you can only reference columns `a` and `b`, even if `src` contains additional columns. - You cannot exclude any columns from being in the index's "value" set. For example, if your view is defined as `SELECT a, b FROM ...`, all indexes will contain `{a, b}` as their values. If you want to create an index that only stores a subset of these columns, consider creating another materialized view that uses `SELECT some_subset FROM this_view...`. ### Structure Indexes in Materialize have the following structure for each unique row: ```nofmt ((tuple of indexed expressions), (tuple of the row, i.e. stored columns)) ``` #### Indexed expressions vs. stored columns Automatically created indexes will use all columns as key expressions for the index, unless Materialize is provided or can infer a unique key for the source or view. For instance, unique keys can be... - **Provided** by the schema provided for the source, e.g. through the Confluent Schema Registry. - **Inferred** when the query... - Concludes with a `GROUP BY`. - Uses sources or views that have a unique key without damaging this property. For example, joining a view with unique keys against a second, where the join constraint uses foreign keys. When creating your own indexes, you can choose the indexed expressions. ### Memory footprint The in-memory sizes of indexes are proportional to the current size of the source or view they represent. The actual amount of memory required depends on several details related to the rate of compaction and the representation of the types of data in the source or view. Creating an index may also force the first materialization of a view, which may cause Materialize to install a dataflow to determine and maintain the results of the view. This dataflow may have a memory footprint itself, in addition to that of the index. #### Best practices

Before creating an index, consider the following:

If you create stacked views (i.e., views that depend on other views) to reduce SQL complexity, we recommend that you create an index only on the view that will serve results, taking into account the expected data access patterns.
Materialize can reuse indexes across queries that concurrently access the same data in memory, which reduces redundancy and resource utilization per query. In particular, this means that joins do not need to store data in memory multiple times.
For queries that have no supporting indexes, Materialize uses the same mechanics used by indexes to optimize computations. However, since this underlying work is discarded after each query run, take into account the expected data access patterns to determine if you need to index or not.

### Usage patterns #### Indexes on views vs. materialized views In Materialize, both indexes on views and materialized views incrementally update the view results when Materialize ingests new data. Whereas materialized views persist the view results in durable storage and can be accessed across clusters, indexes on views compute and store view results in memory within a single cluster.

Some general guidelines for usage patterns include:

Usage Pattern	General Guideline
View results are accessed from a single cluster only; such as in a 1-cluster or a 2-cluster architecture.	View with an index
View used as a building block for stacked views; i.e., views not used to serve results.	View
View results are accessed across clusters; such as in a 3-cluster architecture.	Materialized view (in the transform cluster) Index on the materialized view (in the serving cluster)
Use with a sink or a `SUBSCRIBE` operation	Materialized view
Use with temporal filters	Materialized view

#### Indexes and query optimizations You might want to create indexes when... - You want to use non-primary keys (e.g. foreign keys) as a join condition. In this case, you could create an index on the columns in the join condition. - You want to speed up searches filtering by literal values or expressions.

Specific instances where indexes can be useful to improve performance include:

When used in ad-hoc queries.
When used by multiple queries within the same cluster.
When used to enable delta joins.

For more information, see Optimization.

## Examples ### Optimizing joins with indexes You can optimize the performance of `JOIN` on two relations by ensuring their join keys are the key columns in an index. ```mzsql CREATE MATERIALIZED VIEW active_customers AS SELECT guid, geo_id, last_active_on FROM customer_source WHERE last_active_on > now() - INTERVAL '30' DAYS; CREATE INDEX active_customers_geo_idx ON active_customers (geo_id); CREATE MATERIALIZED VIEW active_customer_per_geo AS SELECT geo.name, count(*) FROM geo_regions AS geo JOIN active_customers ON active_customers.geo_id = geo.id GROUP BY geo.name; ``` In the above example, the index `active_customers_geo_idx`... - Helps us because it contains a key that the view `active_customer_per_geo` can use to look up values for the join condition (`active_customers.geo_id`). Because this index is exactly what the query requires, the Materialize optimizer will choose to use `active_customers_geo_idx` rather than build and maintain a private copy of the index just for this query. - Obeys our restrictions by containing only a subset of columns in the result set. ### Speed up filtering with indexes If you commonly filter by a certain column being equal to a literal value, you can set up an index over that column to speed up your queries: ```mzsql CREATE MATERIALIZED VIEW active_customers AS SELECT guid, geo_id, last_active_on FROM customer_source GROUP BY geo_id; CREATE INDEX active_customers_idx ON active_customers (guid); -- This should now be very fast! SELECT * FROM active_customers WHERE guid = 'd868a5bf-2430-461d-a665-40418b1125e7'; -- Using indexed expressions: CREATE INDEX active_customers_exp_idx ON active_customers (upper(guid)); SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7'; -- Filter using an expression in one field and a literal in another field: CREATE INDEX active_customers_exp_field_idx ON active_customers (upper(guid), geo_id); SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7' and geo_id = 'ID_8482'; ``` Create an index with an expression to improve query performance over a frequently used expression, and avoid building downstream views to apply the function like the one used in the example: `upper()`. Take into account that aggregations like `count()` cannot be used as indexed expressions. For more details on using indexes to optimize queries, see [Optimization](../../ops/optimization/). ## Privileges The privileges required to execute this statement are: - Ownership of the object on which to create the index. - `CREATE` privileges on the containing schema. - `CREATE` privileges on the containing cluster. - `USAGE` privileges on all types used in the index definition. - `USAGE` privileges on the schemas that all types in the statement are contained in. ## Related pages - [`SHOW INDEXES`](../show-indexes) - [`DROP INDEX`](../drop-index) --- ## CREATE MATERIALIZED VIEW Use `CREATE MATERIALIZED VIEW` to: - Create a materialized view that maintains [fresh results](/concepts/reaction-time) by persisting them in durable storage and incrementally updating them as new data arrives. - Create a replacement for an existing materialized view that can be applied in place with [`ALTER MATERIALIZED VIEW ... APPLY REPLACEMENT`](/sql/alter-materialized-view/). Materialized views are particularly useful when you need **cross-cluster access** to results or want to sink data to external systems like [Kafka](/sql/create-sink). When you create a materialized view, a [cluster](/concepts/clusters/), responsible for maintaining the view, is associated with it, but the results can be **queried from any cluster**. This allows you to separate the compute resources used for view maintenance from those used for serving queries. If you do not need durability or cross-cluster sharing, and you are primarily interested in fast query performance within a single cluster, you may prefer to [create a view and index it](/concepts/views/#views). In Materialize, [indexes on views](/concepts/indexes/) also maintain results incrementally, but store them in memory, scoped to the cluster where the index was created. This approach offers lower latency for direct querying within that cluster. ## Syntax **CREATE MATERIALIZED VIEW:** ### Create materialized view ```mzsql CREATE MATERIALIZED VIEW [IF NOT EXISTS] [(, ...)] [IN CLUSTER ] [WITH ()] AS ; ``` | Syntax element | Description | | --- | --- | | `IF NOT EXISTS` | If specified, do not generate an error if a materialized view of the same name already exists. | | `` | A name for the materialized view. | | `(, ...)` | Rename the `SELECT` statement's columns to the list of identifiers. Both must be the same length. Note that this is required for statements that return multiple columns with the same identifier. | | `IN CLUSTER ` | The cluster to maintain this materialized view. If not specified, defaults to the active cluster. | | `WITH ()` | The following `` are supported: \| Field \| Value \| Description \| \|-------\|-------\|-------------\| \| `ASSERT NOT NULL` *col_ident* \| `text` \| The column identifier for which to create a [non-null assertion](#non-null-assertions). To specify multiple columns, use the option multiple times. \| \| `PARTITION BY` *columns* \| `(ident [, ident]*)` \| The key by which Materialize should internally partition this durable collection. See the [partitioning guide](/transform-data/patterns/partition-by/) for restrictions on valid values and other details. \| \| `RETAIN HISTORY FOR` *retention_period* \| `interval` \| ***Private preview.*** Duration for which Materialize retains historical data, which is useful to implement [durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). Accepts positive [interval](/sql/types/interval/) values (e.g. `'1hr'`). Default: `1s`. \| \| `REFRESH` *refresh_strategy* \| \| ***Private preview.*** The refresh strategy for the materialized view. See [Refresh strategies](#refresh-strategies) for syntax options. Default: `ON COMMIT`. \| | | `` | The [`SELECT` statement](/sql/select) whose results you want to maintain incrementally updated. | **CREATE REPLACEMENT MATERIALIZED VIEW:** ### Create replacement materialized view > **Public Preview:** This feature is in public preview. Create a replacement materialized view for an existing materialized view. ```mzsql CREATE REPLACEMENT MATERIALIZED VIEW FOR [IN CLUSTER ] [WITH ()] AS ; ``` | Syntax element | Description | | --- | --- | | `` | A name for the replacement materialized view. | | `` | The name of the existing materialized view to be replaced. The replacement materialized view can only be applied to this materialized view. | | `IN CLUSTER ` | The cluster to maintain this replacement materialized view. If not specified, defaults to the active cluster. | | `WITH ()` | Same options as `CREATE MATERIALIZED VIEW`. | | `` | The [`SELECT` statement](/sql/select) for the replacement view. The statement must produce the same output schema as the target materialized view; i.e., column names, column types, column order, nullability, and keys must all match. | The created replacement materialized view starts hydrating immediately and can later be applied to replace the specified materialized view. For more information, see [Creating replacement materialized views](#creating-replacement-materialized-views). ## Details ### Usage pattern In Materialize, both indexes on views and materialized views incrementally update the view results when Materialize ingests new data. Whereas materialized views persist the view results in durable storage and can be accessed across clusters, indexes on views compute and store view results in memory within a single cluster.

Some general guidelines for usage patterns include:

Usage Pattern	General Guideline
View results are accessed from a single cluster only; such as in a 1-cluster or a 2-cluster architecture.	View with an index
View used as a building block for stacked views; i.e., views not used to serve results.	View
View results are accessed across clusters; such as in a 3-cluster architecture.	Materialized view (in the transform cluster) Index on the materialized view (in the serving cluster)
Use with a sink or a `SUBSCRIBE` operation	Materialized view
Use with temporal filters	Materialized view

### Indexing materialized views Although you can query a materialized view directly, these queries will be issued against Materialize's storage layer. This is expected to be fast, but still slower than reading from memory. To improve the speed of queries on materialized views, we recommend creating [indexes](../create-index) based on common query patterns. It's important to keep in mind that indexes are **local** to a cluster, and maintained in memory. As an example, if you create a materialized view and build an index on it in the `quickstart` cluster, querying the view from a different cluster will _not_ use the index; you should create the appropriate indexes in each cluster you are referencing the materialized view in. [//]: # "TODO(morsapaes) Point to relevant operational guide on indexes once this exists+add detail about using indexes to optimize materialized view stacking." ### Non-null assertions Because materialized views may be created on arbitrary queries, it may not in all cases be possible for Materialize to automatically infer non-nullability of some columns that can in fact never be null. In such a case, `ASSERT NOT NULL` clauses may be used as described in the syntax section above. Specifying `ASSERT NOT NULL` for a column forces that column's type in the materialized view to include `NOT NULL`. If this clause is used erroneously, and a `NULL` value is in fact produced in a column for which `ASSERT NOT NULL` was specified, querying the materialized view will produce an error until the offending row is deleted. ### Refresh strategies Materialized views in Materialize are incrementally maintained by default, meaning their results are automatically updated as soon as new data arrives. This guarantees that queries returns the most up-to-date information available with minimal delay and that results are always as [fresh](/concepts/reaction-time) as the input data itself. In most cases, this default behavior is ideal. However, in some very specific scenarios like reporting over slow changing historical data, it may be acceptable to relax freshness in order to reduce compute usage. For these cases, Materialize supports refresh strategies, which allow you to configure a materialized view to recompute itself on a fixed schedule rather than maintaining them incrementally. > **Note:** The use of refresh strategies is discouraged unless you have a clear and measurable need to reduce maintenance costs on stale or archival data. For most use cases, the default incremental maintenance model provides a better experience. [//]: # "TODO(morsapaes) We should add a SQL pattern that walks through a full-blown example of how to implement the cold, warm, hot path with refresh strategies." #### Refresh on commit

Syntax: REFRESH ON COMMIT

Materialized views in Materialize are incrementally updated by default. This means that as soon as new data arrives in the system, any dependent materialized views are automatically and continuously updated. This behavior, known as **refresh on commit**, ensures that the view's contents are always as fresh as the underlying data. **`REFRESH ON COMMIT` is:** * **Generally available** * The **default behavior** for all materialized views * **Implicit** and does not need to be manually specified * **Strongly recommended** for the vast majority of use cases With `REFRESH ON COMMIT`, Materialize provides low-latency, up-to-date results without requiring user-defined schedules or manual refreshes. This model is ideal for most workloads, including streaming analytics, live dashboards, customer-facing queries, and applications that rely on timely, accurate results. Only in rare cases—such as batch-oriented processing or reporting over slowly changing historical datasets—might it make sense to trade off freshness for potential cost savings. In such cases, consider defining an explicit [refresh strategy](#refresh-strategies) to control when recomputation occurs. #### Refresh at

Syntax: REFRESH AT { CREATION | timestamp }

This strategy allows configuring a materialized view to **refresh at a specific time**. The refresh time can be specified as a timestamp, or using the `AT CREATION` clause, which triggers a first refresh when the materialized view is created. **Example** To create a materialized view that is refreshed at creation, and then at the specified times: ```mzsql CREATE MATERIALIZED VIEW mv_refresh_at IN CLUSTER my_scheduled_cluster WITH ( -- Refresh at creation, so the view is populated ahead of -- the first user-specified refresh time REFRESH AT CREATION, -- Refresh at a user-specified (future) time REFRESH AT '2024-06-06 12:00:00', -- Refresh at another user-specified (future) time REFRESH AT '2024-06-08 22:00:00' ) AS SELECT ... FROM ...; ``` You can specify multiple `REFRESH AT` strategies in the same `CREATE` statement, and combine them with the [`REFRESH EVERY` strategy](#refresh-every). #### Refresh every

Syntax: REFRESH EVERY interval [ ALIGNED TO timestamp ]

This strategy allows configuring a materialized view to **refresh at regular intervals**. The `ALIGNED TO` clause additionally allows specifying the _phase_ of the scheduled refreshes: for daily refreshes, it specifies the time of the day when the refresh will happen; for weekly refreshes, it specifies the day of the week and the time of the day when the refresh will happen. If `ALIGNED TO` is not specified, it defaults to the time when the materialized view is created. **Example** To create a materialized view that is refreshed at creation, and then once a day at 10PM UTC: ```mzsql CREATE MATERIALIZED VIEW mv_refresh_every IN CLUSTER my_scheduled_cluster WITH ( -- Refresh at creation, so the view is populated ahead of -- the first user-specified refresh time REFRESH AT CREATION, -- Refresh every day at 10PM UTC REFRESH EVERY '1 day' ALIGNED TO '2024-06-06 22:00:00' ) AS SELECT ...; ``` You can specify multiple `REFRESH EVERY` strategies in the same `CREATE` statement, and combine them with the [`REFRESH AT` strategy](#refresh-at). When this strategy, we recommend **always** using the [`REFRESH AT CREATION`](#refresh-at) clause, so the materialized view is available for querying ahead of the first user-specified refresh time. #### Querying materialized views with refresh strategies Materialized views configured with [`REFRESH EVERY` strategies](#refresh-every) have a period of unavailability around the scheduled refresh times — during this period, the view **will not return any results**. To avoid unavailability during the refresh operation, you must host these views in [**scheduled clusters**](/sql/create-cluster/#scheduling), which can be configured to automatically [turn on ahead of the scheduled refresh time](/sql/create-cluster/#hydration-time-estimate). **Example** To create a scheduled cluster that turns on 1 hour ahead of any scheduled refresh times: ```mzsql CREATE CLUSTER my_scheduled_cluster ( SIZE = '3200cc', SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '1 hour') ); ``` You can then create a materialized view in this cluster, configured to refresh at creation, then once a day at 12PM UTC: ```mzsql CREATE MATERIALIZED VIEW mv_refresh_every IN CLUSTER my_scheduled_cluster WITH ( -- Refresh at creation, so the view is populated ahead of -- the first user-specified refresh time REFRESH AT CREATION, -- Refresh every day at 12PM UTC REFRESH EVERY '1 day' ALIGNED TO '2024-06-18 00:00:00' ) AS SELECT ...; ``` Because the materialized view is hosted on a scheduled cluster that is configured to **turn on ahead of any scheduled refreshes**, you can expect `my_scheduled_cluster` to be provisioned at 11PM UTC — or, 1 hour ahead of the scheduled refresh time for `mv_refresh_every`. This means that the cluster can backfill the view with pre-existing data — a process known as [_hydration_](/transform-data/troubleshooting/#hydrating-upstream-objects) — ahead of the refresh operation, which **reduces the total unavailability window of the view** to just the duration of the refresh. If the cluster is **not** configured to turn on ahead of scheduled refreshes (i.e., using the `HYDRATION TIME ESTIMATE` option), the total unavailability window of the view will be a combination of the hydration time for all objects in the cluster (typically long) and the duration of the refresh for the materialized view (typically short). Depending on the actual time it takes to hydrate the view or set of views in the cluster, you can later adjust the hydration time estimate value for the cluster using [`ALTER CLUSTER`](../alter-cluster/#schedule): ```mzsql ALTER CLUSTER my_scheduled_cluster SET (SCHEDULE = ON REFRESH (HYDRATION TIME ESTIMATE = '30 minutes')); ``` #### Introspection To check details about the (non-default) refresh strategies associated with any materialized view in the system, you can query the [`mz_internal.mz_materialized_view_refresh_strategies`](/reference/system-catalog/mz_internal/#mz_materialized_view_refresh_strategies) and [`mz_internal.mz_materialized_view_refreshes`](/reference/system-catalog/mz_internal/#mz_materialized_view_refreshes) system catalog tables: ```mzsql SELECT mv.id AS materialized_view_id, mv.name AS materialized_view_name, rs.type AS refresh_strategy, rs.interval AS refresh_interval, rs.aligned_to AS refresh_interval_phase, rs.at AS refresh_time, r.last_completed_refresh, r.next_refresh FROM mz_internal.mz_materialized_view_refresh_strategies rs JOIN mz_internal.mz_materialized_view_refreshes r ON r.materialized_view_id = rs.materialized_view_id JOIN mz_materialized_views mv ON rs.materialized_view_id = mv.id; ``` ### Creating replacement materialized views > **Public Preview:** This feature is in public preview. You can use [`CREATE REPLACEMENT MATERIALIZED VIEW`](/sql/create-materialized-view/) with [`ALTER MATERIALIZED VIEW ... APPLY REPLACEMENT`](/sql/alter-materialized-view) to replace materialized views in-place without recreating dependent objects or incurring downtime.

To create a replacement materialized view, you must:

Specify the target materialized view.
Specify a SELECT statement for the replacement view that produces the same output schema (including column order and keys) as the target view.

Upon creation, the replacement view starts hydrating in the background.

Before applying the replacement view, verify that the replacement view is hydrated to avoid downtime: The replacement view is dropped when you apply the replacement view. For more information on applying the replacement view, including recommendations and CPU/memory considerations, see [`ALTER MATERIALIZED VIEW ... APPLY REPLACEMENT...`](/sql/alter-materialized-view/#replacing-a-materialized-view) See also: - [Replace materialized views](/transform-data/updating-materialized-views/replace-materialized-view/) guide for a step-by-step tutorial. #### Query performance of replacement views You can query a replacement materialized view to validate its results before replacing. However, when queried, replacement materialized views are treated like a [view](/sql/create-view), and the query results are re-computed as part of the query execution. As such, queries against replacement materialized views are slower and more computationally expensive than queries against regular materialized views. #### Restrictions and limitations A replacement materialized view can only be applied to the target materialized view specified in the `FOR` clause of the [`CREATE REPLACEMENT MATERIALIZED VIEW`](/sql/create-materialized-view/) statement. You cannot create dependent objects using [replacement materialized views](/sql/create-materialized-view/#creating-replacement-materialized-views); for example, you cannot create an index on a replacement materialized view or create other views on a replacement materialized view. ## Examples ### Creating a materialized view The following example creates a `winning_bids` materialized view: ```mzsql CREATE MATERIALIZED VIEW winning_bids AS SELECT DISTINCT ON (a.id) b.*, a.item, a.seller FROM auctions AS a JOIN bids AS b ON a.id = b.auction_id WHERE b.bid_time < a.end_time AND mz_now() >= a.end_time ORDER BY a.id, b.amount DESC, b.bid_time, b.buyer; ``` ### Using non-null assertions ```mzsql CREATE MATERIALIZED VIEW users_and_orders WITH ( -- The semantics of a FULL OUTER JOIN guarantee that user_id is not null, -- because one of `users.id` or `orders.user_id` must be not null, but -- Materialize cannot yet automatically infer that fact. ASSERT NOT NULL user_id ) AS SELECT coalesce(users.id, orders.user_id) AS user_id, ... FROM users FULL OUTER JOIN orders ON users.id = orders.user_id ``` ### Using refresh strategies ```mzsql CREATE MATERIALIZED VIEW mv IN CLUSTER my_refresh_cluster WITH ( -- Refresh every Tuesday at 12PM UTC REFRESH EVERY '7 days' ALIGNED TO '2024-06-04 12:00:00', -- Refresh every Thursday at 12PM UTC REFRESH EVERY '7 days' ALIGNED TO '2024-06-06 12:00:00', -- Refresh on creation, so the view is populated ahead of -- the first user-specified refresh time REFRESH AT CREATION ) AS SELECT ... FROM ...; ``` [//]: # "TODO(morsapaes) Add more elaborate examples with \timing that show things like querying materialized views from different clusters, indexed vs. non-indexed, and so on." ### Creating a replacement materialized view > **Public Preview:** This feature is in public preview. The following example creates a replacement materialized view `winning_bids_replacement` for the `winning_bids` materialized view. The replacement view specifies a different filter `mz_now() > a.end_time` than the existing view `mz_now() >= a.end_time`. ```mzsql CREATE REPLACEMENT MATERIALIZED VIEW winning_bids_replacement FOR winning_bids AS SELECT DISTINCT ON (a.id) b.*, a.item, a.seller FROM auctions AS a JOIN bids AS b ON a.id = b.auction_id WHERE b.bid_time < a.end_time AND mz_now() > a.end_time ORDER BY a.id, b.amount DESC, b.bid_time, b.buyer; ``` To replace the existing view with its replacement, see [`ALTER MATERIALIZED VIEW`](../alter-materialized-view). See also: - [Replace materialized views guide ](/transform-data/updating-materialized-views/replace-materialized-view/) ## Privileges The privileges required to execute this statement are: - `CREATE` privileges on the containing schema. - `CREATE` privileges on the containing cluster. - `USAGE` privileges on all types used in the materialized view definition. - `USAGE` privileges on the schemas for the types used in the statement. ## Additional information - Materialized views are not monotonic; that is, materialized views cannot be recognized as append-only. ## Related pages - [`ALTER MATERIALIZED VIEW`](../alter-materialized-view) - [`SHOW MATERIALIZED VIEWS`](../show-materialized-views) - [`SHOW CREATE MATERIALIZED VIEW`](../show-create-materialized-view) - [`DROP MATERIALIZED VIEW`](../drop-materialized-view) --- ## CREATE NETWORK POLICY (Cloud) *Available for Materialize Cloud only* `CREATE NETWORK POLICY` creates a network policy that restricts access to a Materialize region using IP-based rules. Network policies are part of Materialize's framework for [access control](/security/cloud/). ## Syntax ```mzsql CREATE NETWORK POLICY ( RULES ( (action='allow', direction='ingress', address=

) [, ...] ) ) ; ``` | Syntax element | Description | | --- | --- | | `` | The name of the network policy to modify. | | `` | The name for the network policy rule. Must be unique within the network policy. | | `

` | The Classless Inter-Domain Routing (CIDR) block to which the rule applies. | ## Details ### Pre-installed network policy When you enable a Materialize region, a default network policy named `default` will be pre-installed. This policy has a wide open ingress rule `allow 0.0.0.0/0`. You can modify or drop this network policy at any time. > **Note:** The default value for the `network_policy` session parameter is `default`. > Before dropping the `default` network policy, a _superuser_ (i.e. `Organization > Admin`) must run [`ALTER SYSTEM SET network_policy`](/sql/alter-system-set) to > change the default value. ## Privileges The privileges required to execute this statement are: - `CREATENETWORKPOLICY` privileges on the system. ## Examples ```mzsql CREATE NETWORK POLICY office_access_policy ( RULES ( new_york (action='allow', direction='ingress',address='1.2.3.4/28'), minnesota (action='allow',direction='ingress',address='2.3.4.5/32') ) ); ``` ```mzsql ALTER SYSTEM SET network_policy = office_access_policy; ``` ## Related pages - [`ALTER NETWORK POLICY`](../alter-network-policy) - [`DROP NETWORK POLICY`](../drop-network-policy) - [`GRANT ROLE`](../grant-role) --- ## CREATE ROLE Use `CREATE ROLE` [^1] to: - Create functional roles (*Both Cloud and Self-Managed*). - Create roles with login/password/superuser privileges (*Self-Managed only*). When you connect to Materialize, you must specify the name of a valid role in the system. [^1]: Materialize does not support the `CREATE USER` command. ## Syntax **Cloud:** ### Cloud The following syntax is used to create a role in Materialize Cloud. ```mzsql CREATE ROLE [[WITH] INHERIT]; ``` | Syntax element | Description | | --- | --- | | `INHERIT` | *Optional.* If specified, grants the role the ability to inherit privileges of other roles. *Default.* | **Note:** - Materialize Cloud does not support the `NOINHERIT` option for `CREATE ROLE`. - Materialize Cloud does not support the `LOGIN` and `SUPERUSER` attributes for `CREATE ROLE`. See [Organization roles](/security/cloud/users-service-accounts/#organization-roles) instead. - Materialize Cloud does not use role attributes to determine a role's ability to create top level objects such as databases and other roles. Instead, Materialize uses system level privileges. See [GRANT PRIVILEGE](../grant-privilege) for more details. **Self-Managed:** ### Self-Managed The following syntax is used to create a role in Materialize Self-Managed. ```mzsql CREATE ROLE [WITH] [ SUPERUSER | NOSUPERUSER ], [ LOGIN | NOLOGIN ] [ INHERIT ] [ PASSWORD ] ; ``` | Syntax element | Description | | --- | --- | | `INHERIT` | *Optional.* If specified, grants the role the ability to inherit privileges of other roles. *Default.* | | `LOGIN` | *Optional.* If specified, allows a role to login via the PostgreSQL or web endpoints | | `NOLOGIN` | *Optional.* If specified, prevents a role from logging in. This is the default behavior if `LOGIN` is not specified. | | `SUPERUSER` | *Optional.* If specified, grants the role superuser privileges. | | `NOSUPERUSER` | *Optional.* If specified, prevents the role from having superuser privileges. This is the default behavior if `SUPERUSER` is not specified. | | `PASSWORD` | ***Public Preview*** *Optional.* This feature may have minor stability issues. If specified, allows you to set a password for the role. | **Note:** - Self-Managed Materialize does not support the `NOINHERIT` option for `CREATE ROLE`. - With the exception of the `SUPERUSER` attribute, Self-Managed Materialize does not use role attributes to determine a role's ability to create top level objects such as databases and other roles. Instead, Self-Managed Materialize uses system level privileges. See [GRANT PRIVILEGE](../grant-privilege) for more details. ## Restrictions You may not specify redundant or conflicting sets of options. For example, Materialize will reject the statement `CREATE ROLE ... INHERIT INHERIT`. ## Privileges The privileges required to execute this statement are: - `CREATEROLE` privileges on the system. ## Examples ### Create a functional role In Materialize Cloud and Self-Managed, you can create a functional role: ```mzsql CREATE ROLE db_reader; ``` ### Create a role with login and password (Self-Managed) ```mzsql CREATE ROLE db_reader WITH LOGIN PASSWORD 'password'; ``` You can verify that the role was created by querying the `mz_roles` system catalog: ```mzsql SELECT name FROM mz_roles; ``` ```nofmt db_reader mz_system mz_support ``` ### Create a superuser role (Self-Managed) Unlike regular roles, superusers have unrestricted access to all objects in the system and can perform any action on them. ```mzsql CREATE ROLE super_user WITH SUPERUSER LOGIN PASSWORD 'password'; ``` You can verify that the superuser role was created by querying the `mz_roles` system catalog: ```mzsql SELECT name FROM mz_roles; ``` ```nofmt db_reader mz_system mz_support super_user ``` You can also verify that the role has superuser privileges by checking the `pg_authid` system catalog: ```mzsql SELECT rolsuper FROM pg_authid WHERE rolname = 'super_user'; ``` ```nofmt true ``` ## Related pages - [`ALTER ROLE`](../alter-role) - [`DROP ROLE`](../drop-role) - [`DROP USER`](../drop-user) - [`GRANT ROLE`](../grant-role) - [`REVOKE ROLE`](../revoke-role) - [`ALTER OWNER`](/sql/#rbac) - [`GRANT PRIVILEGE`](../grant-privilege) - [`REVOKE PRIVILEGE`](../revoke-privilege) --- ## CREATE SCHEMA `CREATE SCHEMA` creates a new schema. ## Syntax ```mzsql CREATE SCHEMA [IF NOT EXISTS] ; ``` | Syntax element | Description | | --- | --- | | `IF NOT EXISTS` | If specified, do not generate an error if a schema of the same name already exists. If not specified, throw an error if a schema of the same name already exists. | | `` | A name for the schema. You can specify the database for the schema with a preceding `database_name.schema_name`, e.g. `my_db.my_schema`, otherwise the schema is created in the current database. | ## Details By default, each database has a schema called `public`. For more information, see [Namespaces](../namespaces). ## Examples ```mzsql CREATE SCHEMA my_db.my_schema; ``` ```mzsql SHOW SCHEMAS FROM my_db; ``` ```nofmt public my_schema ``` ## Privileges The privileges required to execute this statement are: - `CREATE` privileges on the containing database. ## Related pages - [`DROP DATABASE`](../drop-database) - [`SHOW DATABASES`](../show-databases) --- ## CREATE SECRET A secret securely stores sensitive credentials (like passwords and SSL keys) in Materialize's secret management system. Optionally, a secret can also be used to store credentials that are generally not sensitive (like usernames and SSL certificates), so that all your credentials are managed uniformly. ## Syntax ```mzsql CREATE SECRET [IF NOT EXISTS] AS ; ``` | Syntax element | Description | | --- | --- | | `IF NOT EXISTS` | If specified, do not generate an error if a secret of the same name already exists. | | `` | The identifier for the secret. | | `` | The value for the secret. The value expression may not reference any relations, and must be implicitly castable to `bytea`. | ## Examples ```mzsql CREATE SECRET kafka_ca_cert AS decode('c2VjcmV0Cg==', 'base64'); ``` ## Privileges The privileges required to execute this statement are: - `CREATE` privileges on the containing schema. ## Related pages - [`CREATE CONNECTION`](../create-connection) - [`ALTER SECRET`](../alter-secret) - [`DROP SECRET`](../drop-secret) - [`SHOW SECRETS`](../show-secrets) --- ## CREATE SINK A [sink](/concepts/sinks/) describes an external system you want Materialize to write data to, and provides details about how to encode that data. You can define a sink over a materialized view, source, or table. ## Syntax summary **Kafka/Redpanda:** **Format Avro:** ```mzsql CREATE SINK [IF NOT EXISTS] [IN CLUSTER ] FROM INTO KAFKA CONNECTION ( TOPIC '' [, COMPRESSION TYPE ] [, TRANSACTIONAL ID PREFIX ''] [, PARTITION BY = ] [, PROGRESS GROUP ID PREFIX ''] [, TOPIC REPLICATION FACTOR ] [, TOPIC PARTITION COUNT ] [, TOPIC CONFIG ] ) [KEY ( [, ...] ) [NOT ENFORCED]] [HEADERS ] FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION [ ( [AVRO KEY FULLNAME ''] [, AVRO VALUE FULLNAME ''] [, NULL DEFAULTS ] [, DOC ON [, ...]] [, KEY COMPATIBILITY LEVEL ''] [, VALUE COMPATIBILITY LEVEL ''] ) ] [ENVELOPE DEBEZIUM | UPSERT] [WITH (SNAPSHOT = )] ``` **Format JSON:** ```mzsql CREATE SINK [IF NOT EXISTS] [IN CLUSTER ] FROM INTO KAFKA CONNECTION ( TOPIC '' [, COMPRESSION TYPE ] [, TRANSACTIONAL ID PREFIX ''] [, PARTITION BY = ] [, PROGRESS GROUP ID PREFIX ''] [, TOPIC REPLICATION FACTOR ] [, TOPIC PARTITION COUNT ] [, TOPIC CONFIG ] ) [KEY ( [, ...] ) [NOT ENFORCED]] [HEADERS ] FORMAT JSON [ENVELOPE DEBEZIUM | UPSERT] [WITH (SNAPSHOT = )] ``` **Format TEXT/BYTES:** ```mzsql CREATE SINK [IF NOT EXISTS] [IN CLUSTER ] FROM INTO KAFKA CONNECTION ( TOPIC '' [, COMPRESSION TYPE ] [, TRANSACTIONAL ID PREFIX ''] [, PARTITION BY = ] [, PROGRESS GROUP ID PREFIX ''] [, TOPIC REPLICATION FACTOR ] [, TOPIC PARTITION COUNT ] [, TOPIC CONFIG ] ) FORMAT TEXT | BYTES [ENVELOPE DEBEZIUM | UPSERT] [WITH (SNAPSHOT = )] ``` **KEY FORMAT VALUE FORMAT:** By default, the message key is encoded using the same format as the message value. However, you can set the key and value encodings explicitly using the `KEY FORMAT ... VALUE FORMAT`. ```mzsql CREATE SINK [IF NOT EXISTS] [IN CLUSTER ] FROM INTO KAFKA CONNECTION ( TOPIC '' [, COMPRESSION TYPE ] [, TRANSACTIONAL ID PREFIX ''] [, PARTITION BY = ] [, PROGRESS GROUP ID PREFIX ''] [, TOPIC REPLICATION FACTOR ] [, TOPIC PARTITION COUNT ] [, TOPIC CONFIG ] ) [KEY ( [, ...] ) [NOT ENFORCED]] [HEADERS ] KEY FORMAT VALUE FORMAT -- and can be: -- AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION [ -- ( -- [AVRO KEY FULLNAME ''] -- [, AVRO VALUE FULLNAME ''] -- [, NULL DEFAULTS ] -- [, DOC ON [, ...]] -- [, KEY COMPATIBILITY LEVEL ''] -- [, VALUE COMPATIBILITY LEVEL ''] -- ) -- ] -- | JSON | TEXT | BYTES [ENVELOPE DEBEZIUM | UPSERT] [WITH (SNAPSHOT = )] ``` For details, see [CREATE Sink: Kafka/Redpanda](/sql/create-sink/kafka/). **Iceberg:** > **Public Preview:** This feature is in public preview. ```mzsql CREATE SINK [IF NOT EXISTS] [IN CLUSTER ] FROM INTO ICEBERG CATALOG CONNECTION ( NAMESPACE = '', TABLE = '' ) USING AWS CONNECTION KEY ( [, ...] ) [NOT ENFORCED] MODE UPSERT WITH (COMMIT INTERVAL = '') ``` For details, see [CREATE Sink: Iceberg](/sql/create-sink/iceberg/). ## Best practices ### Sizing a sink Some sinks require relatively few resources to handle data ingestion, while others are high traffic and require hefty resource allocations. The cluster in which you place a sink determines the amount of CPU and memory available to the sink. Sinks share the resource allocation of their cluster with all other objects in the cluster. Colocating multiple sinks onto the same cluster can be more resource efficient when you have many low-traffic sinks that occasionally need some burst capacity. ## Details A sink cannot be created directly on a catalog object. As a workaround you can create a materialized view on a catalog object and create a sink on the materialized view. [//]: # "TODO(morsapaes) Add best practices for sizing sinks." ## Privileges The privileges required to execute this statement are: - `CREATE` privileges on the containing schema. - `SELECT` privileges on the item being written out to an external system. - NOTE: if the item is a materialized view, then the view owner must also have the necessary privileges to execute the view definition. - `CREATE` privileges on the containing cluster if the sink is created in an existing cluster. - `CREATECLUSTER` privileges on the system if the sink is not created in an existing cluster. - `USAGE` privileges on all connections and secrets used in the sink definition. - `USAGE` privileges on the schemas that all connections and secrets in the statement are contained in. ## Related pages - [Sinks](/concepts/sinks/) - [`SHOW SINKS`](/sql/show-sinks/) - [`SHOW COLUMNS`](/sql/show-columns/) - [`SHOW CREATE SINK`](/sql/show-create-sink/) --- ## CREATE SOURCE A [source](/concepts/sources/) describes an external system you want Materialize to read data from, and provides details about how to decode and interpret that data. ## Syntax summary **PostgreSQL (New):** To create a source from an external PostgreSQL: ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM POSTGRES CONNECTION (PUBLICATION '') ; ``` For details, see [CREATE SOURCE: PostgreSQL (New Syntax)](/sql/create-source/postgres-v2/). **PostgreSQL (Legacy):** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM POSTGRES CONNECTION ( PUBLICATION '' [, TEXT COLUMNS ( [, ...] ) ] [, EXCLUDE COLUMNS ( [, ...] ) ] ) [, ...] ) | FOR TABLES ( [AS ] [, ...] )> [EXPOSE PROGRESS AS ] [WITH (RETAIN HISTORY FOR )] ``` For details, see [CREATE SOURCE: PostgreSQL (Legacy)](/sql/create-source/postgres/). **MySQL:** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM MYSQL CONNECTION [ ( [TEXT COLUMNS ( [, ...] ) ] [, EXCLUDE COLUMNS ( [, ...] ) ] ) ] [, ...] ) | FOR TABLES ( [AS ] [, ...] )> [EXPOSE PROGRESS AS ] [WITH (RETAIN HISTORY FOR )] ``` For details, see [CREATE SOURCE: MySQL](/sql/create-source/mysql/). **SQL Server (New):** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM SQL SERVER CONNECTION ``` For details, see [CREATE SOURCE: SQL Server (New Syntax)](/sql/create-source/sql-server-v2/). **SQL Server (Legacy):** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM SQL SERVER CONNECTION [ ( EXCLUDE COLUMNS ( [, ...]) ) ] [ ( TEXT COLUMNS ( [, ...]) ) ] [AS ] [, ...] )> [WITH (RETAIN HISTORY FOR )] ``` For details, see [CREATE SOURCE: SQL Server(Legacy)](/sql/create-source/sql-server/). **Kafka/Redpanda:** **Format Avro:** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM KAFKA CONNECTION ( TOPIC '' [, GROUP ID PREFIX ''] [, START OFFSET ( [, ...] ) ] [, START TIMESTAMP ] ) FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION [KEY STRATEGY ] [VALUE STRATEGY ] [INCLUDE KEY [AS ] | PARTITION [AS ] | OFFSET [AS ] | TIMESTAMP [AS ] | HEADERS [AS ] | HEADER '' AS [BYTES] [, ...] ] [ENVELOPE NONE | DEBEZIUM | UPSERT [ ( VALUE DECODING ERRORS = INLINE [AS ] ) ] ] [EXPOSE PROGRESS AS ] [WITH (RETAIN HISTORY FOR )] ``` **Format JSON:** ```mzsql CREATE SOURCE [IF NOT EXISTS] [IN CLUSTER ] FROM KAFKA CONNECTION ( TOPIC '' [, GROUP ID PREFIX ''] [, START OFFSET (

Column	Type	Represents
`mz_timestamp`	`numeric`	Materialize's internal logical timestamp. This will never be less than any timestamp previously emitted by the same `SUBSCRIBE` operation.
`mz_progressed`	`boolean`	This column is only present if the `PROGRESS` option is specified. If `true`, indicates that the `SUBSCRIBE` will not emit additional records at times strictly less than `mz_timestamp`. See `PROGRESS` below.
`mz_diff`	`bigint`	The change in frequency of the row. A positive number indicates that `mz_diff` copies of the row were inserted, while a negative number indicates that `\|mz_diff\|` copies of the row were deleted.
Column 1	Varies	The columns from the subscribed relation, each as its own column, representing the data that was inserted into or deleted from the relation.
...
Column N	Varies

### `AS OF` When a [history rentention period](/transform-data/patterns/durable-subscriptions/#history-retention-period) is configured for the object(s) powering the subscription, the `AS OF` clause allows specifying a timestamp at which the `SUBSCRIBE` command should begin returning results. If `AS OF` is specified, no rows whose timestamp is earlier than the specified timestamp will be returned. If the timestamp specified is earlier than the earliest historical state retained by the underlying objects, an error is thrown. To configure the history retention period for objects used in a subscription, see [Durable subscriptions](/transform-data/patterns/durable-subscriptions/#history-retention-period). If `AS OF` is unspecified, the system automatically chooses an `AS OF` timestamp. The value in the `AS OF` clause is automatically [cast to `mz_timestamp`](../../sql/types/mz_timestamp/#valid-casts) with an assignment or implicit cast. ### `UP TO` The `UP TO` clause allows specifying a timestamp at which the `SUBSCRIBE` will cease running. If `UP TO` is specified, no rows whose timestamp is greater than or equal to the specified timestamp will be returned. The value in the `UP TO` clause is automatically [cast to `mz_timestamp`](../../sql/types/mz_timestamp/#valid-casts) with an assignment or implicit cast. ### Interaction of `AS OF` and `UP TO` The lower timestamp bound specified by `AS OF` is inclusive, whereas the upper bound specified by `UP TO` is exclusive. Thus, a `SUBSCRIBE` query whose `AS OF` is equal to its `UP TO` will terminate after returning zero rows. A `SUBSCRIBE` whose `UP TO` is less than its `AS OF` timestamp (whether that timestamp was specified in an `AS OF` clause or chosen by the system) will signal an error. ### Duration `SUBSCRIBE` will continue to run until canceled, the session ends, the `UP TO` timestamp is reached, or all updates have been presented. The latter case typically occurs when tailing constant views (e.g. `CREATE VIEW v AS SELECT 1`). > **Warning:** Many PostgreSQL drivers wait for a query to complete before returning its > results. Since `SUBSCRIBE` can run forever, naively executing a `SUBSCRIBE` using your > driver's standard query API may never return. > Either use an API in your driver that does not buffer rows or use the > [`FETCH`](/sql/fetch) statement or `AS OF` and `UP TO` bounds > to fetch rows from `SUBSCRIBE` in batches. > See the [examples](#examples) for details. ### `SNAPSHOT` By default, `SUBSCRIBE` begins by emitting a snapshot of the subscribed relation, which consists of a series of updates at its [`AS OF`](#as-of) timestamp describing the contents of the relation. After the snapshot, `SUBSCRIBE` emits further updates as they occur. For updates in the snapshot, the `mz_timestamp` field will be fast-forwarded to the `AS OF` timestamp. For example, an insert that occurred before the `SUBSCRIBE` began would appear in the snapshot. To see only updates after the initial timestamp, specify `WITH (SNAPSHOT = false)`. > **Note:** While `WITH (SNAPSHOT = false)` guarantees that the snapshot will not be sent to > the client, Materialize may still need to fetch and process the snapshot data to > compute the correct result. > For example, consider: > ```mzsql > SUBSCRIBE TO SELECT SUM(column) FROM table WITH (SNAPSHOT = false) > ``` > The latest update for the query depends on _all_ rows in `table`, not just the > rows that have changed recently. > However, when subscribing directly to a collection; e.g., > ```mzsql > SUBSCRIBE TO