Guide: Handle upstream schema changes with zero downtime
To enable this feature in your Materialize region, contact our team.
Materialize allows you to handle certain types of upstream table schema changes seamlessly, specifically:
- Adding a column in the upstream database.
- Dropping a column in the upstream database.
This guide walks you through how to handle these changes without any downtime in Materialize.
Prerequisites
Some familiarity with Materialize. If you’ve never used Materialize before, start with our guide to getting started to learn how to connect a database to Materialize.
Set up a SQL Server database
For this guide, setup a SQL Server 2016+ database. In your SQL Server, create a
table T and populate:
CREATE TABLE T (
A INT
);
INSERT INTO T (A) VALUES
(10);
GO -- The GO terminator may be unsupported or unnecessary for your client.
Configure your SQL Server Database
Configure your SQL Server database using the configuration instructions for self hosted SQL Server.
Connect your source database to Materialize
Create a connection to your SQL Server database using the CREATE CONNECTION syntax.
Create a source using the new syntax
In Materialize, create a source using the CREATE SOURCE
syntax.
CREATE SOURCE my_source
FROM SQL SERVER CONNECTION sqlserver_connection;
Create a table from the source
To start ingesting specific tables from your source database, you can create a table in Materialize. We’ll add it into the v1 schema in Materialize.
CREATE SCHEMA v1;
CREATE TABLE v1.T
FROM SOURCE my_source(REFERENCE dbo.T);
Once you’ve created a table from source, the initial
snapshot of table v1.T will begin.
Create a view on top of the table.
For this guide, add a materialized view matview (also in schema v1) that
sums column A from table T.
CREATE MATERIALIZED VIEW v1.matview AS
SELECT SUM(A) from v1.T;
Handle upstream column addition
A. Add a column in your upstream SQL Server database
In your upstream SQL Server database, add a new column B to the table T:
ALTER TABLE T
ADD B BIT NULL;
INSERT INTO T (A, B) VALUES
(20, 0);
This operation will have no immediate effect in Materialize. In Materialize,
v1.T will continue to ingest only column A. The materialized view
v1.matview will continue to have access to column A as well.
B. Incorporate the new column in Materialize
To incorporate the new column into Materialize, create a new v2 schema and
recreate the table in the new schema:
CREATE SCHEMA v2;
CREATE TABLE v2.T
FROM SOURCE my_source(REFERENCE dbo.T);
The snapshotting of table v2.T will begin.
v2.T will include columns A and B.
When the new v2.T table has finished snapshotting, create a new materialized
view matview in the new schema. Since the new v2.matview is referencing the
new v2.T, it can reference column B:
CREATE MATERIALIZED VIEW v2.matview AS
SELECT SUM(A)
FROM v2.T
WHERE B = true;
Handle upstream column drop
A. Exclude the column in Materialize
To drop a column safely, in Materialize, first, create a new v3 schema, and
recreate table T in the new schema but exclude the column to drop. In this
example, we’ll drop the column B.
CREATE SCHEMA v3;
CREATE TABLE v3.T
FROM SOURCE my_source(REFERENCE dbo.T) WITH (EXCLUDE COLUMNS (B));
B. Drop a column in your upstream SQL Server database
In your upstream SQL Server database, drop the column B from the table T:
ALTER TABLE T DROP COLUMN B;
Dropping the column B will have no effect on v3.T. However, the drop affects
v2.T and v2.matview from our earlier examples. When the user attempts to
read from either, Materialize will report an error that the source table schema
has been altered.
Optional: Swap schemas
When you’re ready to fully cut over to the new source version, you can optionally swap the schemas and drop the old objects.
ALTER SCHEMA v1 SWAP WITH v3;
DROP SCHEMA v3 CASCADE;