MySQL

Change Data Capture (CDC)

Materialize supports MySQL as a real-time data source. The MySQL source uses MySQL’s binlog replication protocol to continually ingest changes resulting from CRUD operations in the upstream database. The native support for MySQL Change Data Capture (CDC) in Materialize gives you the following benefits:

  • No additional infrastructure: Ingest MySQL change data into Materialize in real-time with no architectural changes or additional operational overhead. In particular, you do not need to deploy Kafka and Debezium for MySQL CDC.

  • Transactional consistency: The MySQL source ensures that transactions in the upstream MySQL database are respected downstream. Materialize will never show partial results based on partially replicated transactions.

  • Incrementally updated materialized views: Materialized views are not supported in MySQL, so you can use Materialize as a read-replica to build views on top of your MySQL data that are efficiently maintained and always up-to-date.

Supported versions and services

NOTE: MySQL-compatible database systems are not guaranteed to work with the MySQL source out-of-the-box. MariaDB, Vitess and PlanetScale are currently not supported.

The MySQL source requires MySQL 5.7+ and is compatible with most common MySQL hosted services.

Integration guides

If there is a hosted service or MySQL distribution that is not listed above but you would like to use with Materialize, please submit a feature request or reach out in the Materialize Community Slack.

Considerations

Schema changes

NOTE: Work to more smoothly support ddl changes to upstream tables is currently in progress. The work introduces the ability to re-ingest the same upstream table under a new schema and switch over without downtime.

Materialize supports schema changes in the upstream database as follows:

Compatible schema changes

  • Adding columns to tables. Materialize will not ingest new columns added upstream unless you use DROP SOURCE to first drop the affected subsource, and then add the table back to the source using ALTER SOURCE...ADD SUBSOURCE.

  • Dropping columns that were added after the source was created. These columns are never ingested, so you can drop them without issue.

  • Adding or removing NOT NULL constraints to tables that were nullable when the source was created.

Incompatible schema changes

All other schema changes to upstream tables will set the corresponding subsource into an error state, which prevents you from reading from the subsource.

To handle incompatible schema changes, use DROP SOURCE to first drop the affected subsource, and then ALTER SOURCE...ADD SUBSOURCE to add the subsource back to the source. When you add the subsource, it will have the updated schema from the corresponding upstream table.

Supported types

Materialize natively supports the following MySQL types:

  • bigint
  • binary
  • bit
  • blob
  • boolean
  • char
  • date
  • datetime
  • decimal
  • double
  • float
  • int
  • json
  • longblob
  • longtext
  • mediumblob
  • mediumint
  • mediumtext
  • numeric
  • real
  • smallint
  • text
  • time
  • timestamp
  • tinyblob
  • tinyint
  • tinytext
  • varbinary
  • varchar

When replicating tables that contain the unsupported data types, you can:

  • Use TEXT COLUMNS option for the following unsupported MySQL types:

    • enum
    • year

    The specified columns will be treated as text and will not offer the expected MySQL type features.

  • Use the EXCLUDE COLUMNS option to exclude any columns that contain unsupported data types.

Truncation

Avoid truncating upstream tables that are being replicated into Materialize. If a replicated upstream table is truncated, the corresponding subsource in Materialize becomes inaccessible and will not produce any data until it is recreated.

Instead of truncating, use an unqualified DELETE to remove all rows from the upstream table:

DELETE FROM t;

Modifying an existing source

When you add a new subsource to an existing source (ALTER SOURCE ... ADD SUBSOURCE ...), Materialize starts the snapshotting process for the new subsource. During this snapshotting, the data ingestion for the existing subsources for the same source is temporarily blocked. As such, if possible, you can resize the cluster to speed up the snapshotting process and once the process finishes, resize the cluster for steady-state.
Back to top ↑