Skip to main content

Command Palette

Search for a command to run...

Change Data Capture & Change Tracking with Azure SQL + Azure Functions: The Hard-Earned Do's, Don'ts, and Workarounds

It looked simple on the docs. It never is.

Updated
12 min read
Change Data Capture & Change Tracking with Azure SQL + Azure Functions: The Hard-Earned Do's, Don'ts, and Workarounds
J
Developer with a passion for well-crafted software and a habit of going deep on whatever I'm building. Interested in cloud, AI, and the intersection of tech and real-world impact.

If you've ever tried to wire up Change Data Capture (CDC) or Change Tracking (CT) on Azure SQL with Azure Functions, you know the exact moment I'm talking about — that deceptively calm afternoon when you think, "This should take a few hours," and then three days later you're questioning every life decision that led you here.

I've been there. More than once. This post is the one I wish existed before I started.


A Bit of Context

I was building an event-driven sync pipeline for a school management system — enrollment data needed to flow downstream to a reporting service and a third-party DepEd integration. The source of truth was Azure SQL. The obvious question: how do we detect what changed?

Two options surfaced immediately: Change Tracking (lightweight, row-level) and Change Data Capture (heavier, log-based, full before/after). I went CDC first because I needed the full delta. I paid for it.

Here's what I learned — organized into the patterns that will save (or cost) you hours.


First: Know Which One You Actually Need

Before anything else, get this decision right.

Change Tracking (CT) Change Data Capture (CDC)
Tracks what changed ✅ Row-level (INSERT/UPDATE/DELETE) ✅ Row-level with column data
Tracks old values ❌ No ✅ Yes (before and after)
Overhead Very low Moderate (reads transaction log)
Retention Manual (version-based) Configurable (days)
Available on Azure SQL ✅ Yes ✅ Yes (but with caveats — keep reading)
Best for Sync/polling, lightweight audit Full audit trail, ETL, replication

The honest rule of thumb: if you don't need what the value was before the change, use Change Tracking. CDC is powerful but it adds real complexity, especially in managed environments like Azure SQL.


The Do's ✅

1. Always Enable CDC/CT at Both the Database AND Table Level

This sounds obvious, but the order matters and missing one silently fails.

-- Enable CDC on the database
EXEC sys.sp_cdc_enable_db;

-- Enable CDC on a specific table
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name   = N'Enrollment',
    @role_name     = NULL,
    @supports_net_changes = 1;

For Change Tracking:

-- Database level
ALTER DATABASE [SchoolMIS]
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 3 DAYS, AUTO_CLEANUP = ON);

-- Table level
ALTER TABLE dbo.Enrollment
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON);

Miss the table-level step and your Azure Function will query forever and return nothing. No error. Just silence.

2. Use CHANGETABLE with Proper Version Anchoring (Change Tracking)

The version-based API is the right way to poll from Azure Functions.

DECLARE @lastVersion BIGINT = @storedVersion; -- retrieved from durable state or storage

SELECT
    CT.SYS_CHANGE_VERSION,
    CT.SYS_CHANGE_OPERATION,
    E.*
FROM CHANGETABLE(CHANGES dbo.Enrollment, @lastVersion) AS CT
LEFT JOIN dbo.Enrollment E ON E.EnrollmentId = CT.EnrollmentId
ORDER BY CT.SYS_CHANGE_VERSION;

Always persist the SYS_CHANGE_VERSION after processing. I stored mine in Azure Table Storage — cheap, fast, and survives function restarts.

3. Store Your Sync Checkpoint Outside the Database

Don't store the last-processed version or LSN in the same database you're tracking. If the DB has an issue, you lose your checkpoint. Use:

  • Azure Table Storage (my go-to — dead simple, no overhead)

  • Azure Blob Storage (fine for JSON checkpoints)

  • Azure Cosmos DB (if you're already using it for other state)

// Simple checkpoint read/write pattern using Azure Table Storage
public async Task<long> GetLastSyncVersionAsync(string tableName)
{
    var entity = await _tableClient.GetEntityAsync<SyncCheckpoint>("cdc", tableName);
    return entity?.Value?.LastVersion ?? 0;
}

public async Task SaveLastSyncVersionAsync(string tableName, long version)
{
    var entity = new SyncCheckpoint
    {
        PartitionKey = "cdc",
        RowKey = tableName,
        LastVersion = version
    };
    await _tableClient.UpsertEntityAsync(entity);
}

4. Validate CHANGE_TRACKING_MIN_VALID_VERSION Before Every Query

This is the one that bites people. If your cleanup window is 3 days and your Function hasn't run in 4 days (maybe due to an outage or a forgotten timer), the minimum valid version will be higher than your checkpoint. You'll miss changes silently or get an error.

DECLARE @minVersion BIGINT = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.Enrollment'));
DECLARE @myVersion BIGINT = @storedVersion;

IF @myVersion < @minVersion
BEGIN
    -- Do a full resync, don't try to diff
    RAISERROR('Checkpoint version is behind minimum valid version. Full resync required.', 16, 1);
END

I made this check mandatory at the top of every sync function. It saved me from corrupted diffs more than once.

5. Use a Timer Trigger with Idempotent Processing

A TimerTrigger is the right trigger type for polling CDC/CT. Don't try to use a queue or HTTP trigger for this unless you have a specific reason.

[Function("EnrollmentSyncFunction")]
public async Task Run(
    [TimerTrigger("0 */2 * * * *")] TimerInfo timerInfo, // every 2 minutes
    FunctionContext context)
{
    var logger = context.GetLogger<EnrollmentSyncFunction>();

    var lastVersion = await _checkpointService.GetLastSyncVersionAsync("Enrollment");
    var changes = await _changeTrackingRepo.GetChangesAsync(lastVersion);

    if (!changes.Any())
    {
        logger.LogInformation("No changes detected.");
        return;
    }

    foreach (var change in changes)
    {
        await _eventPublisher.PublishAsync(change); // idempotent downstream
    }

    var maxVersion = changes.Max(c => c.SysChangeVersion);
    await _checkpointService.SaveLastSyncVersionAsync("Enrollment", maxVersion);
}

Key principle: make your event handlers idempotent. If the Function crashes after publishing but before saving the checkpoint, it will reprocess. That's fine — if your consumers can handle duplicates.


The Don'ts ❌

1. Don't Assume CDC Is Always Available Out of the Box on Azure SQL

Here's the first wall I hit. CDC on Azure SQL Database (the fully managed PaaS offering) works differently than on SQL Server or Azure SQL Managed Instance.

On Azure SQL Database:

  • CDC is available, but SQL Agent is not. CDC relies on SQL Agent jobs for log scanning and cleanup.

  • Microsoft auto-manages the CDC scan and cleanup processes via background tasks — but this means you don't control the schedule.

  • The cdc.fn_cdc_get_all_changes_* and cdc.fn_cdc_get_net_changes_* functions still work fine for querying.

Don't write docs-first code assuming SQL Agent exists. If you try to manually create CDC jobs, they'll fail silently or error out in Azure SQL Database.

2. Don't Poll with SELECT * Against CDC Tables Directly

Querying cdc.dbo_Enrollment_CT directly is tempting and it works — until you're processing millions of rows.

Always use the CDC query functions with LSN boundaries:

DECLARE @from_lsn BINARY(10) = sys.fn_cdc_get_min_lsn('dbo_Enrollment');
DECLARE @to_lsn   BINARY(10) = sys.fn_cdc_get_max_lsn();

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Enrollment(
    @from_lsn, @to_lsn, N'all'
);

And always advance your @from_lsn checkpoint after each successful run.

3. Don't Run Long-Polling Loops Inside a Single Function Invocation

Azure Functions have execution time limits (default 5 minutes for Consumption plan, configurable up to 10 minutes for some plans). I once wrote a naive loop that kept checking for changes in a while loop inside one invocation.

This is a bad idea for multiple reasons:

  • Timeout risk

  • No backpressure — you'll overwhelm downstream services

  • Impossible to debug when things go wrong

Use Durable Functions if you need orchestration, or just let the timer trigger fire repeatedly and handle one batch per invocation.

4. Don't Ignore NULL Rows on LEFT JOIN with CT

When a row is deleted, the main table no longer has that record. Your LEFT JOIN will return NULL for all columns except the change metadata. This is expected — but ignoring it will cause NullReferenceExceptions in your C# mapping layer.

// Wrong — will crash on deleted rows
var mapped = changes.Select(c => new EnrollmentChangedEvent
{
    StudentName = c.StudentName, // NULL on DELETE
    ...
});

// Right — check the operation type first
var mapped = changes.Select(c => c.Operation == "D"
    ? EnrollmentChangedEvent.Deleted(c.EnrollmentId)
    : EnrollmentChangedEvent.FromRow(c));

5. Don't Use CDC/CT as a Substitute for Proper Domain Events

This is an architectural one. CDC and CT are great for infrastructure-level change detection, but don't mistake them for a replacement for domain events. If your business logic requires knowing why something changed — e.g., "enrollment was cancelled by admin" vs. "enrollment was auto-expired" — you won't get that from CDC. You'll get a generic UPDATE.

Use CDC for integration and sync. Use domain events (via MediatR, Dapr, or Service Bus) for business processes.


Workarounds for Real-World Pain Points 🔧

Workaround 1: CDC Capture Instance Limit

Azure SQL has a limit of 2 capture instances per table. If you try to enable CDC on the same table twice (e.g., you changed the column list and re-enabled), you'll hit this limit.

-- Check existing instances
SELECT * FROM cdc.change_tables;

-- Drop an old instance before re-enabling
EXEC sys.sp_cdc_disable_table
    @source_schema = N'dbo',
    @source_name   = N'Enrollment',
    @capture_instance = N'dbo_Enrollment_v1';

Make it a habit to clean up old instances after schema changes.

Workaround 2: Handling the "Version Gap" After Azure SQL Maintenance

Azure SQL performs periodic maintenance (failovers, patches) and this can occasionally cause gaps in the CT version sequence. Your code shouldn't assume version numbers are perfectly sequential.

-- Don't assume version + 1 is valid
-- Always use CHANGETABLE and let SQL Server handle the version traversal

And always validate against CHANGE_TRACKING_MIN_VALID_VERSION as mentioned above.

Workaround 3: CDC Latency in Azure SQL Database

Because you don't control the CDC scanner schedule in Azure SQL Database, there can be latency between a DML operation and it appearing in the CDC change tables. In my testing, this was typically under 30 seconds, but it's not guaranteed.

Workaround: Add a small delay buffer or poll based on wall clock time, not immediately after writes.

// Don't poll for changes in the last 30 seconds to account for CDC latency
var toTime = DateTime.UtcNow.AddSeconds(-30);
var toLsn = await _repo.GetLsnForTimeAsync(toTime);

Workaround 4: Schema Changes Break CDC

CDC is notoriously brittle around schema changes. Adding or dropping a column on a CDC-enabled table will break the capture instance.

The pattern that worked for me:

  1. Disable CDC on the table

  2. Make the schema change

  3. Re-enable CDC with the updated column list

  4. Trigger a full resync for the affected table (wipe your checkpoint)

-- Step 1: Disable
EXEC sys.sp_cdc_disable_table
    @source_schema = N'dbo',
    @source_name   = N'Enrollment',
    @capture_instance = N'all';

-- Step 2: Your ALTER TABLE goes here

-- Step 3: Re-enable
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name   = N'Enrollment',
    @role_name     = NULL,
    @supports_net_changes = 1;

Don't try to live-alter a CDC-enabled table without this dance. I've seen it leave capture tables in an inconsistent state that required a full DB-level CDC disable/re-enable to fix.

Workaround 5: Avoid Cold Start Delays with Pre-warmed Functions

If your timer runs every 2 minutes on a Consumption plan, cold starts can eat into your execution window. For CDC polling specifically, consider Premium plan or keeping the Function warm — a delayed poll means a bigger batch next run, which can compound.

Alternatively, use ROWS_PER_BATCH limits in your query:

SELECT TOP 500
    CT.SYS_CHANGE_VERSION,
    CT.SYS_CHANGE_OPERATION,
    E.*
FROM CHANGETABLE(CHANGES dbo.Enrollment, @lastVersion) AS CT
LEFT JOIN dbo.Enrollment E ON E.EnrollmentId = CT.EnrollmentId
ORDER BY CT.SYS_CHANGE_VERSION;

Process in bounded batches. Update the checkpoint to the max version in the batch. Let the next invocation pick up the rest.


The Architecture That Actually Worked

After all the trial and error, here's the pattern I settled on for the school management system:

[Azure SQL - Source Tables]
        │
        │ Change Tracking (CT)
        ▼
[Azure Function - Timer Trigger (every 2 min)]
        │
        ├── Read CT changes with version anchor
        ├── Validate min valid version
        ├── Map to domain events
        ├── Publish to Azure Service Bus (topic per entity)
        │
        ▼
[Checkpoint: Azure Table Storage]

[Downstream consumers: Reporting API, DepEd Integration, Notification Service]

Why CT over CDC in the end? The downstream systems only needed to know that a record changed and what the current state is — not what it was before. CT + a fresh SELECT on change was cleaner, had less overhead, and avoided all the CDC schema-change fragility.

CDC lives in the integration layer now, specifically for the audit trail module where I need full before/after data for compliance.


Quick Reference: Do's and Don'ts Cheat Sheet

Category ✅ Do ❌ Don't
Setup Enable at DB and table level Assume table-level is auto-enabled
Querying CT Use CHANGETABLE with version anchors Query internal CT tables directly
Querying CDC Use fn_cdc_get_all_changes_* with LSN bounds SELECT * from change tables directly
Checkpoint Store version/LSN in external storage Store checkpoint in the same SQL DB
Version gaps Validate MIN_VALID_VERSION before every poll Trust that your checkpoint is always valid
Schema changes Disable CDC → change schema → re-enable Alter table while CDC is active
Function design One batch per invocation, idempotent processing Long polling loops inside one invocation
Architecture Use CT/CDC for sync; domain events for business logic Replace domain events with CDC
NULL handling Check operation type before mapping row data Assume all columns are populated on DELETE
Azure SQL PaaS Understand SQL Agent is managed, not user-controlled Try to manually create CDC Agent jobs

Final Thoughts

CDC and Change Tracking are genuinely powerful features — when you understand what they're designed for and where they break. The Azure SQL managed environment adds another layer of "it works differently here" that the official docs sometimes underplay.

The biggest lesson from this whole journey: start with Change Tracking unless you have a concrete need for pre-change values. It's simpler, more resilient to schema changes, and plenty powerful for most sync scenarios. Reach for CDC when you genuinely need the audit trail or full delta — and budget time for its maintenance overhead.

And always, always validate that minimum valid version.

Happy tracking. 🚀

Azure Integration

Part 3 of 3

Real-world Azure integration patterns from the trenches — Service Bus, Logic Apps, Durable Functions, Azure Functions, and everything in between. No fluff, just battle-tested approaches from someone who's shipped these to production.

Start from the beginning

Testing Durable Functions Without Losing Your Mind (and Your Sprint)

Real workarounds from the trenches — integration testing Azure Durable Functions when the framework fights back.