VDiff
Compare the source and target in a workflow to ensure integrity
Description #
VDiff does a row by row comparison of all tables associated with the workflow, diffing the source keyspace and the target keyspace and reporting counts of missing/extra/unmatched rows.
It is highly recommended that you do this before you finalize a workflow with SwitchTraffic
and complete
.
Command #
VDiff takes different sub-commands or actions similar to how the MoveTables
/Reshard
commands work. Please see the command's reference docs for additional info. The following sub-commands or actions are supported:
Start a New VDiff #
The create
action schedules a VDiff to run on the primary tablet of each target shard to verify the subset of data that will live on the given shard. If you do not pass a specific UUID then one will be generated.
Each scheduled VDiff has an associated UUID which is returned by the create
action. You can use it
to monitor progress. Example:
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer create
VDiff a35b0006-e6d9-416e-bea9-917795dc5bf3 scheduled on target shards, use show to view progress
Resume a Previous VDiff #
The resume
action allows you to resume a previously completed VDiff, picking up where it left off and comparing the records where the Primary Key column(s) are greater than the last record processed — with the progress and other status information saved when the run ends. This allows you to do approximate rolling or differential VDiffs (e.g. done after MoveTables
finishes the initial copy phase and then again just before SwitchTraffic
).
Example:
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer resume 4c664dc2-eba9-11ec-9ef7-920702940ee0
VDiff 4c664dc2-eba9-11ec-9ef7-920702940ee0 resumed on target shards, use show to view progress
resume
when different collations are used for a table between the source and target keyspaces (more details can be seen here).Show Progress/Status of a VDiff #
Using the show
action you can either show
a specific UUID or use the last
convenience shorthand to look at the most recently created VDiff. Example:
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer show last
VDiff Summary for customer.commerce2customer (4c664dc2-eba9-11ec-9ef7-920702940ee0)
State: completed
RowsCompared: 196
HasMismatch: false
StartedAt: 2022-06-26 22:44:29
CompletedAt: 2022-06-26 22:44:31
Use "--format=json" for more detailed output.
$ vtctldclient --server=localhost:15999 VDiff --format=json --target-keyspace customer --workflow commerce2customer show last
{
"Workflow": "commerce2customer",
"Keyspace": "customer",
"State": "completed",
"UUID": "4c664dc2-eba9-11ec-9ef7-920702940ee0",
"RowsCompared": 196,
"HasMismatch": false,
"Shards": "0",
"StartedAt": "2022-06-26 22:44:29",
"CompletedAt": "2022-06-26 22:44:31"
}
$ vtctldclient --server=localhost:15999 VDiff --format=json --target-keyspace customer --workflow p1c2 show daf1f03a-03ed-11ed-9ab8-920702940ee0
{
"Workflow": "p1c2",
"Keyspace": "customer",
"State": "started",
"UUID": "daf1f03a-03ed-11ed-9ab8-920702940ee0",
"RowsCompared": 51,
"HasMismatch": false,
"Shards": "-80,80-",
"StartedAt": "2022-07-15 03:26:03",
"Progress": {
"Percentage": 48.57,
"ETA": "2022-07-15 03:26:10"
}
}
show all
lists all VDiffs created for the specified keyspace and workflow.
SELECT COUNT(*)
.
So we instead use the statistics available in the
information_schema
to approximate the number of rows in each table when initializing a VDiff on the target
primary tablet(s). This data is then used in the progress report and it can be significantly
off (up to 50-60+%) depending on the utilization of the underlying MySQL server resources and
the age of the tables. You can specify the create
--update-table-stats
flag so that VDiff
will run ANALYZE TABLE
to update the
statistics for the tables involved on the target primary tablet(s) before executing the
VDiff in order to improve the accuracy of the progress report.Stopping a VDiff #
The stop
action allows you to stop a running VDiff for any reason — for example, the load on the system(s) may be too high at the moment and you want to postpone the work until off hours. You can then later use the resume
action to start the VDiff again from where it left off. Example:
$ vtctldclient --server=localhost:15999 VDiff --format=json --target-keyspace customer --workflow commerce2customer stop ad9bd40e-0c92-11ed-b568-920702940ee0
{
"UUID": "ad9bd40e-0c92-11ed-b568-920702940ee0",
"Action": "stop",
"Status": "completed"
}
stop
a VDiff that is already completed is a no-op.Delete VDiff Results #
You use the delete
action to either delete
a specific UUID or use the all
shorthand to delete all VDiffs created for the specified keyspace and workflow. Example:
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer delete all
VDiff delete completed
$ vtctldclient --server=localhost:15999 VDiff --format=json --target-keyspace customer --workflow commerce2customer delete all
{
"Action": "delete",
"Status": "completed"
}
Deletes are idempotent, so attempting to delete
VDiff data that does not exist is a no-op.
All VDiff data associated with a VReplication workflow is deleted when the workflow is deleted.
Metrics #
VDiff exports several metrics using the expvars interface. These are available at the /debug/vars
endpoint of vttablet's http status pages. More details can be found here.
All of these are exported on the target primary vttablet
processes where the VDiff controllers run.
VDiffCount #
The number of vdiffs that exist on the vttablet
— meaning ones that were created and have not
yet been deleted so they could still be stopped, resumed, or deleted. This count would match the
number of vdiffs you would see in the vdiff show all
client command output.
VDiffErrorCountTotal #
The number of errors encountered across all vdiff actions on the vttablet
.
VDiffErrors #
The count of specific errors seen during the lifetime of a vdiff.
VDiffPhaseTimings #
This metric reflects the amount of time spent (in nanoseconds) in the main phases that are run during the lifetime of a vdiff.
VDiffRowsComparedTotal #
The number of rows compared across all vdiffs on the vttablet
.
VDiffRowsCompared #
The live number of rows compared per vdiff by table.
VDiffRestartedTableDiffsCount #
The numnber of time that table diffs were restarted due to --max-diff-duration
by table.