Skip to content

Conversation

@dracarys09
Copy link

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures.

With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables:

  • system.local_metadata_log - Contains transformation entries (epoch -> transformation)
  • system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata

When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures.

With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables:

* system.local_metadata_log - Contains transformation entries (epoch -> transformation)
* system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata

When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.
Copy link
Member

@krummas krummas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is an emergency recovery tool, hopefully extremely rarely used by an operator, I think we can slim it down a lot, these are the features I think we need here:

  • dump metadata to current (or user provided) epoch
    • serialized binary format
    • metadata.toString, to avoid locking us in to any formats
  • dump log (with start/end epoch), just toString each entry
  • maybe add option to dump system_clustermetadata.distributed_metadata_log if this is run on a CMS node

issues;

  • shell script should live in tools/bin/ directory
  • tool name - this does not dump sstable metadata, it dumps cluster metadata from sstables, sstable metadata is something different (see tools/bin/sstablemetadata)
  • it copies the sstables to $CASSANDRA_HOME/data (or, if that is unset, in to the current directory) - we should create a temporary directory for import and clean that directory up after dumping the metadata, we need something like
                Path p = Files.createTempDirectory("dumptcmlog");
                DatabaseDescriptor.getRawConfig().data_file_directories = new String[] {p.resolve("data").toString()};
                DatabaseDescriptor.getRawConfig().commitlog_directory = p.resolve("commitlog").toString();
                DatabaseDescriptor.getRawConfig().accord.journal_directory = p.resolve("accord_journal").toString();
                DatabaseDescriptor.getRawConfig().hints_directory = p.resolve("hints").toString();
                DatabaseDescriptor.getRawConfig().saved_caches_directory = p.resolve("saved_caches").toString();

to make sure we only touch the tmp directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants