Skip to content

Conversation

@pasanw
Copy link
Contributor

@pasanw pasanw commented Jan 23, 2026

Description

Context

tigera-operator configures the eck-operator to deploy a StatefulSet by specifying a NodeSet in the Elasticsearch CR. The name of the resulting StatefulSet is the name of the NodeSet.

tigera-operator sets the NodeSet name by hashing the PersistentVolumeClaim template of the NodeSet.

This ensures that if the PVC template changes as a result of LogStorage changes, a new name for the NodeSet is generated, which creates a new StatefulSet with the new PVC template (rather than updating the existing StatefulSet).

A new StatefulSet must be created, since a StatefulSet's PVC template can not be updated after the StatefulSet has been created.

Issue

The upgrade of k8s.io/apimachinery to v0.34.3 has changed how k8s objects serialize. Specifically, creationTimestamp now has an omitzero json tag. This change has resulted in the same PersistentVolumeClaim resulting in different hashes depending on the version of tigera-operator (and what version of apimachinery it uses)

The primary impact here is that when users upgrade Calico Enteprise to a newer version, which has an operator with apimachinery >= v0.34.3, tigera-operator computes a different name for the NodeSet (due to the serialization change), and this triggers a new StatefulSet and PVC to be created, even though no LogStorage properties changed.

This recreation could lead to one of two things:

  • A slow upgrade on clusters with dynamic provisioning, as new PVs are provisioned and filled with the data from the old PVs
  • A stalled upgrade on clusters without dynamic provisioning/available PVs, as the new ES pods wait for PVs to be provisioned

Proposal

This PR changes the behaviour of NodeSet name generation as follows:

  • We change the name of a NodeSet when we know that the already existing PVC template of the NodeSet does not match what it should be based on the LogStorage configuration. Otherwise, we keep the existing name.
  • When we need to change (or create) a name for a NodeSet, we choose a random string

This has the key property of retaining the existing NodeSet name, regardless of how it was generated, if the existing PVC in the NodeSet matches the configuration found in LogStorage.

Alternative approaches

It was attempted to simply augment the serialization to retain the creationTimestamp: null that previous versions of apimachinery would create. Unfortunately all attempted approaches resulted in field orders changing, which also meant a different hash.

Release Note

Updated Elasticsearch NodeSet name generation to prevent unnecessary recreations of the Elasticsearch StatefulSet.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@pasanw pasanw marked this pull request as ready for review January 23, 2026 23:55
@pasanw pasanw requested a review from a team as a code owner January 23, 2026 23:55
@pasanw pasanw merged commit c707e42 into tigera:master Jan 27, 2026
8 of 11 checks passed
@pasanw pasanw deleted the ev-6345 branch January 27, 2026 20:01
pasanw added a commit to pasanw/operator that referenced this pull request Jan 27, 2026
The upgrade of k8s.io/apimachinery to v0.34.3 changed how k8s objects serialize, causing the same PVC template to produce different hashes across tigera-operator versions. Since NodeSet names are derived from PVC template hashes, this triggered unnecessary StatefulSet recreation during Calico Enterprise upgrades, leading to slow upgrades (dynamic provisioning) or stalled upgrades (static provisioning with no available PVs). This change replaces hash-based NodeSet naming with logic that retains the existing name when the PVC template matches LogStorage configuration, and uses a random string when a new name is needed.
@pasanw pasanw mentioned this pull request Jan 27, 2026
5 tasks
pasanw added a commit to pasanw/operator that referenced this pull request Jan 27, 2026
The upgrade of k8s.io/apimachinery to v0.34.3 changed how k8s objects serialize, causing the same PVC template to produce different hashes across tigera-operator versions. Since NodeSet names are derived from PVC template hashes, this triggered unnecessary StatefulSet recreation during Calico Enterprise upgrades, leading to slow upgrades (dynamic provisioning) or stalled upgrades (static provisioning with no available PVs). This change replaces hash-based NodeSet naming with logic that retains the existing name when the PVC template matches LogStorage configuration, and uses a random string when a new name is needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants