-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Cleanup snapshot files in datastores for Error-ed snapshots, and some code improvements #12347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.20
Are you sure you want to change the base?
Cleanup snapshot files in datastores for Error-ed snapshots, and some code improvements #12347
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12347 +/- ##
============================================
- Coverage 16.23% 16.23% -0.01%
+ Complexity 13378 13377 -1
============================================
Files 5657 5657
Lines 498866 498898 +32
Branches 60545 60545
============================================
Hits 81011 81011
- Misses 408821 408854 +33
+ Partials 9034 9033 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses the cleanup of orphaned snapshot files in datastores when snapshots fail and enter Error state. Previously, when snapshots failed during backup to secondary storage, the removed date was set for snapshot records, causing the storage garbage collector to skip cleanup since getSnapshot() always returned null for removed snapshots. This left orphaned files in primary storage and error records in the database tables.
Key changes include:
- Added cleanup logic for snapshots in Error state including those marked as removed
- Introduced new DAO methods to handle removed snapshots and filter by state
- Added state transitions to handle Destroyed state more gracefully
- Refactored variable names for clarity and extracted methods to reduce duplication
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main/java/com/cloud/storage/StorageManagerImpl.java | Refactored snapshot cleanup logic into separate methods, added support for cleaning up error-ed snapshots including removed ones, and now uses getSnapshotIncludingRemoved for cleanup operations |
| engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotDataFactoryImpl.java | Added getSnapshotIncludingRemoved method and extracted common logic into getSnapshotOnStore helper method |
| engine/schema/src/main/java/org/apache/cloudstack/storage/datastore/db/SnapshotDataStoreDaoImpl.java | Modified findBySnapshotId to return all references regardless of state, added findBySnapshotIdWithNonDestroyedState to filter out destroyed state references |
| engine/schema/src/main/java/org/apache/cloudstack/storage/datastore/db/SnapshotDataStoreDao.java | Added interface method findBySnapshotIdWithNonDestroyedState |
| engine/schema/src/main/java/com/cloud/storage/dao/SnapshotDaoImpl.java | Added listAllByStatusIncludingRemoved method to query snapshots including removed ones |
| engine/schema/src/main/java/com/cloud/storage/dao/SnapshotDao.java | Added interface method listAllByStatusIncludingRemoved |
| engine/storage/src/main/java/org/apache/cloudstack/storage/datastore/ObjectInDataStoreManagerImpl.java | Added state transitions for Destroyed state to handle repeated destroy operations gracefully |
| server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java | Renamed variable from backupedSnapshot to backedUpSnapshot and from snapshot to snapshotOnPrimary for clarity, updated calls to use findBySnapshotIdWithNonDestroyedState |
| engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java | Removed redundant variable assignment, removed unnecessary blank lines |
| engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/DefaultSnapshotStrategy.java | Updated to use findBySnapshotIdWithNonDestroyedState |
| engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/StorageSystemSnapshotStrategy.java | Removed unnecessary blank line |
| plugins/storage/volume/storpool/src/main/java/org/apache/cloudstack/storage/snapshot/StorPoolSnapshotStrategy.java | Updated to use findBySnapshotIdWithNonDestroyedState |
| plugins/storage/volume/storpool/src/main/java/org/apache/cloudstack/storage/datastore/util/StorPoolHelper.java | Updated to use findBySnapshotIdWithNonDestroyedState |
| engine/api/src/main/java/org/apache/cloudstack/engine/subsystem/api/storage/SnapshotDataFactory.java | Added interface method getSnapshotIncludingRemoved |
| server/src/test/java/com/cloud/storage/snapshot/SnapshotManagerImplTest.java | Updated test mocks to use findBySnapshotIdWithNonDestroyedState |
Comments suppressed due to low confidence (1)
engine/schema/src/main/java/org/apache/cloudstack/storage/datastore/db/SnapshotDataStoreDaoImpl.java:354
- The new method
findBySnapshotIdWithNonDestroyedState()and the modified implementation offindBySnapshotId()lack test coverage. Consider adding tests to verify that:
- findBySnapshotId returns all snapshot store references regardless of state
- findBySnapshotIdWithNonDestroyedState filters out Destroyed state references
- Both methods correctly handle cases where no snapshots exist
public List<SnapshotDataStoreVO> findBySnapshotId(long snapshotId) {
SearchCriteria<SnapshotDataStoreVO> sc = searchFilteringStoreIdEqStateEqStoreRoleEqIdEqUpdateCountEqSnapshotIdEqVolumeIdEq.create();
sc.setParameters(SNAPSHOT_ID, snapshotId);
return listBy(sc);
}
@Override
public List<SnapshotDataStoreVO> findBySnapshotIdWithNonDestroyedState(long snapshotId) {
SearchCriteria<SnapshotDataStoreVO> sc = idStateNeqSearch.create();
sc.setParameters(SNAPSHOT_ID, snapshotId);
sc.setParameters(STATE, State.Destroyed);
return listBy(sc);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...rage/src/main/java/org/apache/cloudstack/storage/datastore/ObjectInDataStoreManagerImpl.java
Show resolved
Hide resolved
...e/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotDataFactoryImpl.java
Show resolved
Hide resolved
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16195 |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm, some remarks about logging
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Snapshot [{}] in {} state found on {} data store [{}], it will be deleted.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotInfo.getDataStore().getUuid()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the condition is not needed for the simple getters. Else lambdas can work as well.
| if (logger.isDebugEnabled()) { | |
| logger.debug("Snapshot [{}] in {} state found on {} data store [{}], it will be deleted.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotInfo.getDataStore().getUuid()); | |
| } | |
| logger.debug("Snapshot [{}] in {} state found on {} data store [{}], it will be deleted.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotInfo.getDataStore().getUuid()); |
| } else if (logger.isDebugEnabled()) { | ||
| logger.debug("Did not find snapshot [{}] in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } else if (logger.isDebugEnabled()) { | |
| logger.debug("Did not find snapshot [{}] in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); | |
| } | |
| } else { | |
| logger.debug("Did not find snapshot [{}] in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); | |
| } |
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Failed to delete snapshot [{}] from storage.", snapshot, e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (logger.isDebugEnabled()) { | |
| logger.debug("Failed to delete snapshot [{}] from storage.", snapshot, e); | |
| } | |
| logger.debug("Failed to delete snapshot [{}] from storage.", snapshot, e); |
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Snapshot [{}] is in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (logger.isDebugEnabled()) { | |
| logger.debug("Snapshot [{}] is in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); | |
| } | |
| logger.debug("Snapshot [{}] is in {} state on {} data store ID: {}.", snapshotUuid, snapshotDataStoreVO.getState(), storeRole, snapshotDataStoreVO.getDataStoreId()); |
Description
This PR addresses cleanup of the snapshot files in datastores for Error-ed snapshots, and has some code improvements.
The removed date is set for the snapshot records after backup failed [1], and the storage garbage collector ignores cleaning up the files as getSnapshot [2] always returns null, leaving the orphaned snapshot files in the primary storage and error records in snapshots and snapshot_store_ref tables.
[1] https://github.com/apache/cloudstack/blob/4.20/engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java#L408
https://github.com/apache/cloudstack/blob/4.20/engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java#L719
[2] https://github.com/apache/cloudstack/blob/4.20/server/src/main/java/com/cloud/storage/StorageManagerImpl.java#L1889
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Failed snapshots after created in primary (before copy to secondary storage) by adding the below code, and verified the cleanup before and after changes.
BEFORE CHANGES:
Logs:
DB:
Files in Primary Storage:
AFTER CHANGES:
Logs:
DB:
Files in Primary Storage:
How did you try to break this feature and the system with this change?