fix: prevent duplicate features in bulk creation#148
fix: prevent duplicate features in bulk creation#148exbald wants to merge 1 commit intoAutoForgeAI:masterfrom
Conversation
When feature_create_bulk is called multiple times (e.g., from expand sessions), it now checks for existing (category, name) pairs and skips duplicates instead of creating them again. Changes: - Query existing features before creation - Filter duplicates from input batch - Report skipped_duplicates count in response
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mcp_server/feature_mcp.py (1)
549-591:⚠️ Potential issue | 🔴 CriticalCritical bug: Dependency indices break when features are skipped.
The
depends_on_indicesvalues refer to positions in the originalfeatureslist, but after filtering intounique_features, the index mapping is invalidated. This causes:
- IndexError if
idx >= len(unique_features)after filtering- Wrong dependencies if the dependency target was skipped (exists in DB) — the code will reference the wrong feature from
created_featuresinstead of looking up the existing feature's ID- Self-dependency in edge cases where a feature incorrectly depends on itself
Example: If features
[A, B]where B hasdepends_on_indices: [0], and A is skipped (already exists withid=5), thenunique_features = [B],created_features = [B], and B's dependency resolves tocreated_features[0].id(B's own ID) instead of 5.🔧 Proposed fix: maintain index-to-ID mapping for both new and existing features
# Second pass: check for existing features to avoid duplicates - existing = set() - for f in session.query(Feature.category, Feature.name).all(): - existing.add((f.category, f.name)) + existing_map = {} # (category, name) -> feature_id + for f in session.query(Feature.id, Feature.category, Feature.name).all(): + existing_map[(f.category, f.name)] = f.id # Filter out duplicates from input batch unique_features = [] + original_idx_to_id = {} # Maps original index -> feature ID (existing or to-be-created) skipped = 0 - for feature_data in features: + for orig_idx, feature_data in enumerate(features): key = (feature_data["category"], feature_data["name"]) - if key in existing: + if key in existing_map: + # Track existing feature's ID for dependency resolution + original_idx_to_id[orig_idx] = existing_map[key] skipped += 1 else: - unique_features.append(feature_data) - existing.add(key) # Mark as will-exist for batch dedup + unique_features.append((orig_idx, feature_data)) + existing_map[key] = None # Placeholder; ID assigned after flush # Third pass: create unique features only created_features: list[Feature] = [] - for i, feature_data in enumerate(unique_features): + for orig_idx, feature_data in unique_features: db_feature = Feature( - priority=start_priority + i, + priority=start_priority + len(created_features), category=feature_data["category"], name=feature_data["name"], description=feature_data["description"], steps=feature_data["steps"], passes=False, in_progress=False, ) session.add(db_feature) created_features.append(db_feature) # Flush to get IDs assigned session.flush() + # Build complete index-to-ID mapping after flush + for i, (orig_idx, _) in enumerate(unique_features): + original_idx_to_id[orig_idx] = created_features[i].id + # Fourth pass: resolve index-based dependencies to actual IDs deps_count = 0 - for i, feature_data in enumerate(unique_features): + for i, (orig_idx, feature_data) in enumerate(unique_features): indices = feature_data.get("depends_on_indices", []) if indices: - # Convert indices to actual feature IDs - dep_ids = [created_features[idx].id for idx in indices] + # Convert original indices to actual feature IDs (new or existing) + dep_ids = [original_idx_to_id[idx] for idx in indices] created_features[i].dependencies = sorted(dep_ids) deps_count += 1
🧹 Nitpick comments (1)
mcp_server/feature_mcp.py (1)
549-552: Consider selecting only necessary columns for the existence check.The query now retrieves all
(category, name)pairs into memory. This is fine for moderate datasets, but if the feature table grows large, consider adding a database index on(category, name)or using anEXISTSsubquery per feature. For now, this approach is acceptable given the described scale (~600 features).
…AutoForgeAI#148) - Updated display_derivation.py icon values to match rest of codebase (coding->code, testing->test-tube, refactoring->wrench) - feature_compiler.py now imports TASK_TYPE_ICONS and DEFAULT_ICON from display_derivation instead of defining its own duplicate constants - spec_builder.py already delegates to display_derivation (prior session) - Updated all test assertions to match consolidated icon values - Added 34 tests in test_feature_148_display_consolidation.py verifying: - Single source of truth (no duplicate definitions) - Consistent icon values across all code paths - Import chain correctness (feature_compiler -> display_derivation) - AgentSpecs created via any path get correct display values Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem
When
feature_create_bulkis called multiple times (e.g., from expand sessions or if the initializer runs again), it creates duplicate features with the same(category, name)pairs. This can lead to hundreds of duplicate features in the database.Solution
Added deduplication logic to
feature_create_bulk:(category, name)pairs already in the databaseskipped_duplicatesin the response so callers know how many were filteredChanges
mcp_server/feature_mcp.py: Added deduplication before feature creationTesting
Tested on a project that had 607 features (should have been ~98). After identifying the duplicates came from repeated bulk creation calls, this fix prevents the issue from recurring.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.