fix: prevent duplicate features in bulk creation by exbald · Pull Request #148 · AutoForgeAI/autoforge

exbald · 2026-01-31T16:40:11Z

Problem

When feature_create_bulk is called multiple times (e.g., from expand sessions or if the initializer runs again), it creates duplicate features with the same (category, name) pairs. This can lead to hundreds of duplicate features in the database.

Solution

Added deduplication logic to feature_create_bulk:

Query existing features - Check (category, name) pairs already in the database
Filter input batch - Skip features that already exist or appear multiple times in the same batch
Report skipped count - Return skipped_duplicates in the response so callers know how many were filtered

Changes

mcp_server/feature_mcp.py: Added deduplication before feature creation

Testing

Tested on a project that had 607 features (should have been ~98). After identifying the duplicates came from repeated bulk creation calls, this fix prevents the issue from recurring.

Summary by CodeRabbit

Release Notes

New Features
- Bulk feature creation now automatically detects and filters duplicate entries based on category and name.
- Response includes a count of skipped duplicates for transparency.
- Improved data integrity by preventing duplicate features within a single bulk creation request.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

When feature_create_bulk is called multiple times (e.g., from expand sessions), it now checks for existing (category, name) pairs and skips duplicates instead of creating them again. Changes: - Query existing features before creation - Filter duplicates from input batch - Report skipped_duplicates count in response

coderabbitai · 2026-01-31T16:40:28Z

📝 Walkthrough

Walkthrough

The feature_create_bulk function now includes a pre-creation deduplication pass to identify and filter out duplicate (category, name) pairs from the input batch. The function now follows a four-pass process: validation, duplicate filtering, creation, and dependency resolution. The response includes a new skipped_duplicates field reporting filtered duplicates.

Changes

Cohort / File(s)	Summary
Deduplication Logic `mcp_server/feature_mcp.py`	Added pre-creation pass to collect existing (category, name) pairs and filter duplicates from input batch. Second and third passes now operate on deduplicated list. Response includes new `skipped_duplicates` field. Updated control-flow comments to reflect four-pass process.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Whiskers twitching with glee,
Duplicate features fade away,
One by one, clean and neat,
Deduplication saves the day! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: prevent duplicate features in bulk creation' directly and clearly summarizes the main change: adding deduplication logic to the bulk feature creation function to prevent duplicate (category, name) pairs.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mcp_server/feature_mcp.py (1)

549-591: ⚠️ Potential issue | 🔴 Critical

Critical bug: Dependency indices break when features are skipped.

The depends_on_indices values refer to positions in the original features list, but after filtering into unique_features, the index mapping is invalidated. This causes:

IndexError if idx >= len(unique_features) after filtering
Wrong dependencies if the dependency target was skipped (exists in DB) — the code will reference the wrong feature from created_features instead of looking up the existing feature's ID
Self-dependency in edge cases where a feature incorrectly depends on itself

Example: If features [A, B] where B has depends_on_indices: [0], and A is skipped (already exists with id=5), then unique_features = [B], created_features = [B], and B's dependency resolves to created_features[0].id (B's own ID) instead of 5.

🔧 Proposed fix: maintain index-to-ID mapping for both new and existing features

             # Second pass: check for existing features to avoid duplicates
-            existing = set()
-            for f in session.query(Feature.category, Feature.name).all():
-                existing.add((f.category, f.name))
+            existing_map = {}  # (category, name) -> feature_id
+            for f in session.query(Feature.id, Feature.category, Feature.name).all():
+                existing_map[(f.category, f.name)] = f.id

             # Filter out duplicates from input batch
             unique_features = []
+            original_idx_to_id = {}  # Maps original index -> feature ID (existing or to-be-created)
             skipped = 0
-            for feature_data in features:
+            for orig_idx, feature_data in enumerate(features):
                 key = (feature_data["category"], feature_data["name"])
-                if key in existing:
+                if key in existing_map:
+                    # Track existing feature's ID for dependency resolution
+                    original_idx_to_id[orig_idx] = existing_map[key]
                     skipped += 1
                 else:
-                    unique_features.append(feature_data)
-                    existing.add(key)  # Mark as will-exist for batch dedup
+                    unique_features.append((orig_idx, feature_data))
+                    existing_map[key] = None  # Placeholder; ID assigned after flush

             # Third pass: create unique features only
             created_features: list[Feature] = []
-            for i, feature_data in enumerate(unique_features):
+            for orig_idx, feature_data in unique_features:
                 db_feature = Feature(
-                    priority=start_priority + i,
+                    priority=start_priority + len(created_features),
                     category=feature_data["category"],
                     name=feature_data["name"],
                     description=feature_data["description"],
                     steps=feature_data["steps"],
                     passes=False,
                     in_progress=False,
                 )
                 session.add(db_feature)
                 created_features.append(db_feature)

             # Flush to get IDs assigned
             session.flush()

+            # Build complete index-to-ID mapping after flush
+            for i, (orig_idx, _) in enumerate(unique_features):
+                original_idx_to_id[orig_idx] = created_features[i].id
+
             # Fourth pass: resolve index-based dependencies to actual IDs
             deps_count = 0
-            for i, feature_data in enumerate(unique_features):
+            for i, (orig_idx, feature_data) in enumerate(unique_features):
                 indices = feature_data.get("depends_on_indices", [])
                 if indices:
-                    # Convert indices to actual feature IDs
-                    dep_ids = [created_features[idx].id for idx in indices]
+                    # Convert original indices to actual feature IDs (new or existing)
+                    dep_ids = [original_idx_to_id[idx] for idx in indices]
                     created_features[i].dependencies = sorted(dep_ids)
                     deps_count += 1

🧹 Nitpick comments (1)

mcp_server/feature_mcp.py (1)

549-552: Consider selecting only necessary columns for the existence check.

The query now retrieves all (category, name) pairs into memory. This is fine for moderate datasets, but if the feature table grows large, consider adding a database index on (category, name) or using an EXISTS subquery per feature. For now, this approach is acceptable given the described scale (~600 features).

…AutoForgeAI#148) - Updated display_derivation.py icon values to match rest of codebase (coding->code, testing->test-tube, refactoring->wrench) - feature_compiler.py now imports TASK_TYPE_ICONS and DEFAULT_ICON from display_derivation instead of defining its own duplicate constants - spec_builder.py already delegates to display_derivation (prior session) - Updated all test assertions to match consolidated icon values - Added 34 tests in test_feature_148_display_consolidation.py verifying: - Single source of truth (no duplicate definitions) - Consistent icon values across all code paths - Import chain correctness (feature_compiler -> display_derivation) - AgentSpecs created via any path get correct display values Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

rudiheydra added a commit to rudiheydra/AutoBuildr that referenced this pull request Feb 2, 2026

docs: Update progress notes for Feature AutoForgeAI#148 completion

87a7235

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate features in bulk creation#148

fix: prevent duplicate features in bulk creation#148
exbald wants to merge 1 commit intoAutoForgeAI:masterfrom
exbald:fix/dedup-bulk-features

exbald commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

exbald commented Jan 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

exbald commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading