-
-
Notifications
You must be signed in to change notification settings - Fork 47
feat: add package build from source (#101) #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add package build from source (#101) #482
Conversation
Implements the ability to build and install packages from source code when binaries are unavailable. This feature includes automatic build dependency detection, build system detection, build caching, and comprehensive error handling.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdds end-to-end "build from source" support: new Changes
Sequence Diagram(s)sequenceDiagram
participant User as User / CLI
participant CLI as cortex/cli.py
participant Builder as SourceBuilder
participant Cache as Build Cache
participant Coord as InstallationCoordinator
User->>CLI: cortex install pkg[`@ver`] --from-source [--source-url URL]
CLI->>CLI: parse args -> from_source = true
CLI->>Builder: build_from_source(pkg, version, source_url)
Builder->>Cache: check_cache(cache_key)
alt cache hit
Cache-->>Builder: cached build + install cmds
else cache miss
Builder->>Builder: fetch_source / extract
Builder->>Builder: detect_build_system
Builder->>Builder: detect_build_dependencies
Builder->>Builder: configure_build
Builder->>Builder: build
Builder->>Builder: install_build -> install cmds
Builder->>Cache: save_cache(cache_key, build_dir, cmds)
end
Builder-->>CLI: BuildResult(success, install_commands, cached?)
CLI->>Coord: execute install_commands (or dry-run)
Coord-->>CLI: install status / exit code
CLI-->>User: report success/failure and details
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cortex/cli.py (1)
1-2050: Address Black formatting failure.The CI pipeline indicates that Black formatting check failed with "1 file would be reformatted." Run
black cortex/cli.pyto fix the formatting issues before merging.
🧹 Nitpick comments (6)
docs/SOURCE_BUILD.md (1)
78-82: Consider notingsetup.pydeprecation in documentation.The documentation mentions
python3 setup.py buildandsudo python3 setup.py installfor Python builds. While this still works for legacy projects, modern Python packaging preferspip install .orpython -m buildwithpyproject.toml. Consider adding a note about this or supporting the modern approach as well.tests/test_source_builder.py (2)
184-209: Consider adding stronger assertions for fetch test.The test mocks the tarfile extraction but only verifies
result is not None. Consider asserting on the returned path structure to ensure the extraction logic correctly identifies the source directory.🔎 Suggested improvement
result = self.builder._fetch_from_url( "https://example.com/test.tar.gz", "test", "1.0" ) assert result is not None + # Verify returned path is a valid directory + assert isinstance(result, Path)
235-253: Strengthen assertion for success test.The test comment states "Should succeed" but only asserts
result is not None. Add an assertion forresult.successto properly verify the expected outcome.🔎 Proposed fix
result = self.builder.build_from_source("test-package", use_cache=False) - # Should succeed (or at least not fail on dependency check) - assert result is not None + # Should succeed since dependencies are satisfied and commands succeed + assert result is not None + assert result.success is Truecortex/source_builder.py (3)
36-37: Module-level side effect on import.
CACHE_DIR.mkdir(parents=True, exist_ok=True)executes at import time, which could cause issues in testing or when the module is imported but not used. Consider moving this to theSourceBuilder.__init__method (which already has a similar call at line 96).🔎 Proposed fix
# Build cache directory CACHE_DIR = Path.home() / ".cortex" / "build_cache" -CACHE_DIR.mkdir(parents=True, exist_ok=True)The directory creation in
__init__at line 96 already handles this.
194-200: GitHub tag URL assumes 'v' prefix for versions.Not all GitHub projects use the
vprefix for version tags (e.g., some use1.0.0instead ofv1.0.0). Consider making this configurable or attempting both patterns.
185-225: Temporary directories not cleaned up after use.The temp directory created at line 187 is never explicitly cleaned up. While the OS may eventually clean
/tmp, builds could accumulate significant disk space. Consider adding cleanup after the install commands are executed, or using a context manager pattern.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cortex/cli.pycortex/source_builder.pydocs/SOURCE_BUILD.mdpyproject.tomltests/test_source_builder.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
tests/test_source_builder.pycortex/source_builder.pycortex/cli.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_source_builder.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}
📄 CodeRabbit inference engine (AGENTS.md)
Use Python 3.10 or higher as the minimum supported version
Files:
pyproject.toml
🧬 Code graph analysis (2)
cortex/source_builder.py (3)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (2)
DependencyResolver(40-386)is_package_installed(101-104)cortex/utils/commands.py (1)
CommandResult(118-125)
cortex/cli.py (2)
cortex/coordinator.py (4)
execute(230-279)InstallationCoordinator(51-324)InstallationStep(26-39)StepStatus(17-22)cortex/source_builder.py (2)
SourceBuilder(81-525)build_from_source(382-525)
🪛 GitHub Actions: CI
cortex/cli.py
[error] 1-1: Black formatting check failed. 1 file would be reformatted by Black. Run 'black .' to format.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.11)
- GitHub Check: test (3.10)
- GitHub Check: test (3.12)
🔇 Additional comments (6)
pyproject.toml (1)
137-137: LGTM!Adding
docsto the Ruff exclude list is appropriate since the documentation directory contains Markdown files, not Python code.cortex/cli.py (1)
1549-1598: Well-structured source build integration.The
_install_from_sourcemethod properly integrates withSourceBuilder, handles caching, and supports both dry-run and execute modes. The progress callback follows existing patterns in the codebase.tests/test_source_builder.py (2)
1-21: LGTM - Well-structured test setup.Imports are properly organized and the test file follows pytest conventions. The import of
BUILD_DEPENDENCIES,BuildConfig,BuildResult, andSourceBuilderaligns with the public API of the source_builder module.
23-253: Good test coverage for SourceBuilder.The
TestSourceBuilderclass provides comprehensive coverage of key functionality including initialization, cache key generation, dependency detection for multiple build systems, build system detection, and the build-from-source workflow. The use oftempfile.TemporaryDirectoryfor filesystem tests is appropriate.cortex/source_builder.py (2)
53-78: LGTM - Well-defined dataclasses.
BuildConfigandBuildResultare properly typed with sensible defaults. The use of@dataclasskeeps the code clean and provides automatic__init__,__repr__, etc.
382-525: Well-implemented build orchestration.The
build_from_sourcemethod properly orchestrates the entire build workflow: cache check, source fetch, dependency detection, configure, build, and install command generation. Error handling is comprehensive with informative error messages inBuildResult. The integration withDependencyResolverandrun_commandutility follows existing patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/SOURCE_BUILD.md (1)
237-242: Verify that referenced documentation files exist.The "See Also" section references documentation files that may not exist in the repository. A previous review flagged that
Getting-Started.mdandREADME_DEPENDENCIES.mdwere missing. Please verify these files exist or update the references to point to existing documentation.
🧹 Nitpick comments (5)
cortex/cli.py (2)
554-563: Update docstring to document new parameters.The method signature was extended with
from_source,source_url, andversionparameters, but the docstring was not updated to describe them. Adding parameter documentation would improve maintainability.🔎 Proposed enhancement
Add to the method's docstring:
"""Install software using natural language or from source. Args: software: Software description or package name execute: Whether to execute installation commands dry_run: Show commands without executing parallel: Enable parallel execution for multi-step installs from_source: Build and install from source code when True source_url: URL to source code (for from_source builds) version: Version to build (for from_source builds) Returns: Exit code (0 for success, 1 for failure) """
1547-1624: Consider adding audit logging for source builds.Based on learnings, all package operations should implement audit logging to
~/.cortex/history.db. The_install_from_sourcemethod doesn't record installation attempts to the history database, which means source builds won't appear incortex historyor be available for rollback.Consider integrating with
InstallationHistorysimilar to the regularinstall()flow (lines 592-627 and 754-757).Based on learnings, audit logging should be implemented for install operations.
cortex/source_builder.py (3)
110-125: Consider error handling for build_dir stat.Line 119 calls
Path(build_dir).stat().st_mtimewhich could raiseFileNotFoundErrorif the build directory was removed between build and cache save. While unlikely in normal operation, adding a try-except or verification would improve robustness.🔎 Optional enhancement
# Save metadata + try: + timestamp = str(Path(build_dir).stat().st_mtime) + except (OSError, FileNotFoundError): + timestamp = str(time.time()) metadata = { "build_dir": str(build_dir), "install_commands": install_commands, - "timestamp": str(Path(build_dir).stat().st_mtime), + "timestamp": timestamp, }
209-220: Enhance zipfile path traversal protection.The current check for
".."and"/"prefix in zip members provides basic protection, but could be bypassed by certain path constructions. Consider usingPath.resolve()to verify that extracted paths remain within the target directory.🔎 Enhanced security check
elif archive_path.suffix == ".zip": with zipfile.ZipFile(archive_path, "r") as zip_ref: # Filter out path traversal components for security for member in zip_ref.namelist(): - # Skip files with path traversal or absolute paths - if ".." in member or member.startswith("/"): - continue - zip_ref.extract(member, extract_dir) + # Verify extraction path is within target directory + target_path = (extract_dir / member).resolve() + if not target_path.is_relative_to(extract_dir.resolve()): + logger.warning(f"Skipping unsafe path in archive: {member}") + continue + zip_ref.extract(member, extract_dir)Note:
is_relative_to()requires Python 3.9+, which is supported perpyproject.toml.
335-357: Consider extracting parallel job count logic.The code for determining parallel job count using
multiprocessing.cpu_count()is duplicated in lines 341-343 and 353-355. Extracting this to a helper or computing once would reduce duplication.🔎 Optional refactor
def build(self, source_dir: Path, config: BuildConfig) -> list[str]: """Build the package.""" commands = [] + + # Determine parallel jobs + import multiprocessing + jobs = multiprocessing.cpu_count() if config.build_system == "autotools" or config.build_system == "make": make_cmd = "make" if config.make_args: make_cmd += " " + " ".join(config.make_args) else: - import multiprocessing - jobs = multiprocessing.cpu_count() make_cmd += f" -j{jobs}" commands.append(make_cmd) elif config.build_system == "cmake": build_dir = source_dir / "build" make_cmd = "make" if config.make_args: make_cmd += " " + " ".join(config.make_args) else: - import multiprocessing - jobs = multiprocessing.cpu_count() make_cmd += f" -j{jobs}" commands.append(f"cd {build_dir} && {make_cmd}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cortex/cli.pycortex/source_builder.pydocs/SOURCE_BUILD.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/cli.pycortex/source_builder.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations
Applied to files:
cortex/cli.py
🧬 Code graph analysis (2)
cortex/cli.py (1)
cortex/source_builder.py (1)
build_from_source(388-531)
cortex/source_builder.py (3)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (2)
DependencyResolver(40-386)is_package_installed(101-104)cortex/utils/commands.py (1)
CommandResult(118-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.10)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
🔇 Additional comments (6)
cortex/cli.py (2)
1745-1759: CLI arguments properly defined.The new command-line arguments for source builds are well-structured with appropriate types and clear help text. The
--pkg-versionnaming avoids the previously identified conflict with the global--versionflag.
1995-2004: Proper wiring of source build parameters.The new parameters are correctly passed from command-line arguments to the
install()method using defensivegetattrcalls with appropriate defaults.cortex/source_builder.py (4)
53-78: Well-designed data structures.The
BuildConfigandBuildResultdataclasses are cleanly defined with appropriate type hints and sensible defaults. They provide a clear interface for build operations.
127-160: Dependency detection logic is sound.The method correctly aggregates build dependencies based on the build system and package patterns, then filters to return only missing packages. The heuristic for Python-related packages is reasonable.
255-282: Build system detection is well-structured.The detection logic checks for build system indicators in a sensible order and provides a reasonable default. The approach covers the most common build systems.
388-531: Well-orchestrated build workflow.The
build_from_sourcemethod provides a comprehensive build pipeline with:
- Proper caching to avoid redundant builds
- Automatic dependency detection and installation
- Build system auto-detection
- Progressive user feedback via
cx_print- Appropriate timeouts for different build stages
- Comprehensive error handling with descriptive messages
The structure is clean and maintainable.
Replace filter='data' parameter (Python 3.12+) with manual member filtering for Python 3.10/3.11 compatibility. Still prevents path traversal attacks. Fixes test failures on Python 3.10 and 3.11 in CI
Update test_fetch_from_url_tarball to mock getmembers() and handle the members parameter in extractall() call for Python 3.10+ compatibility. Fixes test failure: mock_extractall() got an unexpected keyword argument 'members'
- Mock urlretrieve to create dummy archive file - Ensure mock_extractall creates structure in correct extract_dir - Add Path type assertion to test - Remove unnecessary tempdir setup that doesn't match implementation Fixes test failure where mock wasn't creating structure in the extract_dir that _fetch_from_url actually uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_cli.py (1)
207-253: Add integration tests for new source-build CLI flags to meet >80% coverage requirement.The PR introduces new CLI flags (
--from-source,--source-url,--pkg-version) that are correctly implemented in the argument parser and install method signature, but the integration tests at lines 207-253 do not exercise them. The existing tests only verify the base case,--execute, and--dry-runflags.Add the following test cases to cover the new functionality:
Suggested test cases
@patch("sys.argv", ["cortex", "install", "docker", "--from-source"]) @patch("cortex.cli.CortexCLI.install") def test_main_install_with_from_source(self, mock_install): mock_install.return_value = 0 result = main() self.assertEqual(result, 0) mock_install.assert_called_once_with( "docker", execute=False, dry_run=False, parallel=False, from_source=True, source_url=None, version=None, ) @patch("sys.argv", ["cortex", "install", "docker", "--from-source", "--source-url", "https://example.com/package.tar.gz"]) @patch("cortex.cli.CortexCLI.install") def test_main_install_with_source_url(self, mock_install): mock_install.return_value = 0 result = main() self.assertEqual(result, 0) mock_install.assert_called_once_with( "docker", execute=False, dry_run=False, parallel=False, from_source=True, source_url="https://example.com/package.tar.gz", version=None, ) @patch("sys.argv", ["cortex", "install", "docker", "--pkg-version", "1.0.0"]) @patch("cortex.cli.CortexCLI.install") def test_main_install_with_pkg_version(self, mock_install): mock_install.return_value = 0 result = main() self.assertEqual(result, 0) mock_install.assert_called_once_with( "docker", execute=False, dry_run=False, parallel=False, from_source=False, source_url=None, version="1.0.0", )The coding guidelines require >80% test coverage for
tests/**/*.pyfiles. Verify coverage withpython -m pytest tests/test_cli.py --cov=cortex.cli --cov-report=term-missingafter adding these tests.
🧹 Nitpick comments (1)
cortex/source_builder.py (1)
356-395: Movemultiprocessingimport to module level.The
multiprocessingmodule is imported inline twice within the same method (lines 374 and 386). Per PEP 8, imports should be at the top of the file unless there's a specific reason for lazy loading.🔎 Suggested refactor
At the top of the file (after line 27):
import zipfile +import multiprocessing from dataclasses import dataclassThen remove the inline imports and use
multiprocessing.cpu_count()directly:if config.build_system == "autotools" or config.build_system == "make": make_cmd = "make" if config.make_args: make_cmd += " " + " ".join(config.make_args) else: # Use parallel builds by default - import multiprocessing - jobs = multiprocessing.cpu_count() make_cmd += f" -j{jobs}"Apply the same change at line 386.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cortex/source_builder.pytests/test_cli.pytests/test_source_builder.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
tests/test_source_builder.pytests/test_cli.pycortex/source_builder.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_source_builder.pytests/test_cli.py
🧬 Code graph analysis (3)
tests/test_source_builder.py (1)
cortex/source_builder.py (11)
BuildConfig(54-65)BuildResult(69-78)SourceBuilder(81-564)_get_cache_key(98-101)detect_build_dependencies(127-160)detect_build_system(288-315)configure_build(317-354)build(356-395)install_build(397-419)_fetch_from_url(185-264)build_from_source(421-564)
tests/test_cli.py (1)
cortex/cli.py (1)
main(1680-2041)
cortex/source_builder.py (3)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (1)
is_package_installed(101-104)cortex/utils/commands.py (1)
CommandResult(118-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.12)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.10)
🔇 Additional comments (5)
tests/test_cli.py (3)
213-221: LGTM! Assertion correctly updated for extended signature.The test assertion properly includes the three new keyword arguments (
from_source,source_url,version) with their expected default values for a basic install command.
229-237: LGTM! Assertion correctly updated for extended signature.The assertion properly verifies that the
--executeflag is handled independently of the new source-build parameters, with all new arguments correctly set to their defaults.
245-253: LGTM! Assertion correctly updated for extended signature.The assertion properly verifies that the
--dry-runflag is handled independently of the new source-build parameters, with all new arguments correctly set to their defaults.cortex/source_builder.py (1)
209-253: LGTM: Robust path traversal protection implemented.The manual filtering for both tarfile and zipfile extraction properly prevents CVE-2007-4559 and similar zip slip attacks. The implementation correctly:
- Rejects absolute paths and
..components- Resolves and validates each member path stays within
extract_dir- Handles Python 3.10/3.11 compatibility (where
filter='data'isn't available)tests/test_source_builder.py (1)
184-228: LGTM: Thorough mocking for tarball extraction test.The test properly mocks the complex tarfile extraction flow, including:
- URL retrieval simulation
- Member filtering (aligning with the security checks in the implementation)
- Creating actual directory structure to match extraction behavior
- Handling Path vs. string arguments
This validates the security-enhanced extraction logic while maintaining test isolation.
c267eb5 to
a471a6a
Compare
CLA Verification PassedAll contributors have signed the CLA.
|
Anshgrover23
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aybanda CI check is failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @cortex/cli.py:
- Around line 1625-1701: Add audit logging to the _install_from_source flow
mirroring the install() method's history writes: open the same history DB used
in install() (e.g., the History/HistoryDB helper used at lines ~670-705) and
record an entry for this source-based install with package_name, version (if
any), source_url (mark source='source' or include URL), timestamp, success
boolean and any error_message; do this after a successful install_result.success
write a success entry, and on failures write a failed entry (including
install_result.error_message or result.error_message), ensuring the code path
around _install_from_source / InstallationCoordinator / install_result uses the
same history API and file (~/.cortex/history.db) as install().
In @cortex/source_builder.py:
- Around line 36-38: Remove the module-level directory creation (the
CACHE_DIR.mkdir call) to avoid side effects at import; keep the CACHE_DIR =
Path.home() / ".cortex" / "build_cache" constant but defer creating it to
runtime in SourceBuilder.__init__. In SourceBuilder.__init__ (the constructor
that already attempts to create the cache), wrap the mkdir logic in a
try/except, handle exceptions by falling back to a safe alternative (e.g., set
self.cache_dir to None or a temp directory) and emit a logged warning/error so
the rest of the class can operate without failing the import.
- Around line 184-206: In _fetch_from_url, validate the incoming url string
before calling urllib.request.urlretrieve: parse the URL (urllib.parse.urlparse)
and ensure the scheme is either http or https (reject file:, data:, ftp:, etc.),
then resolve the hostname to IPs and use the ipaddress module to reject
loopback, link‑local, private RFC1918 ranges, and known metadata addresses
(e.g., 169.254.169.254); if any check fails, raise a ValueError or custom
exception. Apply this validation at the start of _fetch_from_url (before
creating archive_path) and use the same url variable when modifying GitHub paths
so that only validated, safe URLs reach urllib.request.urlretrieve.
🧹 Nitpick comments (4)
tests/test_source_builder.py (1)
196-239: Consider simplifying the fetch test mocking.The test uses complex mocking with a nested
mock_extractallfunction and nonlocal variables. While functional, this could be more maintainable.🔎 Alternative approach using a simpler mock structure
Consider using
side_effectdirectly on the tar object methods:def test_fetch_from_url_tarball(self, mock_tarfile, mock_urlretrieve, mock_run_command): """Test fetching source from URL (tarball).""" def mock_urlretrieve_impl(url, filepath): Path(filepath).parent.mkdir(parents=True, exist_ok=True) Path(filepath).touch() mock_urlretrieve.side_effect = mock_urlretrieve_impl # Create a simple mock that simulates extraction mock_tar = MagicMock() mock_member = MagicMock() mock_member.name = "source-1.0" mock_tar.getmembers.return_value = [mock_member] # Mock extractall to create the directory structure def create_extracted_dir(path, members=None): extract_path = Path(path) (extract_path / "source-1.0").mkdir(parents=True, exist_ok=True) (extract_path / "source-1.0" / "README").touch() mock_tar.extractall.side_effect = create_extracted_dir mock_tarfile.return_value.__enter__.return_value = mock_tar result = self.builder._fetch_from_url("https://example.com/test.tar.gz", "test", "1.0") assert result.name == "source-1.0" assert result.is_dir()cortex/source_builder.py (3)
94-97: Redundant cache directory creation.The cache directory is already created at module level (line 38), making the
mkdircall at line 97 redundant. However, if you address the module-level side effect issue (see previous comment), this line would become necessary.
99-102: Consider longer cache keys to reduce collision risk.The cache key is truncated to 16 hex characters (64 bits), which provides adequate collision resistance for typical use cases but may be insufficient for very large cache populations. Consider using at least 32 characters (128 bits) for better collision resistance in production environments.
🔎 Proposed fix
def _get_cache_key(self, package_name: str, version: str | None, source_url: str) -> str: """Generate a cache key for a build.""" key_data = f"{package_name}:{version}:{source_url}" - return hashlib.sha256(key_data.encode()).hexdigest()[:16] + return hashlib.sha256(key_data.encode()).hexdigest()[:32]
406-409: Importmultiprocessingat module level for efficiency.The
multiprocessingmodule is imported inside thebuildmethod at two different locations (lines 406-409 and 420-423), which is inefficient. Module imports should generally be done at the top of the file for better performance and clarity.🔎 Proposed fix
Add to the imports at the top of the file (after line 20):
import shutil import subprocess import tarfile import tempfile import urllib.request import zipfile +import multiprocessing from dataclasses import dataclassThen remove the local imports and use
multiprocessing.cpu_count()directly at lines 408 and 422.Also applies to: 420-423
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/codeql.ymlcortex/cli.pycortex/source_builder.pycortex/utils/commands.pytests/test_cli_extended.pytests/test_source_builder.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
tests/test_source_builder.pycortex/utils/commands.pytests/test_cli_extended.pycortex/source_builder.pycortex/cli.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_source_builder.pytests/test_cli_extended.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations
Applied to files:
cortex/cli.py
🧬 Code graph analysis (3)
tests/test_source_builder.py (1)
cortex/source_builder.py (10)
BuildConfig(55-66)BuildResult(70-79)_get_cache_key(99-102)detect_build_dependencies(128-161)detect_build_system(296-323)configure_build(325-382)build(384-431)install_build(433-455)_fetch_from_url(184-274)build_from_source(457-614)
tests/test_cli_extended.py (2)
tests/test_cli.py (2)
test_main_install_with_execute(225-237)test_main_install_with_dry_run(241-253)cortex/cli.py (1)
main(1765-2151)
cortex/source_builder.py (7)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (2)
DependencyResolver(40-386)is_package_installed(101-104)cortex/utils/commands.py (3)
CommandResult(120-127)run_command(223-305)validate_command(136-198)cortex/sandbox/sandbox_executor.py (1)
success(74-76)cortex/kernel_features/llm_device.py (1)
open(158-159)tests/test_env_manager.py (1)
temp_dir(44-47)cortex/env_manager.py (1)
load(571-595)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.10)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
🔇 Additional comments (23)
.github/workflows/codeql.yml (1)
28-31: LGTM!The CodeQL workflow updates are appropriate. Using default queries and adding Autobuild aligns with GitHub's recommended CodeQL configuration and should help avoid conflicts with default setups.
tests/test_cli_extended.py (1)
306-314: LGTM!The test assertions are correctly updated to reflect the new
install()method signature withfrom_source,source_url, andversionparameters. The default values (False, None, None) appropriately represent the behavior when these flags are not provided.Also applies to: 322-330, 338-346
tests/test_source_builder.py (8)
26-33: LGTM!The test initialization properly sets up the SourceBuilder instance and verifies that the dependency resolver and cache directory are correctly configured.
35-46: LGTM!The cache key generation test thoroughly validates:
- Deterministic key generation (same inputs → same key)
- Differentiation (different inputs → different key)
- Key format (16 characters as specified)
48-82: LGTM!The dependency detection tests provide excellent coverage across build systems (autotools, cmake, python) and include an important edge case test for already-satisfied dependencies. The mocking strategy properly isolates the package manager checks.
84-117: LGTM!The build system detection tests cover all supported systems (autotools, cmake, make, python) plus the default fallback case. Using actual file creation in temporary directories provides better confidence than pure mocking.
119-152: LGTM!The configure build tests validate:
- Command structure (bash prefix for autotools, cmake for cmake)
- Default arguments (--prefix for autotools, CMAKE_INSTALL_PREFIX for cmake)
- Working directory handling (important for cmake builds)
The tuple return format
(command, working_dir)is properly tested.
241-286: LGTM!The integration tests cover both failure and success paths for the complete build-from-source workflow:
- Dependency installation failure handling
- Successful end-to-end build with all steps coordinated
- Proper mocking of the entire pipeline
The success test properly disables caching (
use_cache=False) to ensure deterministic behavior.
289-316: LGTM!The BuildConfig dataclass tests validate both default values and custom configurations, ensuring the configuration object behaves as expected.
319-341: LGTM!The BUILD_DEPENDENCIES constant tests ensure the dependency structure is correctly defined and contains expected packages for each build system category.
cortex/cli.py (5)
638-640: LGTM!The new parameters for source-based installation are well-designed:
from_source: Boolean flag with appropriate default (False)source_url: Optional URL for custom source locationsversion: Optional version specificationThe type hints are correct and the parameter names are clear.
675-677: LGTM!The early return for the
--from-sourceflow is a clean way to separate the source-building logic from the normal installation path. This avoids complex conditional logic and keeps the main install method focused.
1625-1642: Good edge case handling for version parsing.The version parsing correctly handles the edge case where a user provides
package@(with no version after the@) by checking bothlen(parts) > 1andparts[1]before assigning. This addresses the concern from previous reviews.
1846-1860: LGTM!The CLI argument definitions are well-structured:
--from-source: Clear boolean flag for enabling source builds--source-url: Optional URL parameter for custom source locations--pkg-version: Renamed to avoid conflict with global--versionflag (addresses previous review feedback)The help text is descriptive and guides users appropriately.
2111-2113: LGTM!The parameter passing uses
getattr()with appropriate defaults to safely handle the new CLI arguments and correctly mapsargs.pkg_versionto theversionparameter.cortex/utils/commands.py (1)
114-115: The bash/sh allowlist addition is properly constrained and safe.The security concern is well-mitigated by the implementation:
- All bash/sh invocations use
shlex.quote()for paths and arguments (source_builder.py lines 353-372)- Commands are executed via
shlex.split()(commands.py line 272), which parses into safe argument arrays rather than shell interpretation- The
validate_command()function blocks dangerous patterns and shell metacharacters before execution- Even allowed characters like
&&and||won't execute as operators because they're treated as literal string arguments byshlex.split()No changes needed; the implementation properly defends against injection.
cortex/source_builder.py (7)
54-80: LGTM! Well-structured dataclasses.The
BuildConfigandBuildResultdataclasses are well-designed with proper type hints (using Python 3.10+ syntax) and clear docstrings. The fields appropriately capture build configuration and results.
213-257: LGTM! Comprehensive path traversal protection.The manual filtering for both tarfile and zipfile extraction properly prevents path traversal attacks (CVE-2007-4559 and similar). The implementation correctly:
- Rejects paths with
..components or absolute paths- Resolves paths and validates they remain within the extraction directory
- Handles exceptions gracefully
This addresses the security concerns from previous reviews.
296-323: LGTM! Build system detection logic is sound.The detection order (autotools → cmake → make → python) is appropriate, and the fallback to "autotools" is documented in the docstring. The file checks are straightforward and correct.
325-382: LGTM! Command generation now properly handles validation requirements.The refactored
configure_buildmethod addresses previous command validation concerns by:
- Using
bash(an allowed command prefix) to execute configure scripts instead of./configure- Properly sanitizing arguments with
shlex.quote- Returning tuples of
(command, working_directory)to avoid shell operators likecd ... &&- Using CMake's
-Bstyle with relative paths and working directory contextThis resolves the validation failures mentioned in past reviews.
457-614: LGTM! Well-orchestrated build workflow with comprehensive error handling.The
build_from_sourcemethod properly orchestrates the complete build process with:
- Cache checking to avoid redundant builds
- Source fetching with error handling
- Build system detection
- Dependency resolution and installation
- Phased execution (configure → build → install command generation)
- Proper timeout values (5min for configure, 1hr for build)
- Comprehensive error handling that returns
BuildResultwith detailed error messages- User-friendly progress messages via
cx_printThe dependency installation at lines 532-533 is safe because
missing_depscomes from the controlledBUILD_DEPENDENCIESdictionary.
433-455: No action needed — install commands are executed throughInstallationCoordinator, which properly usesshell=Trueto handle shell operators like&&andcd. The commands will execute correctly as written.Likely an incorrect or invalid review comment.
206-206: Missing timeout on URL download can cause indefinite hangs.
urllib.request.urlretrievedoes not support a timeout parameter and can hang indefinitely if the remote server is unresponsive or the connection stalls. This can cause the build process to freeze without any indication of progress.🔎 Proposed fix using urllib with timeout
- urllib.request.urlretrieve(url, archive_path) + # Use urlopen with timeout for better control + with urllib.request.urlopen(url, timeout=300) as response: + with open(archive_path, 'wb') as out_file: + shutil.copyfileobj(response, out_file)Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cortex/cli.pycortex/source_builder.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/source_builder.pycortex/cli.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations
Applied to files:
cortex/cli.py
🧬 Code graph analysis (1)
cortex/source_builder.py (4)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (2)
DependencyResolver(40-386)is_package_installed(101-104)cortex/utils/commands.py (1)
run_command(223-305)cortex/kernel_features/llm_device.py (1)
open(158-159)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Package
- GitHub Check: test (3.11)
- GitHub Check: test (3.10)
- GitHub Check: test (3.12)
🔇 Additional comments (6)
cortex/cli.py (1)
1627-1772: Excellent audit logging implementation!The
_install_from_sourcemethod implements comprehensive audit logging as requested in past review comments. The implementation properly:
- Records installation start with package name, version, source URL, and commands
- Updates history on success/failure for all execution modes (dry-run, non-execute, execute)
- Wraps all history operations in try-except blocks with appropriate logging
- Provides rollback information on successful installations
This follows the same pattern as the main
install()method and fully addresses the learnings requirement for audit logging to~/.cortex/history.db.Based on learnings, the implementation correctly adds audit logging for source-based installations.
cortex/source_builder.py (5)
92-111: Excellent cache directory initialization with robust error handling!The implementation properly addresses the past review comment about module-level side effects. The three-tier fallback strategy is well-designed:
- Try to create the preferred cache directory
- Fall back to temp directory if that fails
- Disable caching entirely if both fail
This ensures the module can be imported even in restrictive environments without failing.
201-270: Comprehensive SSRF protection implemented!The URL validation thoroughly prevents Server-Side Request Forgery attacks by:
- Restricting schemes to http/https only
- Resolving hostnames to both IPv4 and IPv6 addresses
- Rejecting loopback (127.0.0.1, ::1)
- Rejecting link-local addresses
- Rejecting private RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
- Rejecting cloud metadata service addresses (169.254.169.254)
This fully addresses the past review comment about validating source URLs to prevent SSRF attacks.
310-354: Excellent path traversal protection for archive extraction!The implementation properly prevents CVE-2007-4559 (tarfile) and zip slip attacks by:
- Manually filtering all archive members before extraction
- Rejecting paths with ".." components or absolute paths
- Resolving each member path and verifying it stays within the extract directory
- Applying consistent security checks to both tar.gz and zip formats
The manual filtering approach ensures compatibility across Python versions while maintaining security. The comment at line 314 properly explains why the
filter='data'parameter isn't used (Python 3.12+ only).
422-479: Smart command generation that works with validation constraints!The implementation cleverly addresses command validation constraints by:
- Using
bash configureinstead of./configureto satisfy ALLOWED_COMMAND_PREFIXES (line 450)- Returning tuples with working directories instead of using
cdcommands (line 473)- Properly quoting all arguments with
shlex.quoteto prevent injectionThis design successfully generates validated commands while maintaining full functionality. Great work addressing the past review comments about command validation failures!
1-711: Well-architected source building implementation with strong security.The overall design of this module is excellent:
Strengths:
- Clean separation of concerns (fetch, configure, build, install)
- Comprehensive security measures (SSRF protection, path traversal prevention)
- Robust error handling with graceful degradation (cache directory fallbacks)
- Smart workarounds for command validation constraints
- Proper use of type hints and dataclasses
- Good documentation throughout
Architecture highlights:
- Build caching reduces redundant work
- Automatic build system detection supports multiple ecosystems
- Dependency detection and installation automation
- Extensible design allows adding new build systems easily
The module successfully addresses all past review comments regarding security (SSRF, path traversal, cache initialization) and integrates well with the existing Cortex infrastructure.
| def install_build(self, source_dir: Path, config: BuildConfig) -> list[str]: | ||
| """Generate install commands for built package. | ||
| Args: | ||
| source_dir: Path to source code directory. | ||
| config: Build configuration with options and settings. | ||
| Returns: | ||
| List of install commands to execute (requires sudo). | ||
| """ | ||
| commands = [] | ||
|
|
||
| if config.build_system == "autotools" or config.build_system == "make": | ||
| commands.append("sudo make install") | ||
|
|
||
| elif config.build_system == "cmake": | ||
| build_dir = source_dir / "build" | ||
| commands.append(f"cd {build_dir} && sudo make install") | ||
|
|
||
| elif config.build_system == "python": | ||
| commands.append("sudo python3 setup.py install") | ||
|
|
||
| return commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent command generation pattern may cause validation failures.
The install_build method uses shell operators at line 547 that differ from the pattern established in configure_build and build:
commands.append(f"cd {build_dir} && sudo make install") # Line 547Issues:
- Uses
cdcommand which may not be in ALLOWED_COMMAND_PREFIXES (per past review comments) - Uses
&&shell operator requiring shell execution build_diris not shell-escaped, creating potential injection risk if the path contains special characters- Return type is
list[str]instead oflist[tuple[str, Path | None]]likeconfigure_build, preventing working directory specification
This inconsistency means cmake builds may fail validation when these commands are executed via run_command in cli.py.
♻️ Proposed fix to match configure_build pattern
- def install_build(self, source_dir: Path, config: BuildConfig) -> list[str]:
+ def install_build(self, source_dir: Path, config: BuildConfig) -> list[tuple[str, Path | None]]:
"""Generate install commands for built package.
Args:
source_dir: Path to source code directory.
config: Build configuration with options and settings.
Returns:
- List of install commands to execute (requires sudo).
+ List of tuples (command, working_directory) to execute (requires sudo).
"""
commands = []
if config.build_system == "autotools" or config.build_system == "make":
- commands.append("sudo make install")
+ commands.append(("sudo make install", source_dir))
elif config.build_system == "cmake":
build_dir = source_dir / "build"
- commands.append(f"cd {build_dir} && sudo make install")
+ commands.append(("sudo make install", build_dir))
elif config.build_system == "python":
- commands.append("sudo python3 setup.py install")
+ commands.append(("sudo python3 setup.py install", source_dir))
return commandsThen update build_from_source around line 688 to handle tuples, and update cli.py to extract commands and working directories from the tuples when passing to InstallationCoordinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @cortex/source_builder.py:
- Around line 277-466: The _fetch_from_url function currently leaves temporary
directories around on success; update build_from_source to remove the temporary
source dir after a successful build (use a finally block or post-build cleanup)
by detecting the temp pattern created by _fetch_from_url (e.g., path containing
"cortex-build-") and calling shutil.rmtree(..., ignore_errors=True) safely
inside build_from_source (ensure source_dir is not None and exists), or
alternatively add a cleanup method on BuildResult (e.g., BuildResult.cleanup)
that removes the temp build/source directory and call it at the end of
build_from_source to ensure no orphaned temp directories remain.
- Around line 132-149: _wrap the filesystem operations in _save_to_cache
(creating cache_path, writing metadata.json, touching installed) in a try/except
so IO errors don’t bubble out and break build_from_source; catch broad
OSError/IOError/Exception around the mkdir/json.dump/touch sequence, and on
exception log a warning including the cache_key and error details (use the class
logger or Python logging if available) and then return silently so the
successful build is not treated as failed due to cache save problems. Ensure you
reference cache_path, metadata.json and installed in the log so it’s clear which
cache artifact failed.
🧹 Nitpick comments (4)
cortex/source_builder.py (4)
174-177: Consider refining the Python package name heuristic.The check
if "python" in package_name.lower()is a simple substring match that could produce false positives (e.g., "mypython-tool" or "python-like"). While extra dependencies won't break builds, a more precise check would be cleaner.♻️ Suggested refinement
# Package-specific dependencies (common patterns) - if "python" in package_name.lower(): + # Match common Python package patterns: python, python3, cpython, pypy + pkg_lower = package_name.lower() + if pkg_lower in ("python", "python3", "cpython", "pypy") or \ + pkg_lower.startswith(("python-", "python3-", "cpython-")): required_deps.update(BUILD_DEPENDENCIES["python"])
468-486: Consider making source URL mappings configurable.The
_detect_source_locationmethod hardcodes URLs for only 3 packages. While this provides a reasonable starting point, making this configurable (e.g., loading from a JSON file or package database) would improve extensibility.💡 Suggested approach
Consider moving
common_urlsto a class attribute or external configuration file that can be extended without modifying code:# Could be loaded from ~/.cortex/source_urls.json DEFAULT_SOURCE_URLS = { "python": "https://www.python.org/ftp/python/{version}/Python-{version}.tgz", "nginx": "https://nginx.org/download/nginx-{version}.tar.gz", "redis": "https://download.redis.io/releases/redis-{version}.tar.gz", }
576-623: Consider returning a consistent type for better API clarity.The method returns
list[tuple[str, Path | None] | str], requiring callers to handle both tuples and strings withisinstancechecks. While functional, this is less clean than the consistent tuple return fromconfigure_build.♻️ Proposed fix for type consistency
- def build(self, source_dir: Path, config: BuildConfig) -> list[tuple[str, Path | None] | str]: + def build(self, source_dir: Path, config: BuildConfig) -> list[tuple[str, Path | None]]: """Build the package. Args: source_dir: Path to source code directory. config: Build configuration with options and settings. Returns: - List of build commands to execute. Can be: - - Tuples of (command, working_directory) for commands needing specific cwd - - Strings for commands that use source_dir as cwd + List of tuples (command, working_directory) where working_directory is + None to use source_dir as the default working directory. """ commands = [] if config.build_system == "autotools" or config.build_system == "make": make_cmd = "make" if config.make_args: # Sanitize make args to prevent injection safe_args = [shlex.quote(arg) for arg in config.make_args] make_cmd += " " + " ".join(safe_args) else: # Use parallel builds by default import multiprocessing jobs = multiprocessing.cpu_count() make_cmd += f" -j{jobs}" - commands.append(make_cmd) + commands.append((make_cmd, None)) # None means use source_dir elif config.build_system == "cmake": build_dir = source_dir / "build" make_cmd = "make" if config.make_args: # Sanitize make args safe_args = [shlex.quote(arg) for arg in config.make_args] make_cmd += " " + " ".join(safe_args) else: import multiprocessing jobs = multiprocessing.cpu_count() make_cmd += f" -j{jobs}" - # Use build_dir as working directory instead of cd command - # Return as tuple for consistency with configure_build commands.append((make_cmd, build_dir)) elif config.build_system == "python": - commands.append("python3 setup.py build") + commands.append(("python3 setup.py build", None)) return commandsThen simplify the caller in
build_from_source:build_commands = self.build(source_dir, config) for cmd_tuple in build_commands: - if isinstance(cmd_tuple, tuple): - cmd, work_dir = cmd_tuple - cwd = str(work_dir) if work_dir else str(source_dir) - else: - # Backward compatibility: if it's just a string, use source_dir - cmd = cmd_tuple - cwd = str(source_dir) + cmd, work_dir = cmd_tuple + cwd = str(work_dir) if work_dir else str(source_dir) result = run_command(cmd, cwd=cwd, timeout=3600)
719-736: Consider using shlex.quote for dependency installation command.While
missing_depscontains only hardcoded package names fromBUILD_DEPENDENCIES(not user input), usingshlex.quotewould follow defensive coding best practices.♻️ Suggested improvement
if missing_deps: cx_print(f" Installing: {', '.join(missing_deps)}", "info") - install_cmd = f"sudo apt-get install -y {' '.join(missing_deps)}" + safe_deps = [shlex.quote(dep) for dep in missing_deps] + install_cmd = f"sudo apt-get install -y {' '.join(safe_deps)}" result = run_command(install_cmd, timeout=600)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/spam-protection.ymlcortex/source_builder.py
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/spam-protection.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/source_builder.py
🧬 Code graph analysis (1)
cortex/source_builder.py (3)
cortex/branding.py (1)
cx_print(49-69)cortex/dependency_resolver.py (2)
DependencyResolver(40-386)is_package_installed(101-104)cortex/utils/commands.py (1)
run_command(223-305)
🪛 GitHub Actions: CI
cortex/source_builder.py
[error] 342-342: TypeError in _fetch_from_url: '>' not supported between instances of 'MagicMock' and 'int' while comparing member.size to MAX_FILE_SIZE during tarball fetch.
[error] 466-466: RuntimeError: Failed to fetch source: '>' not supported between instances of 'MagicMock' and 'int' (raised while handling fetch failure).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
🔇 Additional comments (11)
cortex/source_builder.py (11)
1-55: LGTM! Module structure and constants are well-organized.The module header, imports, and constants follow PEP 8 guidelines. The security constants (MAX_EXTRACT_FILES, MAX_EXTRACT_SIZE, etc.) provide good protection against zip bomb attacks, and BUILD_DEPENDENCIES provides sensible defaults for common build systems.
58-84: LGTM! Dataclasses are well-structured.Both
BuildConfigandBuildResultdataclasses have appropriate type hints and clear field names. The use of modern Python| Nonesyntax for optional fields is consistent with the project's Python 3.10+ requirement.
98-116: LGTM! Excellent error handling for cache directory creation.The constructor properly defers cache directory creation to runtime with comprehensive error handling. The multi-level fallback (main cache → temp directory → disable caching) ensures the module remains usable even in restrictive environments.
186-205: LGTM! Clean delegation with comprehensive documentation.The
fetch_sourcemethod provides a clear public API with proper type hints and complete docstring.
207-276: LGTM! Comprehensive SSRF protection.The URL validation method provides robust protection against SSRF attacks by:
- Restricting to http/https schemes
- Resolving hostnames to IPs
- Rejecting loopback, link-local, private, and metadata service addresses
- Handling both IPv4 and IPv6
316-385: LGTM! Comprehensive security protections for tarfile extraction.The tarfile extraction implements multiple layers of security:
- Path traversal prevention (rejecting ".." and absolute paths)
- Resource limits (file count, file size, total size)
- Compression ratio checks (zip bomb protection)
- Path resolution validation
These protections address CVE-2007-4559 and SonarCloud S5042.
386-449: LGTM! Zipfile extraction has equivalent security protections.The zipfile extraction mirrors the tarfile security measures with appropriate checks for the zip format, including per-file compression ratio validation.
342-342: Pipeline failure is a test setup issue, not production code issue.The TypeError
'>' not supported between instances of 'MagicMock' and 'int'occurs because the test mocks don't properly configuremember.sizeas an integer. The production code is correct; the test setup needs to be fixed.In
tests/test_source_builder.py, ensure that when mocking tarfile members, thesizeattribute is set to an integer value:# Example fix in test mock_member = MagicMock() mock_member.size = 1024 # Set as int, not leave as MagicMock mock_member.name = "file.txt"
488-515: LGTM! Build system detection logic is sound.The detection order (autotools → cmake → make → python) is appropriate, with a sensible default to autotools for unrecognized projects.
517-574: LGTM! Configure command generation is secure and consistent.The method properly:
- Uses
bash configureinstead of./configureto comply with ALLOWED_COMMAND_PREFIXES- Quotes all arguments with
shlex.quoteto prevent injection- Returns tuples
(command, working_directory)instead of using shell operators likecd && ...- Handles different build systems appropriately
649-806: Note: This method is affected by issues in install_build.The
build_from_sourcemethod generates install commands viainstall_build(line 783). Sinceinstall_buildcurrently returns commands with shell operators that will fail validation, cmake builds will fail at installation time.The fix for
install_build(see previous comment) will require corresponding updates here to handle the new tuple return type.
| def _save_to_cache(self, cache_key: str, build_dir: Path, install_commands: list[str]) -> None: | ||
| """Save build artifacts to cache.""" | ||
| if self.cache_dir is None: | ||
| return # Caching disabled | ||
| cache_path = self.cache_dir / cache_key | ||
| cache_path.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # Save metadata | ||
| metadata = { | ||
| "build_dir": str(build_dir), | ||
| "install_commands": install_commands, | ||
| "timestamp": str(Path(build_dir).stat().st_mtime), | ||
| } | ||
| with open(cache_path / "metadata.json", "w") as f: | ||
| json.dump(metadata, f, indent=2) | ||
|
|
||
| # Mark as installed | ||
| (cache_path / "installed").touch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for cache save operations.
The _save_to_cache method doesn't catch exceptions from file operations (mkdir, json.dump, touch). If these fail due to filesystem issues, the entire build_from_source operation will fail even though the actual build succeeded.
♻️ Proposed fix
def _save_to_cache(self, cache_key: str, build_dir: Path, install_commands: list[str]) -> None:
"""Save build artifacts to cache."""
if self.cache_dir is None:
return # Caching disabled
- cache_path = self.cache_dir / cache_key
- cache_path.mkdir(parents=True, exist_ok=True)
-
- # Save metadata
- metadata = {
- "build_dir": str(build_dir),
- "install_commands": install_commands,
- "timestamp": str(Path(build_dir).stat().st_mtime),
- }
- with open(cache_path / "metadata.json", "w") as f:
- json.dump(metadata, f, indent=2)
-
- # Mark as installed
- (cache_path / "installed").touch()
+ try:
+ cache_path = self.cache_dir / cache_key
+ cache_path.mkdir(parents=True, exist_ok=True)
+
+ # Save metadata
+ metadata = {
+ "build_dir": str(build_dir),
+ "install_commands": install_commands,
+ "timestamp": str(Path(build_dir).stat().st_mtime),
+ }
+ with open(cache_path / "metadata.json", "w") as f:
+ json.dump(metadata, f, indent=2)
+
+ # Mark as installed
+ (cache_path / "installed").touch()
+ except (OSError, IOError, PermissionError) as e:
+ logger.warning(f"Failed to save build to cache: {e}")
+ # Continue - caching failure shouldn't fail the build🤖 Prompt for AI Agents
In @cortex/source_builder.py around lines 132 - 149, _wrap the filesystem
operations in _save_to_cache (creating cache_path, writing metadata.json,
touching installed) in a try/except so IO errors don’t bubble out and break
build_from_source; catch broad OSError/IOError/Exception around the
mkdir/json.dump/touch sequence, and on exception log a warning including the
cache_key and error details (use the class logger or Python logging if
available) and then return silently so the successful build is not treated as
failed due to cache save problems. Ensure you reference cache_path,
metadata.json and installed in the log so it’s clear which cache artifact
failed.
| def _fetch_from_url(self, url: str, package_name: str, version: str | None) -> Path: | ||
| """Fetch source from a URL. | ||
| Note: The temporary directory is intentionally not cleaned up immediately | ||
| as the returned source directory may be used for building. Cleanup should | ||
| be handled by the caller or system temp file cleanup. | ||
| Raises: | ||
| ValueError: If URL validation fails (SSRF protection). | ||
| RuntimeError: If download or extraction fails. | ||
| """ | ||
| # Validate URL before any processing (SSRF protection) | ||
| self._validate_url(url) | ||
|
|
||
| temp_dir = Path(tempfile.mkdtemp(prefix=f"cortex-build-{package_name}-")) | ||
|
|
||
| try: | ||
| # Download | ||
| cx_print(f"📥 Downloading {package_name} source...", "info") | ||
| archive_path = temp_dir / "source.tar.gz" | ||
|
|
||
| # Modify GitHub URLs if needed (only after validation) | ||
| if url.startswith("https://github.com/"): | ||
| # GitHub release or archive | ||
| if not url.endswith((".tar.gz", ".zip")): | ||
| if version: | ||
| url = f"{url}/archive/refs/tags/v{version}.tar.gz" | ||
| else: | ||
| url = f"{url}/archive/refs/heads/main.tar.gz" | ||
| # Re-validate modified URL | ||
| self._validate_url(url) | ||
|
|
||
| urllib.request.urlretrieve(url, archive_path) | ||
|
|
||
| # Extract | ||
| cx_print("📦 Extracting source...", "info") | ||
| extract_dir = temp_dir / "extracted" | ||
| extract_dir.mkdir() | ||
|
|
||
| if archive_path.suffix == ".gz" or archive_path.suffixes[-2:] == [".tar", ".gz"]: | ||
| with tarfile.open(archive_path, "r:gz") as tar: | ||
| # Prevent path traversal attacks (CVE-2007-4559) | ||
| # Filter members manually for Python < 3.12 compatibility | ||
| # Note: filter='data' parameter is only available in Python 3.12+ | ||
| # This manual filtering provides equivalent security | ||
| # Also enforce resource consumption limits to prevent zip bombs (SonarCloud S5042) | ||
| safe_members = [] | ||
| extract_dir_resolved = extract_dir.resolve() | ||
| total_size = 0 | ||
| file_count = 0 | ||
| archive_size = archive_path.stat().st_size | ||
|
|
||
| for member in tar.getmembers(): | ||
| # Skip files with path traversal components | ||
| if ".." in member.name or member.name.startswith("/"): | ||
| continue | ||
|
|
||
| # Enforce resource consumption limits | ||
| if file_count >= MAX_EXTRACT_FILES: | ||
| raise RuntimeError( | ||
| f"Archive contains too many files (>{MAX_EXTRACT_FILES}). " | ||
| "Extraction aborted to prevent resource exhaustion." | ||
| ) | ||
|
|
||
| # Check individual file size | ||
| if member.size > MAX_FILE_SIZE: | ||
| raise RuntimeError( | ||
| f"File '{member.name}' is too large ({member.size} bytes > " | ||
| f"{MAX_FILE_SIZE} bytes). Extraction aborted." | ||
| ) | ||
|
|
||
| # Check compression ratio (zip bomb protection) | ||
| # For tarfiles, we compare uncompressed size to archive size | ||
| # A small compressed file that expands to huge size is suspicious | ||
| if archive_size > 0 and member.size > 0: | ||
| # Estimate compressed size per file (rough approximation) | ||
| estimated_compressed = archive_size / max(len(tar.getmembers()), 1) | ||
| if estimated_compressed > 0: | ||
| compression_ratio = member.size / estimated_compressed | ||
| if compression_ratio > MAX_COMPRESSION_RATIO: | ||
| raise RuntimeError( | ||
| f"Suspicious compression ratio detected for '{member.name}'. " | ||
| "Extraction aborted to prevent zip bomb attack." | ||
| ) | ||
|
|
||
| # Check total size limit | ||
| total_size += member.size | ||
| if total_size > MAX_EXTRACT_SIZE: | ||
| raise RuntimeError( | ||
| f"Total uncompressed size exceeds limit ({total_size} bytes > " | ||
| f"{MAX_EXTRACT_SIZE} bytes). Extraction aborted." | ||
| ) | ||
|
|
||
| # Normalize and resolve path to prevent traversal | ||
| # This ensures the extracted path stays within extract_dir | ||
| try: | ||
| member_path = (extract_dir / member.name).resolve() | ||
| # Verify resolved path is within extract_dir | ||
| if not str(member_path).startswith(str(extract_dir_resolved)): | ||
| continue | ||
| safe_members.append(member) | ||
| file_count += 1 | ||
| except (ValueError, OSError): | ||
| # Skip invalid paths | ||
| continue | ||
|
|
||
| # Only extract pre-filtered safe members | ||
| if safe_members: | ||
| tar.extractall(extract_dir, members=safe_members) | ||
| elif archive_path.suffix == ".zip": | ||
| with zipfile.ZipFile(archive_path, "r") as zip_ref: | ||
| # Filter out path traversal components for security | ||
| # Prevents zip slip attacks (similar to CVE-2007-4559) | ||
| # Also enforce resource consumption limits to prevent zip bombs (SonarCloud S5042) | ||
| extract_dir_resolved = extract_dir.resolve() | ||
| total_size = 0 | ||
| file_count = 0 | ||
| archive_size = archive_path.stat().st_size | ||
|
|
||
| for member in zip_ref.namelist(): | ||
| # Skip files with path traversal or absolute paths | ||
| if ".." in member or member.startswith("/"): | ||
| continue | ||
|
|
||
| # Get file info for size checking | ||
| try: | ||
| member_info = zip_ref.getinfo(member) | ||
| except KeyError: | ||
| continue | ||
|
|
||
| # Enforce resource consumption limits | ||
| if file_count >= MAX_EXTRACT_FILES: | ||
| raise RuntimeError( | ||
| f"Archive contains too many files (>{MAX_EXTRACT_FILES}). " | ||
| "Extraction aborted to prevent resource exhaustion." | ||
| ) | ||
|
|
||
| # Check individual file size | ||
| if member_info.file_size > MAX_FILE_SIZE: | ||
| raise RuntimeError( | ||
| f"File '{member}' is too large ({member_info.file_size} bytes > " | ||
| f"{MAX_FILE_SIZE} bytes). Extraction aborted." | ||
| ) | ||
|
|
||
| # Check compression ratio (zip bomb protection) | ||
| # Compare uncompressed size to compressed size | ||
| if member_info.compress_size > 0 and member_info.file_size > 0: | ||
| compression_ratio = member_info.file_size / member_info.compress_size | ||
| if compression_ratio > MAX_COMPRESSION_RATIO: | ||
| raise RuntimeError( | ||
| f"Suspicious compression ratio detected for '{member}' " | ||
| f"({compression_ratio:.1f}:1). Extraction aborted to prevent zip bomb attack." | ||
| ) | ||
|
|
||
| # Check total size limit | ||
| total_size += member_info.file_size | ||
| if total_size > MAX_EXTRACT_SIZE: | ||
| raise RuntimeError( | ||
| f"Total uncompressed size exceeds limit ({total_size} bytes > " | ||
| f"{MAX_EXTRACT_SIZE} bytes). Extraction aborted." | ||
| ) | ||
|
|
||
| # Normalize and resolve path to prevent traversal | ||
| try: | ||
| member_path = (extract_dir / member).resolve() | ||
| # Verify resolved path is within extract_dir | ||
| if not str(member_path).startswith(str(extract_dir_resolved)): | ||
| continue | ||
| zip_ref.extract(member, extract_dir) | ||
| file_count += 1 | ||
| except (ValueError, OSError, zipfile.BadZipFile): | ||
| # Skip invalid paths or corrupted zip entries | ||
| continue | ||
|
|
||
| # Find the actual source directory (usually one level deep) | ||
| extracted_items = list(extract_dir.iterdir()) | ||
| if len(extracted_items) == 1 and extracted_items[0].is_dir(): | ||
| # Return the source directory (temp_dir will be cleaned up by system) | ||
| return extracted_items[0] | ||
| else: | ||
| return extract_dir | ||
|
|
||
| except Exception as e: | ||
| # Clean up temp directory on error | ||
| try: | ||
| shutil.rmtree(temp_dir, ignore_errors=True) | ||
| except Exception: | ||
| pass # Best effort cleanup | ||
| logger.exception(f"Failed to fetch source from {url}") | ||
| raise RuntimeError(f"Failed to fetch source: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: temporary directories are not cleaned up after successful builds.
The method creates temporary directories (line 291) but only cleans them up on error (lines 461-464). The comment at lines 280-282 states this is intentional, but build_from_source doesn't clean up the returned source_dir either, leading to accumulation of temporary directories over time.
🔧 Proposed solution
Option 1: Clean up in build_from_source (recommended)
In build_from_source, add cleanup after the build completes:
# In build_from_source method, around line 795
finally:
# Clean up temporary source directory
if source_dir and source_dir.exists() and "cortex-build-" in str(source_dir):
try:
shutil.rmtree(source_dir, ignore_errors=True)
except Exception as e:
logger.debug(f"Failed to clean up temp directory {source_dir}: {e}")Option 2: Return cleanup responsibility to caller
Add a cleanup method to BuildResult and document that callers should clean up:
@dataclass
class BuildResult:
# ... existing fields ...
def cleanup(self) -> None:
"""Clean up temporary build directory."""
if self.build_dir and Path(self.build_dir).exists():
try:
shutil.rmtree(self.build_dir, ignore_errors=True)
except Exception:
pass🧰 Tools
🪛 GitHub Actions: CI
[error] 342-342: TypeError in _fetch_from_url: '>' not supported between instances of 'MagicMock' and 'int' while comparing member.size to MAX_FILE_SIZE during tarball fetch.
[error] 466-466: RuntimeError: Failed to fetch source: '>' not supported between instances of 'MagicMock' and 'int' (raised while handling fetch failure).
🤖 Prompt for AI Agents
In @cortex/source_builder.py around lines 277 - 466, The _fetch_from_url
function currently leaves temporary directories around on success; update
build_from_source to remove the temporary source dir after a successful build
(use a finally block or post-build cleanup) by detecting the temp pattern
created by _fetch_from_url (e.g., path containing "cortex-build-") and calling
shutil.rmtree(..., ignore_errors=True) safely inside build_from_source (ensure
source_dir is not None and exists), or alternatively add a cleanup method on
BuildResult (e.g., BuildResult.cleanup) that removes the temp build/source
directory and call it at the end of build_from_source to ensure no orphaned temp
directories remain.
6365908 to
8b23ce8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/spam-protection.yml:
- Around line 65-77: You introduced an unrelated refactor to the spam-protection
workflow by changing how commentBody is built before calling
github.rest.issues.createComment (variables: commentBody, flags, pr.number);
either remove this change from the current PR so the feature PR stays focused,
or extract the workflow change into its own separate PR; if you decide to keep
it here, simplify the string assembly by using a template literal that
interpolates flags and preserves the checklist and context, and ensure the call
to github.rest.issues.createComment still uses issue_number: pr.number and the
same body variable.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/spam-protection.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Package
- GitHub Check: test (3.11)
- GitHub Check: test (3.10)
- GitHub Check: test (3.12)
| const commentBody = '🔍 **Auto-Review Notice**\n\n' + | ||
| 'This PR was flagged for: ' + flags.join(', ') + '\n\n' + | ||
| 'Please ensure:\n' + | ||
| '- [ ] PR description explains the changes\n' + | ||
| '- [ ] CLA is signed\n' + | ||
| '- [ ] Changes are tested\n\n' + | ||
| 'A maintainer will review shortly.'; | ||
| await github.rest.issues.createComment({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: pr.number, | ||
| body: '🔍 **Auto-Review Notice** | ||
| This PR was flagged for: ' + flags.join(', ') + ' | ||
|
|
||
| Please ensure: | ||
| - [ ] PR description explains the changes | ||
| - [ ] CLA is signed | ||
| - [ ] Changes are tested | ||
|
|
||
| A maintainer will review shortly.' | ||
| body: commentBody |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change in a feature PR.
This refactoring of the spam protection workflow appears unrelated to the PR's stated objectives (adding package build from source functionality). While the code change itself is harmless and slightly improves readability, including unrelated changes makes PRs harder to review and understand.
Consider keeping PRs focused on a single concern. This workflow refactor could either be:
- Removed from this PR, or
- Submitted separately if the refactoring is desired
Optional: Consider template literals for cleaner string construction
If you do keep this change, you could use ES6 template literals for better readability:
- const commentBody = '🔍 **Auto-Review Notice**\n\n' +
- 'This PR was flagged for: ' + flags.join(', ') + '\n\n' +
- 'Please ensure:\n' +
- '- [ ] PR description explains the changes\n' +
- '- [ ] CLA is signed\n' +
- '- [ ] Changes are tested\n\n' +
- 'A maintainer will review shortly.';
+ const commentBody = `🔍 **Auto-Review Notice**
+
+This PR was flagged for: ${flags.join(', ')}
+
+Please ensure:
+- [ ] PR description explains the changes
+- [ ] CLA is signed
+- [ ] Changes are tested
+
+A maintainer will review shortly.`;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const commentBody = '🔍 **Auto-Review Notice**\n\n' + | |
| 'This PR was flagged for: ' + flags.join(', ') + '\n\n' + | |
| 'Please ensure:\n' + | |
| '- [ ] PR description explains the changes\n' + | |
| '- [ ] CLA is signed\n' + | |
| '- [ ] Changes are tested\n\n' + | |
| 'A maintainer will review shortly.'; | |
| await github.rest.issues.createComment({ | |
| owner: context.repo.owner, | |
| repo: context.repo.repo, | |
| issue_number: pr.number, | |
| body: '🔍 **Auto-Review Notice** | |
| This PR was flagged for: ' + flags.join(', ') + ' | |
| Please ensure: | |
| - [ ] PR description explains the changes | |
| - [ ] CLA is signed | |
| - [ ] Changes are tested | |
| A maintainer will review shortly.' | |
| body: commentBody | |
| const commentBody = `🔍 **Auto-Review Notice** | |
| This PR was flagged for: ${flags.join(', ')} | |
| Please ensure: | |
| - [ ] PR description explains the changes | |
| - [ ] CLA is signed | |
| - [ ] Changes are tested | |
| A maintainer will review shortly.`; | |
| await github.rest.issues.createComment({ | |
| owner: context.repo.owner, | |
| repo: context.repo.repo, | |
| issue_number: pr.number, | |
| body: commentBody |
🤖 Prompt for AI Agents
In @.github/workflows/spam-protection.yml around lines 65 - 77, You introduced
an unrelated refactor to the spam-protection workflow by changing how
commentBody is built before calling github.rest.issues.createComment (variables:
commentBody, flags, pr.number); either remove this change from the current PR so
the feature PR stays focused, or extract the workflow change into its own
separate PR; if you decide to keep it here, simplify the string assembly by
using a template literal that interpolates flags and preserves the checklist and
context, and ensure the call to github.rest.issues.createComment still uses
issue_number: pr.number and the same body variable.
Anshgrover23
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aybanda Kindly resolve conflicts.
7c57b9d to
cdd5e0f
Compare
|
hey @Anshgrover23 can you check this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cortex/cli.py (1)
638-647: Document the new parameters in the method docstring.The
install()method signature now includes three new parameters (from_source,source_url,version), but the method lacks a docstring documenting these parameters and their usage.As per coding guidelines, docstrings are required for all public APIs.
📝 Suggested docstring addition
def install( self, software: str, execute: bool = False, dry_run: bool = False, parallel: bool = False, from_source: bool = False, source_url: str | None = None, version: str | None = None, ): + """Install software packages. + + Args: + software: Software package(s) to install + execute: Execute the installation commands + dry_run: Show commands without executing + parallel: Enable parallel execution for multi-step installs + from_source: Build and install from source code + source_url: URL to source code (used with from_source) + version: Version to build (used with from_source) + + Returns: + int: Exit code (0 for success, 1 for failure) + """
🤖 Fix all issues with AI agents
In @cortex/cli.py:
- Around line 2016-2023: Add a comprehensive docstring to the
_install_from_source method: describe its purpose (installing a package from a
source URL), document all parameters (package_name: str, execute: bool, dry_run:
bool, source_url: str | None, version: str | None), explain return value (int
status/exit code), note side effects (invokes build/install, may execute
commands), and list raised exceptions or error cases and any special behavior
for dry_run vs execute; place the docstring immediately below the def
_install_from_source(...) signature in cortex/cli.py using the project’s
docstring style.
🧹 Nitpick comments (3)
cortex/cli.py (3)
2034-2038: Strengthen version parsing validation.The version parsing from
package@versionsyntax is simplistic and doesn't validate the format or handle edge cases.♻️ More robust version parsing
- # Parse version from package name if specified (e.g., python@3.12) - if "@" in package_name and not version: - parts = package_name.split("@") - package_name = parts[0] - version = parts[1] if len(parts) > 1 and parts[1] else None + # Parse version from package name if specified (e.g., python@3.12) + if "@" in package_name and not version: + parts = package_name.split("@", 1) # Split only on first @ + if len(parts) == 2 and parts[0] and parts[1]: + package_name = parts[0].strip() + version = parts[1].strip() + else: + self._print_error(f"Invalid package@version format: {package_name}") + return 1
2305-2319: Consider clarifying flag relationships in help text.The
--source-urland--pkg-versionflags are only meaningful when used with--from-source, but this relationship isn't explicit in the help text.♻️ Clearer help text
install_parser.add_argument( "--source-url", type=str, - help="URL to source code (for --from-source)", + help="URL to source code (requires --from-source)", ) install_parser.add_argument( "--pkg-version", type=str, - help="Version to build (for --from-source)", + help="Version to build (requires --from-source, or use package@version)", )
2044-2051: Consider using a separate metadata field instead of including non-command strings in the commands list.The commands list includes metadata strings like
"Source URL: {source_url}"and"Version: {version}"alongside actual installation command names. While the history system correctly handles these through snapshot-based rollback and ignores them during package extraction, this design is semantically unclear—the field is named "commands" but contains metadata. A separate metadata field would improve code clarity and make the audit trail more maintainable.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/codeql.ymlcortex/cli.pytests/test_cli.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/codeql.yml
- tests/test_cli.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/cli.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations
Applied to files:
cortex/cli.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
- GitHub Check: test (3.11)
🔇 Additional comments (5)
cortex/cli.py (5)
10-11: LGTM: Logger initialization follows best practices.The module-level logger is properly initialized and used appropriately for non-critical warnings in the source build flow.
681-683: LGTM: Source build routing is well-placed.The early routing to
_install_from_source()appropriately bypasses the LLM-based flow and API key requirement when building from source, while still benefiting from input validation.
2027-2030: LGTM: Audit logging properly implemented.The method correctly initializes
InstallationHistoryand tracks installation operations throughout the source build flow, aligning with the requirement to audit log all package operations to~/.cortex/history.db.Based on learnings, this implements the required audit logging pattern.
2062-2084: LGTM: Comprehensive error handling and history tracking.The error handling properly:
- Catches build failures
- Updates history records with error messages
- Uses logger.warning for non-critical failures
- Returns appropriate exit codes
2686-2688: LGTM: Safe parameter forwarding.The use of
getattrwith appropriate defaults provides defensive programming against missing attributes while cleanly forwarding the new parameters to theinstall()method.
| def _install_from_source( | ||
| self, | ||
| package_name: str, | ||
| execute: bool, | ||
| dry_run: bool, | ||
| source_url: str | None, | ||
| version: str | None, | ||
| ) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add comprehensive docstring for this method.
The _install_from_source() method lacks a docstring. Per coding guidelines, docstrings are required for all public APIs, and this method is a significant entry point for source-build functionality.
📝 Suggested docstring
def _install_from_source(
self,
package_name: str,
execute: bool,
dry_run: bool,
source_url: str | None,
version: str | None,
) -> int:
+ """Build and install a package from source.
+
+ Handles the complete source build workflow including dependency detection,
+ build system detection, compilation, and installation. Integrates with
+ InstallationHistory for audit logging.
+
+ Args:
+ package_name: Name of the package to build (supports package@version syntax)
+ execute: Execute installation commands after building
+ dry_run: Show commands without executing
+ source_url: Optional URL to source code (tarball, GitHub, etc.)
+ version: Optional version to build (can also be specified via package@version)
+
+ Returns:
+ int: Exit code (0 for success, 1 for failure)
+ """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _install_from_source( | |
| self, | |
| package_name: str, | |
| execute: bool, | |
| dry_run: bool, | |
| source_url: str | None, | |
| version: str | None, | |
| ) -> int: | |
| def _install_from_source( | |
| self, | |
| package_name: str, | |
| execute: bool, | |
| dry_run: bool, | |
| source_url: str | None, | |
| version: str | None, | |
| ) -> int: | |
| """Build and install a package from source. | |
| Handles the complete source build workflow including dependency detection, | |
| build system detection, compilation, and installation. Integrates with | |
| InstallationHistory for audit logging. | |
| Args: | |
| package_name: Name of the package to build (supports package@version syntax) | |
| execute: Execute installation commands after building | |
| dry_run: Show commands without executing | |
| source_url: Optional URL to source code (tarball, GitHub, etc.) | |
| version: Optional version to build (can also be specified via package@version) | |
| Returns: | |
| int: Exit code (0 for success, 1 for failure) | |
| """ |
🤖 Prompt for AI Agents
In @cortex/cli.py around lines 2016 - 2023, Add a comprehensive docstring to the
_install_from_source method: describe its purpose (installing a package from a
source URL), document all parameters (package_name: str, execute: bool, dry_run:
bool, source_url: str | None, version: str | None), explain return value (int
status/exit code), note side effects (invokes build/install, may execute
commands), and list raised exceptions or error cases and any special behavior
for dry_run vs execute; place the docstring immediately below the def
_install_from_source(...) signature in cortex/cli.py using the project’s
docstring style.
✅ Addressed in commit bd4027b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aybanda Address this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
cortex/cli.py (3)
2024-2072: Condense the docstring significantly.This 48-line docstring is excessively verbose and violates typical Python/Google style guidelines. While comprehensive documentation is valuable, docstrings should be concise. Consider:
- Reducing the Args/Returns/Side Effects sections to essential information
- Moving detailed usage examples and special behaviors to module-level docs or SOURCE_BUILD.md
- Keeping the docstring under 20-25 lines
Example condensed version
- def _install_from_source( - self, - package_name: str, - execute: bool, - dry_run: bool, - source_url: str | None, - version: str | None, - ) -> int: - """Install a package from a source URL by building and optionally installing it. - - This method handles the complete workflow for installing packages from source code: - parsing version information, building the package, and optionally executing - installation commands. It supports dry-run mode for previewing operations and - records all activities in the installation history for audit purposes. - - Args: - package_name: Name of the package to install. If version is specified - using "@" syntax (e.g., "python@3.12"), it will be parsed automatically - if version parameter is None. - execute: If True, executes the installation commands after building. - If False, only builds the package and displays commands without executing. - dry_run: If True, performs a dry run showing what commands would be executed - without actually building or installing. Takes precedence over execute. - source_url: Optional URL to the source code repository or tarball. - If None, the SourceBuilder will attempt to locate the source automatically. - version: Optional version string to build. If None and package_name contains - "@", the version will be extracted from package_name. - - Returns: - int: Exit status code. Returns 0 on success (build/install completed or - dry-run completed), 1 on failure (build failed or installation failed). - - Side Effects: - - Invokes SourceBuilder.build_from_source() to build the package - - May execute installation commands via InstallationCoordinator if execute=True - - Records installation start, progress, and completion in InstallationHistory - - Prints status messages and progress to console - - May use cached builds if available - - Raises: - No exceptions are raised directly, but underlying operations may fail: - - SourceBuilder.build_from_source() failures are caught and returned as status 1 - - InstallationCoordinator.execute() failures are caught and returned as status 1 - - InstallationHistory exceptions are caught and logged as warnings - - Special Behavior: - - dry_run=True: Shows build/install commands without executing any operations. - Returns 0 after displaying commands. Installation history is still recorded. - - execute=False, dry_run=False: Builds the package and displays install commands - but does not execute them. Returns 0. User is prompted to run with --execute. - - execute=True, dry_run=False: Builds the package and executes all installation - commands. Returns 0 on success, 1 on failure. - - Version parsing: If package_name contains "@" (e.g., "python@3.12") and version - is None, the version is automatically extracted and package_name is updated. - - Caching: Uses cached builds when available, printing a notification if cache - is used. - """ + def _install_from_source( + self, + package_name: str, + execute: bool, + dry_run: bool, + source_url: str | None, + version: str | None, + ) -> int: + """Build and install a package from source code. + + Supports version pinning (package@version syntax), source URLs, dry-run mode, + and build caching. Records all operations to installation history for audit/rollback. + + Args: + package_name: Package name, optionally with "@version" suffix. + execute: Execute install commands after building. + dry_run: Preview commands without building or installing. + source_url: Optional source repository or tarball URL. + version: Optional version to build (overridden by package_name@version). + + Returns: + 0 on success, 1 on failure. + """
2166-2180: Extract duplicated coordinator setup to reduce code duplication.The progress callback (lines 2166-2172) and InstallationCoordinator setup (lines 2174-2180) are duplicated from the main
install()method (lines 725-732, 823-829). This violates the DRY principle and makes maintenance harder.Consider extracting this into a helper method like
_execute_with_coordinator(commands, descriptions, package_name)that bothinstall()and_install_from_source()can call.Proposed helper method
def _execute_with_coordinator( self, commands: list[str], descriptions: list[str], timeout: int = 600 ) -> tuple[bool, str | None]: """Execute commands using InstallationCoordinator with progress reporting. Returns: Tuple of (success: bool, error_message: str | None) """ def progress_callback(current: int, total: int, step: InstallationStep) -> None: status_emoji = "⏳" if step.status == StepStatus.SUCCESS: status_emoji = "✅" elif step.status == StepStatus.FAILED: status_emoji = "❌" console.print(f"[{current}/{total}] {status_emoji} {step.description}") coordinator = InstallationCoordinator( commands=commands, descriptions=descriptions, timeout=timeout, stop_on_error=True, progress_callback=progress_callback, ) result = coordinator.execute() return result.success, result.error_messageThen use it in both methods.
2176-2176: Provide distinct descriptions for each install command.All install commands receive the same description
f"Install {package_name}", making progress output non-descriptive when multiple commands are executed. Consider using more specific descriptions like:
f"Install dependencies for {package_name}"f"Install {package_name} binary"- Or derive descriptions from the command itself
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cortex/cli.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/cli.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : Implement audit logging to ~/.cortex/history.db for all package operations
Applied to files:
cortex/cli.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.11)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.10)
🔇 Additional comments (4)
cortex/cli.py (4)
638-647: LGTM: Clean signature extension.The new parameters (
from_source,source_url,version) are properly typed with sensible defaults, maintaining backward compatibility.
681-683: LGTM: Clean routing logic.The early return when
from_source=Trueproperly delegates to the new source-build path and prevents mixing with the LLM-based installation flow.
2353-2367: LGTM: CLI arguments are well-defined.The new
--from-source,--source-url, and--pkg-versionarguments are properly configured with clear help text and appropriate types.
2734-2736: LGTM: Safe argument forwarding.Using
getattrwith default values provides defensive handling and ensures the code won't fail if arguments are missing, though argparse should always set these attributes.
| # Include source URL in the commands list to track it | ||
| commands = [] | ||
| if source_url: | ||
| commands.append(f"Source URL: {source_url}") | ||
| commands.append(f"Build from source: {package_name}") | ||
| if version: | ||
| commands.append(f"Version: {version}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix commands list to contain only executable commands.
The commands list includes metadata strings like "Source URL: ...", "Build from source: ...", and "Version: ...". These are not executable shell commands and will cause issues:
InstallationHistory._extract_packages_from_commands()expects actual commands- Rollback functionality may fail or produce incorrect results
- History display will show non-commands mixed with real commands
Store metadata separately or in a structured format rather than mixing it with executable commands.
Suggested fix
# Prepare commands list for history recording
- # Include source URL in the commands list to track it
commands = []
- if source_url:
- commands.append(f"Source URL: {source_url}")
- commands.append(f"Build from source: {package_name}")
- if version:
- commands.append(f"Version: {version}")
+ # Commands will be populated after build with actual install commands
+ # Metadata (source_url, version) could be stored in a separate field if InstallationHistory supports itThen after line 2138:
- # Add install commands to the commands list for history
- commands.extend(result.install_commands)
+ # Use actual install commands from build result
+ commands = result.install_commandsCommittable suggestion skipped: line range outside the PR's diff.
| except Exception as e: | ||
| logger.warning(f"Failed to record installation start: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider surfacing history recording failures to the user.
History recording failures are caught with broad Exception handling and only logged as warnings (also at lines 2148-2149, 2161-2162, 2193, 2206-2207). Users won't see these failures in CLI output, leading to silent audit logging failures.
While failing the entire operation on history errors may be too strict, consider:
- Displaying a warning to the user via
cx_print(..., "warning") - Tracking whether history logging succeeded and mentioning it in the final success/failure message
Based on learnings, audit logging to ~/.cortex/history.db is a required feature for all package operations.
Anshgrover23
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aybanda Kindly address coderabbtAI comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this one changed ?
| def _install_from_source( | ||
| self, | ||
| package_name: str, | ||
| execute: bool, | ||
| dry_run: bool, | ||
| source_url: str | None, | ||
| version: str | None, | ||
| ) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aybanda Address this one.
|


Summary
Implements the ability to build and install packages from source code when binaries are unavailable. This feature includes automatic build dependency detection, build system detection, build caching, and comprehensive error handling.
Related Issue
Closes #101
Type of Change
AI Disclosure
AI Tools Used:
Changes Made
Core Implementation
`cortex/source_builder.py - Core source building module
cortex/cli.py- CLI integration--from-sourceflag to install command--source-urloption for custom source locations--versionoption for version pinning_install_from_source()methodtests/test_source_builder.py- Comprehensive test suitedocs/SOURCE_BUILD.md- Complete documentationFeatures
Testing
Test Coverage
tests/test_source_builder.pyManual Testing
Code Quality
Documentation
docs/SOURCE_BUILD.mdwith:Example Usage
Security Considerations
run_commandutility with validationSummary by CodeRabbit
New Features
Documentation
Tests
Chores
Bug Fix / Style
✏️ Tip: You can customize this high-level summary in your review settings.