codetoprompt is a powerful command-line tool that transforms local codebases, GitHub repositories, web pages, and online documents into a single, context-rich prompt optimized for Large Language Models (LLMs).
It streamlines the process of providing comprehensive context to LLMs by intelligently selecting, compressing, and formatting project files and remote content.
- Universal Context Sources: Ingests code from local directories, GitHub repos, web pages, YouTube transcripts, ArXiv papers, and PDFs.
- Intelligent Code Compression: Uses
tree-sitterto parse code into a structural summary, drastically reducing token count while preserving the high-level architecture. - Interactive TUI Mode: Launch a fast, lazy-loaded terminal UI to visually select the exact files and directories you need.
- Flexible Output Formats: Generate prompts in a simple default format, as a single Markdown file, or in Claude-friendly XML.
- Automatic File Handling: Natively processes Jupyter Notebooks, samples large data files (like
.csvor.json), and respects your.gitignorerules. - Powerful Filtering: Fine-tune your context with
--includeand--excludeglob patterns. - In-Depth Analysis: Run the
analysecommand to get a full breakdown of your project's languages, token counts, and file sizes before generating a prompt. - Snapshots and Diffs: Save a JSON snapshot of a project and generate a unified diff against it. Diff is copied to clipboard by default (summary only shown in terminal), or written to a file with
--output.
Install from PyPI:
pip install codetopromptFor clipboard functionality on Linux, you may need to install xclip or wl-clipboard:
# Debian/Ubuntu
sudo apt-get install xclip
# Arch Linux
sudo pacman -S xclipThe two core commands are codetoprompt (or ctp) for generating prompts and analyse for inspecting your project.
Scan your current project, respect .gitignore, and copy a context-rich prompt to your clipboard.
# Long version
codetoprompt .
# Short version
ctp .Pass a supported URL to fetch and process remote content automatically.
# From a GitHub Repository
ctp https://github.com/yash9439/codetoprompt
# From a documentation page
ctp https://python-poetry.org/docs/
# From a YouTube video transcript
ctp https://www.youtube.com/watch?v=cAkMcPfY_Ns# Save a JSON snapshot of the current project
codetoprompt snapshot . --output snap.json# Copies the full diff to the clipboard; terminal shows only a summary
codetoprompt diff . --snapshot snap.json
# Save the full diff to a file instead of copying to clipboard
codetoprompt diff . --snapshot snap.json --output diff.txtcodetoprompt can pull in context from almost anywhere.
| Source Type | Example Command | Description |
|---|---|---|
| Local Directory | ctp path/to/your/project |
Scans a local codebase, respecting .gitignore and applying filters. |
| GitHub Repo | ctp https://github.com/user/repo |
Fetches all text-based files and builds a complete project prompt. |
| Web Page | ctp https://en.wikipedia.org/wiki/API |
Strips boilerplate and extracts the core text content. |
| YouTube Video | ctp <youtube_url> |
Automatically extracts the full video transcript. |
| ArXiv Paper | ctp https://arxiv.org/abs/2203.02155 |
Downloads the full PDF from an abstract page and extracts its text. |
| PDF Document | ctp <url_to_pdf> |
Directly downloads and extracts text from any public PDF link. |
| Jupyter Notebook | (automatic) |
.ipynb files in a local project are automatically converted to Python code. |
For ultimate control, the --interactive flag launches a Terminal User Interface (TUI) allowing you to manually select which files and directories to include. It's perfect for cherry-picking specific features or excluding noisy test files.
ctp . --interactiveOptimized for Large Projects: The interactive file tree uses lazy loading, meaning it only loads a directory's contents when you expand it. This keeps the interface fast and responsive, even in massive codebases.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ |
โ FileSelectorApp |
โ Navigate: โ/โ/w/s | Expand/Collapse: โ/a/d |
| Toggle Select: Space | Confirm: Enter |
โ โ = All selected | - = Some selected | โฆ = None selected |
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ |
โ โถ ๐ .github |
โ โผ - ๐ codetoprompt |
โ > โผ โ ๐ compressor |
โ โ ๐ analysers |
โ โ ๐ formatters |
โ โ ๐ __init__.py |
โ โ ๐ compressor.py |
โ โฆ ๐ __init__.py |
โ โฆ ๐ analysis.py |
โ โฆ ๐ arg_parser.py |
โ โฆ ๐ cli.py |
โ โฆ ๐ config.py |
โ โฆ ๐ core.py |
โ โฆ ๐ interactive.py |
โ โฆ ๐ utils.py |
โ โฆ ๐ version.py |
โ โถ ๐ codetoprompt.egg-info |
โ โถ ๐ tests |
โ โฆ ๐ .gitignore |
โ โฆ ๐ CHANGELOG.md |
โ โฆ ๐ CONTRIBUTING.md |
โ โฆ ๐ LICENSE |
โ โฆ ๐ MANIFEST.in |
โ โฆ ๐ pyproject.toml |
โ โฆ ๐ pytest.ini |
โ โฆ ๐ README.md |
โ |
โ q Quit |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
For large codebases, the --compress flag is essential. It analyzes supported code files and generates a high-level summary instead of including the full code, drastically reducing the final token count.
ctp . --compressSupported Languages: Python, JavaScript, TypeScript, Java, C, C++, and Rust. Other files (like README.md) are included in full.
Example Compressed Output for a Python File:
# File: codetoprompt/core.py
# Language: python
## Imports:
- import platform
- from pathlib import Path
- ...
## Classes:
### class CodeToPrompt:
"""Convert code files to prompt format."""
def __init__(self, root_dir, ...): ...
def generate_prompt(self, progress): ...
def analyse(self, progress, top_n): ...
Automatic Data File Handling: To further manage token count,
codetopromptautomatically detects common data files (like.csv,.json) and includes only the first 5 lines.
Tailor the output for different LLMs or use cases using format flags.
| Format | Flag | Description |
|---|---|---|
| Default | (none) | Clean, human-readable format with file paths and fenced code blocks. |
| Markdown | --markdown or -m |
A single Markdown document, great for viewing or sharing. |
| Claude XML | --cxml or -c |
Wraps each file in <document> tags, a format Claude models handle exceptionally well. |
Example CXML Output (-c):
<documents>
<document index="1">
<source>main.py</source>
<document_content>
def main():
print("Hello, World!")
</document_content>
</document>
</documents>Before generating a prompt, get a high-level overview of your local project's composition and token count. This helps you decide which filters or compression strategies to apply.
ctp analyse .Example Analysis:
โญโโโโโโโโโโโโโโโโโโ Codebase Analysis โโโโโโโโโโโโโโโโโโฎ
โ Configuration for this run: โ
โ Root Directory: . โ
โ Include Patterns: ['*'] โ
โ Exclude Patterns: [] โ
โ Respect .gitignore: True โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Overall Project Summary โโฎ
โ Total Files: 47 โ
โ Total Lines: 6,033 โ
โ Total Tokens: 49,834 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Analysis by File Type (Top 10)
โโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโณโโโโโโโโณโโโโโโโโโโโโโโโโโโ
โ Extension โ Files โ Tokens โ Lines โ Avg Tokens/File โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ .py โ 32 โ 37,901 โ 4,650 โ 1,184 โ
โ .md โ 3 โ 5,069 โ 544 โ 1,690 โ
โ .<no_ext> โ 3 โ 4,807 โ 559 โ 1,602 โ
โ .toml โ 1 โ 827 โ 117 โ 827 โ
โ .txt โ 4 โ 582 โ 68 โ 146 โ
โ .yml โ 1 โ 361 โ 56 โ 361 โ
โ .yaml โ 1 โ 229 โ 30 โ 229 โ
โ .ini โ 1 โ 45 โ 6 โ 45 โ
โ .in โ 1 โ 13 โ 3 โ 13 โ
โโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโโโโโโ
Largest Files by Tokens (Top 10)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโณโโโโโโโโ
โ File Path โ Tokens โ Lines โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ codetoprompt/core.py โ 5,037 โ 538 โ
โ codetoprompt.egg-info/PKG-INFO โ 4,453 โ 493 โ
โ README.md โ 2,910 โ 285 โ
โ codetoprompt/compressor/analysers/cpp.py โ 2,276 โ 272 โ
โ codetoprompt/compressor/analysers/rust.py โ 2,214 โ 271 โ
โ codetoprompt/interactive.py โ 2,125 โ 270 โ
โ codetoprompt/compressor/analysers/java.py โ 2,090 โ 245 โ
โ codetoprompt/cli.py โ 1,877 โ 207 โ
โ tests/test_core.py โ 1,743 โ 179 โ
โ CHANGELOG.md โ 1,701 โ 183 โ
Here is the full list of options for the main codetoprompt command.
| Option | Alias | Description | Scope |
|---|---|---|---|
--output <file> |
Save the prompt to a file instead of the clipboard. | All | |
--markdown |
-m |
Format output as a single Markdown document. | All |
--cxml |
-c |
Format output using Claude-friendly XML tags. | All |
--max-tokens <num> |
Warn if token count exceeds this limit. Does not truncate. | All | |
--include <pats> |
Comma-separated glob patterns for files to include (e.g., ".py,.js"). | Local | |
--exclude <pats> |
Comma-separated glob patterns for files to exclude (e.g., ".pyc,dist/"). | Local | |
--interactive |
-i |
Launch an interactive TUI to select files. | Local |
--compress |
Use smart code compression to summarize files. | Local | |
--show-line-numbers |
Prepend line numbers to code. | Local | |
--respect-gitignore |
Respect .gitignore rules (default). Use --no-respect-gitignore to disable. |
Local | |
--tree-depth <num> |
Set the maximum depth for the project structure tree. | Local | |
--version |
-v |
Display the installed version number. | N/A |
--help |
-h |
Show the help message and exit. | N/A |
- Analyse:
codetoprompt analyse <PATH> [--include ...] [--exclude ...] - Snapshot:
codetoprompt snapshot <PATH> --output <snapshot.json> [--include ...] [--exclude ...] [--respect-gitignore|--no-respect-gitignore] - Diff:
codetoprompt diff <PATH> --snapshot <snapshot.json> [--use-snapshot-filters] [--include ...] [--exclude ...] [--output <file>]
Set your preferred defaults once using the config command. Settings are saved in ~/.config/codetoprompt/config.toml.
- Interactive Wizard:
ctp config - Show Current Config:
ctp config --show - Reset to Defaults:
ctp config --reset
Additional snapshot-related settings:
- Snapshot Max Bytes:
snapshot_max_bytes(default: 3 MB). If a text file exceeds this size, its content is not inlined into the snapshot. - Snapshot Max Lines:
snapshot_max_lines(default: 20,000). If a text file exceeds this line count, its content is not inlined into the snapshot.
Snapshot always requires
--output. Diff copies to clipboard by default (summary only printed to terminal). Provide--output <file>to write the diff to a file instead of copying.
Use codetoprompt programmatically in your own Python scripts for maximum flexibility.
from codetoprompt import CodeToPrompt
# conceptual imports for a full use-case
# from some_llm_library import LlmClient
# 1. Process a local directory with compression and XML format
ctp_local = CodeToPrompt(
target="path/to/your/project",
compress=True,
output_format="cxml",
exclude_patterns=["tests/*", "docs/*"]
)
prompt = ctp_local.generate_prompt()
analysis = ctp_local.analyse()
print(f"Project Analysis: {analysis['overall']}")
print(f"Generated a prompt with {ctp_local.get_token_count()} tokens.")
# 2. Conceptually, you'd then use this with an LLM client
# client = LlmClient(api_key="...")
# response = client.completions.create(
# model="claude-3-opus-20240229", # CXML is great for Claude
# messages=[
# {"role": "user", "content": f"Here is a codebase:\n{prompt}\nPlease explain the main purpose of the `core.py` file."},
# ]
# )
# print(response)
# 3. Process a remote URL
ctp_remote = CodeToPrompt(target="https://github.com/yash9439/codetoprompt")
remote_prompt = ctp_remote.generate_prompt()
print(f"Generated a prompt from GitHub with {ctp_remote.get_token_count()} tokens.")We welcome contributions! Please see CONTRIBUTING.md for development setup and guidelines. Feel free to open a PR or issue to get started.
This project is licensed under the MIT License. See the LICENSE file for full details.