Skip to content

Transform any codebase, web page, or document into an optimized LLM prompt. CodeToPrompt intelligently compresses code and filters content to overcome context window limits.

License

Notifications You must be signed in to change notification settings

yash9439/codetoprompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CodeToPrompt

CI PyPI version PyPI Downloads PyPI - Python Version License: MIT

codetoprompt is a powerful command-line tool that transforms local codebases, GitHub repositories, web pages, and online documents into a single, context-rich prompt optimized for Large Language Models (LLMs).

It streamlines the process of providing comprehensive context to LLMs by intelligently selecting, compressing, and formatting project files and remote content.


โœจ Key Features

  • Universal Context Sources: Ingests code from local directories, GitHub repos, web pages, YouTube transcripts, ArXiv papers, and PDFs.
  • Intelligent Code Compression: Uses tree-sitter to parse code into a structural summary, drastically reducing token count while preserving the high-level architecture.
  • Interactive TUI Mode: Launch a fast, lazy-loaded terminal UI to visually select the exact files and directories you need.
  • Flexible Output Formats: Generate prompts in a simple default format, as a single Markdown file, or in Claude-friendly XML.
  • Automatic File Handling: Natively processes Jupyter Notebooks, samples large data files (like .csv or .json), and respects your .gitignore rules.
  • Powerful Filtering: Fine-tune your context with --include and --exclude glob patterns.
  • In-Depth Analysis: Run the analyse command to get a full breakdown of your project's languages, token counts, and file sizes before generating a prompt.
  • Snapshots and Diffs: Save a JSON snapshot of a project and generate a unified diff against it. Diff is copied to clipboard by default (summary only shown in terminal), or written to a file with --output.

๐Ÿ”ง Installation

Install from PyPI:

pip install codetoprompt

For clipboard functionality on Linux, you may need to install xclip or wl-clipboard:

# Debian/Ubuntu
sudo apt-get install xclip

# Arch Linux
sudo pacman -S xclip

๐Ÿš€ Quick Start

The two core commands are codetoprompt (or ctp) for generating prompts and analyse for inspecting your project.

1. Generate a Prompt from a Local Codebase

Scan your current project, respect .gitignore, and copy a context-rich prompt to your clipboard.

# Long version
codetoprompt .

# Short version
ctp .

2. Generate a Prompt from any URL

Pass a supported URL to fetch and process remote content automatically.

# From a GitHub Repository
ctp https://github.com/yash9439/codetoprompt

# From a documentation page
ctp https://python-poetry.org/docs/

# From a YouTube video transcript
ctp https://www.youtube.com/watch?v=cAkMcPfY_Ns

3. Create a Snapshot (local only)

# Save a JSON snapshot of the current project
codetoprompt snapshot . --output snap.json

4. Diff Against a Snapshot (local only)

# Copies the full diff to the clipboard; terminal shows only a summary
codetoprompt diff . --snapshot snap.json

# Save the full diff to a file instead of copying to clipboard
codetoprompt diff . --snapshot snap.json --output diff.txt

๐Ÿง  Features in Detail

1. Universal Context Gathering

codetoprompt can pull in context from almost anywhere.

Source Type Example Command Description
Local Directory ctp path/to/your/project Scans a local codebase, respecting .gitignore and applying filters.
GitHub Repo ctp https://github.com/user/repo Fetches all text-based files and builds a complete project prompt.
Web Page ctp https://en.wikipedia.org/wiki/API Strips boilerplate and extracts the core text content.
YouTube Video ctp <youtube_url> Automatically extracts the full video transcript.
ArXiv Paper ctp https://arxiv.org/abs/2203.02155 Downloads the full PDF from an abstract page and extracts its text.
PDF Document ctp <url_to_pdf> Directly downloads and extracts text from any public PDF link.
Jupyter Notebook (automatic) .ipynb files in a local project are automatically converted to Python code.

2. Interactive File Selection (--interactive or -i)

For ultimate control, the --interactive flag launches a Terminal User Interface (TUI) allowing you to manually select which files and directories to include. It's perfect for cherry-picking specific features or excluding noisy test files.

ctp . --interactive

Optimized for Large Projects: The interactive file tree uses lazy loading, meaning it only loads a directory's contents when you expand it. This keeps the interface fast and responsive, even in massive codebases.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                                     |
โ”‚                        FileSelectorApp                              |
โ”‚           Navigate: โ†‘/โ†“/w/s  | Expand/Collapse: โ†/a/d               |
|               Toggle Select: Space | Confirm: Enter                 |
โ”‚       โœ“ = All selected | - = Some selected | โ—ฆ = None selected      |
โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚                                                                     |
โ”‚ โ–ถ ๐Ÿ“ .github                                                        |
โ”‚ โ–ผ - ๐Ÿ“ codetoprompt                                                 |
โ”‚ >   โ–ผ โœ“ ๐Ÿ“ compressor                                               |
โ”‚         โœ“ ๐Ÿ“ analysers                                              |
โ”‚         โœ“ ๐Ÿ“ formatters                                             |
โ”‚         โœ“ ๐Ÿ“„ __init__.py                                            |
โ”‚         โœ“ ๐Ÿ“„ compressor.py                                          |
โ”‚     โ—ฆ ๐Ÿ“„ __init__.py                                                |
โ”‚     โ—ฆ ๐Ÿ“„ analysis.py                                                |
โ”‚     โ—ฆ ๐Ÿ“„ arg_parser.py                                              |
โ”‚     โ—ฆ ๐Ÿ“„ cli.py                                                     |
โ”‚     โ—ฆ ๐Ÿ“„ config.py                                                  |
โ”‚     โ—ฆ ๐Ÿ“„ core.py                                                    |
โ”‚     โ—ฆ ๐Ÿ“„ interactive.py                                             |
โ”‚     โ—ฆ ๐Ÿ“„ utils.py                                                   |
โ”‚     โ—ฆ ๐Ÿ“„ version.py                                                 |
โ”‚ โ–ถ ๐Ÿ“ codetoprompt.egg-info                                          |
โ”‚ โ–ถ ๐Ÿ“ tests                                                          |
โ”‚ โ—ฆ ๐Ÿ“„ .gitignore                                                     |
โ”‚ โ—ฆ ๐Ÿ“„ CHANGELOG.md                                                   |
โ”‚ โ—ฆ ๐Ÿ“„ CONTRIBUTING.md                                                |
โ”‚ โ—ฆ ๐Ÿ“„ LICENSE                                                        |
โ”‚ โ—ฆ ๐Ÿ“„ MANIFEST.in                                                    |
โ”‚ โ—ฆ ๐Ÿ“„ pyproject.toml                                                 |
โ”‚ โ—ฆ ๐Ÿ“„ pytest.ini                                                     |
โ”‚ โ—ฆ ๐Ÿ“„ README.md                                                      |
โ”‚                                                                     |
โ”‚ q Quit                                                              |
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. Smart Code Compression (--compress)

For large codebases, the --compress flag is essential. It analyzes supported code files and generates a high-level summary instead of including the full code, drastically reducing the final token count.

ctp . --compress

Supported Languages: Python, JavaScript, TypeScript, Java, C, C++, and Rust. Other files (like README.md) are included in full.

Example Compressed Output for a Python File:

# File: codetoprompt/core.py
# Language: python

## Imports:
- import platform
- from pathlib import Path
- ...

## Classes:
### class CodeToPrompt:
    """Convert code files to prompt format."""
    def __init__(self, root_dir, ...): ...
    def generate_prompt(self, progress): ...
    def analyse(self, progress, top_n): ...

Automatic Data File Handling: To further manage token count, codetoprompt automatically detects common data files (like .csv, .json) and includes only the first 5 lines.

4. Adaptable Output Formats

Tailor the output for different LLMs or use cases using format flags.

Format Flag Description
Default (none) Clean, human-readable format with file paths and fenced code blocks.
Markdown --markdown or -m A single Markdown document, great for viewing or sharing.
Claude XML --cxml or -c Wraps each file in <document> tags, a format Claude models handle exceptionally well.

Example CXML Output (-c):

<documents>
  <document index="1">
    <source>main.py</source>
    <document_content>
def main():
    print("Hello, World!")
    </document_content>
  </document>
</documents>

5. In-Depth Project Analysis (analyse)

Before generating a prompt, get a high-level overview of your local project's composition and token count. This helps you decide which filters or compression strategies to apply.

ctp analyse .

Example Analysis:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Codebase Analysis โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Configuration for this run:                          โ”‚
โ”‚ Root Directory: .                                    โ”‚
โ”‚ Include Patterns: ['*']                              โ”‚
โ”‚ Exclude Patterns: []                                 โ”‚
โ”‚ Respect .gitignore: True                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€ Overall Project Summary โ”€โ•ฎ
โ”‚ Total Files: 47           โ”‚
โ”‚ Total Lines: 6,033        โ”‚
โ”‚ Total Tokens: 49,834      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
             Analysis by File Type (Top 10)             
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Extension โ”ƒ Files โ”ƒ Tokens โ”ƒ Lines โ”ƒ Avg Tokens/File โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ .py       โ”‚    32 โ”‚ 37,901 โ”‚ 4,650 โ”‚           1,184 โ”‚
โ”‚ .md       โ”‚     3 โ”‚  5,069 โ”‚   544 โ”‚           1,690 โ”‚
โ”‚ .<no_ext> โ”‚     3 โ”‚  4,807 โ”‚   559 โ”‚           1,602 โ”‚
โ”‚ .toml     โ”‚     1 โ”‚    827 โ”‚   117 โ”‚             827 โ”‚
โ”‚ .txt      โ”‚     4 โ”‚    582 โ”‚    68 โ”‚             146 โ”‚
โ”‚ .yml      โ”‚     1 โ”‚    361 โ”‚    56 โ”‚             361 โ”‚
โ”‚ .yaml     โ”‚     1 โ”‚    229 โ”‚    30 โ”‚             229 โ”‚
โ”‚ .ini      โ”‚     1 โ”‚     45 โ”‚     6 โ”‚              45 โ”‚
โ”‚ .in       โ”‚     1 โ”‚     13 โ”‚     3 โ”‚              13 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               Largest Files by Tokens (Top 10)               
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ File Path                                 โ”ƒ Tokens โ”ƒ Lines โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ codetoprompt/core.py                      โ”‚  5,037 โ”‚   538 โ”‚
โ”‚ codetoprompt.egg-info/PKG-INFO            โ”‚  4,453 โ”‚   493 โ”‚
โ”‚ README.md                                 โ”‚  2,910 โ”‚   285 โ”‚
โ”‚ codetoprompt/compressor/analysers/cpp.py  โ”‚  2,276 โ”‚   272 โ”‚
โ”‚ codetoprompt/compressor/analysers/rust.py โ”‚  2,214 โ”‚   271 โ”‚
โ”‚ codetoprompt/interactive.py               โ”‚  2,125 โ”‚   270 โ”‚
โ”‚ codetoprompt/compressor/analysers/java.py โ”‚  2,090 โ”‚   245 โ”‚
โ”‚ codetoprompt/cli.py                       โ”‚  1,877 โ”‚   207 โ”‚
โ”‚ tests/test_core.py                        โ”‚  1,743 โ”‚   179 โ”‚
โ”‚ CHANGELOG.md                              โ”‚  1,701 โ”‚   183 โ”‚

๐ŸŽ›๏ธ Command-Line Reference

Here is the full list of options for the main codetoprompt command.

Option Alias Description Scope
--output <file> Save the prompt to a file instead of the clipboard. All
--markdown -m Format output as a single Markdown document. All
--cxml -c Format output using Claude-friendly XML tags. All
--max-tokens <num> Warn if token count exceeds this limit. Does not truncate. All
--include <pats> Comma-separated glob patterns for files to include (e.g., ".py,.js"). Local
--exclude <pats> Comma-separated glob patterns for files to exclude (e.g., ".pyc,dist/"). Local
--interactive -i Launch an interactive TUI to select files. Local
--compress Use smart code compression to summarize files. Local
--show-line-numbers Prepend line numbers to code. Local
--respect-gitignore Respect .gitignore rules (default). Use --no-respect-gitignore to disable. Local
--tree-depth <num> Set the maximum depth for the project structure tree. Local
--version -v Display the installed version number. N/A
--help -h Show the help message and exit. N/A

Subcommands

  • Analyse: codetoprompt analyse <PATH> [--include ...] [--exclude ...]
  • Snapshot: codetoprompt snapshot <PATH> --output <snapshot.json> [--include ...] [--exclude ...] [--respect-gitignore|--no-respect-gitignore]
  • Diff: codetoprompt diff <PATH> --snapshot <snapshot.json> [--use-snapshot-filters] [--include ...] [--exclude ...] [--output <file>]

โš™๏ธ Configuration

Set your preferred defaults once using the config command. Settings are saved in ~/.config/codetoprompt/config.toml.

  • Interactive Wizard: ctp config
  • Show Current Config: ctp config --show
  • Reset to Defaults: ctp config --reset

Additional snapshot-related settings:

  • Snapshot Max Bytes: snapshot_max_bytes (default: 3 MB). If a text file exceeds this size, its content is not inlined into the snapshot.
  • Snapshot Max Lines: snapshot_max_lines (default: 20,000). If a text file exceeds this line count, its content is not inlined into the snapshot.

Snapshot always requires --output. Diff copies to clipboard by default (summary only printed to terminal). Provide --output <file> to write the diff to a file instead of copying.


๐Ÿ Python API

Use codetoprompt programmatically in your own Python scripts for maximum flexibility.

from codetoprompt import CodeToPrompt
# conceptual imports for a full use-case
# from some_llm_library import LlmClient 

# 1. Process a local directory with compression and XML format
ctp_local = CodeToPrompt(
    target="path/to/your/project",
    compress=True,
    output_format="cxml",
    exclude_patterns=["tests/*", "docs/*"]
)
prompt = ctp_local.generate_prompt()
analysis = ctp_local.analyse()

print(f"Project Analysis: {analysis['overall']}")
print(f"Generated a prompt with {ctp_local.get_token_count()} tokens.")

# 2. Conceptually, you'd then use this with an LLM client
# client = LlmClient(api_key="...")
# response = client.completions.create(
#     model="claude-3-opus-20240229", # CXML is great for Claude
#     messages=[
#         {"role": "user", "content": f"Here is a codebase:\n{prompt}\nPlease explain the main purpose of the `core.py` file."},
#     ]
# )
# print(response)

# 3. Process a remote URL
ctp_remote = CodeToPrompt(target="https://github.com/yash9439/codetoprompt")
remote_prompt = ctp_remote.generate_prompt()
print(f"Generated a prompt from GitHub with {ctp_remote.get_token_count()} tokens.")

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for development setup and guidelines. Feel free to open a PR or issue to get started.

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for full details.

About

Transform any codebase, web page, or document into an optimized LLM prompt. CodeToPrompt intelligently compresses code and filters content to overcome context window limits.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages