-
-
Notifications
You must be signed in to change notification settings - Fork 246
feat(ui): add llms.txt generation for npm packages #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add AgentFile and LlmsTxtResult interfaces for llms.txt generation and export from the shared types barrel.
Add discoverAgentFiles, fetchAgentFiles, generateLlmsTxt, and handleLlmsTxt orchestrator for llms.txt generation from npm packages.
Serve llms.txt at /api/registry/llms-txt/[...pkg] following existing registry API patterns with cached event handler and SWR.
Cover discoverAgentFiles, fetchAgentFiles, and generateLlmsTxt including root files, directory scanning, graceful failures, scoped packages, and full/minimal output generation.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end LLM documentation support: new shared types (AgentFile, LlmsTxtResult) and a re-export in shared/types/index.ts; server utilities to discover and fetch agent files, assemble llms.txt/llms_full.txt and package README content; a new middleware (server/middleware/llm-docs.ts) to serve package, org and root llms routes and raw README .md; vitest test alias for Possibly related issues
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Use Nitro server routes at /package/.../llms.txt instead of an API route with middleware rewriting. Single handler re-exported across four route files for unscoped, scoped, and versioned URL patterns.
…txt content Add createPackageLlmsTxtHandler factory for DRY route creation. handleLlmsTxt now accepts includeAgentFiles option to control whether agent instruction files are included (llms_full.txt) or omitted (llms.txt). Add handleOrgLlmsTxt for org-level package listings and generateRootLlmsTxt for the root /llms.txt discovery page. Simplify route handlers to single-line factory calls.
Add server middleware to handle llms.txt routes that Nitro's radix3 file-based router cannot resolve (parameterized intermediate segments don't match literal children). Handles versioned package paths, org-level package listings, and root /llms.txt discovery page. Remove broken versioned route files and add llms_full.txt routes.
Extend canonical redirect regexes with optional /llms.txt and /llms_full.txt suffix capture groups so shorthand URLs like /nuxt/llms.txt redirect to /package/nuxt/llms.txt. Add explicit /llms.txt root path skip to prevent it matching as a package name.
Add ISR rules for llms_full.txt and root /llms.txt routes in nuxt.config.ts. Add #server alias to vitest config for resolving server utility imports in unit tests.
Test route pattern inclusion, example links, base URL substitution, and trailing newline for the root /llms.txt discovery page output.
Vercel ISR glob rules (/package/**/llms.txt) create catch-all serverless functions that intercept requests before Nitro's file-based routes can resolve them, breaking scoped packages and versioned paths. Move all llms.txt/llms_full.txt handling into the middleware, remove ISR route rules, and delete file-based route files.
Fix strict TypeScript errors: add fallback for split()[0] possibly undefined, narrow regex match group types, use non-null assertions for Record lookups after in-check, and use Nuxt's auto-generated Packument type instead of @npm/types import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Versioned .md paths (e.g. /package/nuxt/v/3.16.2.md) conflict with Vercel's ISR route rules which match /package/:name/v/:version and intercept the request before middleware can handle it. Keep .md for latest-only (unscoped and scoped).
|
@danielroe when merging please add @BYK as a co-author from #151 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
server/middleware/llm-docs.ts (1)
35-135: Split the middleware into smaller helpers.
This handler covers multiple routes and behaviours in one block and is well over the “~50 lines” guideline. Consider extracting.mdhandling, root handling, org handling, and package parsing into dedicated functions.As per coding guidelines: Keep functions focused and manageable (generally under 50 lines).
| set -euo pipefail | ||
|
|
||
| BASE="${1:?Usage: $0 <base-url>}" | ||
| BASE="${BASE%/}" # strip trailing slash | ||
|
|
||
| PASS=0 | ||
| FAIL=0 | ||
|
|
||
| check() { | ||
| local label="$1" | ||
| local url="$2" | ||
| local expect_status="${3:-200}" | ||
|
|
||
| status=$(curl -s -o /dev/null -w "%{http_code}" -L "$url") | ||
|
|
||
| if [ "$status" = "$expect_status" ]; then | ||
| echo " PASS GET $url $status $label" | ||
| PASS=$((PASS + 1)) | ||
| else | ||
| echo " FAIL GET $url $status $label (expected $expect_status)" | ||
| FAIL=$((FAIL + 1)) | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid aborting the whole script on a single curl failure.
With set -e, any non-zero curl exit (DNS, timeout, TLS) stops the script before FAIL is counted, hiding later failures. Capture the exit and keep running.
Proposed fix
check() {
local label="$1"
local url="$2"
local expect_status="${3:-200}"
- status=$(curl -s -o /dev/null -w "%{http_code}" -L "$url")
+ local status
+ status=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" || true)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| set -euo pipefail | |
| BASE="${1:?Usage: $0 <base-url>}" | |
| BASE="${BASE%/}" # strip trailing slash | |
| PASS=0 | |
| FAIL=0 | |
| check() { | |
| local label="$1" | |
| local url="$2" | |
| local expect_status="${3:-200}" | |
| status=$(curl -s -o /dev/null -w "%{http_code}" -L "$url") | |
| if [ "$status" = "$expect_status" ]; then | |
| echo " PASS GET $url $status $label" | |
| PASS=$((PASS + 1)) | |
| else | |
| echo " FAIL GET $url $status $label (expected $expect_status)" | |
| FAIL=$((FAIL + 1)) | |
| fi | |
| set -euo pipefail | |
| BASE="${1:?Usage: $0 <base-url>}" | |
| BASE="${BASE%/}" # strip trailing slash | |
| PASS=0 | |
| FAIL=0 | |
| check() { | |
| local label="$1" | |
| local url="$2" | |
| local expect_status="${3:-200}" | |
| local status | |
| status=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" || true) | |
| if [ "$status" = "$expect_status" ]; then | |
| echo " PASS GET $url $status $label" | |
| PASS=$((PASS + 1)) | |
| else | |
| echo " FAIL GET $url $status $label (expected $expect_status)" | |
| FAIL=$((FAIL + 1)) | |
| fi |
| // /llms.txt at root is handled by the llm-docs middleware | ||
| if (path === '/llms.txt') { | ||
| return | ||
| } | ||
|
|
||
| // /@org/pkg or /pkg → /package/org/pkg or /package/pkg | ||
| let pkgMatch = path.match(/^\/(?:(?<org>@[^/]+)\/)?(?<name>[^/@]+)$/) | ||
| // Also handles trailing /llms.txt or /llms_full.txt suffixes | ||
| let pkgMatch = path.match( | ||
| /^\/(?:(?<org>@[^/]+)\/)?(?<name>[^/@]+?)(?<suffix>\.md|\/(?:llms\.txt|llms_full\.txt))?$/, | ||
| ) | ||
| if (pkgMatch?.groups) { | ||
| const args = [pkgMatch.groups.org, pkgMatch.groups.name].filter(Boolean).join('/') | ||
| const suffix = pkgMatch.groups.suffix ?? '' | ||
| setHeader(event, 'cache-control', cacheControl) | ||
| return sendRedirect(event, `/package/${args}` + (query ? '?' + query : ''), 301) | ||
| return sendRedirect(event, `/package/${args}${suffix}` + (query ? '?' + query : ''), 301) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent root /llms_full.txt from being mis-redirected.
/llms_full.txt currently matches the package redirect regex and becomes /package/llms_full.txt, which is not a valid route. Add the same early-return as /llms.txt.
Proposed fix
- if (path === '/llms.txt') {
+ if (path === '/llms.txt' || path === '/llms_full.txt') {
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // /llms.txt at root is handled by the llm-docs middleware | |
| if (path === '/llms.txt') { | |
| return | |
| } | |
| // /@org/pkg or /pkg → /package/org/pkg or /package/pkg | |
| let pkgMatch = path.match(/^\/(?:(?<org>@[^/]+)\/)?(?<name>[^/@]+)$/) | |
| // Also handles trailing /llms.txt or /llms_full.txt suffixes | |
| let pkgMatch = path.match( | |
| /^\/(?:(?<org>@[^/]+)\/)?(?<name>[^/@]+?)(?<suffix>\.md|\/(?:llms\.txt|llms_full\.txt))?$/, | |
| ) | |
| if (pkgMatch?.groups) { | |
| const args = [pkgMatch.groups.org, pkgMatch.groups.name].filter(Boolean).join('/') | |
| const suffix = pkgMatch.groups.suffix ?? '' | |
| setHeader(event, 'cache-control', cacheControl) | |
| return sendRedirect(event, `/package/${args}` + (query ? '?' + query : ''), 301) | |
| return sendRedirect(event, `/package/${args}${suffix}` + (query ? '?' + query : ''), 301) | |
| } | |
| // /llms.txt at root is handled by the llm-docs middleware | |
| if (path === '/llms.txt' || path === '/llms_full.txt') { | |
| return | |
| } | |
| // /@org/pkg or /pkg → /package/org/pkg or /package/pkg | |
| // Also handles trailing /llms.txt or /llms_full.txt suffixes | |
| let pkgMatch = path.match( | |
| /^\/(?:(?<org>@[^/]+)\/)?(?<name>[^/@]+?)(?<suffix>\.md|\/(?:llms\.txt|llms_full\.txt))?$/, | |
| ) | |
| if (pkgMatch?.groups) { | |
| const args = [pkgMatch.groups.org, pkgMatch.groups.name].filter(Boolean).join('/') | |
| const suffix = pkgMatch.groups.suffix ?? '' | |
| setHeader(event, 'cache-control', cacheControl) | |
| return sendRedirect(event, `/package/${args}${suffix}` + (query ? '?' + query : ''), 301) | |
| } |
| const packageData = await fetchNpmPackage(packageName) | ||
| const resolvedVersion = requestedVersion ?? packageData['dist-tags']?.latest | ||
|
|
||
| if (!resolvedVersion) { | ||
| throw createError({ statusCode: 404, message: 'Could not resolve package version.' }) | ||
| } | ||
|
|
||
| // Extract README from packument (sync) | ||
| const readmeFromPackument = getReadmeFromPackument(packageData, requestedVersion) | ||
|
|
||
| let agentFiles: AgentFile[] = [] | ||
| let cdnReadme: string | null = null | ||
|
|
||
| if (includeAgentFiles) { | ||
| // Full mode: fetch file tree for agent discovery + README fallback in parallel | ||
| const [fileTreeData, readme] = await Promise.all([ | ||
| fetchFileTree(packageName, resolvedVersion), | ||
| readmeFromPackument ? null : fetchReadmeFromCdn(packageName, resolvedVersion), | ||
| ]) | ||
| cdnReadme = readme | ||
| const agentFilePaths = discoverAgentFiles(fileTreeData.files) | ||
| agentFiles = await fetchAgentFiles(packageName, resolvedVersion, agentFilePaths) | ||
| } else if (!readmeFromPackument) { | ||
| // Standard mode: only fetch README from CDN if packument lacks it | ||
| cdnReadme = await fetchReadmeFromCdn(packageName, resolvedVersion) | ||
| } | ||
|
|
||
| const readme = readmeFromPackument ?? cdnReadme ?? undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate that a requested version actually exists.
If a non-existent version is requested, the code still generates output using that version string and package metadata, which is misleading and can trigger unnecessary CDN fetches. Guard against unknown versions before proceeding.
Proposed fix
const packageData = await fetchNpmPackage(packageName)
const resolvedVersion = requestedVersion ?? packageData['dist-tags']?.latest
if (!resolvedVersion) {
throw createError({ statusCode: 404, message: 'Could not resolve package version.' })
}
+ if (requestedVersion && !packageData.versions?.[requestedVersion]) {
+ throw createError({
+ statusCode: 404,
+ message: `Version ${requestedVersion} not found for ${packageName}.`,
+ })
+ }
Summary
Adds
llms.txtandllms_full.txtsupport across all package URL patterns.Supported
llms.txtroutes/llms.txt/package/nuxt/llms.txt/package/nuxt/v/3.15.0/llms.txt/package/@deepgram/sdk/llms.txt/package/@deepgram/sdk/v/4.0.0/llms.txt/package/@deepgram/llms.txt/package/nuxt/llms_full.txt/nuxt/llms.txt→/package/nuxt/llms.txt/nuxt/v/3.15.0/llms.txt→/package/nuxt/v/3.15.0/llms.txtllms.txt— README + metadata onlyllms_full.txt— README + metadata + agent instruction files (CLAUDE.md, .cursorrules, etc.)Supported
.mdroutes/package/nuxt.md/package/@deepgram/sdk.md/nuxt.md→/package/nuxt.md.md— README onlyTest plan
/llms.txtreturns 200 withtext/markdowncontent typellms_full.txtincludes agent instruction files,llms.txtdoes not/nuxt/llms.txt) redirect 301 to canonical paths