Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pkgs/website/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,10 @@ export default defineConfig({
label: 'Troubleshooting connections',
link: '/deploy/troubleshooting-connections/',
},
{
label: 'Troubleshooting stalled tasks',
link: '/deploy/troubleshooting-stalled-tasks/',
},
{ label: 'Prune records', link: '/deploy/prune-records/' },
{
label: 'Tune deployed flows',
Expand Down
60 changes: 42 additions & 18 deletions pkgs/website/src/content/docs/build/retrying-steps.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ sidebar:
order: 25
---

import { Aside, CardGrid, LinkCard } from "@astrojs/starlight/components";
import { Aside, CardGrid, LinkCard } from '@astrojs/starlight/components';

Configure retry behavior based on step reliability characteristics. Set conservative flow-level defaults and override per-step as needed.

<Aside type="tip">
For scheduling delays between steps, see [Delaying Steps](/build/delaying-steps/).
For scheduling delays between steps, see [Delaying
Steps](/build/delaying-steps/).
</Aside>

For detailed information about each configuration option, see the [Step Execution Options](/reference/configuration/step-execution/) reference.
Expand All @@ -22,12 +23,14 @@ Not all failures should retry. Understanding the difference helps you configure
### Transient Failures

Temporary problems that might succeed on retry:

- Network timeouts
- Rate limiting (429 responses)
- Temporary service unavailability (503 responses)
- Database connection issues

Configure with retries:

```typescript
.step({
slug: 'fetchExternalData',
Expand All @@ -39,12 +42,14 @@ Configure with retries:
### Permanent Failures

Problems that will never succeed on retry:

- Invalid input format (malformed email, negative numbers)
- Missing required fields
- Business rule violations
- Schema validation errors

Configure without retries:

```typescript
.step({
slug: 'validInput',
Expand All @@ -56,11 +61,15 @@ Configure without retries:
```

<Aside type="note">
`maxAttempts: 1` means "run once, do not retry". If the step fails, it fails immediately without retry attempts.
`maxAttempts: 1` means "run once, do not retry". If the step fails, it fails
immediately without retry attempts.
</Aside>

<Aside type="caution" title="Current Limitation">
pgflow does not distinguish between transient and permanent failures automatically. All exceptions trigger retry logic based on `maxAttempts`. Use `maxAttempts: 1` for steps that perform validation or other operations that should fail fast.
pgflow does not distinguish between transient and permanent failures
automatically. All exceptions trigger retry logic based on `maxAttempts`. Use
`maxAttempts: 1` for steps that perform validation or other operations that
should fail fast.
</Aside>

For detailed guidance on validation patterns, see [Validation Steps](/build/validation-steps/).
Expand All @@ -78,25 +87,35 @@ When different steps have different reliability requirements:
```typescript
new Flow({
slug: 'dataPipeline',
maxAttempts: 3, // Sensible defaults
maxAttempts: 3, // Sensible defaults
baseDelay: 1,
})
.step({
slug: 'validateInput',
maxAttempts: 1, // No retries - validation should not fail
}, validateHandler)
.step({
slug: 'fetchExternal',
maxAttempts: 5, // External API might be flaky
baseDelay: 10, // Longer delays for external service
}, fetchHandler)
.step({
slug: 'saveResults',
// Use flow defaults
}, saveHandler)
.step(
{
slug: 'validateInput',
maxAttempts: 1, // No retries - validation should not fail
},
validateHandler
)
.step(
{
slug: 'fetchExternal',
maxAttempts: 5, // External API might be flaky
baseDelay: 10, // Longer delays for external service
},
fetchHandler
)
.step(
{
slug: 'saveResults',
// Use flow defaults
},
saveHandler
);
```

**Why this approach:**

- Set reasonable flow-level defaults
- Override only where needed
- Validation steps need no retries (fail immediately on bad input)
Expand Down Expand Up @@ -131,4 +150,9 @@ new Flow({
href="/deploy/tune-flow-config/"
description="Adjust configuration for production flows without redeploying"
/>
<LinkCard
title="Troubleshooting Stalled Tasks"
href="/deploy/troubleshooting-stalled-tasks/"
description="Recover tasks stuck when workers crash mid-processing"
/>
</CardGrid>
22 changes: 14 additions & 8 deletions pkgs/website/src/content/docs/concepts/worker-lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar:
order: 30
---

import { Aside, CardGrid, LinkCard } from "@astrojs/starlight/components";
import { Aside, CardGrid, LinkCard } from '@astrojs/starlight/components';

pgflow workers are designed to be resilient. They poll for tasks, send heartbeats, and automatically restart when they stop.

Expand Down Expand Up @@ -55,6 +55,7 @@ restart -> startup: "starts new\nworker"
## Why Workers Stop

Edge Functions have execution time limits:

- Free tier: 150 seconds
- Paid tier: 400 seconds

Expand Down Expand Up @@ -97,13 +98,13 @@ You don't need to manually curl or restart anything after the initial setup.

## Local vs Production Behavior

| Aspect | Local | Production |
|---------------------|--------------------------------|-----------------------------------------|
| Cron interval | 1 second | 1 second |
| HTTP request | Always for enabled functions | Only if not enough active workers |
| Debounce | Bypassed | Applied (prevents too frequent pings) |
| Active worker check | Bypassed | Required (enabled, non-deprecated) |
| Detection | Automatic (is_local) | Automatic |
| Aspect | Local | Production |
| ------------------- | ---------------------------- | ------------------------------------- |
| Cron interval | 1 second | 1 second |
| HTTP request | Always for enabled functions | Only if not enough active workers |
| Debounce | Bypassed | Applied (prevents too frequent pings) |
| Active worker check | Bypassed | Required (enabled, non-deprecated) |
| Detection | Automatic (is_local) | Automatic |

## Related

Expand All @@ -123,4 +124,9 @@ You don't need to manually curl or restart anything after the initial setup.
href="/build/local-development/"
description="Tips for local development"
/>
<LinkCard
title="Troubleshooting Stalled Tasks"
href="/deploy/troubleshooting-stalled-tasks/"
description="Recover tasks left behind when workers terminate unexpectedly"
/>
</CardGrid>
15 changes: 15 additions & 0 deletions pkgs/website/src/content/docs/deploy/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,21 @@ Learn how to deploy pgflow to production, monitor workflow execution, and mainta
/>
</CardGrid>

## Troubleshoot

<CardGrid>
<LinkCard
title="Connection issues"
href="/deploy/troubleshooting-connections/"
description="Diagnose and fix common database connection problems"
/>
<LinkCard
title="Stalled tasks"
href="/deploy/troubleshooting-stalled-tasks/"
description="Diagnose and recover tasks stuck in 'started' status"
/>
</CardGrid>

## Maintain

<CardGrid>
Expand Down
44 changes: 39 additions & 5 deletions pkgs/website/src/content/docs/deploy/monitor-execution.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,14 @@ sidebar:
order: 10
---

import { Aside, Steps, Tabs, CardGrid, LinkCard, FileTree } from "@astrojs/starlight/components";
import {
Aside,
Steps,
Tabs,
CardGrid,
LinkCard,
FileTree,
} from '@astrojs/starlight/components';

This guide explains how to monitor your pgflow flows during and after execution using SQL queries.

Expand All @@ -31,6 +38,7 @@ run_id | flow_slug | status | input | output
```

Run statuses include:

- `started`: The run has been created and is executing steps
- `completed`: All steps have completed successfully
- `failed`: One or more steps have failed after max retries
Expand Down Expand Up @@ -61,6 +69,7 @@ final_step | created | 2 | 1 | null
```

Step statuses include:

- `created`: The step has been created but may be waiting for dependencies
- `started`: The step has started execution (all dependencies are complete)
- `completed`: The step has completed successfully
Expand Down Expand Up @@ -95,6 +104,7 @@ run_id | step_slug | status | attempts_count | message_id | queued_at
```

Active task statuses:

- `queued`: Task is ready to run, waiting for a worker to claim it
- `started`: Task is currently being processed by a worker (with `started_at` timestamp and `worker_id`)

Expand Down Expand Up @@ -139,18 +149,21 @@ AND ss.status = 'failed';
To start flows from TypeScript applications and stream real-time progress updates, see [Start Flows from TypeScript Client](/build/starting-flows/typescript-client/).

This page focuses on SQL-based monitoring for debugging and operations.

</Aside>

<Aside type="tip" title="Flow Visualization">
Applications typically create dashboards to visualize flows and their execution status.

pgflow stores all the information needed to build rich visualizations of your flow execution, including:

- Step dependencies
- Execution times
- Retry attempts
- Inputs and outputs

This data is available through SQL queries to the pgflow schema tables.

</Aside>

## View step dependencies
Expand All @@ -172,8 +185,29 @@ GROUP BY steps.step_slug;
## Next steps

<CardGrid>
<LinkCard title="Start Flows from TypeScript Client" href="/build/starting-flows/typescript-client/" description="Start flows from TypeScript apps and stream real-time progress updates"/>
<LinkCard title="Organize Flow code" href="/build/organize-flow-code/" description="Learn how to structure your pgflow code for maintainability and reusability"/>
<LinkCard title="Tune deployed flows" href="/deploy/tune-flow-config/" description="Adjust retry behavior and timeouts for production flows"/>
<LinkCard title="Version your Flows" href="/build/version-flows/" description="Learn how to safely update your flows without breaking existing runs"/>
<LinkCard
title="Start Flows from TypeScript Client"
href="/build/starting-flows/typescript-client/"
description="Start flows from TypeScript apps and stream real-time progress updates"
/>
<LinkCard
title="Organize Flow code"
href="/build/organize-flow-code/"
description="Learn how to structure your pgflow code for maintainability and reusability"
/>
<LinkCard
title="Tune deployed flows"
href="/deploy/tune-flow-config/"
description="Adjust retry behavior and timeouts for production flows"
/>
<LinkCard
title="Version your Flows"
href="/build/version-flows/"
description="Learn how to safely update your flows without breaking existing runs"
/>
<LinkCard
title="Troubleshoot stalled tasks"
href="/deploy/troubleshooting-stalled-tasks/"
description="Diagnose and recover tasks stuck in 'started' status"
/>
</CardGrid>
Loading