Add Databricks Serverless Compute Support by rohitrsh · Pull Request #3392 · flyteorg/flytekit

rohitrsh · 2026-02-17T15:56:42Z

Tracking issue

Why are the changes needed?

Databricks Serverless Compute offers faster startup times (seconds vs. minutes), automatic scaling, and zero infrastructure management. However, the existing flytekit-spark connector only supports classic compute (clusters). This PR enables teams to use serverless without changing their task code.

Users switch between classic and serverless by changing only the databricks_conf task code stays identical:

import flytekit
from flytekit import task
from flytekitplugins.spark import DatabricksV2

# Classic compute
@task(task_config=DatabricksV2(
    databricks_conf={
        "new_cluster": {
            "spark_version": "15.4.x-scala2.12",
            "node_type_id": "m5.xlarge",
            "num_workers": 2,
        },
    },
    databricks_instance="my-workspace.cloud.databricks.com",
))
def classic_task() -> float:
    spark = flytekit.current_context().spark_session
    return spark.range(100).count()

# Serverless compute same task code, different config
@task(task_config=DatabricksV2(
    databricks_conf={
        "environment_key": "default",
        "environments": [{
            "environment_key": "default",
            "spec": {"client": "1"},
        }],
    },
    databricks_instance="my-workspace.cloud.databricks.com",
    databricks_service_credential_provider="my-s3-credential",
))
def serverless_task() -> float:
    spark = flytekit.current_context().spark_session  # same API
    return spark.range(100).count()

What changes were proposed in this pull request?

Adds first-class support for running Flyte Spark tasks on Databricks Serverless Compute, alongside the existing classic compute (clusters) support.

Auto-detect serverless vs. classic based on databricks_conf contents (no new task type needed)
Generate correct Databricks Jobs API payload for serverless (multi-task format with environments array)
SparkSession available via flytekit.current_context().spark_session for both compute modes
AWS credential forwarding via Databricks Service Credentials for S3 access in serverless
Notebook task support for both classic and serverless compute
Default entrypoint from flytetools same pattern as classic, no user configuration needed

Files modified

File	Description
`flytekitplugins/spark/connector.py`	Serverless detection, multi-task job format, env injection, credential forwarding, entrypoint resolution, notebook tasks
`flytekitplugins/spark/task.py`	Serverless SparkSession retrieval in `pre_execute()`, `DatabricksV2` config additions (credential provider, notebook support), docstring
`tests/test_connector.py`	11 new tests for serverless detection, configuration, job spec generation, entrypoint defaults
`tests/test_spark_task.py`	Tests for serverless detection, SparkSession retrieval, credential provider, notebook config

No new files in the plugin. The serverless entrypoint (entrypoint_serverless.py) lives in the flytetools repository same pattern as the classic entrypoint.

Technical details

1. Auto-detection of compute mode

New function _is_serverless_config() detects serverless based on databricks_conf keys:

Keys Present	Compute Mode
`existing_cluster_id`	Classic (existing cluster)
`new_cluster`	Classic (new cluster)
`environment_key` or `environments` (no cluster keys)	Serverless
None of the above	Error

2. Serverless job spec format

Databricks Serverless requires a different Jobs API payload (multi-task format with tasks array and environments array). New function _configure_serverless() handles the environments array creation and env var injection.

3. Entrypoint resolution

Both classic and serverless default to the same flytetools repository. Only the python_file path differs:

default_classic_python_file = "flytekitplugins/databricks/entrypoint.py"
default_serverless_python_file = "flytekitplugins/databricks/entrypoint_serverless.py"

Users can override both git_source and python_file via databricks_conf for custom entrypoints.

4. SparkSession in serverless

The serverless entrypoint (in flytetools) pre-creates the SparkSession and stores it in sys.modules and builtins. New method _get_databricks_serverless_spark_session() in task.py retrieves it and exposes it via flytekit.current_context().spark_session the same API as classic compute.

5. AWS credential provider

New DatabricksV2 config field: databricks_service_credential_provider. Resolution order: task config → connector env var (FLYTE_DATABRICKS_SERVICE_CREDENTIAL_PROVIDER).

6. Notebook task support

New DatabricksV2 config fields: notebook_path, notebook_base_parameters. Works with both classic and serverless compute.

Backward compatibility

Aspect	Impact
Existing classic tasks	No change detection is additive, classic path unchanged
Existing `databricks_conf`	No change configs with `new_cluster`/`existing_cluster_id` work as before
API surface	Additive only — new optional fields on `DatabricksV2`
flytetools entrypoint	Classic `entrypoint.py` unchanged new file added alongside

How was this patch tested?

Unit tests (14 connector tests, all passing)

Existing tests (unchanged):

test_databricks_agent Classic compute full agent flow
test_agent_create_with_no_instance Missing instance error
test_agent_create_with_default_instance Instance from env var

New serverless tests:

test_is_serverless_config_detection 7 scenarios for compute mode detection
test_configure_serverless_with_env_key_only Auto-creates environments array
test_configure_serverless_with_inline_env Preserves user's environment spec
test_configure_serverless_creates_default_env Default env when none specified
test_get_databricks_job_spec_serverless_with_env_key Full spec for env_key config
test_get_databricks_job_spec_serverless_with_inline_env Full spec for inline env config
test_get_databricks_job_spec_error_no_compute Error when no compute config
test_databricks_agent_serverless — Full agent create/get flow for serverless
test_serverless_default_entrypoint_from_flytetools Default flytetools entrypoint
test_serverless_task_git_source_overrides_default Task-level override works
test_classic_and_serverless_use_same_repo Same flytetools repo, different python_file

Task tests (test_spark_task.py):

Serverless environment detection
SparkSession retrieval from sys.modules and builtins
DatabricksV2 credential provider and notebook configuration

Manual testing

Classic compute tasks continue to work (no regression)
Serverless compute with pre-configured environment_key
Serverless compute with inline environments spec
AWS credentials from Databricks service credentials
SparkSession available via flytekit.current_context().spark_session
Complex Spark workloads (DataFrame operations, UDFs, aggregations)

Setup process

No additional setup needed. Run tests with:

pytest plugins/flytekit-spark/tests/test_connector.py -v
pytest plugins/flytekit-spark/tests/test_spark_task.py -v

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Related PRs

flytetools PR: Add serverless-compatible entrypoint for Databricks

Docs link

Signed-off-by: Rohit Sharma <rohitrsh@gmail.com>

feat(spark): Add Databricks Serverless Compute support

010c0bb

Signed-off-by: Rohit Sharma <rohitrsh@gmail.com>

rohitrsh requested review from cosmicBboy, davidmirror-ops, kumare3, machichima, pingsutw, samhita-alla and wild-endeavor as code owners February 17, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Databricks Serverless Compute Support#3392

Add Databricks Serverless Compute Support#3392
rohitrsh wants to merge 1 commit intoflyteorg:masterfrom
rohitrsh:feat/databricks-serverless-support

rohitrsh commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

rohitrsh commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

Files modified

Technical details

Backward compatibility

How was this patch tested?

Unit tests (14 connector tests, all passing)

Manual testing

Setup process

Check all the applicable boxes

Related PRs

Docs link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

rohitrsh commented Feb 17, 2026 •

edited

Loading