[Feature Request][Spark] Support ServerSide Table Scan Planning for Fine-Grained Access Control

## Feature request

#### Which Delta project/connector is this regarding?

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

### Overview

Enable Delta Lake to delegate table scan planning (**ServerSidePlanning**) - including file discovery and credential provisioning - to external catalog implementations, allowing catalogs to inject temporary credentials and optimize file listing for large tables with fine-grained access control (FGAC).

### Motivation

#### Current Limitations

Today, Delta Lake's Spark connector performs all table scan planning on the driver:

1. Driver reads the transaction log to discover data files
2. Driver lists all files matching the query predicate
3. Driver distributes file paths to executors for reading
4. Executors use credentials to directly read the table data from the storage

This architecture works when it is okay to give direct access to storage. However, in many scenarios this is not okay. For example, tables that have row-level or column-level access policies based on the user attempting to access the table. Unless the engine is fully trusted, the engine should not be given access to the raw storage location of the table. In such scenarios, the catalog may want to take the query details, plan the query in the server, and provide the specific files that need to accessed rather than the entire directory.

### Proposed Solution

Introduce a **ServerSidePlanning interface** that allows catalog implementations to:

1. **Provide scan plans** - Return list of data files to read, optionally filtered/optimized by catalog
2. **Inject temporary credentials** - Provide short-lived, scoped credentials for accessing specific files

#### High-Level Flow

1. **User Query**: SELECT * FROM `catalog.schema.table` WHERE col > 10
2. **DeltaCatalog.loadTable()**
 - Checks if the catalog supports **ServerSidePlanning**
 - If yes: Calls catalog.planTableScan(table, filters)
 - If no: Falls back to standard Delta scan planning
3. **Catalog Implementation**
 - Reads Delta transaction log (or uses cached metadata)
 - Applies access control policies
 - Generates temporary credentials
 - Returns: { files: [...], credentials: {...} }
4. **New DSv2 Table implementation**
 - Receives scan plan from catalog
 - Injects temporary credentials into Hadoop configuration

### Implementation Progress

Below is a breakdown of the implementation tasks:

**Status Legend:**
- ✅ Merged
- 👀 In Review
- ☑️ Waiting to Merge
- 🔄 In Progress
- 📝 Planned

#### Phase 1: Core Infrastructure & Generic Pushdown Support

| Task | PR Link | Status | Notes |
| :---- | :---- | :---- | :---- |
| **ServerSidePlanningClient interface and ServerSidePlannedTable DSv2 implementation** • Define **ServerSidePlanningClient** interface for remote scan planning • Implement **ServerSidePlannedTable** DSv2 table that uses server-provided scan plans • Core infrastructure for ServerSide planned file discovery | #5621 | ✅ | |
| **Integrate DeltaCatalog with ServerSidePlanning** • Integrate ServerSidePlanning into **DeltaCatalog.loadTable()** • Factory pattern with decision logic for when to use ServerSidePlanning • Tests for full query execution through DeltaCatalog | #5622 | ✅ | |
| **Metadata abstraction and factory pattern** • Define metadata trait for encapsulating catalog-specific information • Implement metadata for Unity Catalog, default catalogs, and test catalogs • Factory pattern for building planning clients from metadata | #5671 | 👀 | |
| **Filter pushdown infrastructure** • Add filter parameter to **ServerSidePlanningClient.planScan()** interface • Use Spark's **Filter** type as catalog-agnostic representation • Update **TestServerSidePlanningClient** to accept and capture filters for verification • Tests validating filters are passed through to planning client correctly | #5672 | 👀 | |
| **Projection pushdown infrastructure** • Add projection parameter to **ServerSidePlanningClient.planScan()** interface • Use Spark's **StructType** as catalog-agnostic representation • Update **TestServerSidePlanningClient** to accept and capture projection for verification • Tests validating projection is passed through to planning client correctly | | 📝 | |

#### Phase 2: Catalog Integration & Advanced Features

| Task | PR Link | Status | Notes |
| :---- | :---- | :---- | :---- |
| **Credential injection and test infrastructure** • Inject temporary credentials into Hadoop configuration on executors • Tests for validating credential flow | | 📝 | |
| **Add catalog server test infrastructure** • HTTP server for testing catalog operations • Servlet with scan planning endpoint support • Adapter for integrating with test catalog | | 📝 | |
| **Add reference catalog implementation** • Catalog planning client making HTTP requests to ServerSidePlanning endpoint • Parse server's scan planning response • Integration tests with test server | | 📝 | |
| **Server-side credential vending** • Define credential structure for temporary credentials (S3, Azure, GCS) • Extend scan plan to include optional credentials • Extract credentials from ServerSidePlanning response | | 📝 | |
| **Filter support with catalog-specific converters** • Implement **SupportsPushDownFilters** in **ServerSidePlannedScanBuilder** with residual filter handling | | 📝 | |
| **Projection pushdown support with catalog-specific converters** • Implement **SupportsPushDownRequiredColumns** in **ServerSidePlannedScanBuilder**| | 📝 | |

#### Follow-ups / Future Work

| Description | Issue | Status |
| :---- | :---- | :---- |
| Test special characters in table/catalog/schema names (hyphens, etc.) - Add test coverage for edge cases in identifier handling | TBD | 📝 |
| Support additional auth such as oauth | TBD | 📝 |
| Performance analysis and improvements | TBD | 📝 |
| Metrics and observability | TBD | 📝 |

### Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

- [x] Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
- [ ] Yes. I can contribute this feature independently.
- [ ] No. I cannot contribute this feature at this time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request][Spark] Support ServerSide Table Scan Planning for Fine-Grained Access Control #5623

Feature request

Which Delta project/connector is this regarding?

Overview

Motivation

Current Limitations

Proposed Solution

High-Level Flow

Implementation Progress

Phase 1: Core Infrastructure & Generic Pushdown Support

Phase 2: Catalog Integration & Advanced Features

Follow-ups / Future Work

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Task	PR Link	Status
ServerSidePlanningClient interface and ServerSidePlannedTable DSv2 implementation • Define ServerSidePlanningClient interface for remote scan planning • Implement ServerSidePlannedTable DSv2 table that uses server-provided scan plans • Core infrastructure for ServerSide planned file discovery	#5621	✅
Integrate DeltaCatalog with ServerSidePlanning • Integrate ServerSidePlanning into DeltaCatalog.loadTable() • Factory pattern with decision logic for when to use ServerSidePlanning • Tests for full query execution through DeltaCatalog	#5622	✅
Metadata abstraction and factory pattern • Define metadata trait for encapsulating catalog-specific information • Implement metadata for Unity Catalog, default catalogs, and test catalogs • Factory pattern for building planning clients from metadata	#5671	👀
Filter pushdown infrastructure • Add filter parameter to ServerSidePlanningClient.planScan() interface • Use Spark's Filter type as catalog-agnostic representation • Update TestServerSidePlanningClient to accept and capture filters for verification • Tests validating filters are passed through to planning client correctly	#5672	👀
Projection pushdown infrastructure • Add projection parameter to ServerSidePlanningClient.planScan() interface • Use Spark's StructType as catalog-agnostic representation • Update TestServerSidePlanningClient to accept and capture projection for verification • Tests validating projection is passed through to planning client correctly		📝

Task	PR Link	Status	Notes
Credential injection and test infrastructure • Inject temporary credentials into Hadoop configuration on executors • Tests for validating credential flow		📝
Add catalog server test infrastructure • HTTP server for testing catalog operations • Servlet with scan planning endpoint support • Adapter for integrating with test catalog		📝
Add reference catalog implementation • Catalog planning client making HTTP requests to ServerSidePlanning endpoint • Parse server's scan planning response • Integration tests with test server		📝
Server-side credential vending • Define credential structure for temporary credentials (S3, Azure, GCS) • Extend scan plan to include optional credentials • Extract credentials from ServerSidePlanning response		📝
Filter support with catalog-specific converters • Implement SupportsPushDownFilters in ServerSidePlannedScanBuilder with residual filter handling		📝
Projection pushdown support with catalog-specific converters • Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder		📝

Description	Issue	Status
Test special characters in table/catalog/schema names (hyphens, etc.) - Add test coverage for edge cases in identifier handling	TBD	📝
Support additional auth such as oauth	TBD	📝
Performance analysis and improvements	TBD	📝
Metrics and observability	TBD	📝

[Feature Request][Spark] Support ServerSide Table Scan Planning for Fine-Grained Access Control #5623

Description

Feature request

Which Delta project/connector is this regarding?

Overview

Motivation

Current Limitations

Proposed Solution

High-Level Flow

Implementation Progress

Phase 1: Core Infrastructure & Generic Pushdown Support

Phase 2: Catalog Integration & Advanced Features

Follow-ups / Future Work

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions