Formula 1 Championship ETL Pipeline

An automated ETL data pipeline that extracts current Formula 1 championship standings, transforms the data, and loads it into PostgreSQL for visualization in Grafana dashboards.

Overview

This project implements a production-grade ETL pipeline that automatically fetches the latest Formula 1 championship data, processes it, and stores it in a PostgreSQL database. The pipeline runs weekly on Astronomer (managed Airflow) and provides real-time championship standings through Grafana dashboards.

Medium article

I have written a medium article where I will document my experince building this project . Check it out in the link below.
Medium article for my project

Key Features

Automated Data Extraction: Fetches current F1 driver and constructor championship standings from F1 API
Data Transformation: Cleans and normalizes data using pandas
PostgreSQL Storage: Stores processed data in Neon PostgreSQL database
Grafana Visualization: Dashboards showing Top 10 drivers (Bar chart), the constructors champinoship(Bar chart) and the drivers point distribution(pie chart)
Production Deployment: Runs on Astronomer Cloud with weekly scheduling
Error Handling: Comprehensive logging and retry mechanisms defined in the DAG
XCom Data Passing: Efficient data flow between Airflow tasks

Tech Stack

Technology	Purpose
Apache Airflow 3.1	Workflow orchestration and scheduling
Python 3.12	Data processing and transformation
Pandas	Data manipulation and cleaning
PostgreSQL (Neon)	Data storage
SQLAlchemy	Database ORM
Grafana	Data visualization
Astronomer	Managed Airflow deployment
F1 API	Data source

Architecture

The pipeline follows a standard ETL pattern:

Extract: Airflow tasks fetch data from F1 API endpoints (drivers and constructors)
Transform: Pandas processes and cleans the data, normalizing JSON structures
Load: SQLAlchemy loads transformed data into PostgreSQL (Neon)
Visualize: Grafana queries the database to display championship standings

Data Flow:

F1 API → Airflow DAG → Pandas Transformation → PostgreSQL (Neon) → Grafana Dashboards

Prerequisites

Before setting up this project, ensure you have the following:

Required Accounts

Astronomer Account - For managed Airflow deployment
Neon PostgreSQL - Free PostgreSQL database hosting
Grafana Cloud (optional) - For cloud-based dashboards, or run locally with Docker

Required Software

Python 3.12+ - Download
Docker Desktop - Download
Astronomer CLI - For local development and deployment
Git - For version control

Installation Commands

Install Astronomer CLI (Windows PowerShell):

winget install -e --id Astronomer.Astro

Verify installations:

python --version
docker --version
astro version

Installation & Setup

1. Clone the Repository

git clone https://github.com/ayalasher/F1-batch-pipeline.git
cd F1-batch-pipeline/formulae_1_batch_pipepline

2. Set Up Astronomer Project

The project is already initialized with Astronomer. To run locally:

astro dev start

This will start:

Airflow Webserver: http://localhost:8080
Postgres: localhost:5432

3. Configure Airflow Variables

You need to set up the following Airflow Variables (either in Astronomer UI or via CLI):

For Astronomer Deployment:

astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key API_URL_DRIVERS --value "https://f1api.dev/api/current/drivers-championship"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key API_URL_CONSTRUCTORS --value "https://f1api.dev/api/current/constructors-championship"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_USER --value "<YOUR_NEON_USER>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_PASSWORD --value "<YOUR_NEON_PASSWORD>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_HOST --value "<YOUR_NEON_HOST>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_PORT --value "5432"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_DB --value "<YOUR_NEON_DB>"

For Local Development (via Airflow UI):

Navigate to http://localhost:8080
Go to Admin → Variables
Add the same variables as above

4. Set Up Neon PostgreSQL

Create a free account at Neon.tech
Create a new project
Copy your connection details (host, user, password, database)
Use these details in Step 3 above

5. Deploy to Astronomer (Production)

astro deploy

Select your deployment when prompted.

Project Structure

F1-batch-pipeline/
├── README.md
├── media/
│   └── architecture-diagram.png
└── formulae_1_batch_pipepline/
    ├── Dockerfile                    # Astronomer runtime configuration
    ├── packages.txt                  # System-level packages
    ├── requirements.txt              # Python dependencies
    ├── airflow_settings.yaml         # Airflow configuration
    ├── dags/
    │   ├── etl_batch.py             # Main ETL pipeline DAG
    │   └── exampledag.py            # Example DAG from Astronomer
    ├── src/
    │   ├── drivers/
    │   │   └── drivers.py           # Driver data extraction & transformation
    │   ├── constructors/
    │   │   └── constructors.py      # Constructor data extraction & transformation
    │   └── formula_one_sql.sql      # SQL queries for data verification
    ├── include/                      # Additional Python modules
    ├── plugins/                      # Custom Airflow plugins
    └── tests/
        └── dags/
            └── test_dag_example.py  # DAG testing

Key Files

File	Description
`dags/etl_batch.py`	Main Airflow DAG with TaskFlow API implementation
`src/drivers/drivers.py`	Functions to extract and transform driver championship data
`src/constructors/constructors.py`	Functions to extract and transform constructor championship data
`requirements.txt`	Python dependencies (pandas, sqlalchemy, psycopg2-binary, etc.)
`airflow_settings.yaml`	Airflow connections and variables configuration

Usage

Running the Pipeline Locally

Start Airflow:
```
astro dev start
```
Access Airflow UI:
- Open http://localhost:8080
- Default credentials: admin / admin
Trigger the DAG:
- Find f1_etl_pipeline_rewrite in the DAGs list
- Toggle it to "On"
- Click the play button to trigger manually
Monitor Execution:
- Click on the DAG name
- View task logs and execution status
- Check XCom values for data passing between tasks

Pipeline Workflow

The DAG executes the following tasks in parallel streams:

Driver Pipeline:

extract_drivers - Fetch driver championship data from F1 API
transform_drivers - Clean and normalize the data
load_drivers_to_database - Insert into PostgreSQL drivers_championship table

Constructor Pipeline:

extract_constructors - Fetch constructor championship data from F1 API
transform_constructors - Clean and normalize the data
load_constructors_to_database - Insert into PostgreSQL constructors_championship table

Both pipelines run independently and in parallel for optimal performance.

Viewing Data in Grafana

Set up Grafana Data Source:
- Add PostgreSQL data source
- Use your Neon connection details
- Test the connection
Import Dashboards:
- Create visualizations for:
  - Driver Championship Standings (bar chart)
  - Constructor Championship Standings (bar chart)
  - Points Distribution (pie chart)
  - Top 10 Drivers/Teams (table)
Example Queries:

Driver Standings:

SELECT
    position,
    driver_name || ' ' || driver_surname AS driver,
    points
FROM drivers_championship
ORDER BY position
LIMIT 10;

Constructor Standings:

SELECT
    position,
    "team_teamName" as team,
    points,
    wins
FROM constructors_championship
ORDER BY position;

Database Schema

Tables

drivers_championship

Column	Type	Description
`position`	INTEGER	Driver's championship position
`driver_name`	VARCHAR	Driver's first name
`driver_surname`	VARCHAR	Driver's last name
`team_teamId`	VARCHAR	Team identifier
`points`	FLOAT	Championship points

constructors_championship

Column	Type	Description
`position`	INTEGER	Team's championship position
`teamId`	VARCHAR	Team identifier
`team_teamName`	VARCHAR	Team name
`points`	FLOAT	Championship points
`wins`	INTEGER	Number of race wins

Troubleshooting

Common Issues

DAG not appearing in Airflow UI

Check DAG parse errors in the UI
Verify all imports are inside @task functions
Ensure DAG parsing time is under 30 seconds

Environment variables not found

Use Variable.get() from airflow.models instead of os.getenv()
Verify variables are set in Astronomer deployment or local Airflow UI
Check variable names match exactly (case-sensitive)

Database connection errors

Verify Neon PostgreSQL credentials in Airflow Variables
Test connection using psql or database client
Ensure Neon project is active (not paused)

Grafana queries failing

Use double quotes for column names with underscores: "team_teamName"
Check actual column names in PostgreSQL using \d table_name
Use ORDER BY + LIMIT instead of WHERE for top N queries

Debugging Tips

Check Task Logs: Click on task → View Logs for detailed error messages
XCom Values: Navigate to Admin → XCom to verify data passing between tasks
Local Testing: Run astro dev start to test DAG locally before deploying
Database Verification: Query tables directly to confirm data loaded correctly

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

F1 API - f1api.dev for providing free F1 data
Astronomer - For managed Airflow platform
Neon - For serverless PostgreSQL hosting
Apache Airflow Community - For excellent documentation and support

Contact

Project Author: Ayal ASher
GitHub: @ayalasher
Repository: F1-batch-pipeline

⭐ If you found this project helpful, please consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Formula 1 Championship ETL Pipeline

Overview

Medium article

Key Features

Tech Stack

Architecture

Prerequisites

Required Accounts

Required Software

Installation Commands

Installation & Setup

1. Clone the Repository

2. Set Up Astronomer Project

3. Configure Airflow Variables

4. Set Up Neon PostgreSQL

5. Deploy to Astronomer (Production)

Project Structure

Key Files

Usage

Running the Pipeline Locally

Pipeline Workflow

Viewing Data in Grafana

Database Schema

Tables

Troubleshooting

Common Issues

Debugging Tips

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
formulae_1_batch_pipepline		formulae_1_batch_pipepline
media		media
README.md		README.md

ayalasher/F1-batch-pipeline

Folders and files

Latest commit

History

Repository files navigation

Formula 1 Championship ETL Pipeline

Overview

Medium article

Key Features

Tech Stack

Architecture

Prerequisites

Required Accounts

Required Software

Installation Commands

Installation & Setup

1. Clone the Repository

2. Set Up Astronomer Project

3. Configure Airflow Variables

4. Set Up Neon PostgreSQL

5. Deploy to Astronomer (Production)

Project Structure

Key Files

Usage

Running the Pipeline Locally

Pipeline Workflow

Viewing Data in Grafana

Database Schema

Tables

Troubleshooting

Common Issues

Debugging Tips

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages