An automated ETL data pipeline that extracts current Formula 1 championship standings, transforms the data, and loads it into PostgreSQL for visualization in Grafana dashboards.
This project implements a production-grade ETL pipeline that automatically fetches the latest Formula 1 championship data, processes it, and stores it in a PostgreSQL database. The pipeline runs weekly on Astronomer (managed Airflow) and provides real-time championship standings through Grafana dashboards.
- I have written a medium article where I will document my experince building this project . Check it out in the link below.
- Medium article for my project
- Automated Data Extraction: Fetches current F1 driver and constructor championship standings from F1 API
- Data Transformation: Cleans and normalizes data using pandas
- PostgreSQL Storage: Stores processed data in Neon PostgreSQL database
- Grafana Visualization: Dashboards showing Top 10 drivers (Bar chart), the constructors champinoship(Bar chart) and the drivers point distribution(pie chart)
- Production Deployment: Runs on Astronomer Cloud with weekly scheduling
- Error Handling: Comprehensive logging and retry mechanisms defined in the DAG
- XCom Data Passing: Efficient data flow between Airflow tasks
| Technology | Purpose |
|---|---|
| Apache Airflow 3.1 | Workflow orchestration and scheduling |
| Python 3.12 | Data processing and transformation |
| Pandas | Data manipulation and cleaning |
| PostgreSQL (Neon) | Data storage |
| SQLAlchemy | Database ORM |
| Grafana | Data visualization |
| Astronomer | Managed Airflow deployment |
| F1 API | Data source |
The pipeline follows a standard ETL pattern:
- Extract: Airflow tasks fetch data from F1 API endpoints (drivers and constructors)
- Transform: Pandas processes and cleans the data, normalizing JSON structures
- Load: SQLAlchemy loads transformed data into PostgreSQL (Neon)
- Visualize: Grafana queries the database to display championship standings
Data Flow:
F1 API → Airflow DAG → Pandas Transformation → PostgreSQL (Neon) → Grafana Dashboards
Before setting up this project, ensure you have the following:
- Astronomer Account - For managed Airflow deployment
- Neon PostgreSQL - Free PostgreSQL database hosting
- Grafana Cloud (optional) - For cloud-based dashboards, or run locally with Docker
- Python 3.12+ - Download
- Docker Desktop - Download
- Astronomer CLI - For local development and deployment
- Git - For version control
Install Astronomer CLI (Windows PowerShell):
winget install -e --id Astronomer.AstroVerify installations:
python --version
docker --version
astro versiongit clone https://github.com/ayalasher/F1-batch-pipeline.git
cd F1-batch-pipeline/formulae_1_batch_pipeplineThe project is already initialized with Astronomer. To run locally:
astro dev startThis will start:
- Airflow Webserver: http://localhost:8080
- Postgres: localhost:5432
You need to set up the following Airflow Variables (either in Astronomer UI or via CLI):
For Astronomer Deployment:
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key API_URL_DRIVERS --value "https://f1api.dev/api/current/drivers-championship"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key API_URL_CONSTRUCTORS --value "https://f1api.dev/api/current/constructors-championship"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_USER --value "<YOUR_NEON_USER>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_PASSWORD --value "<YOUR_NEON_PASSWORD>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_HOST --value "<YOUR_NEON_HOST>"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_PORT --value "5432"
astro deployment variable create --deployment-id <YOUR_DEPLOYMENT_ID> --key POSTGRES_DB --value "<YOUR_NEON_DB>"For Local Development (via Airflow UI):
- Navigate to http://localhost:8080
- Go to Admin → Variables
- Add the same variables as above
- Create a free account at Neon.tech
- Create a new project
- Copy your connection details (host, user, password, database)
- Use these details in Step 3 above
astro deploySelect your deployment when prompted.
F1-batch-pipeline/
├── README.md
├── media/
│ └── architecture-diagram.png
└── formulae_1_batch_pipepline/
├── Dockerfile # Astronomer runtime configuration
├── packages.txt # System-level packages
├── requirements.txt # Python dependencies
├── airflow_settings.yaml # Airflow configuration
├── dags/
│ ├── etl_batch.py # Main ETL pipeline DAG
│ └── exampledag.py # Example DAG from Astronomer
├── src/
│ ├── drivers/
│ │ └── drivers.py # Driver data extraction & transformation
│ ├── constructors/
│ │ └── constructors.py # Constructor data extraction & transformation
│ └── formula_one_sql.sql # SQL queries for data verification
├── include/ # Additional Python modules
├── plugins/ # Custom Airflow plugins
└── tests/
└── dags/
└── test_dag_example.py # DAG testing
| File | Description |
|---|---|
dags/etl_batch.py |
Main Airflow DAG with TaskFlow API implementation |
src/drivers/drivers.py |
Functions to extract and transform driver championship data |
src/constructors/constructors.py |
Functions to extract and transform constructor championship data |
requirements.txt |
Python dependencies (pandas, sqlalchemy, psycopg2-binary, etc.) |
airflow_settings.yaml |
Airflow connections and variables configuration |
-
Start Airflow:
astro dev start
-
Access Airflow UI:
- Open http://localhost:8080
- Default credentials:
admin/admin
-
Trigger the DAG:
- Find
f1_etl_pipeline_rewritein the DAGs list - Toggle it to "On"
- Click the play button to trigger manually
- Find
-
Monitor Execution:
- Click on the DAG name
- View task logs and execution status
- Check XCom values for data passing between tasks
The DAG executes the following tasks in parallel streams:
Driver Pipeline:
extract_drivers- Fetch driver championship data from F1 APItransform_drivers- Clean and normalize the dataload_drivers_to_database- Insert into PostgreSQLdrivers_championshiptable
Constructor Pipeline:
extract_constructors- Fetch constructor championship data from F1 APItransform_constructors- Clean and normalize the dataload_constructors_to_database- Insert into PostgreSQLconstructors_championshiptable
Both pipelines run independently and in parallel for optimal performance.
-
Set up Grafana Data Source:
- Add PostgreSQL data source
- Use your Neon connection details
- Test the connection
-
Import Dashboards:
- Create visualizations for:
- Driver Championship Standings (bar chart)
- Constructor Championship Standings (bar chart)
- Points Distribution (pie chart)
- Top 10 Drivers/Teams (table)
- Create visualizations for:
-
Example Queries:
Driver Standings:
SELECT
position,
driver_name || ' ' || driver_surname AS driver,
points
FROM drivers_championship
ORDER BY position
LIMIT 10;Constructor Standings:
SELECT
position,
"team_teamName" as team,
points,
wins
FROM constructors_championship
ORDER BY position;drivers_championship
| Column | Type | Description |
|---|---|---|
position |
INTEGER | Driver's championship position |
driver_name |
VARCHAR | Driver's first name |
driver_surname |
VARCHAR | Driver's last name |
team_teamId |
VARCHAR | Team identifier |
points |
FLOAT | Championship points |
constructors_championship
| Column | Type | Description |
|---|---|---|
position |
INTEGER | Team's championship position |
teamId |
VARCHAR | Team identifier |
team_teamName |
VARCHAR | Team name |
points |
FLOAT | Championship points |
wins |
INTEGER | Number of race wins |
DAG not appearing in Airflow UI
- Check DAG parse errors in the UI
- Verify all imports are inside
@taskfunctions - Ensure DAG parsing time is under 30 seconds
Environment variables not found
- Use
Variable.get()fromairflow.modelsinstead ofos.getenv() - Verify variables are set in Astronomer deployment or local Airflow UI
- Check variable names match exactly (case-sensitive)
Database connection errors
- Verify Neon PostgreSQL credentials in Airflow Variables
- Test connection using
psqlor database client - Ensure Neon project is active (not paused)
Grafana queries failing
- Use double quotes for column names with underscores:
"team_teamName" - Check actual column names in PostgreSQL using
\d table_name - Use
ORDER BY+LIMITinstead ofWHEREfor top N queries
- Check Task Logs: Click on task → View Logs for detailed error messages
- XCom Values: Navigate to Admin → XCom to verify data passing between tasks
- Local Testing: Run
astro dev startto test DAG locally before deploying - Database Verification: Query tables directly to confirm data loaded correctly
This project is licensed under the MIT License - see the LICENSE file for details.
- F1 API - f1api.dev for providing free F1 data
- Astronomer - For managed Airflow platform
- Neon - For serverless PostgreSQL hosting
- Apache Airflow Community - For excellent documentation and support
Project Author: Ayal ASher
GitHub: @ayalasher
Repository: F1-batch-pipeline
⭐ If you found this project helpful, please consider giving it a star!
