Dhwani Patel — Data Engineer

Multi-Cloud Expertise

Three clouds.
One engineer.

Production-grade data engineering across AWS, Azure, and GCP — each chosen for the right job.

🟠

Amazon Web Services

Primary Cloud · Production

S3AWS GlueLambda RedshiftAthenaLake Formation SESCloudWatchCloudFormationFargate

🔵

Microsoft Azure

Databricks · dbt · ADF

DatabricksAzure Data Factory Synapse AnalyticsADLS Gen2 CosmosDBAzure DevOpsAzure Blob

🔷

Google Cloud

BigQuery · Analytics

BigQueryCloud Storage Pub/SubDataprocIAM

Core Toolchain

Orchestration & Processing

Engines behind every pipeline — batch ETL to real-time streaming.

Apache SparkPySparkApache Airflow dbtApache KafkaHadoop TerraformDockerKubernetes JenkinsGitLab CI

Career

Where I've built.

Real production systems, real impact — across AI disaster response, banking, and enterprise SaaS data.

Data Engineer Intern

Interlinked — AI-Powered Disaster Response Platform

📍 Berkeley, California (Remote) 🏢 AI / Environmental Tech

Mar 2026 – Present

AWSAzure DatabricksPySpark dbtApache Airflow Real-TimeML Pipelines

40%

Latency Reduced

60%

Data Prep Cut

50%

Fewer Incidents

Real-Time

Threat Detection

›
Engineered real-time wildfire sensor data ingestion pipelines using Apache Spark on Databricks and AWS S3, reducing data availability latency by 40% — enabling earlier threat detection for emergency response teams across active wildfire zones.
›
Built end-to-end dbt transformation workflows on Azure converting raw geospatial telemetry into analytics-ready risk intelligence models — cutting data preparation time by 60% and accelerating dashboard refresh cycles for disaster coordinators.
›
Developed automated data quality monitoring frameworks using Python and SQL across mission-critical environmental feeds on AWS, reducing data integrity incidents by 50% and ensuring continuous situational awareness.
›
Designed and orchestrated feature engineering pipelines using Apache Airflow and PySpark to deliver validated, clean inputs to predictive wildfire spread models — enabling more precise resource allocation recommendations for emergency agencies.
›
Implemented complex PL/SQL and dbt transformation logic across environmental data models on Azure, producing documented pipelines that streamlined compliance and reporting for disaster response stakeholders.

Data Engineer

Addon Solutions

📍 Gujarat, India 🏢 FinTech / Banking

Nov 2023 – Nov 2024

AWS GlueS3 LambdaPySpark Lake FormationSOC 2GDPR

40%

Parsing Errors Cut

4-Zone

S3 Lake Built

Audit Findings

SOC 2

Compliant

›
Architected a four-zone AWS S3 data lake (Landing → Raw → Trusted → Curated) with AWS Glue crawlers and Athena tables enabling ad-hoc analytics across all ingested datasets at scale.
›
Engineered ETL pipelines using AWS Glue and Lambda to process NACHA banking transaction files; implemented AWS SES alerts for ingestion and transformation completion events.
›
Authored PySpark schema standardization scripts for multi-format source files — reducing downstream parsing errors by 40% across all active pipelines in production.
›
Implemented data governance using AWS Lake Formation with least-privilege RBAC, lineage tracking, and SOC 2 / GDPR compliance controls — achieving zero audit findings.

Data Engineer

N Vision IQ

📍 Gujarat, India 🏢 Enterprise SaaS / Analytics

Nov 2022 – Oct 2023

Apache AirflowRedshift TerraformS3 Multi-Source IngestionIncremental Load

35%

Faster Pipelines

SaaS Sources

Envs via Terraform

Zero

Full Refreshes

›
Built Apache Airflow DAGs for multi-source ingestion (Workday, HubSpot, Zendesk, Kantata, QuickBase) into a unified data warehouse with automated dependency management and retry logic.
›
Supported warehouse migration to AWS Redshift + S3 + Lake Formation; provisioned all infrastructure via Terraform across dev, staging, and production environments — infrastructure-as-code from day one.
›
Designed incremental load strategies replacing full-refresh patterns — reducing pipeline execution time by 35% across all production DAGs and drastically cutting compute costs.

Key Projects

Impact by
the numbers.

Every metric below is production-earned, not estimated.

Real-Time · AI · Disaster Response

40%

Wildfire Sensor Real-Time Ingestion Pipeline

Real-time ingestion of wildfire sensor telemetry using Apache Spark on Databricks and AWS S3. Reduced data latency by 40%, enabling earlier threat detection for emergency response teams across active wildfire zones.

Apache SparkDatabricks AWS S3PySparkAirflow

Azure · dbt · Geospatial

60%

Geospatial Telemetry dbt Pipeline

End-to-end dbt workflows on Azure converting raw geospatial telemetry into risk intelligence models — cutting data prep time by 60%.

dbtAzurePL/SQL

AWS · Data Quality

50%

Automated Data Quality Monitoring

Python + SQL automated quality monitoring across mission-critical environmental feeds. Reduced integrity incidents by 50% with proactive alerting.

PythonSQLCloudWatch

AWS · Banking · ETL

4-Zone

AWS NACHA Banking Transaction Lake

Four-zone AWS S3 data lake (Landing→Raw→Trusted→Curated) for NACHA banking transaction files. Glue crawlers, Athena tables, SES alerting, PySpark schema standardization — cutting parsing errors by 40%.

AWS GlueS3 LambdaSESPySpark

AWS · Governance

Lake Formation SOC 2 Compliance

Lake Formation with RBAC, lineage tracking, SOC 2 / GDPR controls. Zero audit findings across all production datasets.

Lake FormationSOC 2GDPR

Kafka · Real-Time · AWS

30%

Real-Time SaaS Pipeline on AWS

Kafka + Lambda + Redshift real-time ingestion. Reduced mean issue resolution time by 30% via CloudWatch monitoring and proactive runbooks.

KafkaLambdaRedshift

ML · Feature Engineering · Airflow

↑ Precision

Wildfire Predictive Feature Engineering Pipeline

Orchestrated feature engineering pipelines using Apache Airflow and PySpark to deliver validated, clean inputs to predictive wildfire spread models — enabling more precise resource allocation recommendations for emergency agencies.

AirflowPySpark Feature EngineeringAWS S3

Airflow · Multi-Source

35%

Multi-Source Airflow DAG Orchestration

Airflow DAGs for Workday, HubSpot, Zendesk, Kantata, QuickBase. Incremental loads replaced full-refresh — 35% faster across all production DAGs.

AirflowRedshiftTerraform

Let's Talk

Book a 30-min
intro call.

I'd love to chat about data engineering challenges, open roles, or how I can bring production-grade pipeline expertise to your team. Pick a slot and let's connect.

Walk through my technical background & projects
Discuss your data engineering challenges
Explore how I can add value to your team
Available for full-time, contract & consulting

📅 Book on Calendly 📞 Call Directly

30-min Intro Call

with Dhwani Patel · Data Engineer

📅

Monday, Jun 2

10:00 AM – 10:30 AM CST

Available

Tuesday, Jun 3

2:00 PM – 2:30 PM CST

Available

Wednesday, Jun 4

11:00 AM – 11:30 AM CST

Available

Friday, Jun 6

3:00 PM – 3:30 PM CST

Available

Data Engineer.
Cloud Native.

Three clouds.
One engineer.

Amazon Web Services

Microsoft Azure

Google Cloud

Orchestration & Processing

Built to
engineer
at scale.

Where I've built.

Impact by
the numbers.

Book a 30-min
intro call.

Academic foundation.

Ready to build
something great.

Data Engineer.Cloud Native.

Three clouds.One engineer.

Amazon Web Services

Microsoft Azure

Google Cloud

Orchestration & Processing

Built toengineerat scale.

Where I've built.

Impact bythe numbers.

Book a 30-minintro call.

Academic foundation.

Ready to buildsomething great.

Data Engineer.
Cloud Native.

Three clouds.
One engineer.

Built to
engineer
at scale.

Impact by
the numbers.

Book a 30-min
intro call.

Ready to build
something great.