Data & Machine Learning Engineer

Building Resilient
Data Architectures & AI Solutions

Scalable Systems Engineer specializing in distributed computing, cloud-native infrastructure (AWS/GCP), and AI integration. Proven experience designing reusable software frameworks and internal tooling that automate complex workflows.

About Me

My journey in Tech

Federico Fiorio

I am a Data & Machine Learning Engineer based in Milan, Italy. I specialize in designing and optimizing high-performance data architectures at scale. My experience ranges from building enterprise-scale data pipelines using Databricks and Medallion Architecture to developing RAG-powered tools and optimizing orchestration with Airflow.

I hold an MSc in Computer Science from the University of Milan (Grade: 108/110) & Erasmus+ at University of Copenhagen, where I researched text-to-image diffusion model attacks. I am passionate about staying at the forefront of technology, constantly learning and applying new skills in Data Engineering, MLOps, and Agentic AI.

Experience

My professional track record

Data & Machine Learning Engineer

May 2024 – Present

Data Reply | Milan, Italy

  • LLM Fine-Tuning: Fine-tuned Llama 3 for fashion intent classification using LoRA and quantization, achieving high accuracy while optimizing GPU compute resources on Databricks.
  • RAG-powered Tooling: Developed a UI-driven engine to generate AWS CloudFormation templates from natural language. Optimized prompts to guarantee well-written YAML output and lower Time To First Token (TTFT).
  • Data Pipelines: Designed high-volume ingestion pipelines using Databricks and Medallion Architecture. Optimized historical tracking with SCD Type 2, reducing query time by 80% and more.
  • Sanctioned Shop Blocker: Optimized the customer onboarding sanctions pipeline by refactoring matching logic from permutation based (O(n!)) to token-sorting (O(n log n)), reducing data storage by 75% and significantly decreasing execution time.
  • Orchestration: Migrated Airflow DAGs from Composer 1.x to GCP Composer 2.x, increasing scheduling efficiency by 10x.
  • CI/CD: Engineered Azure DevOps pipelines for Databricks, leveraging Terraform for Unity Catalog, DABs for deployment and integration testing with DQx for data quality, and Pytest for unit testing.

Data Engineer

Oct 2023 – Apr 2024

Management Solutions | Milan, Italy

  • IBM DataStage Refactoring: Optimized IBM DataStage ETL workflows and SQL for banking data warehouses.
  • Data Governance: Enforced data governance policies in cross-functional teams.

Software Engineer Intern

Feb 2021 – May 2021

Dilium | Milan, Italy

  • GANs Research: Conducted research on GANs for deepfake generation, emphasizing ethical considerations.
  • Training & Awareness: Led training sessions on emerging risks of AI technologies.

Projects

Selected personal work and achievements

NASA Space Apps Challenge 2025 Winner

Zurich — Led a team to develop an AI-driven traffic optimization platform using Reinforcement Learning and SUMO to reduce urban congestion.

View Project

Real-Time Edge Detection System

Engineered and deployed a computer vision safety system controlling operations at 2 commercial car wash facilities using tflite.

Personalized BoardGames RAG

RAG application using OpenAI embeddings and Qdrant for vector search to retrieve precise game mechanics from rulebooks.

View Repo

Patient Journey Analysis

LLM-powered pipeline to extract clinical insights from unstructured patient symptom descriptions, transforming narratives into structured data.

View Repo

Skills

Technical proficiency

AI & ML

Prompt Authoring
Fine-tuning (PEFT/LoRA)
DPO
LangChain
PyTorch
Vector Databases (Qdrant)
Scikit-learn
MLflow

Data Engineering

Databricks
Apache Spark
SQL
NoSQL (MongoDB)
Airflow
GCP Composer
BigQuery

Languages & Core Tech

Python
Java
SQL
PySpark
Docker
Terraform
K8s
ArgoCD
Github Actions
Azure DevOps

Agents Automation

n8n

Computer Science & System Design

DSA Mastery

Passionate about Data Structures and Algorithms with 660+ LeetCode problems solved. I continuously sharpen my problem-solving skills to write efficient, optimized code.

View DSA Repo

System Design

Deepening expertise in scalable architecture through advanced resources:

  • Designing Data-Intensive Applications
  • Software Architecture: The Hard Parts

Certifications

Databricks Certified Data Engineer Professional

Databricks Certified Associate Developer for Apache Spark

Databricks Certified Data Engineer Associate

AWS Certified Machine Learning – Specialty

Professional Machine Learning Engineer

Neo4j Certified Professional

Get In Touch

Let's build something amazing together

Email Me

fioriofederico99@gmail.com

Call Me

+39 342 596 4621