Nirdosh Gandhi
Data Engineer • Azure Databricks • PySpark • ML Pipelines
Bengaluru, India Databricks Certified M.Tech in AI
Data Engineer at Axtria with 2+ years of experience designing and optimising large-scale data pipelines on Azure Databricks. M.Tech in Artificial Intelligence from Delhi Technological University. Passionate about turning complex, raw data into actionable insights and building resilient, production-grade pipelines with PySpark, Python, and Azure.
What I Bring
Data Engineering
Scalable, production-grade ETL pipelines on Azure Databricks using Medallion Architecture (Bronze → Silver → Gold), PySpark, and SQL — serving multi-market pharma analytics.
ML & AI Integration
M.Tech in Artificial Intelligence from DTU with hands-on experience delivering governed, audit-ready datasets that power ML model deployment in live production systems.
DevOps & Automation
End-to-end CI/CD pipelines using GitHub Actions and Databricks Asset Bundles — automating artifact packaging, deployment, and environment configuration across dev, staging, and prod.
Experience
- Configured and optimised Spark-based ETL pipelines for a global pharma client across 30+ markets at brand-indication level, reducing execution time by 40%. Powered an ML-driven HCP-recommendation system generating 18,000+ monthly suggestions for 300+ sales reps via Veeva CRM.
- Built the MyInsights Python pipeline — gathering requirements directly from multilingual market contacts (Italy, Germany, France, Spain) and implementing complex business rules for HCP suggestion eligibility, rep-initiated dismissals, and priority-based filtering.
- Collaborated with the MLOps team on end-to-end pipeline validation and ML model deployment. Maintained a Bronze→Silver→Gold Lakehouse on Databricks with Unity Catalog governance, CI/CD via GitHub Actions + DAB, and SQL-driven data quality monitoring across DEV/UAT/PROD — reducing manual testing by 30%.
- Stack: PySpark · Delta Lake · Databricks · Python · SQL · GitHub Actions · DAB · Unity Catalog · Veeva CRM · Azure
- Led Unity Catalog implementation for data governance and access control — delivering the organisation's first UC-enabled product and contributing directly to a 5/5 client CSAT score.
- Deployed Python wheel packages and notebooks via Azure DevOps CI/CD to onboard 40+ pharma markets, transforming omnichannel engagement survey data (webinars, portals, email, Veeva Event Management, WhatsApp) into Power BI-ready datasets. Built and maintained Azure Data Factory pipelines with schedule-based triggers for market-brand refresh cycles, managing secret rotation and environment health across DEV/PROD.
- Developed SQL-driven data quality validation queries and maintained infrastructure parity by synchronising ARM templates from PROD to DEV.
- Stack: Unity Catalog · Azure DevOps · Azure Data Factory · Python · SQL · Power BI · Databricks · ARM Templates
- Consolidated pharmaceutical engagement data from different source systems across multiple input files using Python and SQL, producing clean analytical datasets that fed downstream KPI reporting.
- Gained hands-on exposure to Databricks and production ETL tooling — contributing to live client deliverables.
- Full-cycle EDA on Indian road accident dataset using NumPy, Pandas, Matplotlib, Seaborn & Tableau — identifying severity patterns for stakeholder review.
- Trained ML classifiers to predict accident severity: 83.3% accuracy with Logistic Regression. Deployed on Heroku. GitHub ↗
- Built a News Summarizer web app using Django and NLP (TF-IDF) — fetches, summarises & categorises news in 12 languages based on user country preference. Deployed on Heroku. GitHub ↗
Education
· Gajera Vidyabhavan (10th)
Tech Stack
Certifications
Projects
Databricks End-to-End Data Product
End-to-end Databricks data product platform: CSV ingestion to medallion layers, ML segmentation and AI dashboard. Implemented DLT ETL with quality checks, deployed via MLflow, and served via FastAPI with GitHub Actions CI/CD.
- • Stack: Databricks, DLT, MLflow, FASTAPI
Skin Cancer Detection using CNN
Fine-tuned ResNet-50 (TensorFlow & Keras) to classify malignant vs. benign dermoscopy images. Achieved 92.4% accuracy — demonstrating CNNs in medical imaging diagnostics.
- • Stack: TensorFlow, Keras, Python
Gandhi Art Drawing
Full-stack Django web app promoting a family art business — role-based access (admin & customer). Drives ~400 visits/month and delivered 40% increase in sales. Deployed on PythonAnywhere.
- • Stack: Django, Bootstrap, PythonAnywhere
LRU / LFU Cache Visualizer
Cache algorithm visualizer using hash maps & linked lists to optimise memory management. Graphics rendered via C++ SFML library illustrating get/put operations in real time.
- • Stack: C++, DSA, OS, SFML
AI Email Classifier
Classifies unread Gmail using IMAP4 protocol — TfIdf feature extraction & SGDClassifier, GridSearchCV. Sorts into 20 categories with 76.6% accuracy.
- • Stack: SGDClassifier, Flask, IMAP4
G-Home Assistant
NodeMCU-based IoT model controlling home appliances via Google Assistant voice commands or Blynk app button — integrates IFTTT / AdaFruit service.
- • Stack: IFTTT, Blynk, NodeMCU
Achievements & Publications
Google Scholar ↗
AIP Conference
IEEE Xplore
Contact
Lotus feet of H.D.H. Prabodhjivan swamiji