Portfolio
A collection of end-to-end projects in data engineering, MLOps, and natural language processing, featuring scalable architectures, published code, and open datasets.
Major Projects
LLM-Ghostwriter
MLOps-driven LLM assistant with an FTI pipeline (Features → Training → Inference) that learns writing style from user content.
- Integrates RAG for context grounding, ZenML for orchestration, MongoDB (used as DWH), and Qdrant for retrieval.
- Curated instruction and preference datasets for fine-tuning.
- Fine-tuned LLMs with TRL and Unsloth using PEFT (LoRA), applying SFT and DPO to generate a preference-aligned LLM.
- Implemented custom LLM-as-a-judge evaluation framework and a REST inference service.
- Used Comet ML for experiment tracking and Opik for prompt monitoring.
- Built with Domain-Driven Design for modularity and maintainability.
Restaurant Menu Pricing
Built a production-grade Azure MLOps pipeline for menu price prediction using 5M+ crawled menu items.
- Automated ETL (web crawling) with Scrapy.
- Extracted menu entities using a RoBERTa-based NER (token-classification) model.
- Trained and tuned models with using scikit-learn, TensorFlow, Optuna, and MLflow on Azure Machine Learning.
- Deployed on Azure ML with IaC (ARM templates), canary rollouts, and full observability via Azure Monitor/Application Insights.
- Includes a lightweight Poetry CLI for ETL, training, and inference.
BDNewsToday
A scalable news aggregator built with Django, Scrapy, Firebase, MySQL, and AWS. Crawls news from 20+ sources per hour.
- Designed and developed an end-to-end ETL pipeline to crawl news from local newspapers and YouTube.
- Automated crawling with cron jobs and integrated Firebase for real-time push notifications.
- Included offline browsing (cache), category-based filtering, and a user-friendly search interface.
PulsePoint Incidents Monitoring System
A real-time emergency monitoring tool powered by an AWS-based crawling pipeline and geospatial analysis.
- Crawled live emergency data from PulsePoint web logs using Selenium and third-party APIs.
- Built a concurrent ETL pipeline using AWS Batch, S3, and Lambda to automate and scale the crawling process.
- Developed a near-real-time dashboard with DRF and JavaScript to monitor emergency events.
- Performed geospatial analysis using Pandas, Folium, clustering algorithms (e.g., K-means, HCA)
BD Medicine Scraper
An API-powered system for crawling and managing medicine data from Bangladesh.
- Built an ETL pipeline and REST APIs using Scrapy, DRF, and PostgreSQL.
- Extended Scrapy to run via Django commands and added proxy IP rotation configuration.
- Customized the Django admin panel with features like autocomplete lookups, custom filters (e.g., alphabetical, model property), and bulk actions (e.g., CSV export).
Published the cleaned dataset and analytics notebook on Kaggle: