Portfolio

A selection of full-stack projects in data engineering, NLP, and real-time analytics, with published code and datasets.

Major Projects

BDNewsToday

A scalable news aggregator built with Django, Scrapy, Firebase, and MySQL. Crawls news from 15+ sources per hour and serves 1K+ daily users.

  • Designed and developed a full-stack ETL pipeline to scrape news from local newspapers and YouTube.
  • Automated scraping with cron jobs and integrated Firebase for real-time push notifications.
  • Included offline browsing (cache), category-based filtering, and a user-friendly search interface.

Twitter Celebrity Matcher

Built a pipeline that finds the most similar Twitter celebrity users by applying Semantic Textual Similarity on over 900+ users’ 2.5 million+ scraped tweets utilizing Transformers, streamlit and FastAPI.

PulsePoint Incidents Monitoring System

A real-time emergency monitoring tool powered by an AWS-based scraping pipeline and geospatial analysis.

  • Scraped live emergency data from PulsePoint web logs using Selenium and third-party APIs.
  • Built a concurrent ETL pipeline using AWS Batch, S3, and Lambda to automate and scale the scraping process.
  • Developed a near real-time dashboard with DRF and JavScript to monitor emergency events.
  • Performed geospatial analysis using Pandas, Folium, Clustering algorithms (e.g., HCA)

BD Medicine Scraper

An API-powered system for scraping and managing medicine data from Bangladesh.

  • Built an ETL pipeline and REST APIs using Scrapy, DRF, and PostgreSQL.
  • Extended Scrapy to run via Django commands and added proxy IP rotation configuration.
  • Customized the Django admin panel with features like autocomplete lookups, custom filters (e.g., alphabetical, model property), and bulk actions (e.g., CSV export).

Published the cleaned dataset and analytics notebook on Kaggle: