Portfolio

Major Projects

BDNewsToday

Designed and developed News Aggregator app that crawls news from local newspapers (15+ sites/hour) and YouTube using Django, Scrapy, Firebase, and MySQL (16k+ Downloads, 1k+ daily users). The ETL pipeline automates scraping with a cron job and supports push notifications via Firebase. Users can search and filter newspapers, browse news by categories, and utilize the save and offline browsing (cache) feature.

Twitter Celebrity Matcher

Built a pipeline that finds the most similar Twitter celebrity users by applying Semantic Textual Similarity on over 900+ users’ 2.5 million+ scraped tweets utilizing Transformers, streamlit and FastAPI.

The scraped dataset and my analytics notebook is also available in kaggle:

PulsePoint Incidents Monitoring System

Developed an ETL pipeline to scrape data from PulsePoint web logs, using Selenium, Django, GCP, and third-party APIs and automated the process with concurrency to optimize and scale the scraping process. Developed a real-time dashboard using Django and Bootstrap for monitoring incident data.

An extensive geospatial analysis on the dataset - kaggle notebook

BD Medicine Scraper

Scraped medicine data using Scrapy and integrated it with DRF and PostgreSQL. Designed the pipeline to maintain relationships between data models. Customized the Django admin panel with features like autocomplete lookups, custom filters (e.g., alphabetical, model property), and bulk actions (e.g., CSV export). Also, extended Scrapy to run spiders via Django commands and configured proxy settings.

The scraped dataset and my analytics notebook (cleaning, wrangling, visualization) is available in kaggle: