Help

NTUOSS TGIFHacks #115 – Data Cleaning & Data Scraping



Event Date 25 Sep 2020 (Fri), 06:30 PM - 08:30 PM
Venue Microsoft Teams
Organiser NTU Open Source Society (Email : ntuoss@gmail.com )


Event Info


In this age of data-driven world, scraping and crawling content from the web to create datasets is a crucial skill to have in your portfolio. This workshop aims to give attendees a brief introduction into data scraping from web pages to cleaning the scraped data. By the end of the workshop, attendees should have acquired some hands-on experience with the topic by creating their very own datasets.

Attendees will learn about the entire data preparation and collection stage in a machine learning pipeline. We will be using scrapy (a python web-crawling framework) to scrape content from webpages and use various python libraries to preprocess the data. By the end of the workshop, the attendees will create a news headlines dataset (text data) for sentiment analysis task and a Binary image classification dataset (Paintings Vs Photographs).

 

About the Speaker:

Siddesh Sambasivam Suseela is a Year 3 EEE Undergraduate student specialising in data intelligence and processing, and currently serving as the chairperson of NTUOSS's TGIFHacks Events committee. He is deeply passionate about industrial research in deep learning and interested in application and deployment of large scale machine learning models. He has contributed to several open sources projects and loves to participate in various Kaggle competitions. He is currently working as a part-time Data Science intern at Shopee in the language services team.

 

Technical Prerequisites:

  • Python

Dependencies:

  • Python 3
  • Scrapy
  • Pandas
  • Matplotlib


Registration for this event has closed.