Celebrity Classifier

December 8, 2024 - 2 mins read

nanoHUB

This project is a machine learning project focused on classifying images of five actors: Emily Blunt, John David Washington, Michelle Yeoh, Saoirse Ronan, and Tony Leung. It covers various aspects of data science and machine learning, including data collection, data cleaning, feature engineering, model building, model fine-tuning, and model deployment.

Data collection involved utilizing the Fatkun Chrome tool to download images of the celebrities from sources like Google Images and Bing Images. Fatkun is a powerful and user-friendly Google Chrome extension that simplifies the process of bulk image download from web pages. It provides an intuitive interface that allows users to easily select and save multiple images with just a few clicks. The decision to use Fatkun for image collection is based on its ease of use and time efficiency. This approach eliminates the need for coding and reduces the complexity of the data collection workflow.

Data cleaning involved two tasks: face detection and image resizing. Face detection was performed using OpenCV to crop out the faces of the celebrities from the images. Image resizing was done to standardize the cropped faces to a size of 32 x 32 pixels. Feature engineering included wavelet transforms and color histograms. Wavelet transforms were used to extract low-level features from grayscale face images using PyWavelets. Color histograms were used to extract high-level features from color face images using OpenCV.

Model building consisted of building three models: SVM, logistic regression, and random forest. These models were trained on the extracted features and the labels of the celebrities. Model fine-tuning involved optimizing the hyperparameters of the models using cross-validation and grid search. Cross-validation evaluated the model performance using accuracy as a metric, and grid search searched for the best hyperparameter combinations.

The project resulted in trained models for each actor. These models can be used for classifying new images of the actor. Overall, the project showcases the complete pipeline of a data science and machine learning project, from data collection and cleaning to feature engineering, model building, and fine-tuning.

The project can be found on Github.