I am a Data Scientist with expertise in SQL, Python, and data visualization, used to collaborate in multidisciplinary teams.

I am proficient at gathering requirements, automating workflows, and delivering actionable insights.

Passionate about data, I also work on analytics and machine learning projects in my spare time and regularly take online courses to learn new subjects.

Projects

(from newest to oldest)

Price Model deployed on GCP using Docker

Code

Created a Gradio web app, containerized it with Docker, registered the image in Google Artifact Registry, and deployed it on Google Cloud Run.

Solving Classic Childhood Games with OpenCV and scikit-image

Explored image processing with Python by solving childhood puzzles.

Using OpenCV and scikit-image, spotted differences in images by comparing their structural similarity, solved mazes through basic image transformations, and detected objects regardless of scale via template matching.

Spot the differences (report)

Solve the maze (report)

Find the object (report)

Code (on the right panel)

Gender Classification with BiLSTM and Attention Mechanism (PyTorch)

Kaggle Notebook

Classified names by gender using a Bidirectional LSTM combined with a simple attention mechanism, using PyTorch.

The dataset consists of first names, which are encoded into integer sequences. The model processes these sequences to predict gender, with the output being a probability of male or female.

Performance is evaluated using metrics like accuracy, F1 score, and ROC AUC.

Feedback Submission Platform with Streamlit, EC2, and S3 Integration

Code

Developed a Streamlit web application hosted on an AWS EC2 instance. The app presents users with a table and prompts them to enter their name and opinion for each row. Once users are satisfied with their inputs, they can submit the data using a submit button.

Upon submission, the data is saved to a pre-configured Amazon S3 bucket when running on EC2. If S3 is inaccessible, the data is stored locally as a fallback.

Understanding Diabetes Incidence: Feature Selection, Hyperparameter Tuning and Model Evaluation

Report

Code

Integrated a medical dataset to analyze diabetes incidence, performed feature selection, and tuned hyperparameters of Decision Tree, Logistic Regression, and XGBoost models using coarse-to-fine and Bayesian optimization approaches to enhance recall scores.

Optimizing Traffic Flow in New York City: Predictive Modeling and Strategic Insights from Telemetry Data

Report

Code

Integrated a dataset containing telemetry data from two taxi drivers in New York with geographical data, developed Random Forest, Stochastic Gradient Boosting, and XGBoost models to predict trip duration, and provided recommendations for New York City and the taxi company.

Post-Pruned Trees and Gradient Boosting to Predict Cardiovascular Disease

Kaggle Notebook

Used post-pruned trees and gradient boosting to predict cardiovascular disease in well-balance dataset with few features and reduced size.

If competition was still ongoing, this solution would have placed 3rd, with an accuracy of 0.72666.

Customer Transactions - Data Bank (RFM Analysis)

Kaggle Notebook

Brief analysis of customer transactions dataset, including RFM (Recency-Frequency-Monetary) analysis to predict churn likelihood.

Data Competition - Was a Website Redesign Successful?

Code

My submission to the competition “Was a Website Redesign Successful?” evaluates four website designs using A/B testing, recommending the most effective one with 95% confidence and including probabilities for Type I and Type II errors.

It also serves as a tutorial on A/B testing using z-scores and power analysis.

Data Cleaning and translation project - Retail Turnover Index in Portugal

Code

Cleaned a dataset from Instituto Nacional de Estatística by imputing missing data and splitting string columns into multiple fields for clarity; translated Portuguese values to ensure standardization, dropped unnecessary columns, renamed fields, and sorted the data.

Tableau Viz - Happiness around the world visualization

Tableau Viz

My first Tableau Viz, showcasing an analysis of 2021 happiness scores across various countries. Data sourced from Kaggle.

DataCamp Data Analyst Capstone - Boat Listings Newsletter

Code and Final Report

Capstone of Datacamp Data Analyst Certification aimed at improving boat listing views through a newsletter, based on features popular listings have in common.

I was given 24h to prepare a 8-slides presentation for the Head of Marketing (non-technical), which I then presented during a 10-minute video call.

Lisbon Vacation Rentals: Data Scraping and Visualization

Code

Tableau Viz

Scraped vacation rental listings from Imovirtual using Python and BeautifulSoup. Extracted and cleaned data with Pandas and Tableau Prep. Visualization created in Tableau using sunburst, butterfly, and circle grid charts.

IBM Data Science Capstone Project - Opening a Japanese restaurant in Madrid

Code and Final Report

Final assignments from the IBM Data Science course, including the capstone project.

The capstone project involved identifying prime locations for a new Japanese restaurant in Madrid using Foursquare data, with sites clustered and evaluated via K-means.

Files are named by course and include additional details as needed (e.g., report).

Certificates

I have earned over 35 certificates from online courses, accumulating more than 400 hours of self-study.

My training spans Data Engineering, Data Analysis, Data Science, and Python Programming.

Data Engineer Associate Certification

Certificate (1)

Timed assessments and coding challenges in SQL and Python, focusing on data management and data engineering.

Data Analyst Certification

Certificate (1)

Timed assessments and coding challenges in PostgreSQL and Python, focused on exploratory analysis, data reporting, and core statistical techniques like A/B testing.

IBM Data Science Track

Certificate (1)

9-course track ranging from defining business problems to deploying machine learning solutions, totaling 240 hours.

Analyst Learning Path Certification

Certificates (5)

Series of Tableau courses offered through Tableau eLearning, focused on data preparation, analysis, and reporting using Tableau and Tableau Prep Builder.

Python Programming Certificates

Certificates (10)

Certified courses in Python programming, totalling 43 hours and focusing on best practices and testing frameworks.

Data Engineering certificates

Certificates (5)

Certified courses in Data Engineering, totalling 22 hours and focusing on query performance, database design, and cloud computing principles.

Tableau certificates

Certificates (5)

Certified Tableau dashboard creation courses, totaling 23 hours, covering topics from data source connections to a capstone project.

General Data Topics certificates

Certificates (8)

Certified Data Science courses, totaling 19 hours, covering general topics, including business applications and programming paradigm theory.

Qualitative Methodology in Scientific Research

Certificate (1)

2-ECT course focusing on classification and analysis of qualitative data.