🌟 Project 4: A/B Testing & Statistics on Olist Data

🧩 Context and Objectives

This one-day project aimed to explore a subset of the public Olist dataset (Brazil) in order to formulate business hypotheses, extract descriptive insights, and test hypotheses using statistical and A/B testing methods. The approach relied on analysis in Python (pandas, numpy, plotly express) and included an oral presentation of the results.

📊 Data Used

Olist dataset (reduced version: orders, customers, products, payments, reviews)
Data processed in a DataFrame after exploration and cleaning

🔍 Methodology

Exploration & cleaning: data types, missing values, duplicates
Descriptive statistics:

Average price / standard deviation / box plots (orders)
Distribution by product category, payment method, customer satisfaction
Analysis of delivery times (mean, median, std deviation)

Exploratory correlations:

Price vs satisfaction
Delivery time vs satisfaction
Product category vs rating

A/B Testing (examples):

On-time delivery vs average satisfaction (t-test)
Payment method vs order completion rate (chi²)
Product categories vs order value (ANOVA)

📈 Results

Delivery time has a significant impact on customer satisfaction (p-value < 0.05)
Some product categories influence the average rating given by customers
No significant difference found between payment methods on order completion rate

📌 Tools

Python (pandas, numpy, plotly, scipy.stats), Jupyter

🗣️ Presentation

Synthetic slides with visualizations + explanation of tests

View the notebook on Google Colab