Luxury

Fragrance

Intelligence

An end-to-end data pipeline reverse-engineering Fragrantica community reviews to uncover the hidden mathematical patterns of olfactory preferences. From raw web scrapes to executive-ready market intelligence.

60,000+

Validated Perfumes

1,000+

Global Brands

100%

Rating Bias Corrected

Luxury perfume bottle with golden cap on dark background
Project Links

Tech Stack

PythonSQLPower BIPandasRegex
Case Study - 2026
Scroll

Where Scent Meets Data Science

The global fragrance market is saturated, but what truly drives consumer loyalty? This project moves beyond aggressive marketing to dissect why a scent sells. The mission is to uncover correlations between scent profiles (accords) and true consumer consensus, identifying high-performing product structures by stripping away the noise of sheer marketing volume.

Fragrantica.com

Data Source

60,000+

Perfumes Validated

1,000+

Global Brands

2018 - 2025

Time Period

Project Objectives

01

Identify which fragrance notes drive the highest consumer ratings across luxury houses

02

Build a weighted rating model that corrects for review volume bias in raw averages

03

Segment consumer preferences by fragrance family (Fresh, Warm, Oriental, Woody)

04

Surface actionable product positioning recommendations via Power BI dashboard

Python Pipeline
Architecture

The raw dataset contained over 70,000 loosely structured records with nested arrays and heavily concatenated text.

raw
clean

Regex Alchemy

Structured top-heart-base notes from raw text using regex, while cleaning product identities by removing URL noise and extracting brand signals.

pattern = r'(?:top notes?|heart notes?|base notes?):\s*([^\n]+)'
df['notes_structured'] = df['raw_notes'].str.extract(pattern, expand=False)

Array Un-nesting

Flattened nested note arrays into row-level format, enabling granular analysis of individual accords driving preferences.

df_exploded = df.explode('note_array').reset_index(drop=True)
df_clean = df_exploded.dropna(subset=['note_array'])

Type Casting & Normalization

Standardized mixed data types (ratings, dates, brands) to ensure consistency for downstream analysis.

df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
df['review_date'] = pd.to_datetime(df['review_date'])
df['brand'] = df['brand'].astype('category')

Strategic Imputation

Handled missing values with rule-based imputation, preserving dataset scale and analytical integrity.

df[text_cols] = df[text_cols].fillna('Unknown')
df['rating'] = df['rating'].fillna(0.0)

Uncovering True Consensus & Bias Correction

Naive averages distort rankings: low-vote niche perfumes can outperform globally validated fragrances. To address this, I implemented a Bayesian-style Weighted Rating model in SQL Server. The minimum vote threshold (m) was dynamically derived using PERCENTILE_CONT(0.9), ensuring fair ranking across both niche and high-volume products.

Exploratory Data Analysis

4.12

Mean raw rating across all reviews

±0.31

Standard deviation - tight clustering

847

Reviews with <10 votes flagged as biased

3.4x

Chanel reviewed more than any other house

Weighted Rating Formula

WR = (v / v+m) x R + (m / v+m) x C
v = vote count
m = min votes (250)
R = raw average
C = global mean (4.12)
Weighted Rating Model
SQL
-- Top 20 Fragrances (Weighted Rating)
DECLARE @C FLOAT = (
SELECTSELECT AVG(CAST(rating_value ASAS FLOAT))
FROMFROM perfumes
WHEREWHERE rating_value IS NOTNOT NULL
);
DECLARE @m FLOAT = (
SELECTSELECT PERCENTILE_CONT(0.9)
WITHIN GROUP (ORDER BYORDER BY rating_count) OVER ()
FROMFROM perfumes
WHEREWHERE rating_count IS NOTNOT NULL
);
SELECTSELECT TOP 20
perfume_id, perfume, brand,
rating_value ASAS R, rating_count ASAS v,
ROUND(
(rating_count * 1.0 / (rating_count + @m)) * rating_value +
(@m / (rating_count + @m)) * @C
, 2) ASAS weighted_rating
FROMFROM perfumes
WHEREWHERE rating_count IS NOTNOT NULL
ORDER BYORDER BY weighted_rating DESC;
SQL query results showing weighted fragrance ratings table with brand names and scores
Query Result Preview

Intelligence Visualized

The final weighted logic was encapsulated in a SQL VIEW (dbo.perfume_weighted) to feed a live analytical layer in Power BI. The dashboard was designed with a top-down analytical flow: Macro-Level for global market structure and brand dominance, and Micro-Level for granular performance of specific Olfactory Notes with fully dynamic slicers.

Luxury Fragrance Intelligence - Power BI
Live
Power BI dashboard showing luxury fragrance market analytics with charts, ratings, and brand performance metrics

Interactive Dashboard

Open Live Report →

4 Report Pages

Overview - Notes - Brands - Trends

Drill-Through

Click any brand to deep-dive

The “Aha!” Moments & True Consensus

The visualization stripped away market noise and revealed highly actionable insights about what truly drives consumer approval in the fragrance market.

The Fresh Illusion

Fresh & Citrus Dominate Volume - But Not Ratings

While Fresh and Citrus accords dominate sheer market production volume, deeper analysis revealed that fragrances built around warmer, complex accords (Vanilla, Amber, Tobacco, Oud, Warm Spices) consistently achieved the highest weighted ratings.

Fresh

Top volume accord

Warm

Top rated accord

60K+

Perfumes analyzed

The Warm Reality

Warm & Complex Accords Win True Consumer Consensus

The refined rankings surfaced fragrances like Le Male Le Parfum, Stronger With You Intensely, and Spicebomb Extreme. The common denominator is a refined balance of sweet, spicy, and warm woody bases - consistently earning the highest weighted scores.

Whiskey

Top Rated Note

Warm Spicy

Highest Engagement

100%

Bias corrected

Consumer Consensus Leaderboard

By Weighted Rating
RankBrandFragranceWeighted ScoreVotesvs Mean
01Jean Paul GaultierLe Male Le Parfum4.5813,406+0.46
02Giorgio ArmaniStronger With You Intensely4.549,587+0.42
03Jean Paul GaultierLe Male Elixir4.5410,388+0.42
04AzzaroThe Most Wanted Parfum4.533,842+0.41
05Jean Paul GaultierLe Beau Le Parfum4.537,651+0.41

Explore the full analysis

Full source code, SQL scripts, and the interactive Power BI report are available via the project links.