Bellabeat Case Study Report

Scope of Work

Data Analyst:

Shuang Li

Client:

Bellabeat

Purpose:

The goal of this project is to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. The project will discover how these trends apply to Bellabeat customers and how these trends help influence Bellabeat marketing strategy.

Scope / Major Project Activities:

Activity	Description
Collect data	Collect public Fitbit fitness tracker data from Kaggle
Identify trends	Identify trends in smart device usage
Visualize findings	visualize key findings in trends
Create marketing recommendations	Create marketing strategy recommendations based on these trends
Deliver final report	Deliver final report and recommendations to key stakeholders

Deliverables:

A clear summary of the business task
A description of all data sources used
Documentation of any cleaning or manipulation of data
A summary of the analysis
Supporting visualizations and key findings
Top high-level content recommendations based on the analysis

This project does not include:

Collecting and/or analyzing user demographics data;
Collecting and/or analyzing workout type and session duration data;
Projecting seasonal changes;
Implementing any solutions or recommendations.

Schedule Overview / Major Milestones:

Milestone	Expected Completion Date	Description/Details
Data Review	2025-03-25	Review of all data
Data Cleaning and Analysis	2025-03-27	Initial data analysis completed
Identify Trends	2025-03-28	Top trends identified
Create Visualization	2025-03-29	Visualization created
Make Tailored Recommendation	2025-03-30	List of marketing strategy recommendations
Final Report	2025-03-31	Final report detailing all work

Estimated date for completion:

2025-03-31

Prepare and Understand the Dataset

Download the FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius).
Import the dataset into RStudio and rename identical dataset names by adding the month in which the data was collected.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

march_dailyActivity_merged <- read.csv("mturkfitbit_export_march/dailyActivity_merged.csv")
march_heartrate_seconds_merged <- read.csv("mturkfitbit_export_march/heartrate_seconds_merged.csv")
march_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_march/minuteIntensitiesNarrow_merged.csv")
march_minuteSleep_merged <- read.csv("mturkfitbit_export_march/minuteSleep_merged.csv")
march_weightLogInfo_merged <- read.csv("mturkfitbit_export_march/weightLogInfo_merged.csv")
april_dailyActivity_merged <- read.csv("mturkfitbit_export_april/dailyActivity_merged.csv")
april_heartrate_seconds_merged <- read.csv("mturkfitbit_export_april/heartrate_seconds_merged.csv")
april_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_april/minuteIntensitiesNarrow_merged.csv")
april_minuteSleep_merged <- read.csv("mturkfitbit_export_april/minuteSleep_merged.csv")
april_weightLogInfo_merged <- read.csv("mturkfitbit_export_april/weightLogInfo_merged.csv")

Review the metrics of the data collected, identify its components and limitations in order to determine the approaches to answering the business questions.

data was categoried into daily/hourly/minute records;
metrics presented: Distance, Steps, Calories, Active Minutes, Intensity Level, Heart Rate, Sleep, Weight.

Determine the credibility of the data.

Process and Analyze the Data

➤ Find frequency of device usage

Identify the master sheet from each month and ‘count’ the number of days each user used the fitness tracker per month.

march_activity_record <- march_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(march_activity_record)

## Rows: 35
## Columns: 2
## $ Id             <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 8, 10, 12, 32, …

april_activity_record <- april_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(april_activity_record)

## Rows: 33
## Columns: 2
## $ Id             <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4, …

Crosscheck the user ID in both tables to verify if the sample group from both month are the same.

matching_count <- march_activity_record %>%
inner_join(april_activity_record, by = "Id") %>%
nrow()
print(matching_count)

## [1] 33

Inner join the two tables to get a dataset of frequency of device usage for future analysis.

joined_activity_days <- march_activity_record %>% inner_join(april_activity_record, by = "Id")
glimpse(joined_activity_days)

## Rows: 33
## Columns: 3
## $ Id               <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…

Find average usage per month.

average_usage <- joined_activity_days %>% rowwise() %>% mutate(usage_per_month = mean(c(non_null_count.x,non_null_count.y)))
glimpse(average_usage)

## Rows: 33
## Columns: 4
## Rowwise: 
## $ Id               <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…
## $ usage_per_month  <dbl> 25.0, 25.0, 20.0, 21.5, 21.5, 21.5, 21.5, 21.5, 16.5,…

➤ Identify core user groups by fitness level

In the same two master sheets, group users by ID and add FairlyaActiveMinutes and 2x VeryActiveMinutes to find the totoal active minutes per user per month.

march_active_minutes <- march_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(march_active_minutes)

## Rows: 35
## Columns: 2
## $ Id                   <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 660…

april_active_minutes <- april_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(april_active_minutes)

## Rows: 33
## Columns: 2
## $ Id                   <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 106…

Find out the weekly average active minutes per user.

weekly_active_minutes <- march_active_minutes %>% inner_join(april_active_minutes, by = "Id") %>% mutate(weekly_active_minutes = as.integer(round(total_active_minutes.x + total_active_minutes.y)/62*7))
glimpse(weekly_active_minutes)

## Rows: 33
## Columns: 4
## $ Id                     <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes  <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …

Categorize user fitness level into Beginner, Intermediate, and Advanced

weekly_active_minutes <- weekly_active_minutes %>% mutate(fitness_level = case_when(
      weekly_active_minutes < 150 ~ "Beginner",
      weekly_active_minutes >= 150 & weekly_active_minutes < 300 ~ "Intermediate",
      weekly_active_minutes >= 300 ~ "Advanced"
    ))
glimpse(weekly_active_minutes)

## Rows: 33
## Columns: 5
## $ Id                     <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes  <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …
## $ fitness_level          <chr> "Advanced", "Beginner", "Intermediate", "Beginn…

Count number of users in each category

user_fitness_levels <- weekly_active_minutes %>% count(fitness_level)
glimpse(user_fitness_levels)

## Rows: 3
## Columns: 2
## $ fitness_level <chr> "Advanced", "Beginner", "Intermediate"
## $ n             <int> 10, 17, 6

➤ Key features usage

Review data collected from different features and make note that core features like Distance, Steps, Intensity Minutes, and Calories are automatically being tracked. Identify other key features that are being frequently used are Heart Rate, Sleep, and Weight Log.
Count number of users used each feature in each month.

n_distinct(march_heartrate_seconds_merged$Id)

## [1] 14

n_distinct(march_minuteSleep_merged$Id)

## [1] 23

n_distinct(march_weightLogInfo_merged$Id)

## [1] 11

n_distinct(april_heartrate_seconds_merged$Id)

## [1] 14

n_distinct(april_minuteSleep_merged$Id)

## [1] 24

n_distinct(april_weightLogInfo_merged$Id)

## [1] 8

feature_usage <- tibble(
key_feature = c("Sleep Tracking","Heart Rate Monitor","Weight Log"),
March = c(n_distinct(march_heartrate_seconds_merged$Id), n_distinct(march_minuteSleep_merged$Id), n_distinct(march_weightLogInfo_merged$Id)),
April = c(n_distinct(april_heartrate_seconds_merged$Id), n_distinct(april_minuteSleep_merged$Id), n_distinct(april_weightLogInfo_merged$Id))
) %>% 
  mutate(
    march_percentage = round(March/ 35 * 100, 1),
    april_percentage = round(April/ 33 * 100, 1)
      )
glimpse(feature_usage)

## Rows: 3
## Columns: 5
## $ key_feature      <chr> "Sleep Tracking", "Heart Rate Monitor", "Weight Log"
## $ March            <int> 14, 23, 11
## $ April            <int> 14, 24, 8
## $ march_percentage <dbl> 40.0, 65.7, 31.4
## $ april_percentage <dbl> 42.4, 72.7, 24.2

➤ Find time of day users are more active

Figure out how the intensity is classified.

n_distinct(march_minuteIntensitiesNarrow_merged$Intensity)

## [1] 4

Found out that the intensity is being rated from 0 to 3, in which 0 being sedentary and 3 being vigorous. Therefore, identify the time of day when 3 appeared.

march_high_intensity_time <- march_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(march_high_intensity_time)

## Rows: 19,098
## Columns: 3
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…

april_high_intensity_time <- april_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(april_high_intensity_time)

## Rows: 19,838
## Columns: 3
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…

Split the datetime and only keep the hour.

march_high_intensity_time <- march_high_intensity_time %>% 
  mutate(
    datetime_parsed = mdy_hms(ActivityMinute),    # Parse character to POSIXct
    date_only = as.Date(datetime_parsed),         # Extract Date
    hour_only = hour(datetime_parsed)             # Extract Hour (24-hour format)
   ) %>%
select(-datetime_parsed)
glimpse(march_high_intensity_time)

## Rows: 19,098
## Columns: 5
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only      <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-0…
## $ hour_only      <int> 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,…

april_high_intensity_time <- april_high_intensity_time %>% 
  mutate(
    datetime_parsed = mdy_hms(ActivityMinute),    # Parse character to POSIXct
    date_only = as.Date(datetime_parsed),         # Extract Date
    hour_only = hour(datetime_parsed)             # Extract Hour (24-hour format)
   ) %>%
select(-datetime_parsed)
glimpse(april_high_intensity_time)

## Rows: 19,838
## Columns: 5
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only      <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-12, 2016-0…
## $ hour_only      <int> 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15,…

Remove duplicate records (same user, same date and hour)

march_high_intensity_time <- march_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(march_high_intensity_time)

## Rows: 1,348
## Columns: 3
## Groups: Id, date_only, hour_only [1,348]
## $ Id        <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12,…
## $ hour_only <int> 10, 11, 12, 14, 15, 16, 10, 11, 18, 19, 23, 9, 23, 9, 12, 20…

april_high_intensity_time <- april_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(april_high_intensity_time)

## Rows: 1,374
## Columns: 3
## Groups: Id, date_only, hour_only [1,374]
## $ Id        <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-13, 2016-04-13,…
## $ hour_only <int> 14, 15, 20, 14, 17, 18, 23, 13, 20, 21, 17, 22, 23, 12, 13, …

Count number of each hour appeared

march_time_count <- march_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(march_time_count)

## Rows: 23
## Columns: 2
## $ hour_only <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ n         <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73, …

april_time_count <- april_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(april_time_count)

## Rows: 22
## Columns: 2
## $ hour_only <int> 0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
## $ n         <int> 7, 4, 2, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, 63, 1…

Full join the two tables and add counts together

total_time_count <- march_time_count %>% full_join(april_time_count, by = "hour_only") %>% mutate(across(everything(), ~replace_na(., 0))) %>% rowwise() %>% mutate(total_count = sum(n.x, n.y))
glimpse(total_time_count)

## Rows: 24
## Columns: 4
## Rowwise: 
## $ hour_only   <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ n.x         <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73…
## $ n.y         <int> 7, 4, 0, 0, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, …
## $ total_count <int> 14, 7, 1, 1, 52, 69, 100, 149, 152, 171, 172, 199, 183, 19…

Visualize and Share Findings

Trend 1: User Segmentation

🔍 Discover user fitness levels and identify core user groups.

The active minutes requirements for different fitness levels (Beginner, Intermediate, and Advanced) are based on guidelines from WHO, CDC, and ACSM (American College of Sports Medicine), as well as fitness tracker classifications.

# Add percentage and label
user_fitness_levels <- user_fitness_levels %>%
  mutate(
    Category = factor(fitness_level, levels = c("Advanced", "Intermediate", "Beginner")), 
    Percent = n / sum(n) * 100,
    Label = paste0(fitness_level, "\n", round(Percent, 1), "%")
  ) %>%
  arrange(Category)

# Custom color mapping
custom_colors <- c(
  "Beginner" = "#00A5E3",
  "Intermediate" = "#8DD7BF",
  "Advanced" = "#FF828B"
)

# Pie chart with proper color assignment
ggplot(user_fitness_levels, aes(x = "", y = n, fill = Category)) +
  geom_bar(stat = "identity", width = 1, color = "grey40") +  
  coord_polar(theta = "y") +
  geom_text(aes(label = Label), fontface = "bold", position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = custom_colors) +  
  labs(title = "User Fitness Levels", fill = "Fitness Level") +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16))

Trend 2: Device Usage Patterns

🔍 Discover frequency of device usage per week

# Categorize into usage buckets
average_usage <- average_usage %>%
  mutate(usage_category = case_when(
    usage_per_month >= 0  & usage_per_month < 5  ~ "0-1",
    usage_per_month >= 5  & usage_per_month < 9  ~ "1–2",
    usage_per_month >= 9  & usage_per_month < 13 ~ "2–3",
    usage_per_month >= 13 & usage_per_month < 17 ~ "3–4",
    usage_per_month >= 17 & usage_per_month < 21 ~ "4–5",
    usage_per_month >= 21 & usage_per_month < 27 ~ "5–6",
    usage_per_month >= 27 & usage_per_month < 32 ~ "6–7",
    TRUE ~ NA_character_  # fallback for unexpected values
  ))

# Factor for correct order in plotting
average_usage$usage_category <- factor(average_usage$usage_category, levels = c(
  "0-1",
  "1–2",
  "2–3",
  "3–4",
  "4–5",
  "5–6",
  "6–7"
))
usage_summary <- average_usage %>% 
  count(usage_category) %>%
  complete(usage_category, fill = list(n = 0)) %>% 
  mutate(percent = n / sum(n) * 100)

# Create Histogram-Style Bar Chart for Categorized Usage
ggplot(usage_summary, aes(x = usage_category, y = n)) +
  geom_col(fill = "#FF828B", color = "white", width = 0.5) +
  geom_text(aes(label = paste0(round(percent, 1), "%")), 
            vjust = -0.5) +
  labs(
    title = "Frequency of Device Usage",
    x = "Average Days Used per Week",
    y = "Number of Users"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    axis.text.x = element_text(hjust = 1, face = "bold",margin = margin(b = 15)),
    axis.text.y = element_text(hjust = 1, face = "bold", margin = margin(b = 15))
  )

Trend 3: Key Features Usage

🔍 Discover percentage of key features used

# Actual user totals per month (manually defined)
user_totals <- tibble(
  Month = c("March", "April"),
  total_users = c(35, 33)
)
# Reshape for plotting
feature_usage_long <- feature_usage %>%
  pivot_longer(cols = c(March, April), names_to = "Month", values_to = "count") %>%
  left_join(user_totals, by = "Month") %>%
  mutate(
    Month = factor(Month, levels = c("March", "April")),
    percent = count / total_users * 100,
    label = paste0(round(percent, 1), "%")
  )
feature_usage_long <- feature_usage_long %>%
  group_by(key_feature) %>%
  mutate(total = sum(count)) %>%
  ungroup() %>%
  mutate(key_feature = fct_reorder(key_feature, total))

# Custom color mapping
custom_colors <- c(
  "March" = "#FF828B",
  "April" = "#00A5E3"
)
bar_width <- 0.25  # make bars thinner for spacing
dodge_width <- 0.6  # slightly more than bar width

ggplot(feature_usage_long, aes(x = count, y = key_feature, fill = Month)) +
  geom_col(
    position = position_dodge2(width = dodge_width, reverse = TRUE),
    width = bar_width
  ) +
  geom_text(
    aes(label = label),
    position = position_dodge2(width = dodge_width, reverse = TRUE),
    hjust = -0.1,
  ) +
  labs(
    title = "Feature Usage by Month",
    x = "User Count",
    y = "Key Feature",
    fill = "Month"
  ) +
  scale_fill_manual(values = custom_colors) +  
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    axis.text.x = element_text(hjust = 1, face = "bold"),
    axis.text.y = element_text(hjust = 1, face = "bold")
  ) +
  xlim(0, max(feature_usage_long$count) * 1.3)

Trend 4: Device Engagement Patterns

🔍 Discover time of day users engage the most

# Define tiers as horizontal bands (ymin to ymax)
tier_bands <- tibble(
  tier = factor(c("Peak", "Great", "Good", "Average", "Low"),
                levels = c("Peak", "Great", "Good", "Average", "Low")),
  ymin = c(200, 150, 100, 50, 0),
  ymax = c(Inf, 199.9, 149.9, 99.9, 49.9)
)

# Assign colors to each tier
tier_colors <- c(
  "Peak"    = "#2C7FB8",  # Deep blue
  "Great"   = "#41B6C4",  # Teal
  "Good"    = "#7FCDBB",  # Soft mint
  "Average" = "#C7E9B4",  # Pale green
  "Low"     = "#EDF8B1"   # Very light green-yellow
)

peak_point <- total_time_count %>% ungroup() %>% filter(total_count == max(total_count))

# Plot
ggplot() +
  # Tier background bands
  geom_rect(
    data = tier_bands,
    aes(ymin = ymin, ymax = ymax, xmin = -Inf, xmax = Inf, fill = tier),
    alpha = 0.3
  ) +
  # Area + line on top
  geom_area(data = total_time_count, aes(x = hour_only, y = total_count), fill = "#FF828B", alpha = 0.4) +
  geom_line(data = total_time_count, aes(x = hour_only, y = total_count), color = "#FF828B", size = 1.2) +
  
  scale_x_continuous(breaks = 0:23, expand = c(0, 0)) +
  scale_fill_manual(values = tier_colors, name = "Rating") +
  labs(
    title = "User Engagement by Hour of Day",
    x = "Hour of Day",
    y = "Engagement Count"
  ) +
  geom_text(
  data = peak_point,
  mapping = aes(
    x = hour_only + 0.3,  # 👈 shifts label slightly to the right
    y = total_count,
    label = paste0("Peak: ", hour_only, ":00")
  ),
  hjust = 0,  # left-align the text
  vjust = 0.5,
  fontface = "bold",
  color = "#2C7FB8",
  size = 3,
  inherit.aes = FALSE
) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    legend.position = "right",
    axis.text.x = element_text(hjust = 1, margin = margin(b = 10)),
    axis.text.y = element_text(hjust = 1, margin = margin(b = 10))
  )

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Marketing Strategy Recommendations for Bellabeat Ivy+

1. Segment by Fitness Level — With a Wellness Twist

Fitness Group	Bellabeat Approach
Beginner (51.5%)	Focus on gentle wellness, cycle-based movement, and habit building.
Intermediate (18.2%)	Introduce guided workouts + cycle syncing and light goal tracking.
Advanced (30.3%)	Promote stress tracking, recovery insights, and performance during high-energy cycle phases.

💡 Campaign Ideas:

“Your Wellness, Synced to Your Cycle” — educate beginners on aligning habits with menstrual phases.
“3 Easy Moves for Luteal Days” — content driven by Ivy+’s cycle and hormone insights.

2. Lean Into Health Features — Heart Rate, Sleep, Stress, Cycle

Heart Rate (72.7%) → Already a strong engagement area

“Your Calm vs. Stress Timeline” → Weekly digest with HR + stress interpretation

“You’re most recovered during your follicular phase — let’s build on that!”
Sleep Tracking (42.4%) → Big opportunity

Tie sleep into hormonal balance and stress recovery

“Quality sleep during the luteal phase reduces PMS symptoms — let’s track it together.”
Weight Log (24.2%) → Reframe it to body awareness rather than weight loss

“Track how your body changes through your cycle — it’s not just about the scale.”

3. Time-of-Day Engagement: Pair Wellness with Routine

Time Slot	Wellness Strategy
07:00–08:00	“Gentle Morning Routines” → Push mindfulness content, hydration reminders
08:00–14:00	Wellness tracking prompts: “Log your mood + symptoms”
17:00–19:00	Energy is high → promote active minutes, walking challenges, yoga
19:00–21:00	Push evening rituals: guided meditation, sleep prep, breathing exercises
23:00–05:00	Silence pushes, activate “Wind Down” mode messaging

4. Feature-Focused Campaigns

**Convert underused features into daily rituals:**

💤 Sleep Tracking (42.4%)

“3 nights of quality sleep = 1 full day of hormonal balance”

Offer guided bedtime audio + insight summaries
⚖️ Weight/Body Awareness (24.2%)

Replace “weight log” messaging with “body state” — less focus on pounds, more on hydration, inflammation, and bloat through the cycle.
💗 Heart Rate & Stress

“See how your breathing changed during today’s meeting” → Real-time stress coaching

Daily “wellness readiness” based on HR + sleep

5. Personalized Wellness Routines Based on Engagement

Device Usage (days/week)	Suggested Strategy
3–4 Days	Re-engagement nudges: “You’re 2 days from a self-care streak!” + cycle tips
4–5 Days	“Wellness Builder” weekly summary + next week’s focus (based on cycle phase)
5–6 Days	Push deeper features: “Let’s add meditation to your strong routine”
6–7 Days	VIP messages: “You’re part of our 3% elite 🌟 Here’s early access to…”

Appendix & References

Appendix Notes

Variable Definitions

Active Minutes: Calculated as fairly active minutes + 2 × very active minutes, based on fitness guidelines that equate 1 minute of vigorous activity to 2 minutes of moderate activity.

Data Cleaning & Preprocessing

Sample Size: The March dataset had 35 users, but only 33 remained for April. Analyses (except key feature usage) excluded the 2 users with missing April data.
Key Feature Usage: Based on the full March and April datasets.

Model Assumptions

Missing Days: Days with no data were assumed to be inactive (0 active minutes), based on the likelihood that users wear the tracker when they work out.
Engagement Time: Assumed to align with physical activity, as users typically interact with the device before or after workouts.

Limitations

No demographic data (age, gender, location, etc.).
Limited time range (March and April only).
No data on workout type or session duration.

References

Arash, N. (2018). FitBit Fitness Tracker Data. Kaggle. https://www.kaggle.com/datasets/arashnic/fitbit/data Usability score: 10.0 | License: CC0: Public Domain
World Health Organization. (2020). Physical activity guidelines. https://www.who.int/publications/i/item/9789240015128
OpenAI. (2024). ChatGPT (Mar 2024 version) [Large language model]. https://chat.openai.com/