Shuang Li
Bellabeat
The goal of this project is to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. The project will discover how these trends apply to Bellabeat customers and how these trends help influence Bellabeat marketing strategy.
Activity | Description |
---|---|
Collect data | Collect public Fitbit fitness tracker data from Kaggle |
Identify trends | Identify trends in smart device usage |
Visualize findings | visualize key findings in trends |
Create marketing recommendations | Create marketing strategy recommendations based on these trends |
Deliver final report | Deliver final report and recommendations to key stakeholders |
Milestone | Expected Completion Date | Description/Details |
---|---|---|
Data Review | 2025-03-25 | Review of all data |
Data Cleaning and Analysis | 2025-03-27 | Initial data analysis completed |
Identify Trends | 2025-03-28 | Top trends identified |
Create Visualization | 2025-03-29 | Visualization created |
Make Tailored Recommendation | 2025-03-30 | List of marketing strategy recommendations |
Final Report | 2025-03-31 | Final report detailing all work |
2025-03-31
Download the FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius).
Import the dataset into RStudio and rename identical dataset names by adding the month in which the data was collected.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
march_dailyActivity_merged <- read.csv("mturkfitbit_export_march/dailyActivity_merged.csv")
march_heartrate_seconds_merged <- read.csv("mturkfitbit_export_march/heartrate_seconds_merged.csv")
march_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_march/minuteIntensitiesNarrow_merged.csv")
march_minuteSleep_merged <- read.csv("mturkfitbit_export_march/minuteSleep_merged.csv")
march_weightLogInfo_merged <- read.csv("mturkfitbit_export_march/weightLogInfo_merged.csv")
april_dailyActivity_merged <- read.csv("mturkfitbit_export_april/dailyActivity_merged.csv")
april_heartrate_seconds_merged <- read.csv("mturkfitbit_export_april/heartrate_seconds_merged.csv")
april_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_april/minuteIntensitiesNarrow_merged.csv")
april_minuteSleep_merged <- read.csv("mturkfitbit_export_april/minuteSleep_merged.csv")
april_weightLogInfo_merged <- read.csv("mturkfitbit_export_april/weightLogInfo_merged.csv")
march_activity_record <- march_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(march_activity_record)
## Rows: 35
## Columns: 2
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 8, 10, 12, 32, …
april_activity_record <- april_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(april_activity_record)
## Rows: 33
## Columns: 2
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4, …
matching_count <- march_activity_record %>%
inner_join(april_activity_record, by = "Id") %>%
nrow()
print(matching_count)
## [1] 33
joined_activity_days <- march_activity_record %>% inner_join(april_activity_record, by = "Id")
glimpse(joined_activity_days)
## Rows: 33
## Columns: 3
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…
average_usage <- joined_activity_days %>% rowwise() %>% mutate(usage_per_month = mean(c(non_null_count.x,non_null_count.y)))
glimpse(average_usage)
## Rows: 33
## Columns: 4
## Rowwise:
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…
## $ usage_per_month <dbl> 25.0, 25.0, 20.0, 21.5, 21.5, 21.5, 21.5, 21.5, 16.5,…
march_active_minutes <- march_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(march_active_minutes)
## Rows: 35
## Columns: 2
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 660…
april_active_minutes <- april_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(april_active_minutes)
## Rows: 33
## Columns: 2
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 106…
weekly_active_minutes <- march_active_minutes %>% inner_join(april_active_minutes, by = "Id") %>% mutate(weekly_active_minutes = as.integer(round(total_active_minutes.x + total_active_minutes.y)/62*7))
glimpse(weekly_active_minutes)
## Rows: 33
## Columns: 4
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …
weekly_active_minutes <- weekly_active_minutes %>% mutate(fitness_level = case_when(
weekly_active_minutes < 150 ~ "Beginner",
weekly_active_minutes >= 150 & weekly_active_minutes < 300 ~ "Intermediate",
weekly_active_minutes >= 300 ~ "Advanced"
))
glimpse(weekly_active_minutes)
## Rows: 33
## Columns: 5
## $ Id <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …
## $ fitness_level <chr> "Advanced", "Beginner", "Intermediate", "Beginn…
user_fitness_levels <- weekly_active_minutes %>% count(fitness_level)
glimpse(user_fitness_levels)
## Rows: 3
## Columns: 2
## $ fitness_level <chr> "Advanced", "Beginner", "Intermediate"
## $ n <int> 10, 17, 6
Review data collected from different features and make note that core features like Distance, Steps, Intensity Minutes, and Calories are automatically being tracked. Identify other key features that are being frequently used are Heart Rate, Sleep, and Weight Log.
Count number of users used each feature in each month.
n_distinct(march_heartrate_seconds_merged$Id)
## [1] 14
n_distinct(march_minuteSleep_merged$Id)
## [1] 23
n_distinct(march_weightLogInfo_merged$Id)
## [1] 11
n_distinct(april_heartrate_seconds_merged$Id)
## [1] 14
n_distinct(april_minuteSleep_merged$Id)
## [1] 24
n_distinct(april_weightLogInfo_merged$Id)
## [1] 8
feature_usage <- tibble(
key_feature = c("Sleep Tracking","Heart Rate Monitor","Weight Log"),
March = c(n_distinct(march_heartrate_seconds_merged$Id), n_distinct(march_minuteSleep_merged$Id), n_distinct(march_weightLogInfo_merged$Id)),
April = c(n_distinct(april_heartrate_seconds_merged$Id), n_distinct(april_minuteSleep_merged$Id), n_distinct(april_weightLogInfo_merged$Id))
) %>%
mutate(
march_percentage = round(March/ 35 * 100, 1),
april_percentage = round(April/ 33 * 100, 1)
)
glimpse(feature_usage)
## Rows: 3
## Columns: 5
## $ key_feature <chr> "Sleep Tracking", "Heart Rate Monitor", "Weight Log"
## $ March <int> 14, 23, 11
## $ April <int> 14, 24, 8
## $ march_percentage <dbl> 40.0, 65.7, 31.4
## $ april_percentage <dbl> 42.4, 72.7, 24.2
n_distinct(march_minuteIntensitiesNarrow_merged$Intensity)
## [1] 4
march_high_intensity_time <- march_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(march_high_intensity_time)
## Rows: 19,098
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
april_high_intensity_time <- april_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(april_high_intensity_time)
## Rows: 19,838
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
march_high_intensity_time <- march_high_intensity_time %>%
mutate(
datetime_parsed = mdy_hms(ActivityMinute), # Parse character to POSIXct
date_only = as.Date(datetime_parsed), # Extract Date
hour_only = hour(datetime_parsed) # Extract Hour (24-hour format)
) %>%
select(-datetime_parsed)
glimpse(march_high_intensity_time)
## Rows: 19,098
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-0…
## $ hour_only <int> 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,…
april_high_intensity_time <- april_high_intensity_time %>%
mutate(
datetime_parsed = mdy_hms(ActivityMinute), # Parse character to POSIXct
date_only = as.Date(datetime_parsed), # Extract Date
hour_only = hour(datetime_parsed) # Extract Hour (24-hour format)
) %>%
select(-datetime_parsed)
glimpse(april_high_intensity_time)
## Rows: 19,838
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-12, 2016-0…
## $ hour_only <int> 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
march_high_intensity_time <- march_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(march_high_intensity_time)
## Rows: 1,348
## Columns: 3
## Groups: Id, date_only, hour_only [1,348]
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12,…
## $ hour_only <int> 10, 11, 12, 14, 15, 16, 10, 11, 18, 19, 23, 9, 23, 9, 12, 20…
april_high_intensity_time <- april_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(april_high_intensity_time)
## Rows: 1,374
## Columns: 3
## Groups: Id, date_only, hour_only [1,374]
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-13, 2016-04-13,…
## $ hour_only <int> 14, 15, 20, 14, 17, 18, 23, 13, 20, 21, 17, 22, 23, 12, 13, …
march_time_count <- march_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(march_time_count)
## Rows: 23
## Columns: 2
## $ hour_only <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ n <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73, …
april_time_count <- april_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(april_time_count)
## Rows: 22
## Columns: 2
## $ hour_only <int> 0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
## $ n <int> 7, 4, 2, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, 63, 1…
total_time_count <- march_time_count %>% full_join(april_time_count, by = "hour_only") %>% mutate(across(everything(), ~replace_na(., 0))) %>% rowwise() %>% mutate(total_count = sum(n.x, n.y))
glimpse(total_time_count)
## Rows: 24
## Columns: 4
## Rowwise:
## $ hour_only <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ n.x <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73…
## $ n.y <int> 7, 4, 0, 0, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, …
## $ total_count <int> 14, 7, 1, 1, 52, 69, 100, 149, 152, 171, 172, 199, 183, 19…
Fitness Group | Bellabeat Approach |
---|---|
Beginner (51.5%) | Focus on gentle wellness, cycle-based movement, and habit building. |
Intermediate (18.2%) | Introduce guided workouts + cycle syncing and light goal tracking. |
Advanced (30.3%) | Promote stress tracking, recovery insights, and performance during high-energy cycle phases. |
💡 Campaign Ideas:
“Your Wellness, Synced to Your Cycle” — educate beginners on aligning habits with menstrual phases.
“3 Easy Moves for Luteal Days” — content driven by Ivy+’s cycle and hormone insights.
Heart Rate (72.7%) → Already a strong engagement area
“Your Calm vs. Stress Timeline” → Weekly digest with HR + stress interpretation
“You’re most recovered during your follicular phase — let’s build on that!”
Sleep Tracking (42.4%) → Big opportunity
Tie sleep into hormonal balance and stress recovery
“Quality sleep during the luteal phase reduces PMS symptoms — let’s track it together.”
Weight Log (24.2%) → Reframe it to body awareness rather than weight loss
“Track how your body changes through your cycle — it’s not just about the scale.”
Time Slot | Wellness Strategy |
---|---|
07:00–08:00 | “Gentle Morning Routines” → Push mindfulness content, hydration reminders |
08:00–14:00 | Wellness tracking prompts: “Log your mood + symptoms” |
17:00–19:00 | Energy is high → promote active minutes, walking challenges, yoga |
19:00–21:00 | Push evening rituals: guided meditation, sleep prep, breathing exercises |
23:00–05:00 | Silence pushes, activate “Wind Down” mode messaging |
**Convert underused features into daily rituals:**
💤 Sleep Tracking (42.4%)
“3 nights of quality sleep = 1 full day of hormonal balance”
Offer guided bedtime audio + insight summaries
⚖️ Weight/Body Awareness (24.2%)
Replace “weight log” messaging with “body state” — less focus on pounds, more on hydration, inflammation, and bloat through the cycle.
💗 Heart Rate & Stress
“See how your breathing changed during today’s meeting” → Real-time stress coaching
Daily “wellness readiness” based on HR + sleep
Device Usage (days/week) | Suggested Strategy |
---|---|
3–4 Days | Re-engagement nudges: “You’re 2 days from a self-care streak!” + cycle tips |
4–5 Days | “Wellness Builder” weekly summary + next week’s focus (based on cycle phase) |
5–6 Days | Push deeper features: “Let’s add meditation to your strong routine” |
6–7 Days | VIP messages: “You’re part of our 3% elite 🌟 Here’s early access to…” |
Arash, N. (2018). FitBit Fitness Tracker Data. Kaggle. https://www.kaggle.com/datasets/arashnic/fitbit/data Usability score: 10.0 | License: CC0: Public Domain
World Health Organization. (2020). Physical activity guidelines. https://www.who.int/publications/i/item/9789240015128
OpenAI. (2024). ChatGPT (Mar 2024 version) [Large language model]. https://chat.openai.com/