BellabeatCaseStudy

Bellabeat case study

This case study is about data analysis for Bellabeat company.

About company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women

Questions

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?

Business task

To find the strategy and opportunity for Bellabeat marketing based on data trend from smart devices?

Tools

RStudio

Data source

● FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

Setup

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(tidyr)
library(dplyr)
library(ggplot2)
library(lubridate)

## Loading required package: timechange
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Import data

Import data from CSV files dailyActivity_merged.csv sleepDay_merged.csv dailyCalories_merged.csv heartrate_seconds_merged.csv

activity <- read.csv("dailyActivity_merged.csv")
calories <- read.csv("dailyCalories_merged.csv")
sleep <- read.csv("sleepDay_merged.csv")
heartRate<- read.csv("heartrate_seconds_merged.csv")

Explore and clean data

Overview

head(activity)

##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728

head(calories)

##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728

head(sleep)

##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

head(heartRate)

##           Id                 Time Value
## 1 2022484408 4/12/2016 7:21:00 AM    97
## 2 2022484408 4/12/2016 7:21:05 AM   102
## 3 2022484408 4/12/2016 7:21:10 AM   105
## 4 2022484408 4/12/2016 7:21:20 AM   103
## 5 2022484408 4/12/2016 7:21:25 AM   101
## 6 2022484408 4/12/2016 7:22:05 AM    95

Check sample

n_distinct(activity$Id)

## [1] 33

n_distinct(calories$Id)

## [1] 33

n_distinct(sleep$Id)

## [1] 24

n_distinct(heartRate$Id)

## [1] 14

sleep data came from 24 users, however it is enough to analyze. However, heart rate data is not enough to analyze due to 14 users.

Find duplicate data

sum(duplicated(activity))

## [1] 0

sum(duplicated(calories))

## [1] 0

sum(duplicated(sleep))

## [1] 3

Remove duplicate data and NA, Then recheck sleep

activity <- activity %>%
  distinct() %>%
  drop_na()
calories <- calories %>%
  distinct() %>%
  drop_na()
sleep <- sleep %>%
  distinct() %>%
  drop_na()

sum(duplicated(sleep))

## [1] 0

There are no duplicate and the data is clean now. * date formating

activity <- activity %>%
  rename(date = ActivityDate) %>%
  mutate(date = as_date(date, format = "%m/%d/%Y"))

sleep <- sleep %>%
  rename(date = SleepDay) %>%
  mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

## Warning: `tz` argument is ignored by `as_date()`

Summarise data

Average step per day

activity %>%
  select(TotalSteps) %>%
  summary()

##    TotalSteps   
##  Min.   :    0  
##  1st Qu.: 3790  
##  Median : 7406  
##  Mean   : 7638  
##  3rd Qu.:10727  
##  Max.   :36019

calories %>%
  select(Calories) %>%
  summary()

##     Calories   
##  Min.   :   0  
##  1st Qu.:1828  
##  Median :2134  
##  Mean   :2304  
##  3rd Qu.:2793  
##  Max.   :4900

sleep %>%
  select(TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()

##  TotalMinutesAsleep TotalTimeInBed 
##  Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:361.0      1st Qu.:403.8  
##  Median :432.5      Median :463.0  
##  Mean   :419.2      Mean   :458.5  
##  3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :796.0      Max.   :961.0

From summary

According to summary above, average steps is 7638 that fall into ‘Fairy active’. from https://www.10000steps.org.au/articles/counting-steps/ * the sample group is about Lightly active - Fairly active * The average calories is 2304 that a bit higher than woman standard. The demographic of this sample can be both gender.

Analyze and Share

Using average data to analyze by creating an avg_data

step_sleep <- merge(activity,sleep, by = c("Id","date"))
glimpse(step_sleep)

## Rows: 410
## Columns: 18
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ date                     <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ TotalSteps               <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes      <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes     <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes         <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories                 <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
## $ TotalSleepRecords        <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep       <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed           <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …

Average steps and sleep

step_sleep %>%
  group_by(Id) %>%
  summarise(avg_step = mean(TotalSteps)) %>%
  ggplot() +
  geom_col(mapping= aes(Id, avg_step, fill = avg_step))

step_sleep %>%
  group_by(Id) %>%
  summarise(avg_sleep = mean(TotalMinutesAsleep)/60) %>%
  ggplot() +
  geom_col(mapping= aes(Id, avg_sleep, fill = avg_sleep))

* Relation step VS cal

step_sleep %>%
  group_by(Id) %>%
   ggplot() +
  geom_point(mapping= aes(x = TotalSteps, y = Calories)) +
  geom_smooth(mapping= aes(x = TotalSteps, y = Calories)) +
  labs(title="Steps vS Calories")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

step_sleep %>%
  group_by(Id) %>%
   ggplot() +
  geom_point(mapping= aes(x = Calories, y = TotalMinutesAsleep)) +
  geom_smooth(mapping= aes(x = Calories, y = TotalMinutesAsleep)) +
  labs(title="Calories VS sleep")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

step_sleep %>%
  group_by(Id) %>%
   ggplot() +
  geom_point(mapping= aes(x = TotalSteps, y = TotalMinutesAsleep)) +
  geom_smooth(mapping= aes(x = TotalSteps, y = TotalMinutesAsleep)) +
  labs(title="Steps VS sleep")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

According to the graph above * Clearly see the positive relationship between Steps and Calories. ‘This is a must have feature of the Bellabeat app to track and report user about their activity.’ * Unclear relationship between Calories and Sleep time. * See the negative relationship between Steps and Sleep time. ‘If the bellabeat app want to help user mange the sleep time, steps is one factor to consider.’

User group summary

avg_user_group <- step_sleep %>%
  group_by(Id) %>%
  summarise(avg_totalSteps = mean(TotalSteps),avg_cal = mean(Calories), avg_totalSleepTime = mean(TotalMinutesAsleep)) %>%
  mutate(user_activeType = case_when(
    avg_totalSteps < 7500 ~ "Lightly active",
    avg_totalSteps >= 7500 & avg_totalSteps < 10000 ~ "Fairly active",
    avg_totalSteps >= 10000 ~ "Very active"
  ))

avg_user_group %>%
  ggplot(aes(x="", y=user_activeType, fill=user_activeType)) +
  geom_bar(stat="identity") +
  coord_polar("y", start=0) +
  theme_void()

The majority of users tend to be lightly active. ‘The Bellabeat should focus on this user type from product design to user experience.’ For an example, design product more casual and blend to everyday usage.

Summary

According to the data analysis, it shown that lightly activity is the major group of users. The company should focus on this target group first. The steps and sleep data are important to provide the information to the users. Hear rate data is not be used to consider in this study because there is only 14 samples (half of the entire samples). Hence, the Bellabeat app and product should be designed to satisfy user with step and sleep tracker feature as a minimum requirement.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

BellabeatCaseStudy

Witcha

2022-12-29