This case study is about data analysis for Bellabeat company.
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidyr)
library(dplyr)
library(ggplot2)
library(lubridate)
## Loading required package: timechange
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
activity <- read.csv("dailyActivity_merged.csv")
calories <- read.csv("dailyCalories_merged.csv")
sleep <- read.csv("sleepDay_merged.csv")
heartRate<- read.csv("heartrate_seconds_merged.csv")
head(activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(heartRate)
## Id Time Value
## 1 2022484408 4/12/2016 7:21:00 AM 97
## 2 2022484408 4/12/2016 7:21:05 AM 102
## 3 2022484408 4/12/2016 7:21:10 AM 105
## 4 2022484408 4/12/2016 7:21:20 AM 103
## 5 2022484408 4/12/2016 7:21:25 AM 101
## 6 2022484408 4/12/2016 7:22:05 AM 95
n_distinct(activity$Id)
## [1] 33
n_distinct(calories$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(heartRate$Id)
## [1] 14
sleep data came from 24 users, however it is enough to analyze. However, heart rate data is not enough to analyze due to 14 users.
sum(duplicated(activity))
## [1] 0
sum(duplicated(calories))
## [1] 0
sum(duplicated(sleep))
## [1] 3
activity <- activity %>%
distinct() %>%
drop_na()
calories <- calories %>%
distinct() %>%
drop_na()
sleep <- sleep %>%
distinct() %>%
drop_na()
sum(duplicated(sleep))
## [1] 0
There are no duplicate and the data is clean now. * date formating
activity <- activity %>%
rename(date = ActivityDate) %>%
mutate(date = as_date(date, format = "%m/%d/%Y"))
sleep <- sleep %>%
rename(date = SleepDay) %>%
mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
## Warning: `tz` argument is ignored by `as_date()`
activity %>%
select(TotalSteps) %>%
summary()
## TotalSteps
## Min. : 0
## 1st Qu.: 3790
## Median : 7406
## Mean : 7638
## 3rd Qu.:10727
## Max. :36019
calories %>%
select(Calories) %>%
summary()
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
sleep %>%
select(TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalMinutesAsleep TotalTimeInBed
## Min. : 58.0 Min. : 61.0
## 1st Qu.:361.0 1st Qu.:403.8
## Median :432.5 Median :463.0
## Mean :419.2 Mean :458.5
## 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :796.0 Max. :961.0
According to summary above, average steps is 7638 that fall into ‘Fairy active’. from https://www.10000steps.org.au/articles/counting-steps/ * the sample group is about Lightly active - Fairly active * The average calories is 2304 that a bit higher than woman standard. The demographic of this sample can be both gender.
According to the data analysis, it shown that lightly activity is the major group of users. The company should focus on this target group first. The steps and sleep data are important to provide the information to the users. Hear rate data is not be used to consider in this study because there is only 14 samples (half of the entire samples). Hence, the Bellabeat app and product should be designed to satisfy user with step and sleep tracker feature as a minimum requirement.
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.