Crowdly

Add to Chrome

ETC1010 - ETC5510 - Introduction to data analysis - S1 2025

Looking for ETC1010 - ETC5510 - Introduction to data analysis - S1 2025 test answers and solutions? Browse our comprehensive collection of verified answers for ETC1010 - ETC5510 - Introduction to data analysis - S1 2025 at learning.monash.edu.

Get instant access to accurate answers and detailed explanations for your course questions. Our community-driven platform helps students succeed!

This question is about visualising temporal data.

The example data is on pedestrian counts in the city of Melbourne. The below plot looks at distribution of the pedestrian counts over weekdays in March across 24hrs, comparing 2019 to 2020.

ped %>% 
  ggplot(aes(x=Time, y=Count, group=Date, colour=as.factor(year))) +
    geom_boxplot() +
    facet_wrap(~ year, ncol= 1, scales = "free") + 
    scale_colour_brewer("", palette="Dark2") + 
    theme(legend.position="bottom", legend.title = element_blank())

Image failed to load

By looking at the above plots, select all statements that are TRUE.

It would be easier to compare the plots if the y-scales for both years would be the same.

✅

The median values of pedestrian counts over weekdays at 3am are similar between 2019 and 2020

✅

The variability in the pedestrian counts for different hours in 2019 and 2020 is similar

❌

The breaks in the x-axis are adequate

✅

View this question

The following question is about wrangling data. Here is the table flights from the nycflights13 package that we have wrangled previously in class.

glimpse(flights)

## Rows: 336,776
## Columns: 19
## $ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,…
## $ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558…
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600…
## $ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, …
## $ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 84…
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 85…
## $ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, …
## $ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6",…
## $ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301,…
## $ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N3…
## $ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA…
## $ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD…
## $ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149,…
## $ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733…
## $ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6,…
## $ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59,…
## $ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01…

For the following questions, write down the verbs and columns that you would need to use to do the calculations to answer it from the flights table. We provide the code structure and a list of possible verbs for you to select from:

What hour of day should you plan to fly if you want to avoid arrival delays as much as possible?

flights %>% 
  # step 1
  ___(hour) %>%
  # step 2
  ___(avg_delay = mean(___, na.rm = TRUE)) %>% 
  # step 3
  ___(avg_delay)

The verb for the first step is group_by()
The verb for the first step is mutate()
The verb for the first step is count()

The verb for the second step is mutate()
The verb for the second step is summarise()

The appropriate column to compute the mean from is arr_delay
The appropriate column to compute the mean from is dep_delay

The verb for the third step is sort()
The verb for the third step is arrange()

View this question

This question is about working with temporal data. The example data is on pedestrian counts in the city of Melbourne. What time periods of Melbourne pedestrian traffic are extracted by the code below?

Select all answers that you think are correct. Incorrect answers are penalised.

library(lubridate)
library(rwalkr)
ped_2020 <- melb_walk(from=Sys.Date() - 7L)
ped_2019 <- melb_walk(from=Sys.Date() - 30L - years(1), to=Sys.Date() - years(1))

(today - 1 year and 30days)

❌

(one year ago) through to (today - 30days)

❌

(today - 30days)

❌

(1 year ago) through to (today - 1 year and 30days

✅

(today - 7 days)

❌

(today) through (today - 7 days)

✅

View this question

This question is about data visualisation. Below are two plots of the Melbourne Central pedestrian traffic, for 2019.

Image failed to load

Answer the following questions:

1. In plot A, which variable is mapped to the x axis? (Put the answer in lower case).

1. Is this the same information used in plot B?

1. Which plot did this code produce?

ggplot(walk_tidy, aes(x = Time, y = Count, group = Date)) + 
  geom_line(alpha=0.3)

TRUE
FALSE

View this question

The following question is about visualisation.

The data shows calories of a selection of chocolate bars, 100g equivalents. Which of the following statements are true?

Image failed to load

The median calories of the “Dark” and “Milk” chocolates are similar for the samples in this data set.

The “Milk” chocolates in this data seem to have more calories

The “Dark” chocolates in this data seem to have higher calories

100%

In this data set there are more chocolate samples in the “Dark” group

View this question

The following question is about visualisation.

The data shows calories of a selection of chocolate bars, 100g equivalents. Calories mapped to the vertical axis. If you are wanting the reader to compare the inter quantile range of calories of milk and dark chocolates, which part of the plot do you need to observe?

Image failed to load

Compare the size of the boxplots in a common scale on the y axis

100%

Compare the size of the boxplots whiskers in a common scale on the y axis

Compare the size of the boxplots in a common scale on the x axis

Compare the size of the boxplots whiskers and whether there are any unusual observations

View this question

This question is about tidy temporal data. Below are the data on military expenditure for different countries

head(expenditure)

## # A tibble: 6 × 4
##   Entity      Code   Year military_expenditure
##   <chr>       <chr> <dbl>                <dbl>
## 1 Afghanistan AFG    1970              5373185
## 2 Afghanistan AFG    1973              6230685
## 3 Afghanistan AFG    1974              6056124
## 4 Afghanistan AFG    1975              6357396
## 5 Afghanistan AFG    1976              8108200
## 6 Afghanistan AFG    1977              8553767

To compare expenditure counts of Australia and Germany between 2015 and 2020, we are going to select the observations for those two countries and that time range. Then we want change the data format so that the expenditure for Australia and the expenditure for Germany across different years appears in two different columns.

Fill in the blanks for the following code to get our desired output:

exp2 <- expenditure %>% 
  # (a) which pivot function
  # (b) which id_cols
  # (c) which column forms names_from
  # (d) which colum forms values_from
  dplyr::---(Entity %in% c("Australia", "Germany",
                Year >= 2015) %>%
  pivot_---(id_cols = ---, 
              names_from = ---,
              values_from = ---)

select
filter
military_expenditure
Germany

longer()
wider()
year

Military_expenditure
Time
Year

Country
Entity
Australia

View this question

The following question is about tidy data. The table below contains looks at crime occurrence in different locations across Victoria:

entry_point	lga	crime_type	count
FRONT DOOR	Monash	arson	70
FRONT DOOR	Alpine	arson	70
WINDOW	Monash	burglary	30
WINDOW	Alpine	burglary	45
ROOF	Monash	burglary	15
ROOF	Alpine	burglary	10

What proportion of the crimes were recorded in the Monash LGA?

Incorrect answers are penalised.

View this question

The following question is about tidy data. The table below contains looks at crime occurrence in different locations across Victoria:

entry_point	X1	crime_type	count
FRONT DOOR	Monash	NA	70
FRONT DOOR	NA	NA	70
WINDOW	Monash	burglary	30
WINDOW	NA	burglary	NA
NA	Monash	NA	NA
NA	NA	NA	NA

What is the proportion of crimes that entered through the front door?

Hint: If a data set contains "NA" values, it means the entries are missing. You can ignore these missing values here.

View this question

The following question is about tidy data. The table below contains looks at crime data in different locations across Victoria:

entry_point	X1	crime_type	count
FRONT DOOR	Monash	NA	70
FRONT DOOR	NA	NA	70
WINDOW	Monash	burglary	30
WINDOW	NA	burglary	NA
NA	Monash	NA	NA
NA	NA	burglary	NA

As usually we need to first inspect the variables and observations in this data set. What is the dimension of the data set?

This data set has 3 variables and 6 observations. The dimension is 4 x 6

❌

This data set has 4 variables and 6 observations. The dimension is 4 x 6

❌

This data set has 3 variables and 5 observations. The dimension is 5 x 3

❌

This data set has 4 variables and 6 observations. The dimension is 6 x 4

✅

View this question

Want instant access to all verified answers on learning.monash.edu?

Get Unlimited Answers To Exam Questions - Install Crowdly Extension Now!

Add to Chrome

Telegram Instagram TikTok Question Bank