All work to be completed

Unformatted Attachment Preview

Don't use plagiarized sources. Get Your Custom Essay on
Just from \$10/Page

PART 1
PART 2
1.
To investigate the effect of outlier on the correlation coefficient, r.
In order to see if the drink sales from a kiosk might be predicted from daily temperatures, a
survey recorded how many cold drinks were sold over 13 days in summer.
Temp °C
Number of
Cold Drinks
(a)
22
27
31
36
25
34
39
38
36
28
33
24
29
31
41
42
51
39
50
40
57
7
44
49
36
45
Using your calculator, construct a scatterplot for the data.
Print or copy-out a good sketch of the plot and label the axes. (Make sure you put
the explanatory and response variables on the right axes!) Can you see any
(b)
Using your own visual judgement (‘by inspection’), describe the association
between the variables in terms of form, strength and direction.
(c)
(i) Now use your calculator to find the value of Pearson’s correlation
coefficient, r then
(ii) interpret the value of r in terms of what it can tell us about any association
between the variables temperature and number of cold drinks.
(iii) Does the calculated correlation coefficient value from c (i) support your visual
judgement from (b) regarding the strength of the relationship between the
variables?
(iv) Calculate the coefficient of determination, r2 and interpret its value: what can
the value of r2 tell us about the association here?
What does the interpretation imply regarding the predictive power of the
association?
(d)
(i) Use your calculator to find the upper and lower quartiles for the temperature
data, considered as a univariate set. Now calculate upper and lower fences and
do the test for outliers, show all working.
(ii) Use your calculator to find the upper and lower quartiles for the number of
cold drinks sold. Then do the test for outliers, show your working.
(e)
Remove any data pairs for the outlier values you have found and recalculate the
Pearson’s correlation coefficient for the new set of pairs with outliers removed
(f)
Compare the new r (without outliers) with your result for r in c (i).
Does this new value for r support or contradict your visual judgement in (b)?
(g)
What conclusions can you draw from this activity, regarding the effect of
outlier(s) on the value of Pearson’s correlation coefficient?
(h)
(i) Calculate the new value for the coefficient of determination, r2, for the data set
with outliers removed.
(ii) Give an interpretation to this new coefficient of determination in terms of the
variables then comment on the implications for the predictive power of the
association.
(i)
Use your CAS to produce a scatterplot for the new set of data with outliers
removed. Perform a linear regression analysis of the plot. Write out the equation
for the regression line, in terms of the variables in this model and comment on the
reliability of this linear regression model for making predictions.
2. Researchers found a correlation of 0.86 between the number of churchgoers and the
number of burglaries committed in 50 towns. This is a strong positive correlation.
Does the result mean that attending church makes people become burglars? Or, does it
mean that burglars deliberately target the houses of churchgoers on Sunday mornings? Is
there a hidden factor that might explain the results? Discuss.
3. As part of her studies, a university student visited a school holiday childcare centre and
gathered information about the sleeping patterns of the children in the centre. The scatter
Hours of sleep
plot for the data she gathered is graphed below:
Age (years)
(a) Copy the scatter plot to your working and draw a line of best fit through the points
‘by eye’—i.e. just by drawing a line which you think looks like it captures the
general trend of the data.
(i) the expected number of sleeping hours of a child of age 5½ years.
(ii) the estimated age if a child sleeps for 12.5 hours.
(c) Here is the data set the student collected and put into the plot above:
{(1, 20), (2, 15), (3, 18), (4, 12), (5, 13), (6, 15), (7, 11), (8, 14), (9, 5)}
Put these values into your CAS and use them to calculate the least squares
regression line for the data. Write your equation out, using the variable names.
(d) Use the regression equation obtained in (e) to predict
(i) the expected number of sleeping hours of a child of age 5½ years
(ii) the estimated age if a child sleeps for 12.5 hours
(e) Are your answers to parts (b) and (d) the same (or at least very similar)?
Should they be the same? Why might they differ? Give reasons for your answer.
Exam Practice
In Further Mathematics there are two end-of-year examinations. Examination 1 consists of
multiple-choice questions covering the two CORE modules—Data Analysis & Financial
Modelling—and the two chosen modules— at Geometry & Trigonometry and Matrices.
The work you have just completed is a part of the Data Analysis module
In VCE Exam 1, there are a total of 40 questions to be completed in 90 minutes. So, on
average, you should plan to take around 2 minutes per question. One mark is given for each
In order to practice working under exam conditions, we suggest you attempt the five
multiple-choice questions below at this rate: i.e. you should time yourself and take no more
than 10min for them all.
Restrict your time to 10 minutes only.
You should then submit your answers to these questions electronically using the week 4
online quiz. That way you will get immediate feedback on your answers.
If you get less than 4/5, this indicates that you need to spend more time reviewing the work
for the week.
It is not necessary to show your working or include these questions in your SEND
document as credit is given for correct answers only.
Circle the letter beside the correct answer.
1 The following scatter plot shows the relationship between age and the number of alcoholic
NKS
drinks consumed on the weekend by a group of people.
The value of the correlation coefficient is closest to:
A – 0.8
B – 0.4
C – 0.1
D 0.4
E 0.8
2 The length (in metres) and wingspan (in metres) of eight commercial aeroplanes are
displayed in the table below:
70.7
70.7
63.7
58.4
54.9
39.4
36.4
33.4
Wingspan 64.4
59.6
60.3
60.3
47.6
35.8
28.9
28.9
Length
Correct to four decimal places, the value of Pearson’s product moment correlation
coefficient for this data is
A 0.9371
B 0.9583
C 0.9681
D 0.9793
E 0.9839
3 Data was collected from a group of students concerning the number of hours they spend
reading for recreation each week, and their score on an English examination. The plot for
the data shows a linear association with the correlation coefficient r = 0.72.
From this we can say that:
A 72% of the students read for leisure.
B Most of the students who read for leisure scored 72 on the English examination.
C Those students who spent more time reading for leisure tended to score lower on the
English examination.
D Those students who spent more time reading for leisure tended to score higher on the
English examination.
E There is only a very weak relationship between score on the English examination and
number of hours spent reading for leisure.
4 Researchers conduct a study in order to predict weight from height. If the correlation
between height and weight for a group of people is determined to be 0.75, then we can say
that approximately:
A 75% of the variation in weight is explained by the variation in height
B 75% of the variation in height is explained by the variation in weight
C 56% of the variation in weight is explained by the variation in height
D 87% of the variation in height is explained by the variation in weight
E
height
and
weight
are
negatively
correlated.
5 For which one of the following plots would it be appropriate to calculate the value of the
correlation coefficient, r?
A
B
0
30
-10
20
-20
-30
10
-40
-50
0
7
0
1
2
3
4
5
8
9
10
20
0
6
C
30
10
D
30
100
20
50
10
0
0
0
1
2
3
4
5
6
7
8
9
10
3
0
E
0
-5
1
2
4
5
6
7
8
9
10
-10
-15
0
1
2
3
4
5
6
7
8
9
10
Work out the solutions to the following questions then check them by putting your answers into
the week 5 online quiz. That way you will get immediate feedback on your answers.
If you get less than 80% this indicates that you need to spend more time reviewing the work for
the last two weeks.
finished putting your answers into the online quiz, you just need to go to the Work for
submission [W05] link and type in a brief reflection on your learning progress.
TRUE OR FALSE?
Are the following statements true (T) or false (F)? Type T or F for each statement.
1.
Pearson’s correlation coefficient is insensitive to outliers.
2.
A prediction within the original range of the date is call an interpretation
3.
To calculate a residual value we subtract the observed value from the predicted value
MULTIPLE CHOICE QUESTIONS
4.
The product moment correlation coefficient (Pearson’s r) was found to be –0.3951 for
the data displayed. If the point (7, 25) was replaced with (7, 5) and Pearson’s r
recalculated, the new value of r would be.
A
Unchanged
C
negative but closer to zero D
positive but closer to zero E
B
positive but closer to 1
closer to –1
negative but
25
10
20
5
15
0
0
2
4
6
8
10
5. The scatterplot shows the game scores achieved by a group of players in two games.
Which equation is closest to the least squares regression equation of G2 on G1?
A
G2 = G1 + 15
B
G2 =
G1
3
C
G1 + G2 = 15
3G 1
D
E
 15
G2 =
Score second game (G2)
45
40
35
30
25
20
5
15
3G
10
1
G2 =
5
 15
5
0
0
10
20
30
40
Score first game (G1)
50
Given that r = –0.873, s x = 5.832 and s y = 6.001, the slope, b, of the regression line
6.
y = a+ b x is closest to:
A
– 0.90
B
– 0.87
C
– 0.85
D
0.87
E
0.90
7. The following least squares regression line relates the value of sales made per month, in
\$’000s, at a large car yard to the number of salespersons on the staff:
Car sales (thousands of dollars) = 13 + 36  Number of salespeople
Thus we can say:
A
on average, sales are increasing by \$36 000 per month.
B
on average, sales are decreasing by \$36 000 per month.
C
for each additional salesperson employed we predict an increase in sales of \$36 000.
D
on average, sales are increasing by \$13 000 per month.
E
for each additional salesperson employed we predict an increase in sales of \$13 000.
8. The plot below shows a least squares regression line together with the data used in the
calculation of that line.
The corresponding residual plot for this least squares regression line is closest to:
A
B
C
D
E
9. A person’s weight is known to be positively associated with their height. To investigate this
association for 12 men, a scatterplot is constructed as shown. When a least squares regression
line is used to model this data, the coefficient of determination is found to be 0.3146.
The scatterplot shows a very clear outlier. If the outlier is removed from the data, and a least
squares regression line refitted to the data of the remaining 11 men, the value of the
coefficient of determination will
A
remain the same
B
increase
C
decrease
D
be halved
E
not be able to be determined
10.
The table opposite shows the number of years employed and annual income (in
thousands of dollars) for 12 graduates employed by a large accounting firm.
The least squares regression line which would
enable annual salary to be predicted from
years of service is closest to:
A
Salary = 21.715 – 5.829  years
B
Salary = –21.715 + 5.829  years
C
Salary = 5.829 + 21.715  years
D
Salary = 5.829 – 21.715  years
E
Salary = 21.715 + 5.829  years
Years of
service
5
0
8
1
5
4
9
6
7
9
2
6
Annual salary (\$’000)
52
23
68
25
45
38
75
50
62
64
32
88
11. A student fits a least squares line to a set of bivariate data, as shown in the scatterplot
below.
The residual plot for this least squares line would look like:
12. When the correlation coefficient, r, was calculated for the data displayed in the
scatterplot below, it was found to be − 0.3951. If the point (7, 25) was replaced with the
point (7, 5) and the correlation coefficient, r, recalculated, then the value of r would be:
A unchanged
B positive but closer to 1
C negative but closer to 0
D positive but closer to 0
E negative but closer to −1
13. A least squares regression line has been fitted to a scatterplot, as shown below.
The equation of this line is closest to:
A
y = 0.8 − 10x
B
y = 110 + 0.8x y
C
y = −1.25 + 110x
D
y = 110 − 1.25x
E
y = 110 − 0.8x
PROBLEM SOLVING I : Finding Equation of the LSR line using Formula method
14.
We wish to find the equation of the least squares regression line that will enable height (in
cm) to be predicted from femur length (in cm). Femur is just the thigh bone
Complete the following sentences by filling in the missing information.
Which is the explanatory variable (EV) and which is the response variable (RV)?
(i)
Explanatory variable is:
( height or femur length)
(ii)
Response variable:
15.
Use the summary statistics given below to determine the gradient (2 d.p.) and y-
( height or femur length)
intercept (2 d.p.) of the equation of the least squares regression line that will enable
height (y) to be predicted from femur length (x).
Summary statistics
r = 0.9939
x = 24.246
s x = 1.873
y = 166.092
s y = 10.086
Write the least square equation in terms of height and femur length. Fill in the boxes
below.
[Warning: if you do not round off the gradient and y-intercept to the nearest 2 decimal places
=
16.
+
×
Interpret the slope of the regression equation in terms of height and femur length
( fill in the blanks below with the correct words and numbers to two decimal places):
For every 1 cm increase in
17.
,
increases by
cm.
Determine the value of the coefficient of determination (2 d.p.) and interpret this in terms
of height and femur length.
% of the
in
can be explained by the
in
PROBLEM SOLVING II – A Full Regression Analysis
The cost of preparing meals in a school canteen each day is assumed to be linearly associated to
the number of meals prepared. To help the caterers predict the costs, data were collected on the
cost of preparing meals for over 11 days. The data are shown below.
Complete the following sentences by filling in the missing information.
18. In this situation, the explanatory variable is _
(cost or number of meals?)
19. Use your calculator to find the equation of the least square regression line relating the cost
of preparing meals to the number of meals produced (giving all values to one decimal
place).
cost =
20.
+
× number of meals
Use the equation to predict the cost of producing:
(i) 48 meals.
(to nearest cent)
(ii) In making this prediction are you interpolating or extrapolating?
21.
Use the equation to predict the cost of producing:
(i) 21 meals.
(to nearest cent)
(ii) In making this prediction are you interpolating or extrapolating?
22.
Interpret the y-intercept and gradient/slope of the regression line:
The y-intercept of the regression line predicts that the fixed costs of running the canteen
(even if no meals are prepared) is \$
_ (to nearest cent)
The slope of the regression line predicts that, for each additional meal produced, meal
preparation costs increase by \$
(to nearest cent).
23.
Use your calculator to find the correlation coefficient, r and the coefficient of
determination, r2. Use these values to complete the following interpretation sentences.
(i) The correlation coefficient, r equals
This suggests that there is a
(4d.p.).
(strength), _
(direction)
association between the cost of preparing meals to the number of meals produced.
(ii) The coefficient of determination, r2 equals
This indicates that
(4d.p.).
% of the variation in the cost of preparing meals can be
by the variation in the number of meals produced.
24.
Use your calculator to plot the data, fit a regression line and plot the residuals.
Does the residual plot confirm or contradict the assumption that there is a linear
association present here?
SEND: Work for Submission – Exam Practice
In Further Mathematics there are two end-of-year examinations. Examination 1 consists of
multiple-choice questions covering the two CORE modules—Data Analysis & Financial
Modelling—and the two chosen modules— at Geometry & Trigonometry and Matrices.
The work you have just completed is a part of the Data Analysis module
In VCE Exam 1, there are a total of 40 questions to be completed in 90 minutes. So, on average,
you should plan to take around 2 minutes per question. One mark is given for each correct
In order to practice working under exam conditions, we suggest you attempt the five multiplechoice questions below at this rate: i.e. you should time yourself and take no more than 10min
for them all.
Restrict your time to 10 minutes only.
You should then submit your answers to these questions electronically using the week 6 online
If you get less than 4/5, this indicates that you need to spend more time reviewing the work for
the week.
It is not necessary to show your working or include these questions in your SEND
document as credit is given for correct answers only.
Circle the letter beside the correct answer.
1 The relationship between the two variables x and y as shown in the scatterplot is non-linear.
Which of the following transformations could possibly linearize the scatterplot?
y
A log x
40
B log y
35
C 1
x
30
1
D
y
25
E y2
20
 
 
15
10
5
x
0
0
5 10 15 20 25 30 35 40 45 50
2 A student uses the following data to construct the scatterplot shown
To linearize the scatterplot she applies an
x-squared transformation. She then fits a
least squares regression line to the
transformed data with y as the dependent
variable. The equation of this least squares
regression line is closest to
A y = 7.1 + 2.9×2
B y = 29.5 + 26.8×2
C y = 26.8  29.5×2
D y = 1.3 + 0.04×2
E y = 2.2 + 0.3×2
3 The plot on the right shows a least
squares regression line together with the
data used in the calculation of that line.
The plot is clearly non-linear. The plot
can be linearized by applying a log x
transformation.
When this is done, a least squares
regression line is fitted to the
transformed data giving the equation:
Life expectancy = 14.3 + 14.5  log(GNP).
Using this equation, the average life expectancy of a country with a GNP of \$7000 is
predicted to be closest to
A 63
B 65
C 67
D 70
E 73
The following information relates to questions 4 and 5
Cars depreciate in value over time. The table gives the average value of a car at different ages.
Table 1
When a scatterplot is drawn for this data it indicates that a logarithmic transformation of the
horizontal axis may linearize the data.
The original data has been reproduced in Table 2 with an extra row for the transformed variable
being added. Table 2 is incomplete.
Table 2
4 What value, correct to two decimal places, is needed to complete the table?
A 1.0
B 0.94
C 0.95
D 0.954
E 0.96
5 The equation of the least squares regression line fitted to the transformed data is
A value = 10800  18300 × log(age)
B value = 1200 + 17500 × log(age)
C value = 10800 + 18300 × log(age)
D value = 18300  10800 × log(age)
E value = 17500  1200 × log(age)

attachment

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.