Health Research Methods Computing assessment guideline Academic Essay

Health Research Methods
Computing assessment guideline (Semester 2 2016)
PLEASE READ THE INSTRUCTIONS CAREFULLY BEFORE YOU COMMENCE.
INSTRUCTIONS:
• You can refer to page 2 to 12 of this assessment guideline and work on the questions first, then log onto Blackboard to enter your answers via the “computing assessment” link. Note that Question 1 is Question 1 on Blackboard, Question 2 is Question 2 on Blackboard, Question 3a is Question 3 on Blackboard, Question 3b becomes Question 4 on Blackboard, and so on. Alternately, you can work on the questions directly on Blackboard via the “computing assessment” link.
• You must complete this assessment by 9am Monday 10th October 2016 AWST via Blackboard.
• The assignment MUST be completed by yourself using STATA (or SPSS) software.
• You need to submit the relevant outputs generated by STATA (or SPSS) in a word document as evidence to support your answers. You can upload the word document containing screenshots of your outputs or the outputs directly cut and paste from the STATA (or SPSS) software when you complete the last question. Marks will be deducted if you do not provide the outputs.
• You need to save your assignment as a word document in the following format: SURNAME_StudentID_CA.docx. For example, John Smith will save his assignment as SMITH_12345678_CA.docx and submit the word document to Blackboard.
• You need to also submit the same word document to Turnitin for plagiarism detection.
• Do not submit your assignment in pdf format or include any irrelevant statistical outputs.
• The assignment will not be accepted without the Declaration statement below. You may insert your electronic signature or just type your name below.
Declaration
As I type my name below, I admit that the work I am submitting in this assignment is my own work. I have generated the analyses myself and the interpretations and answers given are my own. I understand that all forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University.

______________________________ ___________________________ __________________
Name of student Student ID Date
• You are required to keep a copy of the completed assignment for your own record.
• Once your computing assessment is marked, you will be able to see your mark of the multiple choice questions by clicking on the ‘Computing Assessment Submission Link’, then the score of your computing assessment. You can download the marked assignment with feedback under the My Grades section on Blackboard within 15 working days post the due date.
• Late Assignments: Students will have 10% of the total assessment mark deducted for each day (including both Saturday and Sunday) the assessment is late without prior negotiation with the relevant Unit Coordinator. Assessments will NOT be marked if they are submitted more than 10 days (including both Saturday and Sunday) after the due date (or revised due date if an extension has been granted). Work not submitted after this time (due date plus 10 days) may result in an F – IN (Fail – Incomplete) grade being awarded.

Scenario:
A researcher randomly recruited a group of 63 children in southwest Western Australia from an elementary school and followed them for a period of 12 years. At the beginning of the study, each of the children was given an identification number and their gender, area where they live, their daily energy intake, their fibre intake and time spent on physical activities were recorded by a research assistant who compiled all the information into the dataset ‘Semester 2’. After 12 years, the amount of time these teenagers (the then children) spent playing sports were measured and recorded in the same dataset. The body mass index of the children and the teenagers were also measured. All of the data were measured independently of the children. The variables of the dataset are listed in Table 1.

Table 1: Variables and their descriptions as collected in the study
Variable Labels
ID Identification number of the children
GENDER Gender of the children (1 = Girl; 2 = Boy)
AREA Area where the children live (1 = Country; 2 = City)
ENERGY Daily energy intake of the children (in kJ’s)
FIBRE Daily fibre intake of the children (in g)
TIME1 Time children spent playing sports per day (in mins)
TIME2 Time teenager spent playing sports per day (in mins)
BMIC Body mass index of the children (in kg/m2)
BMIT Body mass index of the teenagers (in kg/m2)

Open the Semester 2 data set from Blackboard under the ‘Assessments’ tab. Use the following questions to guide you through the process as you run some descriptive statistics and also the inferential statistics and prepare your interpretations or conclusions for the researcher.

It is recommended that you first assign the variable ‘labels’ and ‘values’ according to the Table above. This will enable you to read the outputs easily.


1. (Q1 on Blackboard) Which of the following would be appropriate to describe the frequency distribution for Area where the children live? (0.5 mark)
a. Frequency and percentage
b. Mean and standard deviation
c. Median and interquartile range
d. Variance and standard error

2. (Q2 on Blackboard) The appropriate statistics to describe the body mass index (BMI) of the children would be: _______ (0.5 mark)
a. The percentage of BMI of the children is 20.37 kg/m2.
b. The range of BMI of the children population is 17.26 kg/m2.
c. The BMI of the children is a lot lesser than the BMI of the teenager in the population.
d. In this sample, the average BMI of the children is 20.27 kg/m2, with a standard deviation of 3.67 kg/m2.

3. The researcher is interested to know if variable BMIT has a Normal distribution. Use the following table as a guide.
Measures Criteria/Cut off points
• Histogram Symmetrical, bell-shaped curve
• Boxplot Median in the centre of the box with whiskers at equal length at both ends of the box and no outliers
• Normal Q-Q plot Most observations appear on the straight line
• Skewness coefficient Between -1 and 1
• Kurtosis coefficient Between -1 and 1
[STATA users to substrate 3 from the given kurtosis coefficient]

a. (Q3 on Blackboard) Do you think transformation is required for variable BMIT? (1 mark)
i. Yes, natural logarithm of the variable should be done and assessment needs to be carried out in full to assess the Normality of the newly transformed variable.
ii. Yes, variable BMIT has a Normal distribution and should be transformed to ensure the distribution remains Normal.
iii. No, variable BMIT already has a Normal distribution.
iv. No, skewed variables (including variable BMIT) should never be transformed.

b. (Q4 on Blackboard) As you know the distribution of variable BMIT, what should be the most appropriate measures of centrality and variability to report for variable BMIT? *Hint: Different measures of centrality and variability need to be reported for data that display a Normal or a skewed distribution.* (1 mark)
i. Mean and standard deviation. The reason is that variable BMIT has a normal (symmetric) distribution.
ii. Median and interquartile range. The reason is that variable BMIT does not have a normal distribution but a skewed distribution.

c. (Q5 on Blackboard) A practical interpretation for the dietician about the measure of variability of variable BMIT within this sample, as referred to by the 68-95-99% rule, would be: _______ (1 mark)
i. The range of BMI of approximately 95% of the teenagers in this sample lies between 15.06 and 33.37 kg/m2.
ii. The range of BMI of approximately 95% of the teenagers in this sample lies between 23.06 and 25.37 kg/m2.
iii. The range of BMI of approximately 99% of the teenagers in this sample is 20.41 kg/m2.
iv. The average BMI of the teenager should be between 15.06 and 33.37 kg/m2 in this population, as estimated with 95% confidence.

4. The researcher now wants to investigate the levels of energy intake of the boys and girls who spent various amount of time playing sports per day. You need to first recode the variable ENERGY and TIME1 as follows: *Hint: Give the recoded variables new names and remember to assign value labels to the new recoded variables. *
ENERGY Values for ENERGY to be recoded into following levels Code
Less than 4500 kJ
(< 4500 kJ) 1 Equal to or more than 4500 kJ but less than 5000 kJ (4500 - 5000 kJ) 2 More than 5000 kJ (>5000 kJ) 3
TIME1 Values for TIME1 to be recoded into following levels Code
Less than 45 minutes
(< 45 min) 1 Equal to or more than 45 minutes (>= 45 min) 2

a. (Q6 on Blackboard) Obtain a cross-tabulation consisting of the appropriate statistics for energy intake and time the children played sports *Hint: cross-tabulation is for categorical variables.* Which of the following statement(s) is/are appropriate to describe the levels of energy intake between the children who spent less than 45 minutes and those who spent equal or more than 45 minutes playing sports?
(1 mark)
i. Most of the girls (53% of them) spent less than 45 minutes playing sports while less of the boys (45% of them) spent less than 45 minutes playing sports.
ii. There are 32% of the children who consumed between 4500 and 5000 kJ per day spent less than 45 minutes playing sports.
iii. There is not much difference between the percentages of children who spent less than 45 minutes playing sports (54%) than those who spent more than 45 minutes playing sports (46%).
iv. Of the children who consumed more than 5000 kJ, more of them also tend to spend more than 45 minutes playing sports (60%).

b. (Q7 on Blackboard) Assuming the assumptions are met, how would you test if there is any association between the levels of energy intake of children and the levels of time they spent playing sports? (1 mark)
i. Use Pearson Correlation Coefficient, with significance level set at 5% level.
ii. Use Chi-square test, with significance level set at 5% level.
iii. Use an independent samples t-test, with significance level set at 5% level.
iv. None of the above is suitable for this research hypothesis.

c. (Q8 on Blackboard) How can you conclude about the relationship between levels of energy intake and levels of time the children played sports? (1 mark)
i. The chi-square statistic is 12.12 with a p-value of less than 0.05. Assuming the assumptions are met, it can be concluded that there is an association between levels of energy intake and levels of time the children played sports.
ii. The p-value from the test is 0.02. Assuming the assumptions are met, it can be concluded that there is no association between levels of energy intake and levels of time the children played sports when the significance level is set at 5%.
iii. The p-value of the t-test was found to be 0.88. Assuming the assumptions (including the Levene’s test) are met, it can be concluded that the energy intake between those who played less than 45 minutes of sports is not significantly different from those who played more than 45 minutes of sports.
iv. None of the above.


5. Confidence intervals (CI) are used to estimate the population parameters as it is impossible to reach everyone in the population.
a. (Q9 on Blackboard) Which of the following statement is correct about the estimation of the average time the population of teenagers spent playing sports? (1 mark)
i. The average time the population of teenagers spent playing sports is estimated to be between 19.7 and 24.8 minutes.
ii. We are 95% confident that the mean time the children spent playing sports lies between 19.7 and 24.8 minutes in this population.
iii. The higher the confidence levels (e.g. from 90% to 95% to 99%), the more confident we are about capturing the actual population parameter and therefore the corresponding lengths of the CIs tend to be shorter.
iv. None of the above is correct.

b. (Q10 on Blackboard) If the sample size of this study increased from 63 to 630, we will expect: ______ (1 mark)
i. The range of values that are captured within the 90% CI, 95% CI and 99% CI to become shorter as we can be more confident about our estimation now with larger sample size.
ii. The length of the 95% CI to remain the same but the 95% CI is now a more reliable estimation than the 99% CI as the larger sample size warrants a higher level of precision.
iii. The length of the 99% CI to be shorter and be more precise than when the sample size was 63.
iv. Statements ‘i’ and ‘iii’ are both correct.


6. The researcher wants to test a research hypothesis that the mean body mass index (BMI) of the teenagers in this population is 22 kg/m2.
a. (Q11 on Blackboard) The correct hypotheses statement(s) for this research objective would be: ___ (1 mark)
i. Ho: μ =22 years old.
ii. Ho: μ =22 kg/m2, H1(or Ha): μ ≠ 22 kg/m2
iii. Null hypothesis: the mean BMI of the teenagers is 22 kg/m2; Alternative hypothesis: the mean BMI of the teenager is not 22 kg/m2.
iv. Null hypothesis: the population mean BMI of the teenagers is 22 kg/m2; Alternative hypothesis: the population mean BMI of the teenagers is not 22 kg/m2.
v. Statements ‘ii’ and ‘iii’ are both correct.
vi. Statements ‘ii’ and ‘iv’ are both correct.

b. (Q12 on Blackboard) The appropriate statistical test to test that hypothesis would be ____________ ; and the results are found to be: ___(1 mark)
i. One sample t-test with 5% level of significance; t-value = 3.84, p-value = <0.001. ii. Two samples (independent samples) t-test with 5% level of significance; t-value = 3.84, p-value = <0.001. iii. Paired-sample t-test with the ‘alpha’ set at 5%; t-value = 0.64, p-value = 0.527. iv. Pearson correlation coefficient with 5% level of significance; r = 0.08, p-value = 0.520. v. One-way ANOVA with 5% level of significance; t-value = -5.57, p-value = <0.001. vi. Chi-square test with 5% level of significance; χ2= 3906, p-value = 0.239. c. (Q13 on Blackboard) An appropriate conclusion about the research hypothesis would therefore be: ____ (1 mark) i. There is no significant mean difference between the population BMI of the teenagers and the test value, 22 kg/m2, as the p-value is close to zero. ii. In this population, it is estimated that the mean BMI of the teenagers is 2.21571 kg/m2 less than the hypothesized 22 kg/m2, and therefore the null hypothesis has to be rejected (p<0.05). iii. The p-value is not much different from the set level of significance. In addition, the 95% confidence intervaI of the difference does not include the hypothetical value ‘22’, therefore supporting the decision to accept the null hypothesis and conclude that the population mean BMI of the teenagers is 22 kg/m2. iv. In this population, it is estimated that the mean BMI of the teenagers is significantly 2.21571 kg/m2 higher than the hypothesized 22 kg/m2. In addition, the estimated 95% confidence interval does not include the hypothetical value ‘22’ and hence the null hypothesis has to be rejected (p<0.05). 7. The researcher now wishes to test the hypothesis that the population mean fibre intake is the same for the boys and girls. You will test the hypothesis by following the steps of hypothesis testing. a. (Q14 on Blackboard) State the hypotheses. (2 marks) b. (Q15 on Blackboard) State which statistical test, and the level of significance (α) you plan to use to test the hypotheses stated above. (1 mark) c. (Q16 on Blackboard) In addition to ‘random sampling’ and ‘independent observations’, state the other two assumptions for the statistical test you nominated earlier. You need to test if the stated assumptions are met and provide evidence. You need to note some biostatistical remedy, where applicable, if the assumptions were not met. Do complete the table below with the information required. (2 marks) Assumptions Evidence of assumptions being met Biostatistical remedy if assumptions are not met (where applicable) d. (Q17 on Blackboard) After you run the statistical analyses for the research hypothesis stated above, what can you conclude about it? (1 mark) i. The test statistics is 2.10, the p-value is 0.04, the 95% CI of the difference is (0.02, 0.93) and does not include ‘0’, suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls. ii. The test statistics is 2.54, the p-value is 0.01, the 95% CI of the difference is (0.90, 7.58) and does not include ‘0’, suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls. iii. The test statistics is 2.44, the p-value is 0.02, the 95% CI of the difference is (0.77, 7.71) and does not include ‘0’, suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls. iv. The test statistics is 2.10, the p-value is 0.98, the 95% CI of the difference is (0.02, 0.93) and does not include ‘0’, suggesting that we have to accept the null hypothesis and conclude that the population mean fibre intake is the same between the boys and the girls. v. None of the above is correct.   8. The researcher wants to know if the average time the children played sports (in minutes) are the same as the average time they spent playing sports (in minutes) when they became teenagers. a. (Q18 on Blackboard) State the hypotheses. (2 marks) b. (Q19 on Blackboard) State which statistical test, and the level of significance (α) you plan to use. (1 mark) c. (Q20 on Blackboard) Assuming the assumptions for the statistical test you chose to do are met, what can you conclude about the research hypothesis after you run the statistical analyses? (1 mark) i. It is found that the r-value is -0.27, the p-value is 0.03, suggesting that the time the children spent playing sports is only mildy related to the time the teenagers spent playing sports. ii. The mean difference between the time the children played sports and the time the teenagers played sports is 22.16 minutes. The t-value is 14.07, p-value is <0.001, 95% CI of the difference is (19.0, 25.3) minutes and does not include ‘0’, suggesting that, in this population, there is a significant difference between the time the children spent playing sports and the time they played sports when they became teenagers on average. iii. The mean difference between the time the teenagers played sports and the time they played sports while they were children - 22.16 minutes. The t-value is -14.07, p-value is <0.001, 95% CI of the difference is (-25.3, -19.0) minutes and does not include ‘0’, suggesting that, in this population, there is a significant difference between the time the children spent playing sports and the time they played sports when they became teenagers. iv. Only statement ‘i’ is incorrect.   9. The researcher wishes to know if BMI of the teenagers is related to the mean energy intake amongst the children in the population. a. (Q21 on Blackboard) Assuming all the assumptions are met, the appropriate statistical analysis would be ____________ (1 mark) i. One sample t-test with 5% level of significance. ii. Two samples (independent samples) t-test with 5% level of significance. iii. Paired-sample t-test with the ‘alpha’ set at 5%. iv. Pearson’s correlation coefficient with 5% level of significance. v. One-way ANOVA with 5% level of significance. vi. Chi-square test with 5% level of significance. b. (Q22 on Blackboard) Based on the analyses you conducted, is there any relationship between BMI of the teenagers and the mean energy intake amongst the children in the population? (1 mark) i. Yes, the p-value is larger than 0.05 from the one sample t-test so we can conclude that there is a relationship between energy intake and BMI of the teenagers at the population level. ii. The p-value of the independent samples t-test is in agreement with the 95% CI of population mean difference (‘0’ is included in the 95% CI), suggesting that there is no relationship between energy intake and BMI of the teenagers at the population level. iii. Yes, the p-value from the paired-sample t-test is p<0.001, suggesting that there is a significant relationship between energy intake and the BMI of the children and the teenagers at the population level. iv. The correlation coefficient (-0.09) indicates that there is a weak negative linear relationship between the children’s energy intake and BMI of the teenagers, suggesting that there is no significant linear relationship between energy intake and the BMI of the teenagers in this population (p = 0.506). v. The p-value of the one-way ANOVA test is 0.94, suggesting that there is no significant relationship between children’s energy intake and BMI of the teenagers in this population. vi. The chi-square statistic is 126 with a p-value of 0.43, suggesting that there is no relationship between energy intake and BMI of the teenagers in this population.   10. The researcher wishes to test if BMI of the children varies significantly across the three energy levels they consumed at the population level. a. (Q23 on Blackboard) The appropriate test to use would be ___________ (1 mark) i. One sample t-test with 5% level of significance. ii. Two samples (independent samples) t-test with 5% level of significance. iii. Paired-sample t-test with the ‘alpha’ set at 5%. iv. Pearson correlation coefficient with 5% level of significance. v. One-way ANOVA with 5% level of significance. vi. Chi-square test with 5% level of significance. b. (Q24 on Blackboard) What can you conclude about this research hypothesis? (1 mark) i. The t-statistic is 109.03, p-value is <0.001, the 95% CI is (4841.67, 5022.52) kJ and does not include ‘0’, suggesting that in this population, there is a significant difference between the BMI of the children and their daily energy intake. ii. The p-value of the multiple comparison groups is larger than 0.05, suggesting that the null hypothesis should be accepted. iii. The t-statistic is 108.64, p-value is <0.001, the 95% CI is (4821.45, 5002.20) kJ and does not include ‘0’, suggesting that in this population, there is a significant difference between the BMI of the children and their daily energy intake. iv. The r-value is 0.06, p-value is 0.640, suggesting that there is no strong variation between the BMI of the children in this population and their daily energy intake. v. The F test-statistic is 1.08, p-value is 0.347, suggesting that there are no significant population mean BMI differences across the groups of children who had different levels of energy intake. vi. The χ2 is 124, p-value = 0.433, suggesting that there is no significant differences between the levels of energy intake and the BMI of the children in this population.   11. The researcher wants to know, at population level, if the time the children spent playing sports is related to the area they lived when they were children. Note that there are two different statistical tests that you can use to test this hypothesis. You need to therefore decide if you wish to use the outcome variable as a continuous or the recoded categorical variable. You can assume that the assumptions for the test you chose to run are met. a. (Q25 on Blackboard) As you state the hypotheses, be specific about the variables you wish to test (1 mark). b. (Q26 on Blackboard) State one statistical test you plan to carry out to test your stated hypothesis (0.5 mark) and the significance level (0.5 mark). c. (Q27 on Blackboard) Justify your choice (1 mark) of statistical test and significance level you planned to use. d. (Q28 on Blackboard) State the conclusion based on your output (1 mark). 12. Lastly, the researcher wants to test if, at population level, the BMI (kg/m2) of the teenagers can be predicted by the time these teenagers spent playing sports (in minutes), the time they spent playing sports as children (in minutes), and their BMI (kg/m2) when they were children. *Hint: you only need to consider one of the independent variables that is significant.* You can assume that the assumptions for the statistical approach you chose to partake are met. a. (Q29 on Blackboard) You will need to state the statistical approach you plan to carry out to answer the stated research question, including the significance level (1 mark) b. (Q30 on Blackboard) Justify your choice (1 mark) of statistical approach and significance level you planned to use c. (Q31 on Blackboard) State the regression equation that the researcher can use to predict mean values of BMI of the teenagers (1 mark) and interpret the estimated regression coefficient (1 mark) d. (Q32 on Blackboard) Comment on the fitness of the regression model (1 mark). You need to provide outputs you generated from STATA (or SPSS) as evidence to support your given answers. The outputs for Questions 2, 3a, 4a, 4c, 5, 6b, 6c, 7c, 7d, 8c, 9b, 10b, 11d, 12c and 12d (Questions 2, 3, 6, 8, 9, 12, 13, 16, 17, 20, 22, 24, 28, 31 and 32 on Blackboard) should be provided on a word document and uploaded to Blackboard via the submission link as you answer the last question (Question 33 on Blackboard). Marks will be deducted from your total score (0.5 mark for each missing output) if you do not provide the outputs.

Is this question part of your assignment?

Place order