You produce a simulation of a game of snakes and ladders based on the board configuration shown below. 2. Commenting Code Study the following code example which simulates movement around the board, and add comments to describe what w

Assignment Task

1. Snakes and Ladders

You produce a simulation of a game of snakes and ladders based on the board configuration shown below.

2. Commenting Code

Study the following code example which simulates movement around the board, and add comments to describe what what each line does. 

Extend the code above to incorporate the effect of the snakes and ladders. Fix any potential issues with the code and simulate the game for 1000 turns. Plot the position of the player at each turn. Compute the position at wich the player exceeds position 60 on the board.

Use your code from Q2 or otherwise repeat the game for a single player and work out how many turns it takes to reach the 100th square. Repeat this simulation 1000 times and plot the distribution of the number of turns taken to reach the 100th square. Explain what you observe in the distribution briefly. 

Data Analysis

  • For this part of the coursework you will need the data file (assess_data_resit-24.Rdata), instructions to download the file can be found on the course canvas page. This file contains the following objects:
  • Gene expression matrix with 3,465 genes (along the rows) and 30 samples (along the columns) from 15 tumour-normal tissue pairs.
  • You will perform the analysis in two steps, first you will look at the data to identify visually any problems with the data followed by a differential expression analysis to identify genes that are differentially expressed between tumour and normal tissue using regression.
  • Study the following code example and add comments to describe what it does: 

Data Exploration

There is one problematic sample in the data, identify the problematic sample and explain why it is problematic. Provide a plot to support your answer. You could consider using some statistics on the data to identify the problematic sample.

Expression Analysis

Using the remaining sample pairs, we will perform a regression-based differential expression analysis. We will use the glm function from the MASS package to perform the regression.

Using the code from Q4, perform a regression-based differential expression analysis between all normal and tumour samples using Poisson regression. Use the tissue type as the only covariate. Plot the appropriate p-value from your analysis.

Perform a regression-based analysis to identify genes differentially expressed between normal and tumour samples including the tissue variable indicating if it is tumour or normal sample. Plot the appropriate log value from your analysis. Compare the p-values with and without inclusion of the tissue type as a covariate, what do you observe? Which of the covariate has the biggest effect? Explain your answer with supporting plots, tables and further analysis if required.