R_intro_9

Karel Fišer
2017

analysis with R

problem –> { paper/brain/console } –> script –> …

  • Write your code into scripts!
  • Scripts are plain text files ending with .R
  • It is usually better to have a script even for small tasks.

scripts

  • Scripts are plain text files storing your code.

# comments

  • comment why, (what), what is not working, TODO
  • comment to organise code into logicle blocks (chunks)

  • RStudio: comment line ending with at least four dashes (-), equal signs (=), or hashes (#) creates a code section.

  • Rstudio: Ctrl + Shift + c

# ---- random numbers ---------------
x <- rnorm(10)
y <- rnorm(10) # inline comment

coding considerations

  • Try to avoid long one-liners.
  • Try to avoid “obscure” functions, tricks and workarounds.
  • Use base R were you can*.
  • Do not reinvent wheel but do not search for the solution longer then you would write it yourself.
  • Write with (consistent) style.




* However consider e.g. tidyverse for its consistency

analysis with R

  • input –> script –> output


  • input is data (tables, binary, url)
  • output is tables, images, whole analysis reports


  • You want to be able to “run” the analysis the same way at any time with any data on input

Your turn

  • make small structured, commented script which takes input and generates output
  • “source” the script

outputs, reports

  • individual images (png(), svg()) and text/tables (write.table())
    vs
  • full report (html, pdf)


  • R + LaTeX –> pdf (Sweave/knitr)
  • R + markdown –> html (knitr)
  • Markdown was designed for HTML, and LaTeX was for PDF
  • pandoc to convert them all
## Markdown

Markdown is a plain text with simple means of formatting.

For example:
- - for lists
- * for **emphasis**
- # for headers

[Links](goo.gl/5zczVk) go into parentheses.

Markdown

Markdown is a plain text with simple means of formatting.

For example:

  • - for lists
  • * for emphasis
  • # for headers

Links go into parentheses.

Rmarkdown

  • Rmarkdown = R + markdown
  • require: rmarkdown (–> knitr + pandoc)
  • *.Rmd –> *.md –> *.html
  • backtick (grave accent): `
  • code in chunks or inline
## R Markdown

Rmarkdown includes also code and its outputs!

\```{r}
plot(iris$Sepal.Length, iris$Sepal.Width)
\```

R Markdown

Rmarkdown includes also code and its outputs!

plot(iris$Sepal.Length, iris$Sepal.Width)

plot of chunk unnamed-chunk-2

Rmarkdown

```{r chunk_name, option1=TRUE}
~60 chunk options, e.g.:

option run code output
eval=FALSE no no no
include=FALSE YES no no
echo=FALSE YES no YES
  • gloal chunks options by opts_chunk$set(echo=FALSE, message=FALSE, warning=FALSE)

yaml

  • YAML Ain't Markup Language
  • yaml front matter for document global options

-–
title: “Best R analysis ever”
date: “24.12.2200”
author: “Me & Co.”
output: html_document
-–


markdown 3, markdown 4

Your turn

  • turn your script into Rmarkdown

beyond full report

  • shiny turns analysis into interactive web application.
  • interactive plotting (e.g. googleVis)

  • presentations (e.g.slidify)




motivation for interactive reporting

project

  • type and structure (e.g. data analysis project vs soft. devel. project)
  • “implementation” (custom folder structure, RStudio project, R package (CRAN or BioC?))
  • documentation
  • version control
  • sharing

data analysis project

  • data_analysis_project
    • data/
    • code/
      • load.R
      • functions.R
      • do.R
    • bench/
    • results/
    • description.txt

software development project

  • software_development_project
    • R/
    • DESCRIPTION





note: soft. devel. might be part of data analysis. How to combine them?

R package

  • package vs library
  • R/ + DESCRIPTION = essentials
  • + man/, README, …
  • R packages … Hadley Wickham again



RStudio project

  • options
  • build tools –> package
  • version control system (e.g. –> GIT (e.g. –> GitHub))

Your turn

  • start a project

version control

  • travel back in time
  • collaborations
  • multiple simultaneous versions

Atlassian

e.g. sigannotateboxplots

github, bitbucket

  • GIT (vsn, mercurial?)
  • bitbucket has private repositories
  • github has gists (Pastebin with git) (e.g. kinase consensus)

Your turn

  • start Rstudio project with GIT
  • commit a change

functions

  • function is a name + arguments + body code
  • convert code chunks into functions … When?
  • Why a function:
    • easier to repeat code execution
    • less likely to make mistakes by changing the order and state of code chunk
    • sanity checks
    • no need to change variable names (if they are prameters)
    • DRY
  • comments (may turn into documentation)
  • code to function example

Miscs

  • TODO: knitcitations

script naming:

  • files: R_intro_9.R
  • variables (nouns): leuko_pat_1
  • functions (verbs): countLeuko

    • pdf (pdf() or e.g. from knitted html via pandoc: pandoc -s R_intro_9.html -o R_intro_9.pdf)