add <- function(x, y) {
x + y
}Class 6: R Functions
Background
All functions in R have at least 3 things:
- A name that we use to call the function.
- One or more input arguments
- The body the lines or R code that do the work
Our first function
Let’s write a silly wee little function called add() to add some numbers (the input arguments)
Now we can use this function
add(100, 1)[1] 101
add(c(100, 1, 100), 1)[1] 101 2 101
Q. What if I gave a multiple element vector to
xandy?
add(x=c(100,1), y=c(100,1))[1] 200 2
Q. What if I give three inputs to the function?
#add(x=c(100,1), y=1, z=1)Q. What if I give only one input to the add function?
addnew <- function(x, y=1) {
x + y
}addnew(x=100)[1] 101
addnew(c(100,1), 100)[1] 200 101
If we write our function with input arguments having no default value then the user will be required to set them when they use the function. We can give our input arguments”default” values by setting them equal to some sensible value - e.g. y=1 in the addnew() function
A second function
Let’s try something more interesting: Make a sequence generating tool..
The sample() function can be a useful starting point here:
sample(1:10, size=4)[1] 10 7 5 2
Q. Generate 9 random numbers taken from the inpuut vector x=1:10
sample(1:10, size=9)[1] 5 9 8 1 3 4 7 2 10
Q. Generate 12 random numbers taken from the inpuut vector x=1:10
sample(1:10, size=12, replace = T) [1] 6 9 10 3 5 7 7 10 7 4 10 7
Q. Write code for the
sample()function that generates nucleotides sequences of length 6
sample(x=c('a','t','g', 'c'), size=6, replace= T)[1] "c" "t" "g" "t" "t" "a"
Q. Write a first function
generate_dna()that returns a user specified length DNA sequence:
generate_dna <- function(length=6) {
sample(c('A', 'T', 'G', 'C'), length, replace=T)
}Key-Points Every function in R looks fundamentally the same in terms of structure. Basically 3 things: name, input, and body
name <- function(input) {
body
}
Functions can have multiple inputs. These can be required arguments or optional arguments. With optional arguments having a set default value.
Q. Modify and improve our
generate_dna()function to return it’s generated sequence in a more standard format like “AGTAGTA” rather than the vector “A”,“C”,“G”,“A”
generate_dna <- function(length=6, fasta=T) {
ans <- sample(c('A', 'T', 'G', 'C'),
length,
replace=T)
if(fasta) {
cat("Single-element vector output")
ans <- paste(ans, collapse = "")
} else{
cat("Multi-element vector output")
}
return(ans)
}
generate_dna(fasta=FALSE)Multi-element vector output
[1] "G" "G" "A" "T" "A" "G"
generate_dna(fasta=T)Single-element vector output
[1] "ACTGGC"
The paste() function - it’s job is to join up or stick together (a.k.a. paste) input strings together
paste("alice", "loves R", sep=" ")[1] "alice loves R"
Flow control means where the R brain goes in your code
good_mood <- F
if(good_mood) {
cat("Great!")
} else {
cat("Bummer!")
}Bummer!
A Protein generating function
Q. Write a function that generates a user specified length protein sequence.
generate_protein <- function(length = 6, fasta=T) {
if(length) {
length = length
}
else{
length <- sample(6:12, size=1,replace=T)
}
ans <- sample(c('G','A','P','V','L','I','M','F','Y','W','S','T','C','N','Q','K','R','H','D','E'),
length,
replace=T)
if(fasta) {
ans <- paste(ans, collapse = "")
} else{
cat("")
}
return(ans)
}
generate_protein(fasta=T)[1] "IFKPTF"
Q. Use that function to generate random protein sequences between length 6 and 12
generate_protein()[1] "TSIGSI"
for(i in 6:12) {
# FASTA ID line ">id"
cat(">", i, sep="", "\n")
# Protein sequence line
cat(generate_protein(i), "\n")
}>6
QTTLMS
>7
NRCGKWQ
>8
CAQDFQDD
>9
MPMGSDKEA
>10
ADIVTVRYAP
>11
GMKQGTTPPCI
>12
EWYISPKCMQCQ
Q. Are any of your sequences unique i.e. not found anywhere in nature?
Yes 9,10,11,12.