Data Structures and Data Types in R

Vectors
A vector is an ordered collection of objects of the same
type. The function c(...) concatenates its arguments to form a vector. To create
a patterned vector. : Sequence of integers
seq() General sequence, rep() Vector of replicated
elements
> v1 <- c(2.5, 4, 7.3, 0.1)
> v1 output [1] 2.5 4.0 7.3
0.1
> v2 <- c("A", "B", "C", "D")
> v2 output [1] "A" "B" "C"
"D"
> v3 <- -3:3
> v3 output [1] -3 -2 -1 0 1 2
3
> seq(0, 2, by=0.5) output [1] 0.0 0.5
1.0 1.5 2.0
> seq(0, 2, len=6) output [1] 0.0 0.4
0.8 1.2 1.6 2.0
> rep(1:5, each=2) output [1] 1 1 2 2 3 3
4 4 5 5
> rep(1:5, times=2) output [1] 1 2 3 4 5 1
2 3 4 5
Reference Elements of a Vector
Use [ ] with a vector/scalar of positions to reference
elements of a vector. Include a minus sign before the vector/scalar to remove
elements
> x <- c(4, 7, 2, 10, 1, 0)
> x[4] output [1] 10
> x[1:3] output [1] 4 7 2
> x[c(2,5,6)] output [1] 7 1 0
> x[-3] output [1] 4 7 10 1 0
> x[-c(4,5)] output [1] 4 7 2 0
> x[x>4] output [1] 7 10
> x[3] <- 99
> x output [1] 4 7 99 10 1 0
which() and match()
Additional functions that will return the indices of a
vector
which() Indices of a logical vector where the
condition is TRUE
which.max() Location of the (First) maximum
element of a numeric vector
which.min() Location of the (First) minimum
element of a numeric vector
match() First position of an element in a vector
> x <- c(4, 7, 2, 10, 1, 0)
> x>=4 output [1] TRUE
TRUE FALSE TRUE FALSE FALSE
> which(x>=4) output [1] 1 2 4
> which.max(x) output [1] 4
> x[which.max(x)] output [1] 10
> max(x) output [1] 10
> y <- rep(1:5, times=5:1)
> y output [1] 1
1 1 1 1 2 2 2 2 3 3 3 4 4 5
> match(1:5, y) output [1] 1 6 10
13 15
> match(unique(y), y) output [1] 1 6 10 13 15
Vector Operations
When vectors are used in math expressions the operations
are performed element by element
> x <- c(4,7,2,10,1,0)
> y <- x^2 + 1
> y output [1] 17 50 5 101 2
1
> x*y output [1] 68 350 10 1010
2 0
Useful Vector Functions
sum(x) prod(x) Sum/product
of the elements of x
cumsum(x) cumprod(x) Cumulative
sum/product of the elements of x
min(x) max(x)
Minimum/Maximum element of x
mean(x) median(x)
Mean/median of x
var(x) sd(x)
Variance/standard deviation of x
cov(x,y) cor(x,y)
Covariance/correlation of x and y
range(x)
Range of x
quantile(x)
Quantiles of x for the given probabilities
fivenum(x)
Five number summary of x
length(x)
Number of elements in x
unique(x)
Unique elements of x
rev(x)
Reverse the elements of x
sort(x)
Sort the elements of x
which()
Indices of TRUEs in a logical vector
which.max(x) which.min(x) Index of the
max/min element of x
match()
First position of an element in a vector
union(x, y)
Union of x and y
intersect(x, y)
Intersection of x and y
setdiff(x, y)
Elements of x that are not in y
setequal(x, y) Do
x and y contain the same elements?
Matrices
A matrix is just a two-dimensional generalization of a
vector. To create a matrix,
>matrix(data=NA, nrow=1, ncol=1, byrow = FALSE, dimnames =
NULL)
data a vector that gives data to fill the matrix; if data
does not have enough elements to fill the matrix, then the elements are
recycled.
nrow desired number of rows, ncol desired number
of columns
byrow if FALSE (default) matrix is filled by columns,
otherwise by rows
dimnames (optional) list of length 2 giving the row and
column names respectively, list names will be used as names for the dimensions
> x <- matrix(c(5,0,6,1,3,5,9,5,7,1,5,3), nrow=3, ncol=4,
byrow=TRUE,
+ dimnames=list(rows=c("r.1", "r.2", "r.3"),
+ cols=c("c.1", "c.2", "c.3", "c.4")))
> x
cols
rows c.1 c.2 c.3 c.4
r.1 5 0 6 1
r.2 3 5 9 5
r.3 7 1 5 3
Reference Elements of a Matrix
Reference matrix elements using the [ ] just like with
vectors, but now with 2-dimensions
> x <- matrix(c(5,0,6,1,3,5,9,5,7,1,5,3), nrow=3, ncol=4,
byrow=TRUE)
> x
[,1] [,2] [,3] [,4]
[1,] 5 0 6 1
[2,] 3 5 9 5
[3,] 7 1 5 3
> x[2,3] # Row 2, Column 3
output [1] 9
> x[1,] # Row 1
output [1] 5 0 6 1
> x[,2] # Column 2
output [1] 0 5 1
> x[c(1,3),] # Rows 1 and 3, all Columns
[,1] [,2] [,3] [,4]
[1,] 5 0 6 1
[2,] 7 1 5 3
We can also reference parts of a matrix by using the row or
column names. Sometimes it is better to reference a row/column by its name
rather than by the numeric index. For example, if a program adds or permutes the
columns of a matrix then the numeric index of the columns may change. As a
result you might reference the wrong column of the new matrix if you use the old
index number. However the name of each column will not change.
Reference matrix elements using the [ ] but now use the
column or row name, with quotations, inplace of the index number. You don't have
to specify the names when you create a matrix. To get or set the column, row, or
both dimension names of A:
colnames(A)
rownames(A)
dimnames(A)
Can also name the elements of a vector, c("name.1"=1,
"name.2"=2).
Use the function names() to get or set the names of vector
elements.
> N <- matrix(c(5,8,3,0,4,1), nrow=2, ncol=3, byrow=TRUE)
> colnames(N) <- c("c.1", "c.2", "c.3")
> N
c.1 c.2 c.3
[1,] 5 8 3
[2,] 0 4 1
> N[,"c.2"] # Column named "c.2"
[1] 8 4
> colnames(N)
[1] "c.1" "c.2" "c.3"
> M <- diag(2)
> (MN <- cbind(M, N)) #Placing the expression in
parentheses
c.1 c.2
c.3 # will print the result
[1,] 1 0 5 8 3
[2,] 0 1 0 4 1
Matrix Operations
When matrices are used in math expressions the operations
are performed element by element. For matrix multiplication use the %*%
operator. If a vector is used in matrix multiplication, it will be coerced to
either a row or column matrix to make the arguments conformable. Using %*% on
two vectors will return the inner product (%o% for outer product) as a matrix
and not a scalar. Use either c() or as.vector() to convert to a scalar.
> A <- matrix(1:4, nrow=2)
> B <- matrix(1, nrow=2, ncol=2)
> A*B
[,1] [,2]
[1,] 1 3
[2,] 2 4
> A%*%B
[,1] [,2]
[1,] 4 4
[2,] 6 6
> y <- 1:3
> y%*%y
[,1]
[1,] 14
> A/(y%*%y)
Error in A/(y%*%y):non-conformable arrays
> A/c(y%*%y)
[,1]
[,2]
[1,] 0.07142857 0.2142857
[2,] 0.14285714 0.2857143
Useful Matrix Functions
t(A) Transpose of A
det(A) Determinate of A
solve(A, b) Solves the equation Ax=b for x
solve(A) Matrix inverse of A
MASS::ginv(A) Generalized inverse of A (MASS
package)
eigen(A) Eigenvalues and eigenvectors
of A
chol(A) Choleski factorization of A
diag(n) Create a nxn identity
matrix
diag(A) Returns the diagonal
elements of a matrix A
diag(x) Create a diagonal matrix
from a vector x
apply() Apply a function to the
margins of a matrix
rbind(...) Combines arguments by rows
cbind(...) Combines arguments by
columns and
dim(A) Dimensions of A
nrow(A), ncol(A) Number of rows/columns of A
dimnames(A) Get or set the dimension names of
A
lower.tri(A),upper.tri(A) Matrix of
logicals indicating lower/upper triangular matrix
colnames(A), rownames(A) Get or set the
column/row names of A
apply()
The apply() function is used for applying functions to the
margins of a matrix, array, or dataframes.
apply(X, MARGIN, FUN, ...)
X : A matrix, array or dataframe
MARGIN : Vector of subscripts indicating which margins to apply
the function to 1=rows, 2=columns, c(1,2)=rows and columns
FUN : Function to be applied
... : Optional arguments for FUN
You can also use your own function (more on this later)
> (x <- matrix(1:12, nrow=3, ncol=4))
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> apply(x, 1, sum) # Row totals
output [1] 22 26 30
> apply(x, 2, mean) # Column means
output [1] 2 5 8 11
Arrays
An array is a multi-dimensional generalization of a vector.
To create an array,
array(data = NA, dim = length(data), dimnames = NULL)
data
: A vector that gives data to fill the array; if data does not have
enough elements to fill the matrix, then the elements are recycled.
dim
: Dimension of the array, a vector of length one or more giving the
maximum indices in each dimension
dimnames
: Name of the dimensions, list with one component for each dimension,
either NULL or a character vector of the length given by dim for that dimension.
The list can be named, and the list names will be used as
names for the dimensions.
Values are entered by columns. Like with vectors and
matrices, when arrays are used in math expressions the operations are performed
element by element. Also like vectors and matrices, the elements of an array
must all be of the same type (numeric, character, logical, etc.)
Sample 2 x 3 x 2 array,
> w <- array(1:12, dim=c(2,3,2), dimnames=list(c("A","B"),
c("X","Y","Z"), c("N","M")))
Reference Elements of an Array
Reference array elements using the [ ] just like with
vectors and matrices, but now with more dimensions
> w <- array(1:12, dim=c(2,3,2), dimnames=list(c("A","B"),
c("X","Y","Z"), c("N","M")))
> w[2,3,1] # Row 2, Column 3, Matrix 1
[1] 6
> w[,"Y",] # Column named "Y"
> w[1,,] # Row 1
> w[1:2,,"M"] # Rows 1 and 2, Matrix "M"
Useful Array Functions
apply() : Apply a function to the
margins of an array
aperm() : Transpose an array by
permuting its dimensions
dim(x) : Dimensions of x
dimnames(x) : Get or set the dimension names
of x
apply()
We can use the apply() function for more then one dimension
For a 3-dimensional array there are now three margins to
apply the function
to: 1=rows, 2=columns, and 3=matrices.
# Column sums > apply(w, 2, sum)
X Y Z
18 26 34
# Row and matrix sums > apply(w, c(1,3), sum)
N M
A 9 27
B 12 30
|