Statistics, Data Analysis, R, SPSS, Minitab, Quality Control, C, C++

Zahoor Ahmad

Data Structures and Data Types in R

Vectors

A vector is an ordered collection of objects of the same type. The function c(...) concatenates its arguments to form a vector. To create a patterned vector. : Sequence of integers

seq() General sequence, rep() Vector of replicated elements

> v1 <- c(2.5, 4, 7.3, 0.1)

> v1 output [1] 2.5 4.0 7.3 0.1

> v2 <- c("A", "B", "C", "D")

> v2 output [1] "A" "B" "C" "D"

> v3 <- -3:3

> v3 output [1] -3 -2 -1 0 1 2 3

> seq(0, 2, by=0.5) output [1] 0.0 0.5 1.0 1.5 2.0

> seq(0, 2, len=6) output [1] 0.0 0.4 0.8 1.2 1.6 2.0

> rep(1:5, each=2) output [1] 1 1 2 2 3 3 4 4 5 5

> rep(1:5, times=2) output [1] 1 2 3 4 5 1 2 3 4 5

Reference Elements of a Vector

Use [ ] with a vector/scalar of positions to reference elements of a vector. Include a minus sign before the vector/scalar to remove elements

> x <- c(4, 7, 2, 10, 1, 0)

> x[4] output [1] 10

> x[1:3] output [1] 4 7 2

> x[c(2,5,6)] output [1] 7 1 0

> x[-3] output [1] 4 7 10 1 0

> x[-c(4,5)] output [1] 4 7 2 0

> x[x>4] output [1] 7 10

> x[3] <- 99

> x output [1] 4 7 99 10 1 0

which() and match()

Additional functions that will return the indices of a vector

which() Indices of a logical vector where the condition is TRUE

which.max() Location of the (First) maximum element of a numeric vector

which.min() Location of the (First) minimum element of a numeric vector

match() First position of an element in a vector

> x <- c(4, 7, 2, 10, 1, 0)

> x>=4 output [1] TRUE TRUE FALSE TRUE FALSE FALSE

> which(x>=4) output [1] 1 2 4

> which.max(x) output [1] 4

> x[which.max(x)] output [1] 10

> max(x) output [1] 10

> y <- rep(1:5, times=5:1)

> y output [1] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5

> match(1:5, y) output [1] 1 6 10 13 15

> match(unique(y), y) output [1] 1 6 10 13 15

Vector Operations

When vectors are used in math expressions the operations are performed element by element

> x <- c(4,7,2,10,1,0)

> y <- x^2 + 1

> y output [1] 17 50 5 101 2 1

> x*y output [1] 68 350 10 1010 2 0

Useful Vector Functions

sum(x) prod(x) Sum/product of the elements of x

cumsum(x) cumprod(x) Cumulative sum/product of the elements of x

min(x) max(x) Minimum/Maximum element of x

mean(x) median(x) Mean/median of x

var(x) sd(x) Variance/standard deviation of x

cov(x,y) cor(x,y) Covariance/correlation of x and y

range(x) Range of x

quantile(x) Quantiles of x for the given probabilities

fivenum(x) Five number summary of x

length(x) Number of elements in x

unique(x) Unique elements of x

rev(x) Reverse the elements of x

sort(x) Sort the elements of x

which() Indices of TRUEs in a logical vector

which.max(x) which.min(x) Index of the max/min element of x

match() First position of an element in a vector

union(x, y) Union of x and y

intersect(x, y) Intersection of x and y

setdiff(x, y) Elements of x that are not in y

setequal(x, y) Do x and y contain the same elements?

Matrices

A matrix is just a two-dimensional generalization of a vector. To create a matrix,

>matrix(data=NA, nrow=1, ncol=1, byrow = FALSE, dimnames = NULL)

data a vector that gives data to fill the matrix; if data does not have enough elements to fill the matrix, then the elements are recycled.

nrow desired number of rows, ncol desired number of columns

byrow if FALSE (default) matrix is filled by columns, otherwise by rows

dimnames (optional) list of length 2 giving the row and column names respectively, list names will be used as names for the dimensions

> x <- matrix(c(5,0,6,1,3,5,9,5,7,1,5,3), nrow=3, ncol=4, byrow=TRUE,

+ dimnames=list(rows=c("r.1", "r.2", "r.3"),

+ cols=c("c.1", "c.2", "c.3", "c.4")))

> x

cols

rows c.1 c.2 c.3 c.4

r.1 5 0 6 1

r.2 3 5 9 5

r.3 7 1 5 3

Reference Elements of a Matrix

Reference matrix elements using the [ ] just like with vectors, but now with 2-dimensions

> x <- matrix(c(5,0,6,1,3,5,9,5,7,1,5,3), nrow=3, ncol=4, byrow=TRUE)

> x

[,1] [,2] [,3] [,4]

[1,] 5 0 6 1

[2,] 3 5 9 5

[3,] 7 1 5 3

> x[2,3] # Row 2, Column 3 output [1] 9

> x[1,] # Row 1 output [1] 5 0 6 1

> x[,2] # Column 2 output [1] 0 5 1

> x[c(1,3),] # Rows 1 and 3, all Columns

[,1] [,2] [,3] [,4]

[1,] 5 0 6 1

[2,] 7 1 5 3

We can also reference parts of a matrix by using the row or column names. Sometimes it is better to reference a row/column by its name rather than by the numeric index. For example, if a program adds or permutes the columns of a matrix then the numeric index of the columns may change. As a result you might reference the wrong column of the new matrix if you use the old index number. However the name of each column will not change.

Reference matrix elements using the [ ] but now use the column or row name, with quotations, inplace of the index number. You don't have to specify the names when you create a matrix. To get or set the column, row, or both dimension names of A:

colnames(A)

rownames(A)

dimnames(A)

Can also name the elements of a vector, c("name.1"=1, "name.2"=2).

Use the function names() to get or set the names of vector elements.

> N <- matrix(c(5,8,3,0,4,1), nrow=2, ncol=3, byrow=TRUE)

> colnames(N) <- c("c.1", "c.2", "c.3")

> N

c.1 c.2 c.3

[1,] 5 8 3

[2,] 0 4 1

> N[,"c.2"] # Column named "c.2"

[1] 8 4

> colnames(N)

[1] "c.1" "c.2" "c.3"

> M <- diag(2)

> (MN <- cbind(M, N)) #Placing the expression in parentheses

c.1 c.2 c.3 # will print the result

[1,] 1 0 5 8 3

[2,] 0 1 0 4 1

Matrix Operations

When matrices are used in math expressions the operations are performed element by element. For matrix multiplication use the %*% operator. If a vector is used in matrix multiplication, it will be coerced to either a row or column matrix to make the arguments conformable. Using %*% on two vectors will return the inner product (%o% for outer product) as a matrix and not a scalar. Use either c() or as.vector() to convert to a scalar.

> A <- matrix(1:4, nrow=2)

> B <- matrix(1, nrow=2, ncol=2)

> A*B

[,1] [,2]

[1,] 1 3

[2,] 2 4

> A%*%B

[,1] [,2]

[1,] 4 4

[2,] 6 6

> y <- 1:3

> y%*%y

[,1]

[1,] 14

> A/(y%*%y)

Error in A/(y%*%y):non-conformable arrays

> A/c(y%*%y)

[,1] [,2]

[1,] 0.07142857 0.2142857

[2,] 0.14285714 0.2857143

Useful Matrix Functions

t(A) Transpose of A

det(A) Determinate of A

solve(A, b) Solves the equation Ax=b for x

solve(A) Matrix inverse of A

MASS::ginv(A) Generalized inverse of A (MASS package)

eigen(A) Eigenvalues and eigenvectors of A

chol(A) Choleski factorization of A

diag(n) Create a nxn identity matrix

diag(A) Returns the diagonal elements of a matrix A

diag(x) Create a diagonal matrix from a vector x

apply() Apply a function to the margins of a matrix

rbind(...) Combines arguments by rows

cbind(...) Combines arguments by columns and

dim(A) Dimensions of A

nrow(A), ncol(A) Number of rows/columns of A

dimnames(A) Get or set the dimension names of A

lower.tri(A),upper.tri(A) Matrix of logicals indicating lower/upper triangular matrix

colnames(A), rownames(A) Get or set the column/row names of A

apply()

The apply() function is used for applying functions to the margins of a matrix, array, or dataframes.

apply(X, MARGIN, FUN, ...)

X : A matrix, array or dataframe

MARGIN : Vector of subscripts indicating which margins to apply the function to 1=rows, 2=columns, c(1,2)=rows and columns

FUN : Function to be applied

... : Optional arguments for FUN

You can also use your own function (more on this later)

> (x <- matrix(1:12, nrow=3, ncol=4))

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

> apply(x, 1, sum) # Row totals output [1] 22 26 30

> apply(x, 2, mean) # Column means output [1] 2 5 8 11

Arrays

An array is a multi-dimensional generalization of a vector. To create an array,

array(data = NA, dim = length(data), dimnames = NULL)

data : A vector that gives data to fill the array; if data does not have enough elements to fill the matrix, then the elements are recycled.

dim : Dimension of the array, a vector of length one or more giving the maximum indices in each dimension

dimnames : Name of the dimensions, list with one component for each dimension, either NULL or a character vector of the length given by dim for that dimension.

The list can be named, and the list names will be used as names for the dimensions.

Values are entered by columns. Like with vectors and matrices, when arrays are used in math expressions the operations are performed element by element. Also like vectors and matrices, the elements of an array must all be of the same type (numeric, character, logical, etc.)

Sample 2 x 3 x 2 array,

> w <- array(1:12, dim=c(2,3,2), dimnames=list(c("A","B"), c("X","Y","Z"), c("N","M")))

Reference Elements of an Array

Reference array elements using the [ ] just like with vectors and matrices, but now with more dimensions

> w <- array(1:12, dim=c(2,3,2), dimnames=list(c("A","B"), c("X","Y","Z"), c("N","M")))

> w[2,3,1] # Row 2, Column 3, Matrix 1

[1] 6

> w[,"Y",] # Column named "Y"

> w[1,,] # Row 1

> w[1:2,,"M"] # Rows 1 and 2, Matrix "M"

Useful Array Functions

apply() : Apply a function to the margins of an array

aperm() : Transpose an array by permuting its dimensions

dim(x) : Dimensions of x

dimnames(x) : Get or set the dimension names of x

apply()

We can use the apply() function for more then one dimension

For a 3-dimensional array there are now three margins to apply the function

to: 1=rows, 2=columns, and 3=matrices.

# Column sums > apply(w, 2, sum)

X Y Z

18 26 34

# Row and matrix sums > apply(w, c(1,3), sum)

N M

A 9 27

B 12 30

Go Back to R page

If you find any advertisement showing some indecent material please report its web address (URL) on contact page. It will help us to block such websites to show their ads in future. Sorry for inconvenience.