Thursday, August 3, 2017

Learn R : Statistical Programming Language Part 2



   I am going to cover the complex arrays and factor features of R in this post. Check out my first post which contains the basics of R. I have covered the basic arrays in my first post, we are going to focus on complex structures that can hold many rows and columns. First option we have in R, is Matrices. A Matrix is a 2 dimensional array and it is our starting point to cover structured data in R.

By running the following statement, We are creating 3 rows and 4 columns. I am going to fill each cell with value 9

matrix (9, 2, 4)

      Rather than giving value 9 to each cell, we can pass an array as value. Since we have total 12 cells in this matrix. I am going to pass an array of numbers from 1 to 12.

matrix (1:12,2,4)
you can create an array and pass it too.
a <- 1:12
matrix(a,2,4)

We can convert an array to matrix by using dim function.
a < - 1:8
print(a)
dim(a) < - c(2,4)
print(a)

      As you can see we have created a vector then converted it to a matrix. To access the individual cells in a matrix, we need to give a coordination of the cell we need. Here is an example.

m < - matrix(1:10,2,5)
m
m[2,2]

We can select the all values in a row or in a column.

m < - matrix(1:10,2,5)
m
m[2,]
m[,1]

We can select multiple rows or multiple columns too. In the following example I am selecting the first and the second columns.

m < - matrix(1:15,3,5)
m
m[,1:2]

      I haven't talked about R's visualization capabilities yet. I think matrix is a great data source to demo R's visualization capabilities. In the following example, I am creating 10 rows and 10 columns with value 0.Then I update one of them to 1. Let's see how is my data gonna look like when I try to use one of the visualization functions.

mylevels <- matrix(0,10,10)
mylevels
mylevels[5,5] <- 1
mylevels
persp(mylevels, expand = 0.5)

    When we are working with data, usually we need to categorize the data. I am going to introduce Factors to be able to group the data in the next section. Let's say we have bunch of states and we need to have a distinct list of states. We can use the factor function to do this.

states < - c('OH','AL','MI','NY', 'CA','NY','MI')
distinctlist <- factor(states)
print(distinctlist)

      Each state is a string but if you notice Levels are not strings. That's because they are integer references to states array. To see their integer value we can use as.integer() function.

states <- c('OH','AL','MI','NY', 'CA','NY','MI')
distinctlist <- factor(states)
print(distinctlist)
as.integer(distinctlist)


We can use the levels() function to get the levels in string format.

states <- c('OH','AL','MI','NY', 'CA','NY','MI')
distinctlist <- factor(states)
print(distinctlist)
levels(distinctlist)


     I introduced you one of the visualization functions of R in this post, There are a lot to cover, I am going to write about data frames of R in my next post.

1 comment: