When you need to analyze data, usually your first step will be the cleaning data to improve the data quality. After cleaning data, data modelling step comes up, You may want to start to represent some of the data in your data set by number rather than string in this step for performance or analyzing method reasons. R has a function named factor() for this operation.
factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x))
Get the distinct values of tshirt column first
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium','Small',
'Select one') #This will give us distinct values of shirSizes factor(shirtSizes)
[1] Large Medium Small Medium Small Large Medium
[8] Small Select one
Levels: Large Medium Select one Small
'Select One' is not valid. Remove it
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium','Small'
,'Select one')
#"Select one" should not be counted
factor(shirtSizes, exclude = c('Select one'))
[1] Large Medium Small Medium Small Large Medium Small
Levels: Large Medium Small
What is the order/priority of tshirt sizes?
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium','Small'
,'Select one')
#Turn the order on
factor(shirtSizes, order =T, exclude=c('Select one'))
[1] Large Medium Small Medium Small Large Medium Small
Levels: Large < Medium < Small
This order doesn't work me, How can I customize it?
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium',
'Small','Select one')
#Customize the order
temp <- factor(shirtSizes, order=T, level=c('Small','Medium','Large'),
exclude=c('Select one'))
temp
[1] Large Medium Small Medium Small Large Medium Small
Levels: Small < Medium < Large
Looks good, now I want to call values S,M,L
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium','Small',
'Select one')
#Customize the order
temp <- factor(shirtSizes, order=T, level=c('Small','Medium','Large'),
exclude=c('Select one'))
#I want to call S,M,L. Order is important!
levels(temp) <-c('S','M','L')
temp
[1] Large Medium Small Medium Small Large Medium Small
Levels: Small < Medium < Large
[1] L M S M S L M S
Levels: S < M < L
Now let's order the sizes by name and by levels
shirtSizes <- c('Large','Medium','Small','Medium','Small','Large','Medium','Small'
,'Select one')
#This will give us distinct values of shirSizes
#Customize the order
temp <- factor(shirtSizes, order=T, level=c('Small','Medium','Large'),
exclude=c('Select one'))
#I want to call S,M,L. Order is important!
levels(temp) <-c('S','M','L')
#Order shirsizes by size
shirtSizes[order(temp)]
temp[order(temp)]
[1] "Small" "Small" "Small" "Medium" "Medium"
[6] "Medium" "Large" "Large" "Select one"
[1] S S S M M M L L
Levels: S < M < L

No comments:
Post a Comment