A data scientist loves numbers but most of the human data is combination of characters. Let me give you a small example,
Price of Camera1= $5000
Price of Camera2= 5,000
Price of Camera3= 5000
For you and me, all the values are same, their average being 5000, but for a computer the first 2 values are strings, while 3rd is integer. The computer cannot calculate the average without type conversions.
This, is where string manipulation comes handy.
Few examples are:
1. String Length -nchar(string_name)
2. Conversion to lower or upper case -tolower(string_name),toupper(string_name)
3. Breaking string at a pivot-strsplit(string_name,split_char)
4. Concatenating strings-paste(string1,string2…stringn)
One can use a more C friendly form of string concatenation- sprintf(“%s%s%d”,”strings)
5. Sub-string a part- substr(string_name, start=(included),stop=(excluded))
6. Convert a string datatype to integer and vice-versa- as.<datatype>(variable)
The above code is available at: http://rpubs.com/Sarah_R/90577
String matching and replacement is another area of importance that will be discussed in subsequent blog.