As part of tutorial series on Data Science with R from Data Perspective, this first tutorial introduces the very basics of R programming language about basic data types in R.

After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.

##

Numeric data represents decimal data.

##

We use as.integer() function to convert into integers. This converts numeric value to integer values.

In the below example ‘y’ is numerical or decimal value whereas x is integer.

##

Complex data types are shown as below, though we use it very less in our day to day data analysis:

Logical data type is one of the frequently used data type usually used for comparing two values. Values a logical data type takes is TRUE or FALSE.

##

String literals or string values are stored as Character objects in R.

So far we learnt about the basic data types in R, let’s get into a bit complex data types.

How do we hold collection of same data types? We come across this requirement very frequently. We have vector data type to solve this problem.

Consider a numerical vector below:

Length of the vector can be found using length() function.

In the below example we can access the members at 1st,2nd,3rd positons.

Matrix data type is used when we want to represent the data as collection of numerical values in mXn, m by n, dimensions. Matrices are used mostly when dealing with mathematical equations, machine learning, text mining algorithms.

Now how do we create a matrix?

What if we want to handle different data types in a single object?

List data type helps us in storing elements of different data types in a single object.

We create list objects using list() function.

In the below example I have created a list object “list_exp” with 6 different elements of character, numeric and logical data types.

In the below example we can see a list of 6 elements of character, numerical and logical data types.

##

Most of us would be from a bit of SQL background and we would be very much comfortable in handling data in the form of SQL table because of the functionalities which a SQL table offers while working the data.

How would it be if we have such data type object available in R which can be used to store the data and manipulate data in very easy, efficient and convenient way?

R offers a data frame data type object. It is another way that information is stored as data frames. We can treat a data frame similar to a SQL table.

How do we create a data frame?

Using head() function.

Also we can use tail() function to see the last six rows of the data frame.

View(dt_frame)

**What we learn:**After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.

**R assignment operator**

x = 'welcome to R programming'# assigning string literal to variable xx [1] "welcome to R programming" typeof(x)#to check the data type of the variable x[1] "character"

##
**Numeric**

Numeric data represents decimal data.x = 1.5To check the data type we use class() function:#assigning decimal value1.5 to xx [1] 1.5

class(x) [1] "numeric"To check if the variable “x” is of numerical or not, we use

is.numeric(x) [1] TRUETo convert any compatible data into numeric, we use:

x = '1'#assigning value 1 to variable xclass(x) [1] "character" x = as.numeric(x) [1] 1 class(x) [1] "numeric" Note: if we try to convert a string literal to numeric data type we get the following result. x= 'welcome to R programming' as.numeric(x) [1] NA Warning message: NAs introduced by coercion

##
**Integer**

We use as.integer() function to convert into integers. This converts numeric value to integer values.x = 1.34 [1] 1.34 class(x) [1] "numeric" y = as.integer(x) class(y) [1] "integer" y [1] 1Note: to check if the value is integer or not we use is.integer() function.

In the below example ‘y’ is numerical or decimal value whereas x is integer.

is.integer(y) [1] TRUE is.integer(x) [1] FALSE

##
**Complex: **

Complex data types are shown as below, though we use it very less in our day to day data analysis:c = 3.5+4i [1] 3.5+4i is.complex(c) [1] TRUE class(c) [1] "complex"

**Logical**

Logical data type is one of the frequently used data type usually used for comparing two values. Values a logical data type takes is TRUE or FALSE. logical = T logical [1] TRUE l = FALSE l [1] FALSE

##
**Character**

String literals or string values are stored as Character objects in R. str = "R Programming" str [1] "R Programming" class(str) [1] "character" is.character(str) [1] TRUEWe can convert other data types to character data type using as.character() function.

x = as.character(1) x [1] "1" class(x) [1] "character"Note: There are a variety of operations that can be applied on characters such as substrings, finding lengths; etc will be dealt as when appropriate.

So far we learnt about the basic data types in R, let’s get into a bit complex data types.

**Vector**

How do we hold collection of same data types? We come across this requirement very frequently. We have vector data type to solve this problem.Consider a numerical vector below:

num_vec = c(1,2,3,4,5) num_vec [1] 1 2 3 4 5 class(num_vec) [1] "numeric"We can apply many operations on the vector variables such as length, accessing values or members of the vector variable.

Length of the vector can be found using length() function.

length(num_vec) [1] 5We access each element or member of the vector num_vec using its indexes starting from

In the below example we can access the members at 1st,2nd,3rd positons.

num_vec[1] [1] 1 num_vec[2] [1] 2 num_vec[3] [1] 3Similarly string vectors, logical vectors, integer vectors can be created.

char_vec = c("A", "Course","On","Data science","R rprogramming") char_vec [1] "A" "Course" "On" "Data science" "R rprogramming" length(char_vec) [1] 5 char_vec[1] [1] "A" char_vec[2] [1] "Course" char_vec[4] [1] "Data science"

**Matrix**

Matrix data type is used when we want to represent the data as collection of numerical values in mXn, m by n, dimensions. Matrices are used mostly when dealing with mathematical equations, machine learning, text mining algorithms.Now how do we create a matrix?

m = matrix(c(1,2,3,6,7,8),nrow = 2,ncol = 3) m [,1] [,2] [,3] [1,] 1 3 7 [2,] 2C 6 8 class(m) [1] "matrix"Knowing the dimension of the matrix is:

dim(m) [1] 2 3How do we access elements of matrix m:

What happens when we add different data types to a vector?#accessing individual elements are done using the indexes shown as below. In the below example we are accessing 1st, 2nd, 6th element of matrix m.m[1] [1] 1 m[2] [1] 2 m[6] [1] 8 m[2,3]# here we accessing 2nd row 3rd column element.[1] 8# accessing all elements of rows of the matrix m shown below.m[1,] [1] 1 3 7 m[2,] [1] 2 6 8#accessing all elements of each columnm[,1] [1] 1 2 m[,2] [1] 3 6 m[,3] [1] 7 8

v = c("a","b",1,2,3,T) v [1] "a" "b" "1" "2" "3" "TRUE" class(v) [1] "character" v[6] [1] "TRUE" class(v[6]) [1] "character"What happened in the above example is, R coerced all different data types into a single data type of character type to maintain the condition of single data type.

**List**

What if we want to handle different data types in a single object?List data type helps us in storing elements of different data types in a single object.

We create list objects using list() function.

In the below example I have created a list object “list_exp” with 6 different elements of character, numeric and logical data types.

list_exp = list("r programming","data perspective",12345,67890,TRUE,F) list_exp [[1]] [1] "r programming" [[2]] [1] "data perspective" [[3]] [1] 12345 [[4]] [1] 67890 [[5]] [1] TRUE [[6]] [1] FALSEUsing str() function, we can know the structure of the list object, i.e. the internal structure of the list objects can be known. This is one of the very important functions which we use in our day to day analysis.

In the below example we can see a list of 6 elements of character, numerical and logical data types.

str(list_exp) List of 6 $ : chr "r programming" $ : chr "data perspective" $ : num 12345 $ : num 67890 $ : logi TRUE $ : logi FALSE#accessing the data type of list_expclass(list_exp) [1] "list" length(list_exp) [1] 6 list_exp[1] [[1]] [1] "r programming"#accessing the list elements using indexing.list_exp[[1]] [1] "r programming" list_exp[[6]] [1] FALSE list_exp[[7]]# when we try accessing not existing elements we get the below error.Error in list_exp[[7]] : subscript out of bounds# finding the class of individual list elementclass(list_exp[[6]]) [1] "logical" class(list_exp[[3]]) [1] "numeric" class(list_exp[[1]]) [1] "character"

##
**Data Frame:**

Most of us would be from a bit of SQL background and we would be very much comfortable in handling data in the form of SQL table because of the functionalities which a SQL table offers while working the data.How would it be if we have such data type object available in R which can be used to store the data and manipulate data in very easy, efficient and convenient way?

R offers a data frame data type object. It is another way that information is stored as data frames. We can treat a data frame similar to a SQL table.

How do we create a data frame?

Note: Observe the below data frame:#creating a data framedata_frame = data.frame(first=c(1,2,3,4),second=c("a","b","c","d")) data_frame first second 1 a 2 b 3 c 4 d#accessing the data type of the objectclass(data_frame) [1] "data.frame"#finding out the row count of data_frame using nrow()nrow(data_frame) [1] 4#finding out the column count of data_frame using ncol()ncol(data_frame) [1] 2#finding out the dimensions of data_frame using dim()dim(data_frame) [1] 4 2#finding the structure of the data frame using str()str(data_frame) 'data.frame': 4 obs. of 2 variables: $ first : num 1 2 3 4 $ second: Factor w/ 4 levels "a","b","c","d": 1 2 3 4#accessing the entire row of data frame using row index number. Observe below that if we use data_frame[1,] without specifying the column number it means that we want to access all the columns of row 1.data_frame[1,] first second 1 1 a#similarly to access only 1st column values without row information use data_frame[,1]data_frame[,1] [1] 1 2 3 4#accessing the row names of the data frame.rownames(data_frame) [1] "1" "2" "3" "4"#accessing the column names of the data framecolnames(data_frame) [1] "first" "second"#column data can accessed using the column names explicitly instead of column indexesdata_frame$first [1] 1 2 3 4 data_frame$second [1] a b c d Levels: a b c d#accessing individual values using row and column indexesdata_frame[1,1]# accessing first row first column[1] 1 data_frame[2,2]# accessing second row second column[1] b Levels: a b c d data_frame[3,2]# accessing third row second column[1] c Levels: a b c d data_frame[3,1]# accessing third row first column[1] 3

dt_frame = data.frame(first=c(1,2,3,4,5,6,7),second=c("Big data","Python","R","NLP","machine learning","data science","data perspective")) dt_frame first second 1 Big data 2 Python 3 R 4 NLP 5 machine learning 6 data science 7 data perspectiveAssume we have a dataset with 1000 rows instead of 6 rows shown in above data frame. If we want to see a sample of data of the data frame, how do we do?

Using head() function.

head(dt_frame) first second 1 Big data 2 Python 3 R 4 NLP 5 machine learning 6 data sciencehead() function returns us the first six rows of any data frame so that we can have a look of what the data frame is like.

Also we can use tail() function to see the last six rows of the data frame.

tail(dt_frame) first second 2 Python 3 R 4 NLP 5 machine learning 6 data science 7 data perspectiveWe have View() function to see the values of a data frame in a tabular form.

View(dt_frame)

## 0 comments:

## Post a Comment