Foundations R Software Book
Foundations R Software Book
Foundations R Software Book
Prof. Shalabh
Mathematics
IIT Kanpur
INDEX
1 Lecture 01 14
2 Lecture 02 36
3 lecture 03 75
4 Lecture 04 100
Week 2
5 Lecture 5 111
6 Lecture 6 148
7 Lecture 7 173
8 Lecture 8 200
Week 3
9 Lecture 9 216
10 Lecture 10 237
11 Lecture 11 257
12 Lecture 12 277
Week 4
13 Lecture 13 294
14 Lecture 14 311
15 Lecture 15 329
16 Lecture 16 348
1
17 Lecture 17 371
Week 5
18 Lecture 18 389
19 Lecture 19 406
20 Lecture 20 425
21 Lecture 21 442
22 Lecture 22 461
23 Lecture 23 481
24 Lecture 24 501
25 Lecture 25 516
Week 7
26 Lecture 26 533
27 Lecture 27 555
28 lecture 28 573
29 lecture 29 589
30 Lecture 30 606
Week 8
31 Lecture 31 624
32 Lecture 32 649
33 Lecture 33 664
34 Lecture 34 676
35 Lecture 35 699
Week 9
2
36 Lecture 36 718
37 Lecture 37 739
38 Lecture 38 762
39 Lecture 39 778
Week 10
40 Lecture 40 793
41 Lecture 41 816
42 Lecture 42 833
43 Lecture 43 848
44 Lecture 44 877
Week 11
45 Lecture 45 891
46 lecture 46 915
47 lecture 47 934
48 lecture 48 949
49 lecture 49 970
Week 12
50 lecture 50 995
51 lecture 51 1013
52 Lecture 52 1027
53 Lecture 53 1062
3
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 00
How to Learn and Follow the Course
Hello friends. Welcome to the course Foundations of R Software and in this lecture, we
are going to talk about a very simple issue that how to do this course? And the reason for
this issue is the following:
This is a course which is for the software. So, when you are sitting in my class and we
are face-to-face, then in case if I ask you to do something and if you try to execute it on
your computer, and in case if you are facing any problem then possibly I can always
come to you and I can explain you, ok, here at this line you have made this mistake and
you can correct it.
But now the problem is this we are not face-to-face sitting, we are not sitting face-to-
face. But I will try my best that this should not be a problem in understanding and
learning of this R software. So, for that we have to follow certain rules, and by following
these rules we can avoid this problem and we can learn the R software in the best
possible way. And that is what I am going to explain in this lecture.
4
So, you see I have got here a pen, and I have prepared the slides for this course. So, with
this pen, I can write on my computer screen. So, when I am trying to write something on
the computer screen, then there are going to be three types of different things which you
have to understand.
First is the instructions and commands, which have to be followed inside the R software.
Second, when I am trying to explain you something and these are only expression, they
are not to be executed in the R software. And third part is that whatever be the outcome
when we are trying to execute something inside the R software that has to be copied
here, and that you have to understand that what is the difference between the instruction
that you have to give to the R and what R is trying to give you back.
So, in order to understand these things, I have created some slides and then we have
certain rules. So, let us try to understand these rules. And when I am trying to explain
you I will also be using the R software and I will try to demonstrate that how to execute
those things. So, how both the things are going to be combined together, this is what we
are trying to learn in this lecture. So, let us begin the lecture.
So, you can see here this will be my here course title page. And here you can see here I
have got a button from where I can change the colour of my pen. So, you can see here
the first line is the title of the course. And then second line will be the number of the
lecture for example, this lecture 0.
And after that the title of the lecture that will indicate you what are we really going to do
in this particular course. So, the title of the lecture is how to learn and follow the course.
And after this these are the details about my name and my affiliation, ok.
5
(Refer Slide Time: 03:40)
Now, after this you will see that there will be something which I am trying to type in
Blue Colour Courier New Font. So, this will look exactly like the same way as I have
typed here.
So, when I am trying to write something in blue colour courier new font, this means this
is a command which has to be executed in the R software and when we are trying to see
something which is written here in black colour Calibri font, it is the same black colour
Calibri font which I have circled here. So, this means it is the usual expression. And this
is just to explain you and need not to be executed inside the R software.
For example, if I make here a statement. So, I will write it like this statement is the
assignment operators are the left arrow with dash and equal sign. So, now, in this
statement you can see this is written in the black colour Calibri font and this these two
things they are written here in the blue colour courier new font. So, this means these are
the simple expressions, which are for your understanding and these are the R commands.
So, what are they going to indicate? That we are going to learn in the forthcoming
lectures. But here I can write here this type of symbol here like this, less than and dash,
this means we are going to assign a value to a variable. And another alternative for the
same operator is the equality sign here like this, right.
6
So, now in case if you try to see here I am writing here like this x is less than dash 20
assigns the value 20 to x. So, this means here now I am changing the colour of my pen.
So, this x less than hash 20, if you try to write statement in the R software, this means
that this is assigning the value 20 to x, x is my here R command, means something which
you have to type inside the R software.
And the same thing you can also write as x equal to 20 assigns the value 20 to x, this
means you have to type x equal to 20 in the R software, and then this is the R command.
So, you can now easily discriminate between the two types of sentences, which are to be
executed inside the R software and they are written in courier new font in blue colour
and those statements which are written here in say here black colour Calibri font they are
for you understanding.
So, now, just for the sake of illustration, when we are trying to execute something inside
the R software, then how to discriminate that if whether this is an outcome from the R
software or it is written here like this, right.
So, if you try to see here this symbol, this is the symbol like greater than sign. So, this is
the prompt, that whenever you are trying to type something in the R software, you will
always have this prompt and then you have to type x equal to 20 after this prompt. So,
this means whenever you will see this type of slides in which I am writing the first line
7
as greater than sign, this will indicate that this is an outcome from the R software and I
have copied and pasted it from the R software.
So, x equal to 20 means if when you try to type x equal to 20 in the R software and in the
next line if you try to put x and then try to press here enter, enter means this is the key on
your keyboard enter key, and if you try to enter, press enter, then you will get here the
value 20. So, this is indicating that this is the part, which has to be executed when you
are trying to in do something inside the R software.
And similarly, after this if you want to continue further and suppose I want to change the
colour of my pen. So, you can see here further I try to write here y is equal to x star 2
assigns the value 2 star x to y. This means when I am trying to write y equal to x star 2
that is x multiplied by 2, right, it is something like x multiplied by 2, right. And suppose
if I make any mistak, suppose if I write here 3 and if I want to correct it I can use this pen
and I can write down here once again 2.
8
(Refer Slide Time: 08:07)
So, if you are trying to observe to these types of actions while I am a trying to explain
you, you can understand that what I am trying to do, right.
So, this means y equal to x is star 2 and this has to be typed on the R software and the
interpretation of this thing is that 2 into x is the value which is assigned to a variable y.
And after that if I try to type here z is equal to x plus y, assigns the value x plus y to z;
this means all these things z is equal to x plus y, x plus y and z which are typed in the
blue colour courier new font, they are related to the R software.
And whatever is typed here in black Calibri font, like assigns the value and 2; they are
your simple expressions which are used to explain you what I want to say. And then the
outcome over the R console, now in case if you try to do this thing. So, now, this symbol
will indicate that, ok whatever is mentioned here this is copied and pasted from the R
software, right.
So, if you try to type y equal to x into 2 in the R software and then you try to type here y
and then enter you will get here the value 40. And after that once again in the next line if
you try to type here z is equal to x plus y and then if you try to put here z and then enter
then you will get here the value here 60. So, this is going to indicate this is the output
from the R software, right. So, that is our understanding.
9
(Refer Slide Time: 09:45)
Now, when I am trying to do it on the R software, how it will look like. So, I have
executed these commands on the R software that is called R console actually and
whatever is the screenshot of the relevant part, this I have copied and pasted here. So,
now, what is the advantage? The advantage is that when you are trying to look the same
outcome in the screenshot that will give you confidence that whenever you are trying to
execute these commands, you should also get the same output.
And it is possible that if you are making a mistake in typing or giving a wrong spelling
of the command, the same outcome will not come and then if you try to compare your
input with this output which is mentioned here as a screenshot and if you try to compare
it, possibly that will help you in finding out that where you are trying to make the
mistake. Why? When I have executed the same commands in the R software and I have
got the same outcome, why can’t you get the same outcome? That is the basic
fundamental.
And with this approach, I personally feel that we can remove the problem of not being
sitting in the same class and working face-to-face. So, this is equivalent to that in case if
you are trying to do something and if you are making a mistake, and if you are asking
me, then possibly I can come to you, I can look into your computer and can explain you,
ok at here, this point you are making this typographical mistake.
10
So, that is the same thing which you will get from this screenshot also. So, these are the
very simple basic fundamental rules by which we are going to work. Now, after this what
I will do? I will try to execute the same thing in the R software also. So, now, means I
can show you here that how are we going to work in the R software also.
So, you can now see here, so if you try to see here this is the command here.
11
For example, if I want to write down here x equal to 20 whatever is written here, then, I
can come to here R software and here I can write down here before you, x equal to 20.
So, you can see here now this is here like this and x is here like this. I am typing here x
and then I am simply pressing here enter. And after that, I try to come to the next slide
where I have written y is equal to x star 2 and z is equal to x plus y and you can see here
that these are the thing where we have this here greater than sign.
So, I can write down here the same thing y is equal to x star 2 and you can see here now
the value of y, which is obtained by typing y and then pressing the enter key, it is coming
here like this. And after that, if I try to define here z is equal to x plus y. So, I am typing
this command before you, and if I try to enter it here and then press z and enter then I am
getting here the value 60.
So, now, if you try to see this is the outcome when I am trying to execute the commands
on the R software. Well, how it to be done, that we are going to learn in the forthcoming
lectures. But here if you try to see this outcome, this is the screenshot which is here. So,
now in case if you try to look at this screenshot, possibly it will give you a feeling as if
you are working in the R console and that will give you a confidence that whatever is
written that whenever you are trying to execute it yourself that should also give you the
same outcome, right.
12
So, with these simple instruction, with these simple rules let us try to hold the hand of
each other and we try to move forward. And these instructions are going to work like as
our communication language. And this will surely reduce the gap that I am sitting here
and you are somewhere else. So, with this simple instruction, I will request you that you
try to revise them. And from the next lecture, I will formally start with the course on the
R software. So, you try to revise it and I will see you in the next lecture till then, good
bye.
10
13
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 01
Why R and Installation Procedure
Hello friends, welcome to the course Foundations of R Software and from this lecture we
are going to formally start the lecture for the learning of R software. But, before going
forward let me try to clarify one thing that in 2017, I had floated a similar course
Introduction to R Software and now over a period of time many things have changed. So,
this course is practically an updation and revision of the earlier course introduction to R
software.
So, you may find many things, many topics which are common and as I said in the
earlier lecture that this is a fundamental course, this is an introductory course. So, when
you are trying to teach the introductory part you are going to teach the basic fundamental
things. So, they are going to remain more or less the same, but definitely I have tried my
best to add up the new things which have been developed and which are important and
necessary for the understanding of the basic fundamentals of the R software.
So, with this introduction and with this information I would like to now start this lecture.
So, whenever you are going to learn any topic, the first question which comes to our
mind is why, why should I learn this topic? And, then because of curiosity you would
like to know the background that how it happened, why it happened and many questions
crop up.
For example, this is the free software, who developed it, why developed it, why it is free
and what are the advantages and disadvantages compared to any other software and is it
going to be a good software or we are going to lack something etc. etc.
So, in this lecture I am just going to address all these basic queries which comes to a
human mind whenever we are trying to learn something new. And, then the next
question comes from where I am going to get it and then how I am going to begin the
learning of this software. So, that is my another thing. So, this lecture is going to be very
elementary, a story telling type lecture and then I will try to show you how you can
1
14
download the software, from where you can download the software, how are you going
to install it, so that when we come to the next lecture, we are ready to learn the
fundamentals and foundations of R software. So, let us begin our lecture. So, in this
lecture we are going to talk about that why should I learn R and how are we going to
install the software on my computer for the learning of the fundamentals and
foundations.
So, I am sure that you must have learned about the software. Now, this is the time where
the students are trying to learn about the software, computer programming etc. right from
the childhood. They are taught about this computers, programming language etc. in their
schools starting from elementary classes.
So, this is not a question that what is R software, as you can understand very easily from
its terminology that R is simply a software, just like you have used many other software.
So, this is also a software, but the main question is what it is doing and for what it is
going to be used?
So, in that case I can explain you in very simple language that is R is an environment for
data manipulation, statistical computing, graphic display, data analysis and various type
of computations. When we are trying to talk about the software, there are some software
which are used for typing, there are some software which are used for mathematical
15
typing, chemical typing, chemical symbols are typed in a different way. There are some
software which are trying to do some statistical analysis, there are some software which
are trying to do some mathematical analysis etc.
So, similarly R also has got a role and in this R software effective data handling and the
storage of output is possible. And, when we are talking about the calculations, the simple
as well as the complicated calculations both are possible without any problem.
And, whenever we are talking of any statistical software or any mathematical software
usually there are couple of things in which we are interested. The first thing is the
computation that what type of computation, what different varieties of computation the
software can perform.
What about the graphics? Graphics means can the software create different types of
graphical display and when we are trying to create such graphics whether the software
can also give the soft copy or the hard copy, is it also possible to save them. And, after
that what about the programming? Programming means there are two types of software:
one software which are just based on some click, click, click type of buttons that you go
to those buttons, you click there and the execution is done.
And, second type of software are where you have to go to some prompt and you have to
execute the command, you have to type it and then the things are done. So, what happens
3
16
in R? So, now, I can assure you that in R all the things are possible, you can execute
different types of computations, you can execute different types of graphics and you can
also do the programming. And, when we are working in the R the graphical display
which is on your screen as well as the hardcopy, both are possible.
And, as I said that in R software the programming is also possible and the programming
language is quite effective and this includes all sorts of possibilities which are possible in
any other good programming language. And, now a very interesting story that why this R
came into existence, what really happened, what motivated the people to develop this R
software. So, I am the witness of that era and I have seen the birth of this R software and
the reasons why possibly R software was developed.
Well, when I was doing my undergraduate and post graduation at that was the time when
the computers started coming to India. Yes, that looks, that appears to be a very old story
for all the young candidates, but that was the truth. So, when these software started
coming to India and computers coming came to India then at that time there was only
prompt means command prompt, DOS what you call, there was no Windows.
So, even if you want to copy a file mean the way you do it today that you take your
mouse and you say right click and say copy or paste that was not possible. But, we used
to use the command like control C to copy the command or control V to copy or paste
something or copying from file from one directory to another directory, there used to be
command and we were taught that how to write the path etc.
Then came the Windows platform and in Windows, it was possible that instead of
writing the typing the commands using the mouse we can simply go to that block that is
your window and there I can click and can execute the basic commands, basic
functionalities. So, what really happened, for example, if somebody wants to find out the
arithmetic mean of some numbers. So, in the in those software which are based on the
DOS, someone has to write a program, those who have done the basic programming they
might remember that we used to write sum is equal to 0, num is equal to 0, sum is equal
to sum plus num etc. But, when the windows started then people wrote those commands
and it was possible that I enter the data and then I press on the button mean and or say
sum then automatically I will get an output.
17
But, the disadvantage was that suppose if I want to find out the square of the sum or
square of the mean then in Window based software where there was no provision of
programming, then it was not possible whereas, in case if you go back to the DOS based
software then it was possible. And, then there ware some software which started coming
to the market, they were usually the paid software for which you have to pay and they
were quite expensive.
And, during that time among various software there was one software which became
very popular that was S Plus, capital S and then P l u s plus that was the name of the
software. The reason or say I would say one of the reasons why it became very popular
was that it had a sort of hybrid mode means somebody has written a program for finding
the mean. So, in case if somebody has got a set of numerical values and if the arithmetic
mean has to be found, someone has to all simply write mean m e a n and then within the
parenthesis write down the values and it will give you the value of the mean.
And, for that you don’t have to write down the entire program like sum is equal to 0,
num is equal to 0 etc. And, then there was another possibility that one can also write own
program that if somebody wants to write a program for the mean using sum is equal to 0
or num is equal to 0, sum is equal to sum plus num, num is equal to num plus 1 etc., it
was also possible.
So, now here was the advantage that in case if I want to simply compute the square of the
mean value, I can simply find out the mean and whatever is the value I can simply square
it and I do not have to write the entire program and this gained popularity. Many people
started working in this software S Plus and S Plus has a different type of style, the
assignment operator and some other things were different. The way we used to write
down the program that was different in the S Plus software.
So, people started learning this S Plus software and it was very popular, but it was very
expensive, very expensive means at least for me I can say it was not possible to pay from
my pocket from my salary to buy those things.
So, then there were people like me around the world and when they realized that this is a
very important software that is a very good software then couple of people gathered from
18
different academic communities from all over the world and they thought that ok why
not to developed a similar software like S Plus.
And, that was the idea then people started working together and they developed this R
software and gradually it was available for the common people to use. But, there is one
issue, this was a free software and they decided that they will not charge any money for
this. So, now you know we all are human being whenever is given to us free, our human
minds ask us that we have to doubt on it.
It is just like if your friend comes to you and if that friend ask you ok let me offer you a
very good coffee today, expensive coffee today. This is very natural that instead of
enjoying that your friend is offering you a wonderful or a good cup of coffee, you start
thinking first ok what is happening, why he or she is trying to give you a cup of coffee,
what is the reason, what is the intention behind it and the same thing happened with the
R software also.
So, initially I saw means I am the witness that when the software was available for using,
initially including me, we were hesitant in believing that whether this software is giving
us the right calculation right outcome or not. But, gradually over a period of time, R
established its authority and people started believing on it and then people also realized
the importance of this software and what type of freedom do they have with the software.
And, that is how this software became very popular and then later on very established
publishers, they also started publishing or the books on R software and even they started
a complete series on the R software like statistics with R, computation with R etc. and
this is how this R software became very popular. So, when this S Plus software was
introduced then the programming language of that S Plus software was called as S
language, capital S language.
So, there is a belief that ok that because of that thing people call the language of the R
software which is used for programming as R language. Well, there is another reason
that I will try to explain you, but this is the my experience of growing with R. So, let us
try to come back to our slides and try to understand the further things. So, this R
language is very similar in appearance to S language and this S language was based on
the software which was called as S Plus, S P l u s right
6
19
(Refer Slide Time: 14:45)
So, now this will give you an idea that how are you going to do it and those people who
started working together for the development of the R software, they grouped together
and then in order to use this software and to provide a platform the R foundation was
created. So, this R foundation is a non-profit organization which is working in the public
interest because it is providing you the software for free.
And, this foundation has been founded by the members of the R Development Core
Team. What is this R development core team? This is the group of people who are
involved in the development of the R software. You see R software is a huge software,
that is the very big software and it has different types of possibilities that someone can
develop own package and can contribute it and then there are different types of things
which have to be done in the software.
So, this group of people from all over the world from this academic community they
work together and the group of those members, this is called as R development core
team. So, this R development core team provides the support for the R project. R project
means well you are trying to develop a big software, huge software. So, that is why this
is called as the project and they also try to provide different types of innovation in the
statistical computing, mathematical computing.
20
And, it is not even that now in the last decade this R has developed into different
dimensions right and this R foundation holds and administrated the copyright of the R
software and its documentation. Well, someone has to take the responsibility that
whatever is there is correct or something has been scrutinized right.
So, just for your information, the one question comes that who started this R software?
So, there were two academician at the University of Auckland in New Zealand, their
names are Professor Ross Ihaka and Professor Robert Gentleman. Well, I have given
here their photograph also, right. So, yeah I got this photograph and I duly acknowledge
the help from this website snipcademy dot com, where I got a wonderful photograph of
these two academicians and thanks to them for bringing their joint photograph for us.
So, these two professors they came together and they began an alternative
implementation of the basic S language which was completely independent of the S plus
software and all this begin in 1991. So, as I said 1991 was the time when I was doing my
MSc and I was in the 1st year of my MSc program. So, that is what I said that I have
witnessed the development of this software.
So, people do say that the name of this software was put as R because, R is the first letter
of the name of these two professors. You can see here this is this software was partly
named after the first names of the first two R authors and yeah partly it is also because
21
earlier we had S Plus software and its language was called as S language. So, this was
called as R.
So, these people started working on this project and the first official release of this R
software came in 1995. And, then the Comprehensive R Archive Network which is
briefly called as CRAN, C R A N that was officially announced on 23rd April 1997 with
3 mirrors and 12 contributed packages. What does this mean?
You see when this R software was developed and then it was to be distributed for the
people. So, that they can download and they can use it so; obviously, if this software is
uploaded on one server and if many many people are suddenly downloading from the
same server then the load of the server will become quite high and there is a possibility
of crashing the server.
So, this load was divided and different academic institutions all over the world they
agreed that they can also host this software on their website. So, these different websites
from different academic institution who in some way provide a copy of this software for
downloading they are called as mirrors. So, in 1997, three people possibly agreed to
mirror this software.
And, then in R software there is a possibility to contribute the packages. What does this
mean? We will try to discuss in the forthcoming lecture, but here I can briefly tell that
9
22
there is a possibility in the R software that if I want to develop some program for doing
some specific job then this program can be uploaded on the website of the R software
and people can download it and can use it.
So, these are called in layman’s language as contributed packages. So, when this R
started in 1997 there were 12 contributed packages only. Now, you can guess how many
packages are there, I will try to tell you later on and then people started using it. So, we
know that whenever the software is introduced this is in a sort of experimental mode. So,
this is called as beta version that is the terminology that we use.
So, people started using it, there were some issues, people tried to correct it and then
several iterations possibly worked and finally, the first official stable beta version which
was version 1.0 was released on 29th of February 2000, right. So, you can see that it took
almost a decade to get a stable version of this free software and now if you try to see the
based on the data what I have, this R software begin with 12 contributed packages which
has now in November 2020 there are more than 16,000 packages which are available.
So, you can see the growth in the last two decades which happened when in the R
software; 16,000 packages means you can do 16,000 different things from the same
software, right.
10
23
And, then many people, research, offices, design offices, analytical firms, they got
motivated, they got confident that ok R is providing us the good result and they started
using this software and they started switching to the R software. So, now the other
question comes for you that why should you switch to the R software, means I am sure
that everybody is using some software, but then why should you come to a new software,
why should you learn this new software, what are the advantages?.
So, let us try to understand this thing. So, the first biggest advantage is that R is a free
software. We always listen from different people that ok, that whenever there is a new
version of the software they have to buy a new software and then they have to pay once
again etc. Many students cannot afford to buy those software, many institutions also
cannot afford to buy those software or they cannot afford to pay the recurring cost etc.
Well, I am not talking about those people who are rich, but I am talking about person like
me who cannot afford to buy a software from my own salary, right. So, this is the big
advantage for a person like us that we are getting the software for free which can do the
same thing what others software’s are doing and it is giving me the liberty of doing many
more things.
And, whenever there is a new version of the software I can get it for free. So, and beside
those thing many statistical packages are freely available through this comprehensive R
archive network that is the CRAN family. So, they are uploaded on the internet site and
they cover a variety and a wide range of the tools which are used in the modern statistics.
Whenever there is a new development in and if you want to use it, you can simply
download that package and can use this package for your given set of data. So, you may
also consider switching to R, right.
11
24
(Refer Slide Time: 22:50)
Now, when you take a call that ok you want to switch to R you are not confident, you are
hesitant because you do not know what is there inside the R. So, I can assure you here
that R has a statistical computing environment. This is a free open source software and
therefore, it is not a black box means you can see what is there, how the computation are
being done, what type of programming has been done; if you are using any algorithm,
you can see that what the algorithm is really doing.
And, it has got a computer programming language which is convenient to use for
statistical and graphical applications, without any problem you can do the programming
for different types of computation, simulation, calculations and you can also produce
wonderful graphics also. And, in case, if you try to see in any standard programming
language there is an option that you can write the program, you can save the program and
you can store the program and whenever you need it, you can call them and can use them
again.
So, all those things are possible in the R software, all the commands can be saved, run
and stored in the script files; script file means simple language, it is programming file.
12
25
(Refer Slide Time: 23:56)
Then there comes another question that people are using different types of operating
system like as Windows, Unix, Linux, Macintosh etc. So, this R is available for all such
platforms. There may be some small changes in the instruction that how do you define
the path in Windows, Unix, Linux etc. But, those people who are working in these
environments, they are very much familiar with these things, right.
So, this is not an issue, whatever you want that is available, whatever platform you want
to use for this R software, the R software is available for that and R was developed as I
said to compete with the S Plus software. So, that I already have explained you and
whenever you are trying to do for example, any statistical analysis. So, there are some
types of operations which are common which most of the people would like to use and
there are some type of operations which everybody may not like to use.
And, then there are some other type of operation because people are doing research; so,
they are trying to develop different types of statistical tool. So, if they try to develop any
new statistical tool people would like to use it. So, how to use it? So, that researcher can
write the program and can contribute to R software. So, that is why we have here two
types of packages; packages means in simple words you can assume, you can understand
the packages mean in order to do something that is the specific program.
13
26
So, there are built in packages which comes with the R software and some other
programs which are called as contributed packages both are available. And, in case if
you want to develop something, if you want to do something you can also write a
program, you can also write a package and the tools for development of such packages
are also available. So, that you can also contribute your own package to the R software.
Now, in case if you try to think of any good programming language that has certain types
of instructions like a branching, looping, logical control, modular programming etcetera;
[FL] just like any good programming language, this language also provides the logical
control of branching, looping, modular programming etcetera using the functions. What
is function that we will try to understand and that was one of the beauty of this R
language that they have defined the concept of functions that is going to be extremely
useful for us.
And, whenever we are trying to write a program we always get some error messages. So,
in the R software also whenever we are trying to do the programming we also get some
error messages and the language of such messages is quite convenient. So, that helps us
in finding out that where we have made the mistake in the program, right.
14
27
(Refer Slide Time: 26:48)
Now, when you are trying to work on any programming language, those who are
working actually they know that whenever we try to type the program then after that in
order to execute it, we have to use the platform and then there are two types of
approaches.
One is interpreter and another is compiler. Well, those who have good knowledge of
programming they might be understanding, but those who don’t have for them I can give
you a brief idea. So, whenever we are trying to write down a program something like a
line number 1, line number 2, line number 3 and so on. So, when we try to execute the
program then the control will come here and if there is some error into the line number 1
then there are two options; either the program will stop here or the program will come to
line number 2. So, in case if the program stops here; that means, first the programmer has
to look into the program and the mistake in the line number 1 has to be removed. And,
then the program is re executed and in case if the mistake in the line number 1 is
removed, then the control will come to line number 2 and then in case if there is no
mistake then it will come to line number 3.
And, if there is any mistake then at line number 3 that will stop again and the person has
to look inside the program and the mistake in the line number 3 has to be removed and
then the program will move further. So, this is what happens in the interpreter and
another approach is that the control comes to line number 1, then it find if everything is
fine, no issue it will come to line number 2 and suppose it finds some mistake in the line
15
28
number 2, it will record somewhere. Then it will come to line number 3, suppose there is
another mistake in line number 3, it will record somewhere here and it will come to line
number 4 etc. etc. and it will come up to n. And, then it will show you the list of
mistakes which are inside the programming language. Now, one can go to the program
and can rectify these mistakes in the entire program because all the mistakes in the
programs are known to us.
So, this is the way the compiler works. So, R is an actually interpreted computer
language, it is an interpreter. So, whenever you are trying to execute any program, if
there is any mistake in the line number 1 the program will not go to line number 2 unless
and until you clear the mistake or you rectify the mistake in the line number 1, right and
whenever we use the command line interface.
What is command line interface, that I will try to show you that this is the place where
we try to type the command, then each command or expression whatever is could be
evaluated is typed at that command prompt. And, then it is immediately evaluated after
this using the enter key on the keyboard, right and this completes the entire statement.
So, what does this mean? It is as simple as that suppose if I try to write here x equal to 2
and then y equal to 3 and if I try to write down here x plus y and as soon as I try to press
here enter yeah, I am writing it on the R software. So, and then you will get here a value
right.
16
29
So, I will try to show you on the R software and yeah just like any other software for
example, you can up arrows, down arrows etc. keys to recall the commands and edit
them. And, you can use the escape key to cancel a command or in case of the program is
running you can immediately cancel it by pressing the escape key.
So, that is just like what is possible in any other good language and whatever are you
graphics they can be directly saved into the Postscript file, PDF file, JPEG format etc.
So, whatever the ways which are available in any standard software they are available in
the R software also.
So, now this is the brief background to convince you that you should not feel that if you
are trying to use R software which is available for free, you are not compromising on
anything and you are going to get the same quality of outcome which will be available
from any other software. So now, once you are convinced then the next question comes
here- how are you going to install this software, from where you are going to obtain the
software right?
So, there is a website www r hyphen project p r o j e c t dot o r g. So, you can download
the software for any platform Windows, Macintosh, Unix, Linux etc. from this website.
So, I have given you here a screenshot of this website from where you can download it.
So, you can see here this is the address of the website and if you go there, there is a link
17
30
here something like download R and then you can see that when I am trying to record
this lecture at this moment, the latest version which is available is R version 4.1.2.
So, we are going to work in this version only and now you can recall that earlier I told
you that there was version 1.0 that was released in 90s, right. So, now, it has come to
4.1.2.
So, you simply click over here at this button download R and then you will get here this
software and then actually what will happen that after you press on this button download
R, then it will show you this type of screen. Actually, these are the different mirrors; that
means, now you can see now there are many many institutions across all the world in
different countries who have hosted this website. And, all those R software and their
packages they can be obtained from any of these websites.
So, you can see here there is a site in Algeria, Argentina, Australia, Austria etc.; they are
arranged in an alphabetical order of the list of the name of the country right. So, now
here I would like to; so, you can actually click here any of this issue and then after that
you will come to this site. For example, I try to press here on this in Australia c r a n dot
c s i r o dot a u, if I click here I can I get here at this address same address.
18
31
(Refer Slide Time: 32:40)
And, this screen is going to be the same even if you try to click at any other address and
you will get here this type of screen download R for Linux, for mac, for this Windows
etc. And, whatever is your requirement you try to type here and then you try to simply
click here and then you will get the software and then after that you have to simply do
click, click, click etc. etc.
And, then you can install it in the usual way, the way by which you try to install any
other software. One question which I would like to address here before I try to show you
that how these things are being done that this R software is available on different
websites which are hosted by different academic institutions across the world. So,
sometime people think that that which of the website is going to give us the good
software.
So, I want to clear this myth that all the software there is a long list that you will see in
the R software, they have the same software number 1. Number 2, either you download it
from the site in Australia or Austria or Brazil, it doesn’t make any difference, right.
Sometimes people do say that ok, if you try to download the software from some
neighboring country possibly that helps. Well, I have no reason to claim this or to get
convinced with this claim right. So, you can download it from any site and then you can
work on it.
19
32
(Refer Slide Time: 34:10)
So, let me try to show you here that when you try to do these things in the internet how
are you going to get. So, you can see here that I have means click on this site w w w r
hyphen project dot r dot o r g that you can see here, I can increase the font size. So, you
can see here if you click here, you will get here and then here you can see where I am
trying to move my control cursor, you can see here there is a site here download R.
20
33
(Refer Slide Time: 34:44)
So, if you try to put here download R and then you will see here that it will try to bring to
the different type of this mirrors which are CRAN mirrors and you can see here that this
is a long list of different countries which are available, right.
And, I will say here that you just click on any country say Norway or anywhere you get
here, suppose I click on click here in Norway and then it will bring you to the same site
which I have shown you here and then you can install this software. So, with this
21
34
detailed discussion and after giving you a reason that why should you switch to R and
how are you going to install it, I come to an end to this lecture.
And, I hope that I was successful in making you understand that what was the story
behind this R software. And, then sometimes people do ask me why people are doing it
for free? You see we are working in this academic community and in academic
community, money is not everything. We always try to do some research, we always try
to publish our research papers, many people think that that once our research papers are
published we get some money, this is wrong; that’s our job.
And, that is our duty towards the society that we have to try our best so, that we can
develop the things which help the mankind, humankind. So, that is the very simple
modest objective of those academicians who group together and they are still working,
there are many people across the world who are working for the development of different
types of packages for the R software. So, one should not doubt on their integrity, one
should not doubt on their intentions. They are doing it for the welfare of the people that
is all.
So, with this comment I conclude this lecture and from the next lecture I will try to show
you something more on the R software. Till then good bye.
22
35
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 02
Introduction Help Demonstration, and Examples in R
Hello friends, welcome to the course Foundations of R software and in this lecture we
are going to explore that how we can take the help from the R software how we can get
some demonstration and how we can get some examples so that we can understand how
to execute a command and what are its interpretation. The reason why I chose this topic
is as follows. Can you recall that whenever you try to study a new topic from any book
what do you expect?
You expect that first some basic fundamentals should be given, then after that some
examples, solved problems etc. are given through which you can explore that how we
have to attempt the problems, how we have to solve them and then based on that you try
to solve other problems. So, the same approach I am following in the R software also.
Now at this moment, we do not know how to move forward in this process to learn the R
software.
So, first of all we will try to see that how we can take the help on learning any particular
topic on R and in R software, there are various ways some of the approaches are built in
inside the software and some approaches are based on some external help, for example,
if you go to any search engine and try to type like as tutorials in help in R or the name of
the package command etc., possibly there are many forums, many websites which are
offering such helps.
So, how to explore those things? So, that is what we are going to begin with in this
lecture. So, let us begin our lecture and try to see what we can do ok. So, we are going to
talk about here three topics - how to take help, how to get the demonstration of various
commands and packages and how to look at the examples on various commands, various
aspects that how to solve the problem, right.
36
(Refer Slide Time: 02:47)
So, now first very basic simple thing is this on your computer after you have installed the
R software, you will see an icon like this one. So, you simply have to click on it. Once
you click on it, then this will open a window like this one right and this window is called
as GUI window that is Graphic User Interface window and this is the screen on which
they have mentioned about this R, for example, what is the version of R where is the
copy right what is the platform etc. etc. like this and this is the starting point from
where we start working in R.
So, I will try to show you on the R console also and this whole thing, this is called as R
console. So, these are some terminologies which are very basic terminologies and those
people who are using R, they will be using these type of things. So, it is important that
you try to understand what is the meaning of these small words and then you try to do it.
37
(Refer Slide Time: 04:10)
And after that, now you have to see, that how are you going to take the help. So, I will
try to demonstrate here a couple of ways by which you can seek help in the R software.
So, first option is this that you just go to this R GUI window and click here there is a
option here Help and then you can see there are frequently asked question on R
frequently asked question that is FAQ on R for windows, manuals R function etc. html
search etc., they are given over there and you can click them.
And then you can explore that how they are trying to give the information on any
particular aspect, right. So, before I move forward, let me try to give you this
demonstration on the R console also.
38
So, you can see here that after you click on the R icon, this will open a window like this.
So, now, you can see that possibly in this case the fonts are quite light.
So, first of all I can increase the font size so, that you can see here. So, if you go to edit
then there is a option here GUI reference here I can make the font size to be, suppose
here 18 and style to be here bold and you can see here different types of background etc.
that you can maintain here you can control the size of the console and there are many
options here. But I am not going into those details, but I will simply make it here like
this, right.
So, you can see here because the font size is now more. So, you cannot see everything,
but you can see here that this was like the R version 4.1.2 its name is “Bird Hippie” and
4
39
then this is the copyright etc. etc. . So, whatever I shown you on the screenshot that you
can see here also.
Now, you can see here, this is here are greater than sign and this is here are the vertical
line. So, this greater than sign is the prompt sign and this here this vertical line here is the
prompt. So, I will show you that how we are going to work.
But what I want to show you here is here Help you can see here in this Help you can see
here all these things. So, if you try to explore try to spend some time on it you can see
that how you can seek the help on this R software, right.
40
So, let me try to move forward so, that I can show you something more the second option
nowadays which is very popular is that you try to take the help of any search engine like
as Google just go to google dot com and try to search there for the command for the
syntax or for anything related to the R software and then you will get it. But anyway I am
not going into that mode, my objective is that how to seek the help from within the R
software.
Now, when you want to take the help from R there are two options you have some idea
about the command for which you need help or you do not know anything, you simply
know ok I want to do something, what is the command I do not know and I want to know
the command and then I want to have possibly more information on that. So, I will try to
take up all these issues one by one.
So, first option I am going to show you here is that suppose you know that there is a
command, say read dot table well we are going to learn about it, we are going to use
about it this is the command to read the table in the data which is given in the tabular
format. So, now, I want to know about this command. So, what I can do here I can just
write a question mark and then I can write down here the read dot table command, right
and this is the job which I have to do here at this prompt sign.
You can see here this is the same screen here I am trying to write out here question mark
read dot table. And after that you have to simply enter and after that, I will show you
what will happen this will go to the help server.
41
And it is like this because the help server from the R software and here you will see that
it will give you all the information, complete details about this read dot table command
and that is what I told you in the beginning itself that R is not a black box. Whatever
information you want to know that is available. So, if you try to see here now I will try to
first demonstrate here how are you going to seek the help.
You see this is very obvious that whenever you want to learn a topic on the book, the
first condition is that you can find the book, but then you have to read it means without
reading how can you understand the topic that how to solve the problem. So, the same
rule applies here also.
So, I will show you here for example, this is the command here, then this is the
description of this read dot table command that it reads a file in the table format and
creates a data frame from it with cases corresponding to lines and variables to fields in
the file and so on and after that, it is giving you how it is going to be used. So, you have
to write down here read dot table and then inside the parenthesis like this one here, you
have to write down here the file that is your file name, then header equal to FALSE or
true then sep which is separator then code etc. etc., there is a long list and even after
that it is also giving you the information on the read dot csv, read dot csv dot 2, etc.
these are different types of files which can be read in the R software well, we are going
to learn all the things gradually in the forthcoming lecture. So, you need not to be scared
about these thing.
I am simply trying to show you here that these things are available and wherever you
need it you do not need to go anywhere, but you can look into the R software and R
software itself will give you all sorts of information and after that there will be many
many arguments and every argument whatever is written here like a file, header,
separator etc. etc. they are explained in detail that how you have to use it like this.
42
(Refer Slide Time: 11:23)
For example, you can see here file it is giving it like this is the name of the file blah blah
blah blah blah. This is the header, I mean file can also have a complete URL and header
is the logical value indicating whether the file contains the names of the variables as its
first name etc. sep this is the field separator character and it is giving you that how it has
to be given and how it has to be used and similarly, all these details are there well I have
taken the screenshots because the font size is quite small here.
So, you may not be able to read it, but I request you that you try to repeat the same
command on the R console and then try to see what it is trying to see. So, the bottom line
is that if you are willing to read these details, you have the complete outcome that is all.
43
So, now I will try to show you it on the R console itself that how are you that how these
things are going to look like this.
So, what I can do here for example, this I come to the R console and here I try to type
here question mark read dot table. So, you can see here this is the place where I have
typed the command and let us try to see what happens. I enter here like this and you can
see here, it is showing you here that is starting httpd, server etc. and this will open the
some internet browser and you can see here this all these things are coming over here.
44
And if I try to increase the font size you can see very clearly here that this is giving you
here the description reads a file and then this is the command here read dot table, then
file, then header equal to FALSE etc. etc.
10
45
(Refer Slide Time: 13:36)
11
46
(Refer Slide Time: 13:40)
12
47
(Refer Slide Time: 13:50)
And then it is giving you here all the details about the file, header, separator, sep etc.
and you can see here it has got all the details, right and then beside this thing it is giving
you that how these things have been obtained this is giving you the complete details of
the differences also that how these things are interrelated to different commands.
13
48
(Refer Slide Time: 14:05)
How they from where they have been taken and yeah, what else do you want it is giving
you a different examples also that how you can do it etc. like. So, now, if you ask me R
itself is giving you the complete information and once you read it, you will know the
complete information about this command read dot table. Surely when you are trying to
work it may be possible that all the options which are given here they may not be useful
for you.
14
49
Usually, the way it is written like that only first couple of these option they are more
useful, but definitely as a programmer when you are trying to do something and if you
want to get done something you would like to have different types of option which you
can use at your convenience rather than the convenience of the software and that is what
I said in the beginning that you will have the complete flexibility that whatever type of
data you have you can use it here and read dot table is going to give you the all possible
option.
So, now it depends on you how much you want to study to understand this command and
I have taken here only one example, but that you can do for all possible commands
which are available in the R software right ok. So, now, I come to another approach.
Suppose there is a situation where I don’t know what is the command, but my objective
is this I want to input some data. Now what is the command I have no idea. So, what I
can do that I can use here the command help dot search.
help dot search and within the parenthesis; parenthesis means these brackets within this
brackets within the double quotes this is the double quote sign which is available inside
the on your keyboard you have to write data input it is just like using a keyword that you
know means you expect that if I try to search for data input in R on some web search
engine possibly I may get some help.
So, similarly you have to think about an appropriate word and then you have to type it
here and you have to type it on the R console here at this command line. So, you can see
here I have typed here after that it will start in httpd help server etc. and it will try to
give you something. So, we try to see here what do we get, but before that I would like to
give you here a warning.
The warning is that when you are trying to use here this double quotes, this is the sign of
double quotes, try to type it inside the R software ,very important thing because in case if
you try to type in some editor, for example, if you try to type this in MS Word you can
see here as soon as you type it and then you move forward, this symbol will become
something like this and then R don’t understand it and then many times we are trying to
copy and paste the commands and things do not work and this creates lots of hindrances
and time and basically gets waste.
15
50
So, my suggestion to you all is that whenever you are trying to use this single quote or
double quote etc., try to type this command inside the R software and in case if you are
doing copying and pasting and if you are copying and pasting correctly if it is creating
any problem means, I would suggest you try to type the symbols with your own hand for
example, symbols like single quote, double quote minus sign etc., try to type them inside
the R console.
Many times when you are trying to type the minus sign inside the MS Word software as
soon as you move forward, the character of this becomes change and it becomes simply
little bit larger, that looks beautiful that looks just like a minus sign, but possibly it is not
really the minus sign which is used as a mathematical operator.
So, that is my very simple advice and this is the way we are going to learn this R
software that I will try to share all my experiences and these are smaller things and
finally, after some time you will become expert in the R software.
So, once you use this command help dot search and inside the parentheses within double
quotes if you try to type data input you get here this type of web page. And here you can
see here it is showing you here something like utilities actually read dot table and this is
for data input and it is giving you another option that data input from spread sheet etc.
16
51
So, you can see here now that when you have given the option data input it is giving you
two options here that you can look into the read table, you can look into the read dot dif
and possibly once you start looking into these things after that as soon as you go into
more deeper, you will learn more things and finally, you will converge to a point where
the 100 percent correct accurate help is available, right.
So, now the same thing if you try to do it in the R console also I can show you here. So, I
try to take this command and then yeah if you try to see here I can come here and yeah I
will show you later on, but if you try to press here control L means control plus L this
will clear the screen also.
And if you try to type the same command that help dot search and within course data
input you will like here help dot search and within the parentheses within the double
quotes data input.
17
52
(Refer Slide Time: 20:33)
So, you can see here as soon as you enter, this opens here a new web page like this one
and if you try to go here read dot if you try to click here. So, you will see here we get
here the same file what we had got earlier.
18
53
(Refer Slide Time: 20:47)
And if we give you the idea about this read dot table and if you try to click here read dot
DIF it is giving you an idea that how you can input the from any spreadsheet. So, you
can see here it is not so, difficult to get or to seek help in the R software the only thing is
this you should have a wish and you should have a will to learn the course.
19
54
(Refer Slide Time: 21:14)
And another option is this if you do not know anything if you want to start from the
scratch, you simply can type here help or say and this parenthesis or you can dot you can
type help dot start and this parenthesis only that is all, right.
So, in case if you use any of this command means then this website is going to be open, I
have given you here a screenshot where you can see many many things are there, now it
is up to you and it is your capability that how efficient you are in finding out the correct
20
55
help on the right command, right. So, anyway I try to show you this thing also on the R
console here.
21
56
(Refer Slide Time: 22:26)
So, you can see here I can see here help dot start and then just open and close parenthesis
and if I try to show you here I can reduce the font size. So, you can see here right now all
the things are there what you want to know means this is about the data import and
export in the R, it is giving you all sorts of format fixed with format how you can do it.
So, it is just like opening a book. So, once you have got a good book then you can just
browse through the topic and then, but the only condition is that you have to read it. If
you do not read, then nobody will come to help you out.
22
57
So, you can see here now after this I come to another approach. You see whenever we
want to find or search something from in a text file or in a data file, we use the function
for example, like control F or say find. Similarly, in the R software also we are using
various types of functions. These functions are actually a special type of programs and
which execute some special job and these functions are available inside the package.
And suppose we are in a situation that we have a function and we want to know which of
the package contains this function. The reason is this although I will explain you later on
in the forthcoming lecture that unless and until you install the package or load the
package first you cannot use this function you cannot execute it.
So, this is the first step while doing any analysis before using the function that one
should know that which package is having it. So, suppose I know that there is some
function and I want to find out where is this function in which package. Then for
example, in this case suppose I want to know that there is a function lowess and we want
to know that which statistical package or which R package has got this function for that
we have a command here find.
So, the syntax is you simply try to write down here find, within the parenthesis, within
the double quotes you simply write down this function name lowess and it will give you
this type of outcome right and in case if you try to see it in the R console also it will give
you this outcome.
So, if you try to read it this is saying package stats; that means, this function is available
inside a package which is actually R package whose name is stats so; that means, you
need to first upload or have the package stats in your computer and then you can do
anything further. Here if you try to see here it is written like inside the parenthesis this is
number 1. Actually this is only indicating the line number. So, this is first line and well
this is at the first position.
So, I will try to explain you this concept when we have more than one values then we
will try to show you that what this one is really indicating, but here I just thought that ok
because it is there, so it should not create any problem. So, I should inform you, ok.
23
58
(Refer Slide Time: 25:51)
Now, the second option suppose I want to use some function or command, but I cannot
recall the exact command, but I know only some part, some fractional part of that
command ok it looked like this. So, now how to find it out? One option is that if I can
find out all the possible functions which contain that name then using my memory
possibly I can find out which of the function I am trying to find.
So, for that we have a command here apropos a p r o p o s and suppose I want to know
that ok I can recall that ok sometime back I had used the command in which there were
two words l m. So, now I will write down here apropos, within the parenthesis within the
double quotes I will write l m and then if I try to enter here this will return me this type
of screenshot.
So, you can see here it is now giving me all available functions on my computer within
my R software which contain l m for example, you can see here this is here l m, but this
is the part of the command dot call means, there is another value here where it is here dot
l m, but this is the part of dot l m dot fit.
Similarly, if you try to see here there is another here l m, but this is the part of confint dot
l m. Similarly if you try to take here anyone, so, for example, you can see here this is
here l m, but this is the part of KalmanRun that is another function, right. Similarly, if
24
59
you try to see here there is here function here predict dot glm in which the l m also
occurs.
So, you can see here it is giving you all possible functions which are available on your
computer and now it is your turn to look into these commands and try to recall if you can
recall the earlier command which you want to use, right. So, let me try to show you these
two things on the R console and then I will try to move forward.
So, if you try to see here, I try to clear the screen by pressing control L and then I try to
say here find l o w e s s and if I say enter it is giving me here, this is package it is stats
and this is the same thing what you have got here. So, now, you can gradually believe on
me that whenever I am trying to show you the screenshots, this is exactly going to
happen when you are trying to execute it on the R console.
Now, you also understand what is the meaning of R console. Now similarly if you try to
come here apropos, say here l m. You can see here it is giving you these many things you
can see here these are here like this and here I want to show you here one thing more the
reason I took it here if you try to see here, here we have got these numbers 1, then 5 and
then 9.
What does this mean? 1 as I said earlier this is the line number. So, if you say this is here
line number 1 and this is here 1, 2, 3, 4 and the 1st value on the second line is the 5th
25
60
value which is written here 5, then here 6th, 7th, 8th and then here this is the value in the
3rd line which is the 9th value in the list similarly 10th, 11th, 12th and then this is going
to be here the 13th value which is KalmanRun.
And now that is very obvious that in case if you try to shrink this window if you try to
make it here only of this size then this number is going to increase and that is what
exactly is happening here means if you try to see when you are trying to do here in the R
console because you have increased the font size so, the number of elements that can be
accommodated in a line they are they are becoming only two. So, this is there. So, it is.
So, there are here 27 lines but in my slides there are only 25 lines.
So, and then it will also depend that what type of package you have on your particular
computer. So, it may be possible that when you are trying to repeat the same commands
on your computer this may not exactly match, but this should not confuse you I have
explained you the reason why this is going to happen, right ok.
So, now I have given you sufficient idea that how you can seek the help. Now, I come to
another aspect. Now you have got the book, you have you have searched the content that
ok these things are available in the book and you start reading the book. Once you read a
concept after that the first thing which you need is some solved examples. Solved
example help us that how we have to proceed in the correct way.
26
61
So, similarly in the R software also there is a provision of examples and these examples
on particular function command, etc. that can be viewed using the command example e
x a m p l e. For example, in case if I want to know or we want to understand an example
on the command l m.
Actually l m is a command to use the linear models which are quite popular in statistics
although we are not going to do it here, but this is just for your information that suppose
somebody wants to know that how to execute this l m command and how to interpret the
outcome then simply type example inside the parenthesis just l m. And after that you will
get all the details all the details about the outcome which are in the form of text and then
graphics etc. etc.
The only thing is this you have to just read it for example, you can see here and I try to
do it on the R console, I get here this type of screenshot called then l m formula residual
etc. etc., many many things and then graphics also you will get and there are more
details although I am not going to print here all the screenshots but I will try to show you
on the R console itself so, that you can see that what is really happenings right.
So, you can see here let me and there here you can also see that I can control the width of
my this screen just by pressing or by just by shifting these things. So, I am trying to do it
27
62
because you will see later on that when you want to do the graphical thing there are some
graphic things also then they appear on the same screen.
So, if you in a different window. So, you want to create some space. So, first let me try
to clear the screen by pressing control L.
Now, if I try to see here example see here l m, right and if I try to press it here it has
started now it is giving here click, it is giving you here some details here that you can
28
63
read in detail, it is yeah, well we are not going to do the here this thing, but if I try to say
here enter you are getting here different types of here options from the application and
their graphics etc. etc.
And if you try to see here just try to move forward you can see here different types of
graphics etc. etc., they are given here right and if you want to close this window you
simply have to just press here clear and I can clear the screen, well, I am not asking you
to read it at the moment my objective is simply to show you that how these things can be
can be obtained.
Now, after this you would also like to have a demonstration for example, you always ask
your teacher in the class sir, can you please solve some questions on the blackboard?
Because you want to follow that how these things have been actually solved. So, the
similar type of facility is available in the R software also for demonstrating the R
function and for this thing, the command here is demo and within the parenthesis you
have to write the R command for which you need a demonstration it may be possible that
all the commands may not have a demonstration, but I am sure that most of the important
commands do have a demonstration.
For example, here I would like to show you that suppose somebody wants to create a
three d surface plot and in the R software there is a command here persp p e r s p. So, if
29
64
you simply try to write down here demo persp here then you will get here this type of
screenshot which I am trying to show you here, right. It has got all the details and it
continues it is only the first screen I am giving you, but after that it will give you here
more thing.
And similarly if you want to see the demonstration of sender command here demo of
some command graphics you can simply type here demo graphics- demo and inside the
parenthesis graphics and then you will get here this type of outcome you will get here
graphics you will get here all the text and you simply have to read it and then you can
understand it how the things are happening.
30
65
(Refer Slide Time: 36:05)
So, let me try to show you it on the R console for example, demo persp and if you try to
enter here it is trying to, now you have to follow the instruction it is giving with the
demonstration it is writing type written to start, ok.
I will enter it and then it will give me here some details you have to read it and then after
that it is showing me here that click or hit enter for the next page. So, I will keep on here
entering and then the different types of 3D surfaces plots, they are coming over here.
31
66
(Refer Slide Time: 36:35)
And you can see here that how beautiful curves and graphs can be prepared in the R
software and that is also for free right.
32
67
(Refer Slide Time: 36:50)
Similarly, if you try to take here the demo of graphic. So, you can see here demo
graphics, let me try to make here one mistake, suppose I try here only graphic I do not
type here as see what happened it will show you here error in demo graphic no demo
found for the topic graphic. So, as I told you that R always give you some error messages
whenever you are trying to make some mistake and the language of the error message is
quite understandable.
33
68
So, here it is trying to show you the same thing that ok no demo found for the topic
graphic because graphic is not the correct command, the correct command is graphics.
Now if I want to type here demo graphics I can use my arrow key and I can repeat the
earlier command and if as soon as I type here demo graphics and I enter it will ask me to
type return.
34
69
(Refer Slide Time: 37:51)
35
70
(Refer Slide Time: 37:52)
36
71
(Refer Slide Time: 37:53)
37
72
(Refer Slide Time: 37:56)
And then you can see here on the left hand side this is theory and then it is writing here
click or hit enter for the next page I try to do it here and then I keep on entering here and
then I will get here that how to do all these graphics. You can see here how beautiful
graphics can be created in the R software without doing actually much, right. So, now,
let us stop in this lecture and we come to an end in this lecture.
38
73
So, if you try to see in this lecture, I have tried my best to give you an idea that whenever
you want to begin with the new topic how you can seek the helps in different formats. I
am not saying that these are the only formats, but there are many more things which are
available, but my constraint is that if I start doing each and everything, then possibly it
will be a very long course. So, as I said in the beginning you have to hold my hands and
after that I will set you free.
So, my request is that you please try to have a look on this lecture try to revise it and then
try to use some other functions and try to use the same command, for example, for this
demo example etc. and try to see how you can get it. It may be possible that you may
make some mistakes in the beginning, don’t worry try again and, but at least try to use
these commands for couple of times with different functions different examples so that
these things are settled down in your mind.
So, at a later stage, whenever you want to use them these things will click back into your
mind and you can use them very easily.
So, you try to practice them and I will see you in the next lecture with more topics, till
then good bye.
39
74
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 03
Packages and Libraries in R
Hello friends, welcome to the core Foundations of R software. And, in this lecture, we
are going to talk about some Packages and Libraries concept in the R software. Now,
before we try to understand what is the concept of library or package, let us try to discuss
a very normal phenomena that you follow in your life. Suppose, you want to learn
something about some topic, what is your first step? The first step is to get a book.
Now, when you want to get a book then again you have two options. First option is that
you go to the market or an online store and buy it. Second option is you go to the library
that is the physical library means you know at every college, university, school, etc., they
have a library where they keep all types of books. Now, buying all the books is
practically very difficult nearly impossible. So, what we try to do that we try to go to our
library.
Now, try to recall, what happens after that? You will go to the library you will look for
the book, then you will bring the book to a counter where somebody will issue that book
to you for some time; then, you will bring that book to your home or your hostel or your
college. After that, you will study from that book; and once you are done then you will
go back to the library and return it. That’s a very usual process which we follow.
Now, suppose if any of the condition is violated here, then what will happen? Suppose,
you have to buy all the books that is an expensive affair, that is practically impossible.
So, you cannot buy, number one. Is it possible for you that without going to the library
you can have the book? Not possible, because unless and until you go to library yourself
and sign against your roll number or name, you will not be getting the book.
Once you bring the book to the from the library to your home, would you like to keep it
for always? No, because if you start doing it you want to learn many topics and then your
home will be full of books and it will be very difficult for you to manage them. So, you
75
would like to bring the book, but after some time you would like to give it back also. So,
all these things are necessary. Now, the same thing happens in the R software.
In case if you try to recall when we downloaded the R software that had some built in
functions built in, facilities, but with the growing popularity of the R software, there are
many things which have been developed by different academicians, programmers, but all
of them are not going to be useful for everyone. So, what will happen? People want to
use only a couple of topics as you do not want all the books from the library, but you
want to use only couple of books.
So, similarly in the R package also, you would not like to use all the functionalities. And
now, suppose if you try to put all the functionalities in a single package, the size of the
package will become too big and it would not be convenient for downloading for
uploading; and it will take, say large amount of space on your computer also. So, what
people have done? For doing different types of job, they have created different types of
packages.
And those packages have been uploaded on the website of the R software. Whenever you
want, you can go to the website and you can download them, you can install them, you
can work as much as you want and after that you would try to remove them, not
removing from the computer, but removing from the from your library. So, that is the
same process we follow in the R software. But, how to get it done? That is the objective
of this course which I want to discuss today.
So, let us try to begin our lecture and we try to understand these basic fundamental
concepts. And there is going to be one difference between my slides and what you will
be doing on your computer or what I try to do on this computer on which I am trying to
record my video. I have prepared these slides on a different laptop. So, whatever is the
outcome that will be related to that laptop. Now I am recording on a different computer
so that the contents may be different.
And, similarly when you will be doing it on your computer your contents may be
different. So, do not get confused when you try to match the contents and if they are not
matching, right. So, with this objective and keeping this point in mind let us begin our
lecture today.
2
76
(Refer Slide Time: 06:03)
So, ok so, now we are going to talk about packages and libraries in the R software. So, if
you try to recall- how our library looks like, can you see this type of picture? There are
so many books, people sitting over there. So, what is happening inside the library?
Library is essentially a collection of books or media that are easily accessible for use and
not just for display purpose only, right.
So, what happened that if somebody wants a book? The person goes to library and search
for the suitable book get it issued and start using it, and when done the book is written to
the library, right.
77
(Refer Slide Time: 06:53)
Now, similar is the concept in the R software also that there are libraries in R also. And,
while understanding it, if you keep in mind the usual process of a physical library then
possibly it will be very easy for you to understand the concept of library and packages in
the R software. So, as we already have discussed couple of times that when this R
software was built up, then many people around the world they got associated and they
distributed their work.
So, some of the basic functions which are commonly used, they were put in together
inside the R software and that part is called as base package, right. So, what you
download from the website of www r hyphen project dot org that is essentially the base
package. And then, if you want to do something more, then you have to use the concept
of library, and then you have to get that package and then you have to work for it.
So, these R packages are essentially the collections of program and data sets which have
been developed by different users, different scientists, different academicians from
different parts of this world. And, this was the biggest advantage of this R software that
so many people got involved and they started thinking in different directions that what
people want and this increased the functionality and applications of R tremendously,
right.
78
Now, in case if all the functionalities are added to the package then the size of the
package will become too huge and it will occupy large amount of space on the individual
computers also. So, what was thought? That the commonly used important functions
were added to the base package of the R software; and just to avoid that the package it
does not become too big, then additional features were given separately, right.
Because, it is obvious suppose, as a statistician I want to use linear models only although
I am not interested in the cluster analysis. Similarly, somebody may be interested in the
cluster analysis and may not like to use the linear regression analysis. So, why to put
both the things together inside the base package?
So, that was the concept. And, now for example, if you try to see now as I said that there
are some functionalities which are added in the base package of the R software and
others are additional. So, for example, let me take an example of the cluster analysis.
Cluster analysis is a methodology where we try to group different objects together. Well,
that has a statistical background and people use it, but at least I am not interested in
telling you what is cluster analysis, but I am just trying to take an example so that I can
explain you that how the packages work.
So, this aspect of cluster analysis is not included in the base package of R software. But,
in order to conduct the cluster analysis, a package for cluster analysis is developed by the
79
experts who knows cluster analysis and this package is called as cluster. The name of the
package has been given as cluster.
So, now, this is not a part of the base package. So, whenever we want to use it we can
download it and use it completely for free. And, similarly you can think of not only
cluster analysis, but so many applications of statistics and mathematics which are not
needed by everyone. So, whenever they need it they can download it, they can use it just
for free.
So, now the question is how to get it done? And besides those things as we already have
discussed that R software has an option that people can write their own functions and this
gives very big advantage that different people can write different functions and they can
have databases also so that anybody can use it. And, in order to use them, these functions
and data sets they have been organized into libraries.
It is just like if you go to a library, there will be a book, there will be a CD, there will be
a DVD, there will be now E-books, there will be data sets which are available online, etc.
And, you can access them through library. So, similarly in the R software also if you
want to use a library, first, you have to inform the software and for that the command is
library l i b r a r y all in small letters, lower case alphabets.
80
And after writing this command, inside the parenthesis you try to write down the name
of the package that you want to use. It is just like means if you go to a library and if you
want to have a particular book and if you ask the some employee in the library, ok, I
want a book. The person will simply ask you what book, which book, then you have to
tell the name and then that employee in the library can help you in finding out that book.
So, similarly suppose I ask that person in the employee that I need the spatial book or in
our language I want to use a spatial library. A spatial library for example, is used for the
spatial analysis and statistics. So, I simply have to type here library inside the
parentheses I have to write s p a t i a l, a spatial which is the name of the library. Now,
what will happen?
This library will become ready to use. If that package is there and if you don’t load the
library then you cannot use it. It is just like if the book is inside the library and if you
have not issued it you cannot read it, you cannot understand it. So, first you have to get
the book issued. So, this library spatial command works just like as if you are getting the
book issued in your name.
And as I said that there are two types of libraries which are available in the R software.
Some libraries which are very common popular and they are commonly used, they have
been made the part of the base package of the R software; whereas, other libraries they
81
are user dependent and they are need based. Wherever you need, you can get it for
example, a spatial analysis is a very specialized topic and everybody may not like to use
it always; whereas, finding out the arithmetic mean or variance, these are very common
things which everybody may like to use it.
So, functions like mean, variance, etc.; they have become they have been actually made
the part of the base package of the R software; whereas, spatial cluster analysis etc., they
are not the part of the base package, but they have to use to be used separately.
So, some of the libraries they are also the part of base package in R and one of the library
which is quite popular is MASS; capital M capital A capital double S. Actually this name
is coming from here. Modern Applied Statistics using S-Plus. So, this is here M, this is
here A, this is here capital S from statistics and capital as from S plus.
Actually, if you try to recall that earlier we had discussed that this R package was
developed on the lines of another software S plus. So, when this S plus was there then
Professors Venables and Ripley they wrote a book whose name was modern applied
statistics using a S plus. And in that book, they have used some data sets. So, when this
R was developed because that was developed on the same lines as the S plus.
So, in order to understand it people thought that ok if they try to use the same data set
and because R has got the similar command people will be more comfortable in using
8
82
and learning the R software. So, all those data sets commands etc. whatever were
available in that book, they have been compiled in this package MASS, right. Similarly,
there is another library which is built in which is mgcv, all in small letters lowercase
alphabets. This is a library for using the generalized additive models. And then not only
these two, but the, but there is a long list which are the part of the base package.
And two examples of those libraries which are not the part of the base package R say for
example, spatial s p a t i a l, all in lower case alphabets. Actually, this is a library which
is used for the spatial analysis in statistics. And similarly, there is another library for
example, boot b double o t. Actually, this is a library which is used for boot strapping
methodology in statistics, right.
This spatial analysis, bootstrapping, etc. they are different types of methodologies which
are used in the statistics. And, now means because I am from statistics background so
that is why I am trying to give you example of those libraries and packages which are
used in statistics right. So, but as I said there is a long list of such libraries which are
available.
83
(Refer Slide Time: 17:54)
Now, in case if you want to begin, the first step is that you have to use the command
library to load a package, right. And, as I said a package is the collection of functions
which are bundled conveniently, right. And once you try to use this library command the
R will bring the required library or required package from the software to the memory of
the computer and it becomes usable right.
10
84
Now, suppose you get a package and you want to see, what is this package, what are the
details of this package. So, as we have discussed already that R is not a black box, if you
want to have any type of information that is available and one can see it. So, in order to
see the description of the packages, we have a command here package description. Now,
if you try to see you have to be a little bit watchful here that all the alphabets they are in
the lower case, but this capital D this is in the upper case right.
So, yeah I would try to explain you in more detail also that R software is case sensitive.
The this is the R programming is case sensitive; that means, small y and capital Y, they
are different. So, similarly whenever you are trying to use any name of the package you
have to be very careful that whatever is given in the name in the lower case alphabet and
whatever is given in the upper case alphabet, that has to be used exactly in the same way,
right.
So, you can see here this is the package name priority version date when it was built and
depends suggests and authors etc., etc. And, there will be some other details also, but I
have given you here a brief screenshot just for the sake of understanding.
11
85
(Refer Slide Time: 20:54)
Now, means after that you would also try to seek some help from this library function.
So, for example, in case if I want to seek the help for the spatial package, which has to be
used in the library function. So, we try to use the function help with the library package
in this following way.
I will try to write down here library, then within the parenthesis I will write down here
help h e l p equal to and then I have to give here the name of the library for which we
want to know the detail and it will give you some details like information on packages
spatial which is packages spatial, priority recommended, version, date etc. etc.; and
followed by some list of functions and data sets whatever are available with this one,
right.
12
86
(Refer Slide Time: 21:57)
So, I have given you here a screenshot. So, means yeah you can do it; that is not a big
deal, you simply have to type this library help is equal to spatial on your computer and
you will get the same thing. There are many many details, so, I would not tell like to
show you it on the R console, but you can do it yourself that is not difficult.
13
87
And yeah means that the time now when I have to make you independent also. So,
whatever I am doing here you try to do it yourself on your computer and try to verify, are
you getting the same outcome, ok.
So, now I come to another aspect whenever you want to use any library. First of all as I
said that you have to bring the package from the website of the R software to your
computer, right.
So, as we have discussed that the base package contains programs only for the basic
operations and it may not contain the functions and libraries for some advanced work
related to for example, statistics. So, these special requirements are met by these special
packages.
So, now the question is, how will you bring these special packages to your computer so
that you can use them? So, obviously, as we have discussed that these packages are
available on the website of the R software. So, you simply have to go to the website, you
have to download them and then you have to install them, it is just like as you download
any software and after that you have to install it, right. So, similarly you have to
download these packages and then you have to install it. And after that, you can use
them.
14
88
So, now the question is that how to download and how to install these packages on your
computer. So, to install any packages first you have to execute the R program. And, on
the R console inside, on the command line, you have to simply type install dot packages.
So, it is i n s t a double l dot p a c k a g e s, all in lower case alphabets. And, this function
will help you in downloading the libraries what you want.
So, now whatever you want you have to give that name inside the parenthesis within the
double quotes, right; for example, in case if you want to download the package boot or
say cluster. So, as we have just discussed boot is the package which contains the
statistical tool for the bootstrap methodology whereas, cluster package contains the
statistical tool for the cluster analysis, right.
So, these two packages are not available in the base package of R, but they are available
on the website of R software. So, the question is how to install them on your computer.
So, I will try to show you here that how to get it then. And, through screenshot I will try
to show you here what are you really going to observe. Yeah, sometime it may take a
couple of say this seconds minutes etc. depending on the size of the package.
So, yeah so, I would not like to use this time in showing you here, but I will try my best
to explain you through the screenshot and believe me I am promising you this is a very
simple process. So, suppose I want to install the package boot. So, I have to simply type
15
89
here install dot packages, within the parentheses within double quotes I have to write the
name of the package boot b double o t or similarly, if I want to install the package
cluster. So, I have to type here install dot packages within the parentheses within double
quotes, you have to write the name cluster c l u s t e r, then you will see what happens.
So, suppose if I take the example of a here boot. So, now, you can see here that as soon
as I type install dot packages and then boot on the inside the R software on the command
line something will happen like this, that something will be executed and it will be like
this please select a CRAN mirror for use in this session. So, if you try to recall, we had
discussed earlier that this R packages or this R software that has been uploaded in on
different website and it has been hosted by different institutions.
So, similar to what you have observed while downloading the R package because the
same thing you will observe when you are trying to download a package. So now, it will
ask you from where you want to download the package. And it has a list of countries in
the alphabetical order, this is from Australia, at Canberra, Australia in Melbourne,
Australia and Melbourne 2, Australia and Perth, Austria Belgium, etc. So, you can
choose any country.
16
90
(Refer Slide Time: 27:14)
And after that what will happen? This type of a screen will come, right. And it will show
you the progress of this downloading, e.g., it is showing here 61 percent downloaded.
And, after this the whole package will be downloaded and automatically installed on
your computer, right. And, what you can see here after this installation has been done
after sometime depending on the size of the package you will get here a message like
package boot successfully unpacked and MD 5 sums checked.
So that means, ok as soon as you have to just keep an eye on the word successfully
unpacked and sums checked. So, this successfully will indicate that the package has been
installed successfully. After this, this boot package has arrived on your computer. It is
something like this a book has been bought in the library, but you, but since you have not
issued it, you cannot use it that is what you have to keep in mind, right.
17
91
Similarly, as a as an another example in case if you are interested in the cluster package,
if you try to write down here install dot packages within parentheses within double
quotes cluster. So, once again you will see this type of progress in the downloading of
the software and finally, you will get here a message which I have highlighted in green.
And it and you have to simply look into this part whether the downloading has been
successful or not.
So, in case if you try to do it so, now you have brought the two packages related to boot
and cluster in your library, but in order to use it you will have to upload it first, you have
to tell the library please issue it in my name so that I can use it.
So, I will try to show you how are you going to do it. So, the next question comes here
that once you start working in the R software, so from time to time you will be
downloading different types of packages and those packages will be available on your
computer. And, suppose you want to check that what are the packages which are
installed on my computer. So, in order to do that we have a command here installed dot
packages and within the parenthesis, you do not have to write anything.
So, it is i n s t a double l e d dot p a c k a g e s and all in lower case alphabets. So, if you
just use this package, this command over here you can see here this type of detail. Well, I
am not showing you on the R console because these details are going to be different than
18
92
what I have on my computer. So, you can see here it will show you that the base package
then cluster and then the this is the name of the package and then where it is located etc.
etc. And this is here the screenshot you will get here long list of the packages which are
installed on your computers, right.
So, that is how you can get it. And, well, suppose in case if you keep on storing all these
packages on your computer. So, definitely they are going to take some space on your
computer. So, in case if you are not using them, you would like to free some space and
for that you need to remove the package, you have to uninstall the packages from your
computer which are inside the R software.
So, for that we have a command here remove dot packages r e m o v e dot p a c k a g e s
and then you have to give within the parenthesis within double quotes you have to give
the name of the package which you want to remove, right; obviously, only those
packages can be removed which are already installed on your computer, right.
Suppose, we already have installed the package cluster. So, suppose I want to remove it.
So, I will write here remove dot packages and within the parentheses, within the double
quotes, I will write the name of the package cluster; and if you try to do it here, then you
it will remove the package, ok.
19
93
(Refer Slide Time: 31:36)
Now, next aspect is that as we have discussed that R is a free software and R has a very
big advantage that whenever there is some update in the software, all those updated
versions can be downloaded for free. So, now, suppose somebody has developed some
package and it is uploaded on the website of the R software and people are using it
means after some time, some more features are added in that package and a revised
version or an updated version of that package is available.
So, that academician will upload it on the website of the R software. And suppose you
want to use that updated version. So, you would not like to remove the package, but you
simply want to update the package. So, the question is now, how to update it? So, for that
the command here is update dot packages u p d a t e dot p a c k a g e s and within the
parenthesis within double quotes you have to write down the name of the package which
you want to update. So, obviously, the package has to be there on your computer.
Suppose, I want to update the package cluster. So, I will write down here update dot
packages with within parentheses, within double quotes, I will write down here c l u s t e
r and you will get here this type of screen. And, once again it will ask you that which of
the CRAN mirror you would like to use in which countries? Suppose, I use here
Australia Melbourne; and then it will just you just click here and it will be done
automatically. The other alternative is that means, if you want to use the R console
directly here, if you go to the RGui window there is here a package and then there will be
20
94
here set CRAN mirror select etc. etc. Then, there will be an option here update
packages. So, that also you can do and here I would like to one thing here more that
another approach to install these packages is that you can download them means only the
package somewhere on your computer.
And then using this one, this option here install packages you can, if you just click here
the computer will ask you where is the package located, you just go to the to that
location where you have downloaded the package. And if you give it there you can also
install it, right.
So, I will try to show you it on the R console also and after. So, as we have discussed
that once you have downloaded the package then in order to use it you have to use the
command library. So, the first step will be that you have used the command install dot
packages install it, then after that you have to use the command library whatever is the
package name. Now, you have now use it now you have done your work and you want
to remove it remove it you do not want to uninstall it.
Remember, you just want to tell the R software that ok my job is done. Now, whatever I
have done with this command library, I have to do opposite of this. So, for that the
command here is it detach d e t a c h. And, then within the parenthesis, within double
quotes, you have to write down the name of the package colon say for example, cluster
21
95
and then you have to write here unload is equal to true. And, this will unload the install
package. Remember, I am not talking of uninstalling, remember.
No, I am not asking you to uninstall the package right. So, in case if you try to do it with
the cluster package actually, you can see here this type of screen short will come, right.
And means, I can show you here that once I have loaded the cluster package then I am
using detach for this cluster package.
Then, you can see here there is nothing means everything is fine. Now, once I have
detached it; that means, this package is not available in the library and if I try to use this
command once again detach package cluster unload equal to TRUE this is giving me
here error. Because the package can be detach only if it was loaded. And for loading, you
have to use the command library. So, this is what you have to be very careful when you
are trying to use it here.
So, let me try to give you here a glimpse in the R software also in the RGui window then
how the things are going to do for example, if you try to see here load package, set
CRAN mirrors, select repositories etc. So, these things are needed for example, means
every time you are trying to select that from where you want to download your package
from Australia or Belgium or any other place.
22
96
(Refer Slide Time: 37:17)
So, you can select here which repositories you want to use; you can get here this say this
CRAN mirror also, for example, you can see here you can set here whatever country you
want. And similarly, menu install dot packages that is also here if you try to see here it
will give you this type of option here whatever you have typed, but I am just simply
trying to show you the things from the basic fundamental point of view, then similarly
you have here update packages. So, you can see here which package you want to update
from which of the country.
23
97
(Refer Slide Time: 37:54)
So, I will just cancel it and then the last option is install package from the local file. So,
if you have somewhere downloaded the package then you can also install it from here
also right.
So, now, let me come to an end to this lecture. And yeah before leaving, I would like to
tell you one thing more that whenever you are trying to download this software or these
packages, they are local for example, if you have downloaded the R software in a
computer at your office and at your home.
Now, suppose in your office you have installed the package cluster. So, that will remain
only in that laptop or computer, that cannot be automatically install in another computer
which is in your name. So, that is what you have to be careful. So, if you are working in
your office at home and if you are using in the package cluster. So, you have to install at
both the places. And, then many things you can see here, they are also available by this
click from the RGui window. So, why you have not used that one?
My answer is simply, I am trying to teach you the thing from the basic fundamental
means after some time we are going to talk about the R studio software also where
whatever I have told you, these things are very easy to use, but then I do not want you to
be dependent on any particular software, but I would like to teach you this R software on
the R software only. And, I am more interested in developing the things in the basic
24
98
fundamental point of view. So, that when you are trying to do some good programming,
you are not dependent on the software that is my simple objective. So, now, you try to
look for this command and try to practice it; just try to install, load, unload etc. some
packages and try to see what happens.
Once you become comfortable with this commands and with this language I promise you
the learning of our programming will become very easy. So, you try to revise it, and I
will see you in the next lecture till then. Goodbye.
25
99
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 04
Command Line and Data Editor
Hello friends, welcome to the course Foundations of R Software and in this lecture, we
are going to talk about the Command Line and the Data Editors. Actually, in this lecture
and in the next lecture we will continue with the same topic. So, first let me give you a
brief idea what are we going to do in this lecture. First of all, introduce you with the
basic terminologies although I already have done it in the past.
But just for the sake of completeness, I will formally introduce it and after that I will try
to show you that how are we going to work in the R software. So, when we are working
in the R software, we have two options. First option is that we try to work inside the R
software and we don’t take the help of any other external software. Second option is that
we try to take the help of some external software.
These external software are like the friends of R software. What is the friend? Means as
an individual human being I can do everything what others can do, but in case if I have a
friend and if or if somebody gives me a helping hand like a friend, then my job becomes
easier. So, similarly in the R software also, some friendly software’s have been
developed which help us working in the R software.
And there are couple of software and these software remain popular for some time. For
example, in the last two decades I have seen a couple of such software and at this
moment, one of the popular software is RStudio, beside there are many other software.
So, remember one thing I am not trying to do a marketing of RStudio. But that is one of
the software where I want to show you that if you try to take the help of some external
software, how the things are going to be helpful, that is my sole objective. So, in this
lecture I will try to introduce you with the R software, but how to work on the R software
that I will try to discuss in the next lecture in detail. So, let us begin our lecture and we
try to understand the basic functionalities of the R software, ok.
100
(Refer Slide Time: 02:59)
So, now, once you start the software, you know that what are you going to get that we
already have done. So, I will try to show you here only the screenshot of the same thing
which you now very much familiar. So, when we start the R software, then we get here
this type of window that we had called as RGui window, that is R Graphic User Interface
window.
And in this place here you can see a sign which is like a greater than sign and this is
actually the prompt sign and after this there is here a vertical line, vertical line is actually
this is a sort of cursor right; it will give you the position where you have to type. So, this
line actually, this is called as command line and we try to type our, command, syntax,
functions, whatever you want we always type at this place. So, this is called command
line, right.
101
And then, now as we are quite familiar with the R software we know that the execution
of the commands in the R software is not really 100 percent menu driven. Yeah, some
commands are there because in the RGui window, you have some file packages etc., but
not all. So, it is not like clicking over buttons and then getting the outcome, but we need
to type the commands.
So, in R when you are trying to work, then both the options are there that you can type
your commands in single line or in multi lines both are possible. So, as long as you are
working on one line function, one line syntax there is no issue, but when you are trying
to work on a command, which takes more than one line, then instead of typing it on the
R console directly, it is a better option that you try to use a text editor.
The advantage is that in case if you are trying to type the multi-line program inside the R
console and if you make any mistake at any line, the R software will exit and then you
will have to type everything fresh. Whereas, when you are trying to type your command
inside that text editor and then you are trying to execute it from there. So, in case if you
have made a typographical mistake in typing the commands, you will have an option to
go back to your editor and make the correction and then come back to the R software for
the execution.
102
So, that is why a couple of a text editors are possible and I will try to show you here two
option. The first option that how you can use the built in editor which is built in inside
the R software and second is through some external software. So, the built-in editor in
the R software can be accessed from this RGui menu bar, I will try to show you on the R
console also. So, in this case you simply have to click on file and then you have to click
on the new script. What does this mean?
I can show you first here with the help of screenshot and then I will show you on the R
console also. So, as soon as you begin the R software, you will have here this type of
RGui window. Then you can come here on the file and then you can see here new script.
Now, as soon as you click on the new script a new window will open here and where you
can type the commands and from this window also you can means execute them in a
particular way, I will try to show you and you can also save this window means you can
save it as a R script file.
So, if you try to see here, I am trying to use here a new terminology script, script is
something like program. Means in the in most of the languages for example, if you are
using C language, Fortran etc. anything you always try to write down the program and
that is called here as a script. So, at this moment as soon as you click on the new script
file, it will open a sort of new file whose name will be untitled R editor.
103
And then you can type your command inside this file, you can edit it and then in case if
you want to execute a function or a command from this file then you have both the
option. Both the options means, in case if you want to execute only one line that also you
can do, in case if you want to do more than one line or the entire program you can also
do it in a single step.
For that what you have to do? Suppose you want to execute a particular line, for that you
simply have to highlight it, means in order to highlight it either you can use your mouse
or you can bring your cursor to that point and then using the shift key and arrow key, you
can highlight it. And once you highlight it you have to press control R; that means, there
is a key on your keyboard Ctrl that is called as control key.
And you have to choose the key of the R alphabet and you have to press control and R
together, as soon as you try to do it and the execution will be happening on the R
console. Now, in case if you want to execute more than one lines also that is also
possible, same process you simply try to highlight those group of lines which you want
to execute. And just press control R and the execution will be done on the RGui window
on the command line, ok.
104
So, let me try to show you this on the R console.
And then I will try to show you that how other options can be worked upon. For
example, if you try to see here this is the RGui window, I try to come here file and then I
come here with the new script right.
I can show you here, I can make the size of this window small so, that you can see all the
things together right. You can see here with these operations I am trying to show you that
6
105
it is possible to manage these different windows in the R software also. So, you can see
here this is your here untitled R Editor. Now, suppose I try to make here very simple
commands, although I have not told you the commands, but I believe that ok if I take
very simple thing you will understand them very easily.
Suppose, if I try to take here say x equal to 2 and then next line y is equal to say 3 and
then, I try to say here z is equal to x plus y, well these are very simple operations which
you understand. So, now, suppose I want to execute x. I will just highlight it either using
my mouse or using the shift key and arrow key and now I am pressing here control and R
together.
So, I will press here control and then R and then you have to observe what really
happens in the R console. So, I will see here the control R. Now you can see here some
operation has been done on the R console window. Now, if you try to see here what is
the value of here x on the R console, you will see this is here x, right. Then similarly, if
you try to highlight here y equal to 3 and then you press here control R, y equal to 3 is
executed on the R console or and you can see here this is y is coming out to be here 3.
Now, if you try to and then I try to highlight here z is equal to x plus y and I press here
control R and you can see here, now this here is z like this and if you try to see here the
value of your z is coming out to be 5. Well ok, now, I have shown you these operations
line by line, I try to now show you that how I can execute them in the group.
106
So, I clear this screen and just for the sake of understanding, I try to change this value, I
make it x equal to 20, y equal to 30 and now I highlight all the 3 lines together and I
press here control R. So, you can see here that all these operations have been done and if
you want to see what are the values of x y and z. So, you can also do from here this
window, I will type here the values of x, y and z and I press here control R, you can see
here x equal to 20, y equal to 30 and z is equal to 50, right.
So, this is how you can make such operations. Now, in case if you come back to your
this slide, then I would like to introduce you here that this is the way by which I have
shown you that the execution of the commands can be done on the R software. Other
option is that, we can use some other external editor software which helps us in running
of the R commands.
So, actually different editors are available and most of them are actually free. So, for
example, one editor is RStudio and then another address is Tinn R Ti double n Tinn R.
So, earlier long time back, when I was learning this R at that time, Tinn R was very
popular, nowadays RStudio is very popular. So, they are the free software.
So, I would like to just give you an idea that how this software work and sometimes
when you are working in the software either R Studio, Tinn R or any other software the
working become fast and more efficient. That is the only thing, but please remember the
calculation and the analysis inside the R software when you are trying to do in the R
Studio or Tinn R that is still being done inside the R software only, but RStudio and Tinn
R will give you only a face something like. The face is there, but R software is behind
the screen, right. So, we are going to understand here a brief functioning through the
RStudio software, but my objective here is to teach you the R software and I would not
like you to be dependent on any particular type of such software.
So, I will try to give you an idea, but then after that I will come back to the R console
only and I will be executing all the commands only inside the R console only. So, that is
what you have to keep in mind. So, in case if you want to download the Tinn R software
you can go to this website and then you can download it, this is a free software.
107
(Refer Slide Time: 14:51)
Similarly, if you want to have the RStudio software, then you can go to this website
www rstudio dot com. And this software is all is written in the C++ sprogramming
language and it is a free and open source integrated development environment for R. So,
well when you try to go to the website of RStudio they have some versions which are
paid and some version which are free. So, I am talking here only of the free version right.
108
So, if you go to the website of www rstudio dot com. So, I have given you here a
screenshot. Then you can see here there is download section here, you just try to click
here, click on that download.
And after the download you will come to this such a home page, where you can see here
one of the option here is RStudio desktop free, right and others are there are different
other versions.
10
109
So, they are paid, but I am not talking of that, you simply try to download this RStudio
desktop for free. And then you just click on this one, you will get a file over this, you
simply try to click on the file, install it on your computer and if you double click on the
icon of the RStudio software, you will get a screen like this one right. This will have here
four screens, 1, 2 and here 3 and here 4 screen and they have their different types of
functionalities, right.
So, anyway I am not going to take you further, I am not going to discuss about the
utilities of these different four windows and, but I would stop here and then in the next
turn I will try to show you that how you can work on the RStudio software. So, my
starting point in the next lecture will be this slide, where I will be talking about the
different types of functionalities with the respect to these four windows.
So, now I will stop here, but then my request to you will be at least you try to install this
RStudio software on your computer and try to play with it, try to have a look, what is
really happening in that software and how the windows are there, how do they look like.
So, that will help me when I am trying to explain you about these intricacies in the next
lecture, then at least you will be familiar with the face of the RStudio.
And then, I will try to give you a brief overview only. Well in case if I try to take up the
RStudio also, possibly it will take a very long time to explain you all the functionalities
of the RStudio. So, my objective is only to give you an idea that how these secondary
software are going to help us in the working of the R software. So, you try to download
this software and try to see how you can work in the RGui window, also with that text
editor and I will see you in the next lecture till then, good bye.
11
110
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Introduction
Lecture - 05
Introduction to R Studio
Hello friends. Welcome to the course Foundations of R Software and you may recall that
in the last lecture we initiated a discussion on the use of some external software to work
in the R software. And in that discussion, we had talked about a software whose name is
R Studio and I had requested you that you please download it on your computer and
install it. And today in this lecture we are going to understand the working of R studio.
And I will try my best to give you a good idea about the basic functionalities which are
available in the R studio. Well, there are many many things which are available in the R
studio and it is a very user friendly software. So, I am sure that in case if I try to tell you
some basic features after that it should not be very difficult for you to understand the
remaining features. And as I said earlier R studio is only a helping hand. You can say in
very simple words that R studio is a good friend of R software.
So, in case if you want to do any analysis or do any computation, programming in the R
studio, then it will become more convenient for you to work in R studio than in R
software, but I would repeat here once again. That after this lecture I will come back to
the R software and I would like to work only inside the R GUI window or the R console
ok. So, let us begin this lecture and we try to understand what is this R studio ok
111
(Refer Slide Time: 02:05)
So, in the last lecture, I had given you this slide and where I had told you that from this
website r studio dot com. You can download this software R studio by going into this
section over here.
And then finally, downloading the software from this one this is for the desktop and
similarly you can go for other platforms also if you wish. And I am looking forward for
the free version of the R Studio software.
112
And once again before going into the details I would like to clarify that I am not trying at
all to give an advertisement for this R Studio. I am not saying at all that well this is one
of the software and just as an example I have taken this software to demonstrate that how
an external software can help in the working of R software, ok.
So, now, as soon as you open this software you will find here this type of structure which
I have shown you here. There are going to be some windows like this one, this one, this
one and this one. So, before I try to explain you the functionalities of this software and
functionalities of these windows, let me try to show you this software and then you will
have more confidence and then you can have a look.
113
(Refer Slide Time: 03:34)
So, this software on my computer and if I start it will look like this. So, you can see here
if I try to use my cursor here. You can see where I am moving this is my first window.
Then secondly, you can see here I here I am trying to highlight, this is my second
window then I am trying to highlight here, this is my here third window and then here I
am trying to move here like here say files, plots, packages, help etc.
114
(Refer Slide Time: 04:03)
So, this is my fourth window. So, now, what I will do that I will try to choose one
window at a time. And then I will try to explain you the basic functionalities of this R
studio software, right.
So, now you can actually see. This screen and this screen, they have the same screenshot
and yeah you can also open it on your computer and can see whether this is matching
with the this screenshot or not. That is not difficult to follow, ok. So, now, if you try to
115
see here this is your here window number 1. If I call it window number 1. So, this is the
place where you are going to type all the commands, right.
And if you try to see here, here there is a menu bar which has all the standard functions
like s file, edit, code, view etc. etc. And then this is the place here where you are going to
type all the basic commands and if you try to identify here what is this? Means if you try
to compare this one with the window in the software, what do you observe here? Can
you see here? What is this window? And then if you try to see the R software, the R
software look like this. This is your R console.
So, if you try to see in the R Studio software, this window is simply your R console,
right. So, that is what I am trying to show you here. That this is here simply here R
console. So, you can identify it. Now, this is here, let me call window number 2 and this
is here suppose window number 3. So, you can see here there are some option like
environment, history and then there is a symbol here like as here brush and then it is
written import data set then save global environment etc.
So, this is the place where whenever you try to define any variable or you try to give the
values to those variable they are stored here. So, the advantage is that you can see what
we have done in the past and all those values variables etc. will be there. Well, I will try
to take all these windows one by one and I will try to explain you in more detail, but here
116
I just want to give you a brief overview. And then in the window number here 4 you can
see here there has option like here file, plots, packages, help, viewer etc. and then new
folder, delete, rename etc. So, these are the four windows now I will try to take up these
four windows one by one.
And I will try to give you some more details right. Suppose my objective is that I want to
execute these three functions, right. Now, you know the meaning of this syntax
commands and some of them you might not be knowing it, but I will try to explain you
in the forthcoming lectures. So, if you try to see here the first command now you know
this is library MASS; that means, you want to upload the library whose name is MASS.
And then you are trying to attach a data set whose name is bacteria. So, this bacteria data
set is available inside this package MASS. And then you want to edit it. So, you want to
write here fix bacteria. So, now, we had learnt in the earlier lecture that if you want to
execute these commands in the R console then you can open an text editor and then you
can highlight these commands and then press here control R.
Now, the same thing I would like to do in the R Studio software also.
117
(Refer Slide Time: 08:26)
So, now, how to get it done I am really going to show you. So, now, I am simply going
to show you here. What happens in the window number 1. This is my window number 1
and if you try to see here I have written here these three commands: library, MASS,
attach bacteria and fix bacteria. So, in order to execute these commands just like what
you have done in the case of R console that you used to highlight it and then used you
used to press the buttons control and R.
Now, I am saying that in the R Studio software you simply highlight; suppose if you
want to execute this command library MASS. So, you highlight and then you see here
there is a button here where it is written Run R u n Run. You just try to come here and
then try to click here. And then what will happen you can see in the console, this
command is executed and you can see here this is here library MASS, right. And if you
do not want to press here Run then you can also highlight this library MASS and you can
press here control plus enter. Means you press the control key and then you press the
enter key and this command will be executed, ok. So, this is the function of the window
number 1 and here you will see that there is a option of your file and then if you try to
open it there will be different options which are the standard option which in most of are
available the software, right. And then here you can see there is here another command
here source; that means, if you want to attach your program and you want to source it
then you can press over here.
118
And, but before going into this detail let me try to show you here in the R console itself.
So, what I am doing here I am simply trying to copy these commands so, that I can save
some time.
And now, I am coming to here R Studio. And yeah I am trying to paste it here. So, I can
now clear my R GUI window. So, that you can see very clearly. So, for that we have the
same command control l I try to press here control l and then you can see here there is
nothing ok.
Now, if you try to see here I try to highlight library MASS and then I try to press here
Run. And then what you have to observe you have to observe here what is going to
happen here in the R console here where I am trying to move my cursor. So, now, you
see I press this Run. So, you can see here now this is executed on the R console. So, you
can see now this will give you a good feeling, that whatever you are doing here this
window, window number 1 that is working just like your text editor. That you did in the
case of R console also.
And whatever you are trying to execute that is not happening in R Studio, but that is
happening in R software also which is now attached inside the R Studio software, right,
ok. So, now, we come back to our slide and we try to move slowly towards other things.
119
(Refer Slide Time: 11:59)
So, this I have explained you now and then yeah in case if you want to open here a new
file you want to create here editor that you use in the case of R software. Then this can
be done here also just go to the file and then you have choose the edit oblique Data
Editor and then whatever the name you want to give to the matrix or data frame that you
want to edit for that the Data Editor window will appear. Now in case if you want to do it
alternatively also, you can use here the fixed function. Well here I would like to clarify
that I am using here two names matrix and data frames.
So, up to now we have not discussed what is data frame that we will try to discuss in the
forthcoming lectures. This matrix we will try to explain you that how are you going to do
it. So, suppose I try to execute these three things here.
10
120
(Refer Slide Time: 13:00)
So, first I would like to show you that what will really happen when you try to do it in
the R Studio through the slides and screenshot. So, as soon as you say here this attach
bacteria and fix bacteria here this type of Data Editor will open.
This data is simply this is the data file whose name is bacteria. Yeah, this can be any
other data file also without any problem, right. And this data file is actually available
inside the MASS package, right. So, and here if you wish you can source it. So, that you
can use it conveniently whenever you are trying to do the this programming.
11
121
So, now, if I try to show you here that how it will happen in the R Studio possibly that
will make you more confident. So, if you try to see here I am trying to say here attach
bacteria.
So, or if I try to now in case if I try to press on the source you can see here that this has
appeared. So, this is simply your here the data file whose name is bacteria. Well it has
12
122
got some variables, some data values. So, I am not going to discuss what it is trying to
show you here, but anyway my objective was something else. And now in case if you try
to write here fix bacteria and if you try to say here run the same thing you can do here
also, you can see here the same file is opened, right.
So, this is the objective and what I wanted to fulfil and you can see here as soon as I had
done fix bacteria then this file was opened, ok. So, now, you have understood that what
is really happening with this R Studio. So, now, you have a fair idea that what I was
going to tell you that this R Studio is a sort of interface between R and actually us. And it
is more useful now as you have seen for beginners and it makes the coding and
programming easier.
13
123
(Refer Slide Time: 15:24)
And when we start the R studio we see this these following four windows: windows 1,
windows 2, windows 3 and windows 4. Now, you will understand it very easily that what
is really going to happen right.
So, now, I can briefly tell you what about this here window number 1 which is here. This
window is essentially is used to write the commands and the syntax and this is essentially
called as a script where you try to write down the script of your program, right
14
124
(Refer Slide Time: 15:55)
And then yeah in case if you want to open a new file to write down your script. You
simply have to go to here this button and then once you try to click here you will see here
different types of option R script, R markdown, shiny web etc. etc., right. So, you
simply have to click here on the R scrip,t right. And then means a new file will be
opened. Now, if you want to save this file you simply have to come to this button here
which is indicating the save.
And then if you want to execute the commands which you have type you have to press
here Run. And there is here one more button on the right hand side of Run that if you
want to rerun the lines you simply have to click here, right. And yeah in order to run you
know that you have to highlight the commands and then you have to click on the run. So,
I can show you here this thing and then yeah you can she see here there is a option for R
markdown, Shiny Web application, R sweave, R html.
So, one thing I can just share with you just for the sake of your information. I am not
going into that much detail. That now this R is developed in various directions and many
many applications have appeared which are related to this here R, for handling data a
file, data management etc. So, all these options which I shown you here like as here this
one they will be; they will be coming over here, right.
15
125
(Refer Slide Time: 17:22)
16
126
So, this is file, this is edit, this is code you can see there are many options.
So, even if you are a professional programmer you will find it very useful when you are
trying to work on this.
17
127
(Refer Slide Time: 17:41)
But if you try to click over here you can see here now there is an option here R script, R
notebook, R mark markdown etc.
So, once you learn this R software and you have understood it possibly you can learn
how to use this R markdown, Shiny, Web etc., right. I am not going to under to explain
you these thing.
18
128
(Refer Slide Time: 18:06)
And beside this thing R is this R studio also is used for writing the Python Script etc.
Python Script, SQL Script etc. So, these are the latest developments in the R Studio and.
So, if you try to open here if you try to click here on the R script.
So, if you try to click here now you can see here a new window has been opened here.
Where you can write whatever you want if you want to save here this is here save and
then if you want to go to any file or folder you can just type here the address with the
backslash sign.
19
129
And then this is here about Run then this is here rerun that I told you and this is here
source, right. So, I hope this makes the understanding of the window number one quite
clear and so, we can come back to our slides and try to understand more, ok.
Now, I come to window number 2. Window number 2 you know it was very easy. This
is simply your here R console, right. All these the window all this syntax which you
write inside the R program window they appear here and now you know that after
spending so much of time with this R software. That all type of calculation etc. they take
with inside the console window only. So, one can write the programs here, but as we
have now understood that it is not so convenient to write multi line program in the R
console. So, that is why we try to take the help of R text editor. So, but anyway that is
not a big deal.
So, this you already have seen or I can show you here just for the sake of completeness
that in this R studio. You can see here this is your window number 2 which is your here
R console. So, I can make it clear here control L. So, that you can see here very clearly,
ok.
20
130
(Refer Slide Time: 20:04)
So, now, after this you come to window number here 3. So, window number 3 first let
me try to explain you here actually this is a window where you get to know about the
environment of your variable and what are the variables that you have used. What are the
values they have been chosen? What are the type of data which has been stored in those
values? All such important information is there. So, all the variables and objects
whatever you have used in this program, they appear here. And it is not like that if you
are working today and if you shut down your computer, but if you restart your computer
or restart your program, even then those variables those values will be stored here and
you can use them here.
The same thing is possible in the R software also when we are trying to work in the R
console and, but the thing is this you have to look into those values by using certain
command that we will discuss in the forthcoming lectures; that means, you have to check
the content of the directory that what type of variables are there, but in this case, but in
the case of R studio all these things are available directly, right. So, you can see it see
them in a single glance, right.
So, now if you try to understand here suppose if I try to if I have defined a variable here
x equal to 1. So, you can see here this is appearing here x equal to 1. And similarly here
there are two option environment and history. So, history will give you that what are the
codes, what are the programs, what are the values, what are the variable that you have
21
131
used earlier. And then you have here next option here file and then here save and after
this there is an option here Import Dataset.
Import Dataset means, well we have not done up to now, but again I will say that ok we
will discuss in the forthcoming lecture. That in R when you want to read the data from
different resources like as MS excel or some TXT or CSV format then we have certain
commands in the R software which have to be executed. So, that those data sets can be
brought into the R software. So, here in this R Studio this gives you a direct access that
just by clicking on this command you can call those data files inside the R software.
So, this is about the import data sets. And then in case if you want to erase the stored
values here, there is a button here which is like a broom. So, you can just click here and
then the stored values can be erased, but I will request you that you please try to take
some values and try to execute these operations. So, that you are more comfortable with
these things and before I go further I would like to show you it here also you can see here
you have just opened the site here bacteria.
So, you can see here bacteria it is telling you have to just look at my cursor where I am
moving my cursor that is this 220 observations of 6 variables and so on. And earlier I
have used these programs like a cvboot this is the list of 11 values etc. etc., right. So,
you can see here that all this information which have been used on this computer, this is
available here, ok. Now, this is about environment.
22
132
(Refer Slide Time: 23:46)
Now, in case if you come to here history you can see here that in the past what type of
commands which have been have been used here.
For example, if you can see here you have just used the command fix bacteria. So, it is
available here before that you have used the command library MASS and attach bacteria
they are also here and after that you have sourced these files.
23
133
So, you can see here this is giving you a complete idea that what you have done in the
past. So, this is about history.
And then we have here connection tutorials you can see here if you want to know about
this R studio better.
24
134
(Refer Slide Time: 24:26)
And similarly if you try to see here import data set you can see here from Text that is the
t x t files then read our file from Excel from SPSS from SAS from Stata they are
different statistical software.
So, you can call those data file inside the this software and then you can work right. So,
now, after this can we come to our 4th window.
25
135
Where I can show you that what are you going to do yeah 4th window has several
functions. So, if you try to see here that is actually an output window that whatever is the
output of your these execution that will appear here.
So, you can see here Files, then Plots, then Packages, then Help and then Viewer. So, the
files is files means who can see what sort of files are there. Then plots plot will show you
that what type of plots you are trying to create they will appear here and then there is an
option here packages. So, this list will show you that what are the packages which are
available on your computer and if you want to use them you will see that it is very
convenient to use them you simply have to make a check box inside a box.
You have to simply make this mark inside a box and the package will be loaded. Instead
of you use the function install dot packages. And then there is here help and then here
viewer if you want to view those graphics and other things, right. So, basically the output
of these programs that appear here, right.
So, now, for example, if you want to look into this package means I can show you and
then I will try to show you on the R Studio also. That when you are trying to use
package. Suppose you click here. And then you will see here a list of the packages which
are available on your computer. Remember this list might be different than what you are
trying to look on your own computer, because these are the packages which are available
26
136
on my laptop on my computer. So, now, if you try to see here I have here couple of
package whose name is Agreement, bayesm, composition etc.
Now, in case if I want to use any package; that means, I want to use the command
library. I simply have to make inside this box, I have to make here a tick mark. And then
it will load the package and similarly if I want to load another package bayesm I can
make here tick mark and it will be loaded. Now if you try to look here there is another
command here install. So, do you remember that when we wanted to install a package we
used to use a command install dot packages and then it will go to a site and then from
where it is downloaded and installed.
So, now in this case if you simply click over install then from here you can type the
name of the package which you want to install and it will be automatically installed.
Similarly we had learnt about the command to update the package for that you simply
have to click here update and then those packages will be updated. Similarly after that in
case if you want to come on the help part here after this. So, if you simply go to here this
window. Suppose I want to know about the information about Histogram.
So, as soon as you type here h i s t some more information will be coming here and then
you can click here on the hist and it will show you here the all details about the
Histograms that how you can create here. So, various types of helps that is available here
27
137
and then within Histogram also you can type here something and then it will show it
show here. So, you can recall that in the beginning of the course we have discussed
couple of ways to take the help in the R software.
But now in the R studio most of those things have been combined at one place, right.
So, let me try to show you these things on the R console itself and then I will try to show
you an example also. So, suppose if I want to look at here files. So, these are the files
which are available in my home directory.
28
138
(Refer Slide Time: 28:57)
Then we have here packages. So, packages you can see here and that in this computer
there are various types of packages which are available.
For example, there is a here boot package, then here cluster package. So, you can see
here means if I want to load this package, you simply have to make a click and then this
check box and you can see here that this library boot has been executed. And if I want to
29
139
load the package cluster then if I click here then the library cluster is uploaded. So, the
only difference is that whatever command you were using in the R console, they are the
same command which are executed here, but you do not have to do it yourself.
But they can be done only by a click. And if you want to remove this one for example, if
you want to remove here boot, you just click here and you can see here the detach
command has been used and similarly for the cluster this has been unloaded from the this
R studio.
And after this you have here help. So, if you for example, if you want to have here some
information about the Histogram.
30
140
(Refer Slide Time: 30:06)
Suppose if I try to type here h i s t you can see here that these things are coming in the
drop down and you can click here. And then you can see here the information about this
Histogram is available and within this topic if you want to find out something. It will
also come here, right. So, now, let me try to clear this screen.
And let me try to take here an example to show you that how these four windows work
together when we are trying to do something over here. So, in case if you try to see here I
31
141
want to create here a bar diagram. Bar diagram you know that is a very simple thing. It is
something like a bars are created here.
And I want to create this bar diagram for these values: 1,2,1,1,2,3,1,2,3,1,2,2,3. Now,
how to create this bar diagram how to input the data these are the things which we are
going to discuss in more detail in forthcoming lecture, but at this moment my simple
objective is to show you that when we are working in R studio then how simultaneously
all the 4 windows are working together to get an outcome, right.
So, now just for your information at this moment I can share with you. That if you want
to input the data you have to write the data inside the parenthesis and then you have to
write here a command c and you have to store it inside a variable x and after that you
have to just write the command bar plot table and within parenthesis x. And this will give
you a bar plot. So, let me try to execute this command on the R console which is inside
the R software and try to show you that how the things are happening. So, let me try to
copy this data value. So, that there is no error.
32
142
(Refer Slide Time: 31:57)
So, if I try to see it here this is my data value and then I try to type here bar plot. You see
as soon as I type here bar plot. Something is coming in the drop down and if you try to
highlight over this or move your cursor over this you will get here many information. So,
the this is how this R software is helpful, when we are trying to use it through the R
Studio software, right.
So, this R Studio software help us. So, now, you can see here you can have a fair idea
33
143
And if you click here this bar plot will come and after that I have to type table. So, as
soon as I start typing you can see here that this table comes in the drop down and I can
choose from here. So, this will avoid the mistakes and even the parenthesis is coming
automatically. And after this I have to write down here bar plot table x. Now, I try to
highlight it and after that I am going to press on the run command.
But what you have to observe is that what is happening in all the four windows. So, now,
I click here on the run.
And you can see here what is happened. In the window number 2 here, these two
commands are executed. And now you can see here in this window number 3 this x is
giving you all these values over here, which are here. So, you can see here that these
values are here and even if you try to go to the history here possibly you can find out
what you have done today, right.
So, you can see here this is the command which you have used here bar plot table x.
Now, if you come to the 4th window here, here you can see this is your bar plot.
34
144
(Refer Slide Time: 33:26)
This will give you a bigger image which is more clear and in case if you want to save it
you just come to the export it will give you different options like a save as image, save as
PDF etc. etc.
And after that in case if you want to remove this graphic you simply have to click here,
and then as soon as I say here click on this red cross button.
35
145
(Refer Slide Time: 34:11)
It will ask me are you sure you want to remove the current plot I will say yes.
And you can see here this plot is removed from here. So, now, you can see that it is quite
helpful when we are trying to work in the R software through R Studio. Well as I said
that R Studio is not the only software there are some other software also and it is your
wish what you want to choose.
36
146
But my idea was very simple. I wanted to demonstrate, that when you are trying to take
the help of an external software like R studio. Then how you have to think how you have
to manage and how the things are going to be different. Than what you are doing in the R
console, but above all you can see now here whatever you are trying to do, R studio is
not doing anything this is only a friend. Whatever is happening that is happening only
inside the R console.
And second thing is when you are trying to work in the R Studio, you are simply for
example, in order to install a package or to load a package you are simply making a
click, but think of a situation that you are trying to write a program, big program then at
that moment you would need the command like install dot packages or library command
so that you can write it inside the program.
So, that the user does not have to install that package externally or the user has to load
the library first etc. because your user does not know what you have done. So, that is
why these commands are also needed. So, that is the advantage of working with this R
studio, but once again I would say R studio has many more capabilities.
But I have taken only here some capabilities just to demonstrate that how it can help you,
but I would now request you that you try to play with this R Studio software try to see
what are the different option, different function, different capabilities which are available
and try to explore them.
The more you explore the more you will learn. And the good thing will be that whatever
you have done up to now in the R console, try to see how the same thing can be done in
the R studio software. And this will an open exercise that when you are trying to learn
more commands in the further lectures, try to see how the same thing can be done in the
R Studio software also. So, I would recommend you that we will work together only in
the R console.
So, that we can work with the basic fundamentals and after you have learnt it that will be
your choice, whether you want to work in the R console or in the R Studio. So, you try to
practice this try to take some issues and try to solve them inside the R Studio software
and I will see you in the next lecture, till then goodbye.
37
147
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Introduction
Lecture - 06
Basic Operations in R
Hello friends, welcome to the course Foundations of R Software and in this lecture we
are going to talk about some very basic, fundamental, elementary operations in the R
software. You know this course is a very basic course. So, there are many operations in
the R software which are needed, which are very basic fundamental, which are not
difficult at all, but they are needed to understand the programming and structure of
programs and when we are trying to do programming those actions are needed.
For example, in case if you want to clear the screen or if you want to see the directory or
if you want to set the working directory, etc. These are very simple things, but these are
the basic ingredients of any programming. So, in this lecture and in the next lecture we
are going to understand these very minor basic elementary operations. So, we begin this
lecture and I will try to show you first on the screen and then I will try to show you on
the R console. So, let us begin our lecture ok.
148
So, one of the basic operations when you are trying to work in the R software is that, you
would try to see what are the files etc. which are available in your working directory.
For example, in case if you are working in MS-DOS then there is a command like dir,
which tells you about the contents of this directory.
Similarly, in the R software, we have a command here ls and if you write l s and this
parenthesis, then you will get the contents of the working directory in your computer not
my computer remember. For example, you can see here I have used here this command
ls and I have got this outcome.
So, this is giving me that ok I have something like “corboott”, “corr power temp” etc.
Well these are the program which I am using. So, they are available in my computer
where I prepared these slides. Now, in case if you try to do the same command in your
computer possibly you will not get the same contents, but you will get the contents and
that is our basic objective.
So, in case if I try to show you here this thing on the R console itself you can see here if I
try to press here ls and this parenthesis you will see here these contents that.
149
(Refer Slide Time: 03:13)
We have here with this type of data. So, this is the files which is available here, ok. So,
now, we come back to our slides and we try to understand some more basic operation.
Now, suppose you start working in the R software and you would like to know that what
is the location of my working directory is a place where all your files etc. they are
located. So, in order to know the location of my working directory, we have a command
here getwd.
150
This means get working directory. And then you have to write getwd and then
parenthesis. So, in case if you try to execute this command on the R console, for
example, if I write here getwd then I get this outcome. Well, this outcome is going to be
different when you are trying to execute it on your computer, because this is the location
in my computer in my laptop.
But you can see here the location is “C colon backslash Users backslash Shalabh
backslash Documents” etc. and in case if you want to change it.
Suppose I have created a folder somewhere where I have copied all my files and I would
like that all my files as well as the output they should be stored in that particular location
in that particular directory. This means, we want to change the working directory. The
question is how to get it done. So, for that we have a command here setwd. This means
set working directory and after this all your output will be stored in that particular
directory.
Your input data will be read from this directory and whatever you are going to do in the
R software that will be executed from this directory. So, just for the sake of
understanding. I have created a folder Rcourse capital R and then c o u r s e this is the.
So, I have created this folder on the c drive and I want that my R software should read
the files from this directory and all my output should also be stored in this directory.
151
So, in simple words, I want to change the working directory to be the folder Rcourse on
the c drive. So, for that I write down here command setwd within parentheses within
double quotes. I will write down the c colon backslash Rcourse and after that if I try to
see the working directory for that I use the command here getwd I get here this new
address.
One point I would like to make here that ok, I am working at present in the windows
system, but in case if you are working in any other system, you have to follow the rules
to write down the path in its own way, the way it is required in that platform. So, you can
see here the output on the R console goes like this.
If I try to do here getwd it is giving me this directory that is c colon backslash Users
backslash Shalabh backslash Documents and after that I change the working directory
and after that if I try to get the working directory it is showing me c colon backslash
Rcourse, right. So, you have to be watchful when you are trying to choose the path
command while working in Unix or Macintosh but before going further let me try to
execute these commands on the R console also.
So, you can see here that if I try to write down the command here getwd this is getting
the working directory. So, you can see here this is the c colon backslash Users Studio-3
Documents. Well not the same computer on which I have appeared my slide. So, this is
152
the computer where I am recording it. So, this is the address. Now I try to write down
here setwd, I would like to change my working directory.
So, you can see here. In case if you are not getting any message or error; that means, the
job is done and now in case if you try to see what is the working directory you use the
command getwd and you can see here now this is c colon Rcourse, right, ok.
So, now, we come back and we try to understand some more basic operation. Well, this
is another very basic fundamental requirement when you are trying to execute a program.
Sometime you execute a program and it is working and you want to interrupt it, you want
to stop it. Suppose you start a computation and it takes some time. Suppose the
computation is going on somehow you realize that ok, there is something, some reason,
some mistake because of which you want to stop it.
For that you simply have to press the escape key. So, you know this escape key is present
on your keyboard right. So, this is very simple command.
153
(Refer Slide Time: 09:38)
And yeah I already have talked about the next command couple of times and you know,
but still just in order to complete it I will explain you here again that the question is how
to clean the GUI window in R. So, in order to clear the contents on the GUI window,
Graphical User Interface window, we simply have to press two keys together.
One is control key and the key with L alphabets on the keyboard. So, if you try to put
here control plus L your screen will be clear right. For example, if you want to show it,
although we have done it many times, but still I would like to show you. Suppose I want
to clear this screen, this is the RGUI window. So, I simply press here control that is the
key where it is written c t r l and then I press here L.
154
(Refer Slide Time: 10:39)
You can see here as soon as I do it will become clear. Suppose if I have entered and I am
here and I want to go on the first line, I will again say control L and I will come back to
the first line. So, these are very basic operations but they are needed.
The next command is that suppose I want to search some information on the web. So, the
one option is that you can go to any search engine and type the question over there and
get it done, but within R also we have a provision. And in order to search the web for
155
information and getting the answers regarding R, we can use the command here
RSiteSearch.
Now, you have to be very careful when you try to read this command. Here this R and
this S, they are in capital letter and this S of this search this is also in capital letter and all
other letters are in small case, lower case. So, once you try to write down this function
then inside the parenthesis you have to write what exactly do you want to find.
And then if you try to type this command on the R console, the control will go to this
site; search dot r hyphen project dot org. And it will try to search for that word on this
website. The difference is that if you try to go to any search engine like for example,
Google and if you try to look for that word possibly it will give you many many websites
where those things are available, but when you are trying to do it from inside the R
software it will be restricted only to this sight.
For example, in case if I want to know about mode m o d e, I will simply type on the R
console capital R capital S then small letters i t e and then capital S and then e a r c h in
lower case. So, in case if I try to write RSiteSearch and within parentheses within double
quotes I write mode m o d e then you will see here that we get a screen here like this one
and after that browser is automatically opened.
156
So, let me try to show you here this thing and you will see that this type of web site is
open from there it will try to give you the different information which is related to the
word more “mode”. You can see here this is here mode. And it is trying to do it. So, let
me try to show you this thing on the R console also. So, I try to copy this command so
that I can save some time and I do not make any mistake.
10
157
And then as soon as I do it, it will come to us website like one, you can see here.
So, now, there are various information you can say multimodal, one modes one mode
projection and then now you can just look at it and in the R console it will be like this
what I shown you in the R console right and the screenshot also here, right ,ok.
So, now after this I come to a very interesting issue which many times many people ask
me and yeah these are very common things whatever I am telling you, these things looks
11
158
very simple very common, but when somebody is trying to learn this R software for the
first time, these are the questions which crop up to their mind and they would like to be
clarified otherwise that will always be a confusion in their mind.
Suppose I take an example here and then I explain you what I want to explain you.
Suppose I give here a command x is equal to 1 to 200 one like this, 1 colon 200. So, you
will see later on in the further lectures that through this command, I can get here the
numbers from 1, 2, 3, 4 up to 200. So, you can see here this is 1, 2, 3 and then up to here
200.
Now, if you try to see in the first column I have here a green box also. There are
numbers, like here this is 1 this is 19 then 37 then 55 etc. What these numbers are really
indicating? Actually the answer of this question is very simple. These numbers in the
first column they are indicating the index or the position of the first number in that line.
They simply represent the count of the starting number.
For example, if you try to see here this is 1, 2, 3, 4, 5 up to here 18 and now the next
number here is 19. So, this 19 is indicating indicated here. Then it will continue 20, 21
etc. up to a 36 and then if you try to come here it will become here 37 and this is the 37
is the first number.
Well, because these numbers are in sequence, so, that is why you are getting the 37 value
as the first value in the column also, but yeah, but these numbers which are written here
1 to 200, they can be can be anything actually. But these numbers on the inside the
bracket what they are written here they are simply trying to indicate the index of the first
value in that line. For example, if you say here 91. What is this 91?
This is the value which is coming out of 1 to 200, but in case if you try to see this 91,
which is inside the square brackets this is indicating the location of the first number in
this line. Now, there is a confusion. The confusion is this that different people if they try
to print the number 1 to 200 on their R console they will get here these number to be
different. Why this is happening?
Because actually the width of this GUI window which is here like this, this can be
controlled. This is user dependent. Means, if you want to make it smaller or bigger you
12
159
can do it. So obviously, in case if this width is reduced for example, then obviously,
these numbers are going to be partitioned at 10. So, after this the next value will come
here after this for which the 11 number will be there, but this value will be here say 11.
So, that is as simple as that.
So, for example, if you try to see here now I have reduced the width of this GUI window
and again we have here numbers 1 to 200, but now you can see here 1, 2 up to here 10
and then after that the number at the 11th position is indicated here. So, this is the
position and this is the value of the number.
So, let me try to show you this on the R console so that this type of confusion is not there
in your mind, when you are trying to work in the R software, right. So, let me try to first
make this RGUI window here like this suppose like this.
13
160
(Refer Slide Time: 19:03)
Suppose I try to make it here x is equal to suppose 1 to 100. Now, if you try to see here
this here x. You can see here these are the number from here 1, 2, 3 up to 9 and the 10
position is here. And similarly by looking at this 55, I can see the number at the first
location, the index of this number is 55. Now, in case if I try to make this window
smaller and now in case if I try to see the same thing here again; so, now, I can clear the
screen by pressing control.
14
161
(Refer Slide Time: 19:48)
And if I type here x you can see here now there are only four numbers. So, this width is
now becoming here 1, 2, 3, 4, 5 in the first row. And then if you try to see this is 5th in
the second row and so on right. So, now, ok well this may give you a confusion that why
the first number and the index number they are the same.
15
162
(Refer Slide Time: 20:11)
So, let me try to print here some more number. Say 101 to suppose 150, right. So, now, if
you try to see here like this; 101, 102 up to here 109, 9 numbers and now that now then
we have a position here 10. So, but the number here is 110. So, that means, 110 is
occurring at the 10th position. Similarly if you try to see here what is the number at 37th
place in the sequence, this is 137. So, this is the type of information is available from
here this thing.
16
163
(Refer Slide Time: 20:58)
Now, in case if I try to reduce the width and then if I try to press here x you can see here
now this is 101 to up to 102, 104 and after that at the 5th position we have 105. So, you
can see here that in the first case the 110 was in the first place in the second row, but now
we have here 105 at the 5th place in the second row. So, this is the meaning of this. So,
now, I can clear the screen and come back to our slides and I hope you have now
understood this aspect.
17
164
These are very simple thing which many people ask me. So, I thought that ok why not to
include it here right. Now, one more cautious that whenever we try to work on the R
software then we always install some packages and then working or while using those
packages we load them using the function library.
Now, suppose I quit from the R software and I close it. When I try to restart my R
software again, then all the libraries in which we were we want to work and which were
loaded earlier, they have to be reloaded. You cannot expect that once you have loaded
the library this will always be there.
Once you close the R software or you shut down the computer these libraries will be
automatically unloaded. So, many times it has happened that you are working on your
computer and something happens and then you have to restart your R software and after
that you start working, but it gives you means some error some problems and the people
are unable to understand what is happening.
Because they are thinking ok, just now we had loaded this library, but since you have
restarted the computer or the software those libraries are unloaded and you need to
reload them again ok. That is a very simple ok.
Another point of concern. In R, many times you will see the numbers like this. For
example, I am writing here 5.2345 then after that I am writing here an alphabet is small e
18
165
and then plus 7. What is the meaning of this? This number has a very simple meaning. It
is 5.2345 into 10 raise to the power of 7.
So, this part this is something like multiplied by 10 raise to the power of 7 and similarly
in case if it is written like this, 5.2345 e to power of minus 7, this means this number is
5.2345 into 10 raise to power of -7. So, this e minus 7 is like multiplied by 10 to the
power of minus 7. Yeah. Sometime people try to misinterpret it like as exponential
function or something else. Please do not do this thing, ok.
Now, when we are trying to work in the R software, then we need to clean up the
windows also. And sometime we want to remove the variable. For example, when we are
working or doing some programming in the R software we always assign some names to
the variable, right.
For example, if there is some data on the height of the student. So, I can define here a
variable like here height and I will store the data in this variable. So, now, this variable
will remain stored in the R software directory as long as you do not remove it yourself.
And now what are the consequences? The consequences is that the consequences are that
suppose you start working on another problem and you defined the variable once again
as height.
19
166
Now, what will happen? That all the values which you have stored or which you had
stored in the variable height in your earlier problem, this new variable height and its
value will overwrite it. The older values will be removed and the new values will be
there. And for that, the R does not give you any option also. R will not ask you whether
you want to overwrite or not. Once you have pressed the enter after the things are gone.
So, in case if you are trying to work sometime you want to clean up the window you
want to remove the variables and actually it is a good practice to remove the variable
names given to any data frame or any data set at the end, after you finish your work in
the R software. So, this way what will happen? This type of confusions will not be there
and yeah and then there would not be any problem. So, the question is how can you
remove the variables.
So, we have a command here rm. rm means remove, it is the short form right. So, if you
type here r m and inside the parenthesis you write the variable name, this command will
remove the variables and you remove one variable or more than one variables at the
same time. For example, if you want to remove three variables say x, y and z then you
can write down here rm inside the parenthesis x comma y comma z and this will remove
all the three variables from your R software from your working directory.
And before I go further let me try to show you here these things right.
20
167
(Refer Slide Time: 27:03)
So, suppose I try to make it here, I take here one value here x equal to 1 right. Now, you
can see here x is equal to here 1. And suppose by mistake, I write down here x equal to 3.
Now, as soon as I press here enter it will not ask me that there is already an x which was
present, but now if you try to see the value of here x this is overwritten and now it is only
3. And similarly if you try to say here y equal to 3, z is equal to here 5 and so on.
Suppose I want to remove this x, y and z. So, first option is this I can see here rm and
inside the parenthesis I write only one x. So, now, you can see here x if you try to see
what is this? This is removed and it is giving you a message that error object x not found.
That is obvious because you have removed this variable. And similarly if you want to
remove here y and z together then you can write rm inside the parenthesis y, z and now if
you try to see here why this is missing z, this is missing right.
So, this is how you can remove these variables when you are trying to work. And
similarly if you recall we also had discussed one command detach. So, this was the
command which removes the object from the search path, right. And it removes from
this from the path of this parenthesis command which are available for the R objects.
Actually this detach command is a very general command.
This is used for functions and data frame and other type of thing. Although we have not
done it here up to this point, the concept of data frame and other things, but at this
21
168
moment, I would like to inform you and later on when we are trying to work on this
concept explain you once again, right.
So, all those data frame, data sets function etc. which have been packages which have
been attached with the library function they can be removed by this detach function. And
above, all in case if you want to remove everything including the data frame also along
with the variables, you can use here the command rm list is equal to ls and then this
parenthesis and you write this inside the parenthesis.
I will not execute it here on my computer otherwise everything will be cleaned. But I am
telling you that this is the command to remove everything from your directory including
the data frames also.
So, for example, I can show you here, suppose if I try to upload a directory using the
command library cluster. So, this will load the cluster package and then if I try to write
down here detach package cluster means package and cluster then it will remove the
package from the library from the search path. And then in case if you try to use it again
it will say that ok this is not available.
22
169
(Refer Slide Time: 30:19)
Now, finally, when you have completed your job and you want to quit the R then the
command here is q and then parenthesis. So, once you try to write down here q in this
parenthesis on the R console, this will ask you do you want to save the work space Yes
or No or Cancel. So, what is the meaning of this work space image?
That means, whatever I have done up to now, whatever variables we have defined in this
session, whatever data set we have imported in this session, they will all be saved
otherwise everything will be lost in case if you press here Yes. And if you do not want to
save them simply say No that is the very simple thing. So, let me try to show you that
what happens with this here quit on the R console itself.
23
170
(Refer Slide Time: 31:22)
So, you can see here I can now clear the screen by pressing control L and if I try to make
it here quit, as soon as I say enter, so, it is asking me, save work space image, Yes, No.
Now if I say Yes or No, accordingly it will work, right, ok. So, now, we come to an end
to this lecture.
So, you can see that in this lecture that was a very basic elementary lecture in which I
have covered a number of small topics which are very important for you to learn when
you are working in the R software. Well, there are some more such commands which are
there and I will try to cover them in the next lecture. This otherwise; that means, too
many commands will make you more confused.
So, that is why I have taken this collected number of commands. Well, some of the
commands I already had taken in the couple of last lectures. Well, they were the need of
the time and the reason is this means I can share with you very honestly that whenever
we are trying to learn the software, we have two options.
The first option is that first we try to go through with the theories and then after that we
try to come back to the software and try to learn the commands or the second option is
this we simply jump into the well. That means we simply start the software and we start
doing something.
24
171
And in this process we will learn something at that moment itself and there will be a few
things that we will learn later on and for that only I always say, ok, we will try to learn it
in the forthcoming lectures. But, I personally believe that this second approach makes the
learning more interesting and it is easier for you to understand. Yes, I agree that in the
beginning there will be many commands which you have not seen.
But I have taken care that I am not trying to take any complicated commands, but I am
trying to take very simple commands which are not so difficult to understand. And as we
are moving further gradually you will see that you will learn all those commands and at
the end, there would not be no question like we will try to see it in the next couple of
lectures.
So, you try to have a look on these commands what we have learned today, try to
practice them, try to revise them and I will see you in the next lecture. till then goodbye.
25
172
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Introduction
Lecture - 07
Some More Basic Operations in R
Hello friends, welcome to the course Foundations of R Software and you can recall that
in the last lecture we started a small discussion on Some Basic Elementary Operations in
the R Software. And in this lecture, we are going to continue on the same lines. So, we
are going to demonstrate here and we are going to understand here some more basic
fundamental operations to understand the functioning of the R software. So, let us begin
our lecture and we try to learn some more aspects ok.
So, now, the first basic fundamental command which I am going to tell you is about this
greater than sign. Although, I already have told you it many many times and you know it
but in order to complete this lecture I need to inform you formally also. And I need to tell
you some more stories also behind this thing. So, this is the prompt sign that you know
that when you are coming in the R console, you can see this symbol.
173
So, this is the place after which you try to write your commands. Now after this when we
try to consider the problem that how to assign a value to a variable, then the assignment
operator in the case of R software. There are two assignment operators one is like this
left arrow and dash this and this and second operator is the usual equality sign like this
one. So, there are two assignment operators in the R software- one is less than and dash
and another is equal to sign.
Actually when R is started then in the beginning there was only one assignment operator
which was like this, less than and dash and this was the same operator which was also
available in the S plus software. So, that is why possibly in the R the same operator
continued, but recently couple of years back when this R software was updated then
along with this the equality sign was also incorporated as an assignment operators.
Well in case if you try to ask any hard core programmer then possibly, they can explain
you that what will be the difference between the use of these two symbols. But for users
like us practically it will not make any difference whether we are going to use less than
and dash symbol or equality symbol. So, now, onwards most of the time I will try my
best to use equality operator, but in many many resources and books you will also see
this operator- less than dash. So, please do not get confused both are the same thing. And
I have explained you the reason also, so you need not to get confused.
So, for example, in case if I want to assign the value 20 to the x I have to write x less
than hyphen 20. So, this is going to assign the value 20 to the x and similarly on the other
hand, in case, if you write x equal to 20 this is also going to do the same thing and this
will also assign the value 20 to x.
So, you can see here in the R console, I can show you that if you try to write down here,
x less than hyphen 20 this also gives you 20 and when you write x equal to 20 this also
gives you 20. And once you have assigned a value to a variable then it is also possible to
assign a variable to another variable. For example, in case if I choose x equal to 20 and
then I define a new variable x into 2 this star sin is the multiplication sign.
So, this 2 into x is going to be assigned to a new variable y. And after that the value of x
plus y which is also a variable, this is going to be assigned to a new variable here z. So,
174
now, with these three statements, I am able to show you that numerical value can be
assigned to a variable as well as a variable can also be assigned to another variable.
So, if you try to see here if I try to write down here y is equal to x into 2 and then the y
value become 40. And if I try to write here z is equal to x plus y then x plus y is 20 plus
40 which is here 60. So, I try to show you on the R console also, so that you feel more
confidence.
So, if I try to say here suppose if I write here x equal to 20. You can see here now the
value here is assigned to be here like this x equal to 20 and you already have done it
many many times, but now I am telling you formally.
On the other hand, in case if you try to suppose write x less than hyphen say 40. So, you
can see here what happens, now the value of x is 40. But in case if you try to write down
here once again x less than hyphen 20, you can see here the value of x will change here
to x equal to 20 right.
175
(Refer Slide Time: 05:28)
So, now, in case if I try to define here x equal to 20 then suppose I define here another
variable here y as say here 2 into x. So, now you can see here the value of y becomes
here 40. So, the value of x is coming from the first variable and then the value of y is
obtained and similarly if you try to obtain here z is equal to suppose x plus y, you can see
here the value of z comes out to be 20 plus 40 which is 60. So now, you can see here that
the values as well as variables can be assigned to any variable, right.
176
Now, when you are trying to assign a value to a variable then there are two types of
values one which are numerical and other are some characters. So, the numbers, I already
have discussed that the numbers are assigned to any variable in the usual way as we said
x less than hyphen 20 or x equal to 20. But when you are trying to assign the or assign a
character or some word or characters to a variable then they have to be assigned within
the double quotes or within the single quote.
For example, suppose I want to define a word apple a double p l e and I want to assign it
to a variable here x. So, I will write here double quotes and then within double quotes, I
will write the value of the variable for example, it is here apple. So, this will assign the
value apple to the variable x and similarly here also you can use this operator x less than
hyphen. And suppose if I say that here you want to define here apple1, you can write
down here x less than dash within double quotes, you write apple1. So, this will assign
the apple1 to the variable x and beside those things, if you want to assign the value here
with only the single quote also, that is also possible. Instead of using the double quote
you can also use here the single quotes and either you try to use the equality operator or
less than and a dash operator, they are going to assign the word apple to the variable x,
right.
So, here you can see here I have defined here like this x equal to 20 and here you can see
here within double quotes, I am defining the variable and with single quote also I am
5
177
trying to assign the variable. And in both that cases, there is no issue both are acceptable,
but when you try to look into the outcome will always have double quotes that is what
you have to keep in mind.
So, by looking at the outcome you cannot judge whether the value was assigned through
a single quote or a double quote. Now, I try to show it on the R console, so that you get
more confident.
So, suppose if I take here a variable here x is equal to apple, you can see here now this x
becomes here apple. And instead of this thing in case if instead of this one, you can see
here and now I am using here single quote. So, you can compare, now I see here apple.
So, you can see here apple and instead of this if I try to define here y less than hyphen
dash say here apple1.
You can see here this is here again apple1 or I can also use here the equality symbol like
this then you can see here this is again apple1, right. So, this is the basic structure what
we try to use in the R software. Now we try to understand another aspect in the R
programming.
178
(Refer Slide Time: 09:19)
Whenever you are trying to write down a program you are trying to handle some values.
Now as you have seen these values can be numerical values or these values can be
characters. So, now the question is that whenever you are trying to entertain any value
within the program using some command, you would like to know whether this value is
a numerical value or this is a character.
So, how to get it done? These type of situations are very common when you are trying to
write down a program. For example, when you write a program you might be getting the
data is filed from somewhere else and you have no idea. You have absolutely no idea
whether the data is in the form of numerical characteristics or some data frame or some
list etcetera. So, how to know you cannot look inside the file to know whether all the
values numerical or some values are character or some values are numerical you did a
way by which you can find it out.
So, in order to know that if a value is a number we have a command here is dot numeric
and within the parenthesis you have to write down the variable. And in case if you want
to know if a value is a character then the command is dot character i s dot c h a r a c t e r.
And for is dot numeric this is i s dot n u m e r i c and in both the cases everything is in
the lower case.
179
The outcome of these two commands is dot numeric or is dot character is going to be in
terms of TRUE or FALSE although. I will take up this issue after some lectures, but let
me tell you here that this TRUE and FALSE when they are written in the capital letters,
capital T capital R capital U capital E or FALSE in all capital letter. Or instead of TRUE
you can also write capital T and in place of FALSE you can write only here capital F.
These are the reserved words. Reserved words means these words are reserved to
indicate only the TRUE and FALSE which are the logical variable. So, we will talk
about the logical variable after some time, but at this moment I can explain you that you
cannot use these two words TRUE and FALSE all in capital letters for any other thing.
And TRUE means TRUE and FALSE means FALSE one thing what you have to keep in
mind that capital T capital R capital U capital E and capital T and all other things in a
small letter small t r capital small letter u e etc. any other thing they are different. So,
only the capital letters are the reserved words.
So, the answer to these two commands is going to be in terms of a logical variable as
either TRUE or FALSE and then you have to understand what it is trying to say. For
example, if I try to take here x equal to 20. Now you know that this is a number, so this
is a numeric value.
So, now, if you try to use here the command is dot numeric and inside the parenthesis
you write this value x, the answer is going to be here TRUE, so that means yes, the x is a
numeric. And now in case if you try to use this command is dot character and within the
parenthesis you write x.
So, x here is 20. So, do you think that x equal to 20 is a numeric or a character this is a
numeric and it is not a character. So, when you try to use this command is dot character
inside parenthesis x then it will give you the answer FALSE. That means x is not a
character well one thing you have to keep in mind beside this number.
So, that you have to be careful when you are trying to interpret it. When you are trying to
say is dot numeric is true, that means it is numeric. And when you are trying to say is dot
character is FALSE. So, whatever is false that is FALSE, but then what is TRUE that we
180
do not know that can be character or that can be something else, that can be number that
can be something else.
And if you try to execute these commands on the R console, you will get here the same
thing what I shown you here. Now I try to take one more example, now I try to take here
a character and I write here apple within the double quotes. So, now, you can see here,
this is a character. Now I use the command here is dot character y answer is TRUE yes
apple what you have given here in the variable y, this is a character.
And when you try to use here the command is dot numeric within the parenthesis y then
this is indicating, it is FALSE. Obviously, y is apple which is a character, so when you
are trying to check whether this is numeric it is telling you- No it is not numeric and your
statement is FALSE. But now if you ask me what is this? Whether it is matrix data frame
list etcetera that I do not know. So, you have to interpret the result in this particular way
and if you try to do it on the R console also, you can see here, you will get the same
outcome.
And then the question comes over here. In case if you have got a number which is either
numeric or character, can you interchange its role, why not? Means suppose I am getting
a value here for example, x equal to 20. Is this a number for that I use command here is
dot numeric it is true yes this is a number. But somehow I want to use this value as a
181
character. What is the difference between a number and a character. That in case if you
have two numbers, 20 plus 30, both are numbers, so their sum is going to be 50.
But if something is character apple plus say banana or apple plus 50, you cannot find the
sum. But somehow in some value in some experiment 20 is going to indicate some
category. So, I want to use this value as a character. But the default value inside the R
software or any mathematical software that is always taken as mathematical. So, the
question is how can you convert a numeric value into a character?.
So, for that in case if you want to convert a value as a number, the command here is as
dot numeric and within parenthesis you give the variable name and in case if you want to
convert a value to a character, the command here is as dot character and within
parentheses you have to give the variable name. And both this command as dot numeric
and as dot character they are going to be written in the lower case alphabets.
So now, in case if you try to take the same example, suppose I am trying to take here x
equal to 20 which is a numeric, but now I operate the command here as dot character
within parenthesis x and whatever is the outcome I try to store it here as a y. Then now
after if you check, whether this y is numeric.
So, for that you use the command here is dot numeric y and you will get here FALSE,
although x was numeric but now y is no more numeric. Now you would like to check
what is this y? Is this character then the answer comes here yes, why? Because you have
operated the command as dot character to change a number into a character. And so y
has now become character and whose value here is like this, you can now see here your x
was 20.
But y is now given as 20 within the double quotes, you can see here. So that means, this
is now the character you can see here this is the screenshot of the same operation, right.
10
182
(Refer Slide Time: 17:52)
Now in case if you try to do the opposite suppose I have got here a character and I want
to convert it into a number, what happens? So, suppose I take here the variable here
apple which is a character and I store it here as a y. Now I try to operate the command
here is dot numeric y. That means, I want to know whether this y is numeric or not.
The answer comes out to be here FALSE. Means obviously, y is apple it is character so it
cannot be numeric. So, that is why the answer is coming out to be FALSE, but now I
would like to check is dot character y. That means, whether y is a character, answer
comes out to be here true yes, y is a character.
Now I try to convert this y into number. So, I try to use the command here as dot
numeric and within parenthesis here y and then I try to store this value inside this new
variable z. So, now, it is trying to give us a warning message. Whenever you are trying to
work in the R software you will sometime get some messages. These messages are of
primarily two types, one is here warning and second category is error.
So, what is the difference between warning messages and error messages? Warning is
just like a warning. Warning means when you really want someone, you are not exactly
going to do the same thing, right. And when you are getting a error message; that means,
this is a mistake, this is an error and the program cannot execute after this. So, when you
are getting the warning message; that means, the program will continue.
11
183
But it is warning you that something is happening please try to look, please be careful,
please be watchful. But when there is error message then the program will simply stop
there, it will not move forward. Because there is a mistake there is an error unless and
until you correct it, the program will not run. So, it is now giving you here and is
introduced by coercion means forcefully.
And NA is a sort of something like missing value- not available which we are going to
discuss once again I will say in the further lectures in more detail. So, now, if you try to
see here that is dot numeric z, it is telling you yes. Now z is numeric, but in case if you
try to find out the value of the z, now tell me one thing. You are trying to convert apple
into a number what do we expect? Is it possible- no?.
That is why when thing was possible, but when you are trying to do it forcefully then z is
trying to replace this word by another number which is NA and in case if you try to see
here is it a character answer is FALSE. Because NA is not actually a character it looks
like a character NA at this moment, but after I give you the details of two aspects of two
reserved works NA and NULL then you will understand that just like TRUE and FALSE
these are also the reserved words.
So, this is how we are going to work in the R software when we are trying to convert the
numeric into character or vice versa. And you can see here this is the same screenshot
which I have explained you here, but then I would like to show you it on the R console
also. So, let us try to understand these things. So, let me try to copy this command, so
that I can save some time on the R console.
12
184
(Refer Slide Time: 21:44)
So, let me try to remove these things by pressing control l now you know. So, if I try to
take here x equal to 100 and if I try to do here is dot numeric x answer is TRUE. And if I
try to say here is a character it is FALSE. But in case if I try to take here y as say here
apple and now in case if I try to see here what is y? Is it at numeric? Answer is FALSE.
But when I try to say here is it a character then it is coming as yes this is character.
13
185
So, now because you already have seen that x equal to 100 is your number and y equal to
apple is your character. So, I would like to convert them number into character and
character into number. So now, you can see here the command here is in case if you try
to convert a number into a character then the command here is as dot character. So, I try
to use here as dot character x.
So, you can see here this is now within double quote it is 100. So, within double quotes
as you are trying to say this becomes a character. So, by looking at this 100 and this 100
you can very easily identify that which of them is a numeric and which of them is a
character. And similarly, in case if you want to change this character into a number then
what you have to do- your y is here is apple and you want to convert into a numeric.
So, you can see here this is giving you the value here is coming out to be here NA, but it
is giving you a warning message. That NAs are introduced by quotient means forcefully,
means you are forcing R to do something which is not acceptable, but it is trying to do
something to follow your orders and it is trying to replace the apple by the value NA,
right. So, this is how the things work, when we are trying to work with this characters
and numbers in the R software.
Now after this one more aspect. Whenever you are trying to do any programming then as
a good programmer you always like to give some information, which can be used either
14
186
by yourself or by somebody else. For example, suppose I am sitting at IIT Kanpur and I
am writing a program in which I have assumed three variables, x is height y is weight
and z is age.
Now when I am trying to send this program to somebody else outside IIT Kanpur, how
do I communicate and if I try to communicate in a different file, that will be very
confusing. So, what I will do, that within the program I would like to mention that x is
this y is this and that is this. So, how to get it done?
So, in R software in case if you write this hash symbol in the first place on a line, then
after that whatever you are going to write that will become only a comment. And no
mathematical operations will be done over that. For example, in case if I am trying to
write down here hash mu is the mean.
So, you can see here in case if you try to enter here, nothing will happen no warning
error etcetera. That is accepted as soon as you write here hash all the characters after this
up to the end of the line they are simply ignored and R will escape this line. R will come
to know that I do not have to operate on this line.
And similarly in case if you try to write down here hash x equal to 20 or x hash less than
dash 20, then it is going to be considered only as a comment. For example, if you try to
write like here this. Then it is going to take it here as a numerical value, a numerical
value of 20 is assigned to x. But when you are trying to write down here this hash then it
is not going to be like this, right.
15
187
(Refer Slide Time: 26:00)
And one very important thing, R is case sensitive. The small letters and capital letters,
they are different. For example, in case if you try to write down here X equal to 20 in
capital letters and small letters they are going to be different for example, I can show you
here on this screen output and then I will try to show you in the R console also when you
are write trying to write down here x this is say lower case alphabet lowercase x.
You are trying to write down here lower case x is equal to 20 and you press here x it will
come out to be here 20. But when you try to write down here capital X and enter it will
say error object x not form because you have defined it to be small x, but you are trying
to be here trying to find out the value of your capital X. But in case if you try to assign
here the value capital X equal to 30 and then if you try to see the value of capital X, this
will come out to be 30 and the value of small x will come out to be 20. So, I will try to
show you this thing.
So, you have to be very careful when you are trying to work in the R console that
lowercase alphabets and uppercase alphabet, they are going to be different. And that was
the reason that many times whenever I am trying to explain you, about the functions
name and commands whenever there is a small letter or capital letter or lower case or
upper case; I am informing you very clearly that, this is what you have to do.
16
188
So, before we move forward, let me try to show you these two commands on the R
console and then we will move forward with some more commands. So, now, you can
see here.
Suppose if I write here mu is the mean, you can see here it will not understand it will say
some error some unexpected symbol in mu is something like this. But in case if I
introduce here the symbol hash then if I try to do it, it will, say it will just accept it that
there is no issue.
Now, in case if I try to say here 20 plus 40 this will come out to be a 60. But in case if I
try to put here a hash symbol before this 20 plus 40, let us see what happens? Nothing is
happening, there is no value. So, that is the difference that when you are trying to put a
hash at the first place then R is not really going to execute whatever is written after this
right. Now suppose if I say here capital K is equal to 5.
And I try to write here, the value of here K like this and I try to write down here hash K
there is no value there is no outcome, even if I try to write down a small k here it is there
is no outcome, right. So, that is the thing what I want to show you here now. After this I
try to show you, what is the difference between lower case and upper case alphabets. So,
let me try to take it here.
17
189
(Refer Slide Time: 29:17)
Suppose if I try to take here capital L, this is equal to 30, right. And then I try to press
here what is a small l, well I am not taking small x and capital X because you can see
that it may be difficult for you to identify very clearly the difference between small and
capital X, so that is why I have taken here l, right. But in case if I say here l is equal to 30
or say 20 you can see here.
Now there is here l and if you want to press it here capital L it will give you this thing,
right and similarly in case if you try to take here another variable here M is equal to say
30. And if you try to find out the value of a 2 into small m, it is saying no. It is not found
because you have defined the value here as a capital M and you are giving here as a
small m.
So but if you try to change it to capital M it will give you 60 that is what you have to
keep in mind when you are trying to do the programming, these are very common
mistakes and but sometime they take very long time to find it out. Similarly, there is
another common mistake when people try to write 0 or o.
18
190
(Refer Slide Time: 30:26)
Once you write 0 and you want to write o or if you want to write o you write 0, it is very
difficult to find. And particularly when you are trying to write down a mathematical
expression where you have written o in place of 0, then it takes a considerable amount of
time to find out the mistake, ok.
Now I come to another aspect. This is about combining the values in a data vector. Now
you see I am using here the new terminology data vector. Do not get confused with the
word vector we have vectors in physics, vectors in vector calculus etc. But this is here
are data vector data vector means it is trying to combine more than one values.
For example, in case if you want to combine 5 values 1, 2, 3, 4, 5 and you want to give it
as an input in the R software. Then the command here is that you write here small c. And
then within the parenthesis, you try to write down all the values. So, this will combine all
the values 1, 2, 3, 4, 5.
And in case if you try to follow something else I can show you with this screenshot here
what will happen. Suppose if I want to combine 1 2 3 4 5 in our data vector y and if I
simply give here y equal to 1 2 3 4 5 you can see here the outcome here is error.
Similarly, if I try to give it here y is equal to 1, 2, 3, 4, 5 within the parenthesis, even then
there is some error.
19
191
And in case if you write y is equal to a small c means lower case c and within
parentheses 1, 2, 3, 4, 5 then you try to see this y here is like this. So, remember one
thing very important for the R programming whenever you want to give more than one
values in the R software, try to use this command lower case c and try to define the data
vector using this command c.
This is very important, but before that I try to show it to you on the R console also. So,
that you are clear about it.
20
192
You can see here if I try to define here y is equal to suppose 1, 2, 3, 4, 5, you can see
here it is not giving me any value. And even if I try to make it here parenthesis it is
giving me error. But when I try to write down here small letter c, lower case c then it is
assigning the values, right.
So, this is the way and whenever you are trying to do the mathematical operation and
other types of operations, unless and until you give the values in this format using the c
command there will be trouble things may not really work. So, you have to be very
careful when you are trying to do it, ok. Now I would like to give you one more
command this is about mode m o d e mode.
This command mode explains the type or storage mode of an object. For example in case
if you try to say here x equal to 6. So, this is a number, so the mode of x is numeric. So,
in order to find out the mode of a variable you have to use the command mode m o d e
and within the parenthesis you have to write the name of the variable.
Now here I would like to caution, to give a caution to all of you that when I am trying to
use the word mode sometime people get confused that ok, it is a mode like mean,
median, mode, that we study in statistics. Because in order to find out the mean the
command is mean in order to find out the median the command is median.
But in order to find out the mode the command is not the mode it is something else. So,
be watchful in your childhood you have learnt the chapters like, modes of transportations
you say that ok there are different ways for the transportation you go by ship, go by air,
go by train etc. So, this mode is like that mode it is not the statistical mode.
Where you try to find out the value corresponding to the maximum frequency be careful.
Similarly in case if you try to take a character like as here I try to take a variable y in
which I sign the character as apple then the mode of this y mode of this apple will be
character. So, this is another operation mode which helps you when you are trying to do
a programming and when you are trying to get the data from some external sources.
Because you will be getting a file which will have say thousands and millions of values
and it is not possible to go through with a each and every number in the file to know the
21
193
mode of the value. And based on that you have to do the mathematical operations and
this is here the screenshot of this thing. I will try to show you on the R console also.
You can see here if I try to see here x equal to here 5 then the mode of here x here is
numeric.
But in case, if I try to take here y here as a apple then mode of y here is character. But in
R, we have some more modes that we will to use and learn whenever we need them, they
are logical, integer, double, complex, raw, character, list, expression, name, symbol and
22
194
functions. So, we are not going to do it here, but then whenever you need such numbers
you can always find it out.
Now there is another command for this mode that is the storage mode this command
returns the storage modes of its argument. That means whatever is written inside this
parenthesis whatever is the mode of storage of this value that is informed through this
command. Actually this command is usually used when we are trying to call the
functions which are written in other language like as C or FORTRAN. And this actually
ensures that the R objects have the data type as expected by the routine being called by
those programming languages, right.
23
195
So, we are not really going to use it here much in this course, but it is very important for
you to know. For example, if the mode is logical then it is storage mode is also logical
when the mode is numeric then the storage mode is integer or double. When the mode is
complex then the storage mode is also complex, when the mode is character then the
storage mode is also character, when the mode is raw then the storage mode is also raw.
So, this is for your information that you must know actually. For example if you want to
know the storage mode for example, if I try to say here x equal to 6 which is a numeric
then if you try to use the command here is storage dot mode and within parenthesis you
write this x then it will give you the answer double.
Similarly, if you try to take a logical variable which I just explained you as small x equal
to TRUE which is a logical variable, so the storage mode of this x is logical. And in case
if you try to, take here a character like as x equal to here apple. Then the storage mode is
here character and the same thing you can see this on the screenshot of this software also.
So, before I go further let me try to show you these things on the R console also. So, that
you are more convinced, right.
24
196
(Refer Slide Time: 37:59)
So, if I try to take here x equal to here 78 for the this is a number, so the storage mode of
x is double. But in case if I try to take here y is equal to here apple right, then the storage
mode of this y here is character and so on. So, this is needed when you are trying to write
down the programs many times for the mathematical operations, you need to do it and
based on that you try to make your operations ok. Now whenever you are trying to do
some calculations.
25
197
For that you write a programming sometime you get the answer as infinity and many
times, this creates a problem and while you are working with a longer program. Then
sometime in between at some place you are getting some value infinity and it is not
giving you the output. At that moment we always would like to know that somehow if I
can find out the outcome of this command is coming out to be finite or infinite.
Because it may happen that you are trying to divide some number by some number, but
somehow during the computation that number has become so small that it is almost 0.
So, when the number divided by 0 that will give you infinity. So, how to know all these
things?. So, in R software in case if you want to know whether our number is finite or
infinite, we have a command here is dot finite or is dot infinite. For example, you know
that when you are trying to divide 3 by 0, the answer is infinity.
That you can see here in the R if you try to divide 3 divided by 0, the answer will come
out to be Inf. So, which indicates that infinity right and you know that if you try to add 5
in infinity the answer will again be infinity or if you try to subtract divide multiply
anything it will remain as infinity. So, if you want to know that there is a variable x is
equal to 5 plus infinity 5 plus inf. Then I just want to know what will the outcome.
So, I am asking, is dot finite this x is obviously, we have given here infinity though x is
not finite. So, it will give you the answer here as say FALSE and then if I ask, is dot
infinite x it will give you the answer TRUE yes it is infinite, right. So, and you can see
here this is the screenshot and if I try to show you these things on the R console and if I
try to define here.
26
198
The variable here I say x is equal to suppose 6 plus infinity, right. So, you can see here x
is infinity and if you want to find out is, x infinite TRUE and if you want to know is x
finite answer is FALSE. So, this is how you can use such commands while doing the
programming. And that will help you, right, ok.
So now, we come to an end to this lecture and now you can see here in this lecture we
have understood a variety of things. And these are a small commands which are trying to
give you some information about the intermediate steps also. So, my request will be that
you please try to go through these commands, try to execute them and try to understand,
what are they trying to do, that is more important. And I am not saying that these are the
only commands which are possible in the R software there are many more commands.
But I expect that if you know these many commands after that if you want to know more
commands and if you consult any book or any source, goods source, it should not be a
difficult thing for you to learn.
At least you will understand what is really happening once you know what is this is dot
character or as dot character after that if you come to know that there is a command like
is dot matrix or as dot matrix, you will automatically know that it is trying to understand
the character or the characteristics of this variable, whether this is a matrix or not.
So, that was my objective to choose and to give you some representative commands, so
that your further learning in the R software become easier. So, you try to practice it and I
will see you in the next lecture till then, good bye.
27
199
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 08
R as a Calculator with Scalars and Data Vectors: Addition, Subtraction,
Multiplication and Division
Hello friend, welcome to the course Foundations of R Software and you can see that up
to now we have entertained different aspects of R software which are pretty elementary,
but I believe that they are important to understand those points so, that you get familiar
with the functioning of R software. Now, as we have discussed many time that R is one
of the most popular software for doing computation, simulation, calculations etc.
So, from today we are going to begin to learn different aspects of computations in the R
software. Now, R has a very unique capability. R has a way of calculating the scalars and
vectors which is quite different than many of the software. And, that is why it becomes
important for us to understand that how R is functioning and how R is doing the
calculations.
Once you understand that how R is doing the calculation, then it will be easier for you to
write your own programs and you can write the functions to compute something in the
way you want. So, now, when we come to the aspect of calculation, there are several
possibilities. First possibility is that R works just like a calculator; that means, you are
trying to add, subtract, divide, multiply etc.
Scalars that means 1 is scalar, 2 is scalar, 3 is scalars etc.; second option is that instead of
scalars you can choose data vectors. So, now, there are two possibilities that how the
mathematical operations are going to be between a scalar and a data vector. And, the
second option is how the calculations are going to be done when we are trying to take
data vector versus data vector.
So, now, that is our objective; that this is what we want to learn that how R is making
some computations. And, after doing the computation you will also learn that there are
some built in functions which can compute some of the standard things without any
200
problem, without any programming, without writing any mathematical computation
functions. So, now, let us begin our journey and we try to understand all these things one
by one.
So, today in the lecture we are going to understand how R works in the most simple way,
I am sure that you all have ah used the calculators, simple calculators. So, this R is also
working like a calculator. So, that is what we are going to understand in the lecture today
and after that I will try to take up scalars versus data vector, data versus data vector etc.;
all those cases. So, let us begin our lecture.
So, as we have discussed that R is a very good software for doing different types of
computations, calculations etc. So, R can perform all type of standard calculations like
addition, subtraction, multiplication, division, power operation etc. And, R has a built in
functions also for doing some computations and R can operate over scalars as well as
data vectors. So, the computations in R can be carried out with scalar versus scalar,
scalar versus data vectors, data vector versus data vectors.
201
(Refer Slide Time: 04:07)
So, now let us try to understand. So, instead of giving you a theory, I will just try to take
a couple of examples so, that I can show you that how the things are happening and how
the computations are made. So, we are now going to begin with the most simple
operations and we are going to consider the aspect of scalars versus scalars. Yeah, I am
trying to take here very simple example, but these examples can be extended to any
level, ok.
So, first I try to show you how you can do the addition. So, addition is very simple, the
symbol is the usual plus symbol. So, in case if you want to add 2 and 3, you simply have
to write 2 plus 3 on the prompt side. And, you as soon as you press enter, you will get
here the outcome 5; so, this is for addition. And similarly, if you want to do the
multiplication, then the usual symbol here is star. Well, in mathematics we use the
symbol like here this cross, but in calculator, you always use the symbol star.
So, in case if you want to multiply 2 and 3, you simply have to write 2 star 3 and as soon
as you enter, it will give you the value here 6. So, you can see here this is the screenshot
of the outcome and yeah, I will try to show you it on the R console also, ok.
202
(Refer Slide Time: 05:37)
Now, similarly if you want to do subtraction, then we have the usual symbol minus sign.
So, in case if you want to subtract 3 from 2; so, you have to simply write 2 minus 3 and
then as soon as you enter, the answer will come out to be here minus 1. And, for division
the usual symbol in mathematics is like this, but in calculator as well as in the R software
that the symbol is like here slash sign.
So, in case if you want to divide 3 by 2; so, it is 3 slash 2 and then as soon as you enter it
will be here 1.5. And, beside those things R can also follow the usual mathematical
operations when you try to take more than 2 digits, more than 2 numbers as well as if
you try to take different combinations of this mathematical operations. For example: in
case if you want to have a computation in which you involve addition, division,
multiplication, subtraction.
So, for example, if I try to write here 2 star 3 minus 4 plus 5 divided by 6. So, in this case
the usual mathematical rules are applied and R also follow the same rule what have been
taught in our class and using those rules, it will compute the expression. For example: as
soon as you press here enter you get here the value 2.8333 and this is here the screenshot
of the same outcome, right. So, that you can see here.
203
(Refer Slide Time: 07:24)
And, yeah before we move forward let me try to show you these computations on the R
console. So, let me try to show you here suppose if you want to add here 5 and 6. So, you
can see here that it is coming out to be here 11. Similarly, if you want to it add here 78
plus 98 plus 67 plus 64 means, it will give you the value of 307. And, similarly if you
come on the aspect of subtraction, suppose if I say here 6 minus 4 you can see here this
is 2. Suppose, if I take here 56 minus 98, this is going to be minus 42.
204
And similarly, if you try to take here 45 minus 65 minus 78 minus 65, you can see here
the answer comes out to be here minus 163. So, you can see here whether I am involving
2 numbers or more than 2 numbers, R can handle it in the usual way as a calculator does
it.
Similarly, now in case if I look on the multiplication part. So, this will be here 8 into say
3, 8 3’s are 24. We can see here like this. And, similarly if you try to say here 8 into 5
into 4 into 9, this will give you the answer 1440. So, you can see here there is not much
difference in the mathematical operations of the R software in comparison to any
calculator, ok. So, after this I try to look on the division part also for example, if you
want to divide 8 divided by 2, this is 4.
And, similarly if you divide it by here 2 divided by 8, this will be here -0.25, if you
divided by here minus 2 by 8, the answer is -0.25 and so on. So, let us try to see what
happens if you try to say this divided by -8, let us try to see what happens. So, you can
see here that it can handle this thing also, yeah it can handle the brackets also. So, for
handling the brackets in the mathematical operations, the rule is very simple.
Same rule which you have been taught in your elementary classes BODMAS; B O D M
A S. The same rule is applicable here. What does this BODMAS mean? BODMAS is B
for Bracket, O for Orders or say powers. Sometime, we write 2 raise to the power of 3
205
like this, then Division, then Multiplication, then Addition and then Subtraction. So, this
is the order in which the mathematical operations are formed using the BODMAS rule
right. And, the mathematical operations with multiple operators are solved from left to
right in this order. So, that is the same thing which you have learnt in your elementary
classes. So, there is nothing new for you to learn, the only thing is what you have to
understand how are you going to execute it on the R console.
The only difference between the usual mathematical operations and in the R software is
that here in only this parenthesis mean this type of brackets are only used in the
mathematical operations in the R software including in BODMAS. This curly brackets
like this one or the square bracket like this one, they are not used in doing calculation in
the R software. You will see later on that these two symbols, this curly bracket and a
square bracket they are used for some other job. So, that is the reason that we do not use
them here in mathematics, right.
For example, if I try to write down here an application expression like 2 plus 3 inside the
parenthesis into 5 plus 5 minus 10. So, you know first of all this expression inside the
bracket that is going to be solved. So, this will become here 5 and then 5 into 5 is 5. So,
this becomes here 25 and 25 and then addition. So, 25 plus 5, this become 30 and then 30
minus 10, this is here 20. So, you can see here that the same BODMAS rule is operated
206
here and yeah in case if you want to have more brackets, somehow if you have this type
of operation.
So, you can see here that I have written here this expression, but you have to understand
how these brackets are given. So, you have to always keep in mind that whenever you
are trying to give an opening bracket, you also have to give the closing bracket; open
parenthesis, closing parenthesis, right. So, if you try to see this is the parenthesis for 2
plus 3 and similarly if I try to see here, this is the parenthesis for this in red color and this
is the parenthesis in black color like this.
So, these two are paired, these two are paired and these two are paired. And, we know
that in mathematic that whenever we try to do the bracket operations, this start from the
center means if you have like this thing, this type of brackets. So, operation will start
from here first here inside this bracket and then it will go to the second bracket, it will
compute whatever is in this bracket and then it will go to the third bracket and so on.
So, that is the usual way of doing calculations and the same thing is being followed in
the R software also and you can see here that here we have the screenshot of the same
operation. So, you can believe that if you try to do this you will get the same outcome
and yeah means you have to verify also. So, please do it yourself and see whether you
get the same thing or not.
207
So, before I move forward, let me try to give you these examples on the R console also.
So, that you can be confident that these things are working.
So, if you try to see here, this will give you the answer 20 and yeah in case if you try to
make here more brackets here if I try to make it here more brackets. For example, if I try
to say here like this and then if I try to make here like this one more bracket here, like
this here you can see here the answer is 20.
So, that is how means you can do and if you want to make it here like here division also.
So, you will see here that the things are changing; now it will be 30 divided by 10; so,
you get here 3. So, that is the same way as you have learnt the arithmetic, the same rule
is being followed here also. Now, after this a very basic fundamental thing that when you
are trying to do mathematical calculations then in the R software blank space has no
value.
For example: in case if you try to write down here 2 plus 5 like this. So, you can see here
there is no blank space here, I am simply writing adjacent 2 plus 5 without any blank
space, it will give us the value 7. And, in case if you try to give here this blank space
here and here then the answer is going to be 2 plus 5 which is 7. Even if you try to
increase the blank space here like this after and before the plus symbol, still the answer is
going to be 7.
208
And, even if you have more space on left hand side of the operator and no space on the
right hand side of the operator, still the answer is going to be the same 7 and this is here
the screenshot. But, remember one thing this is true only in the mathematical
calculations. What does this mean? Do you remember that earlier we had done two types
of values, one are numerical values and another are characters.
We had taken couple of example using the word apple, right. So, when you are trying to
write something as character within the double quotes then this blank space will be
printed as such, that will be treated as such. For example, if you try to write down here
apple like this and if you leave some space and then you try to write down here apple,
then these two will be shown on the screen and R will also consider these 3 additional
blank space and the operations.
What type of operation this will involve? That we will try to see in the forthcoming
lectures when we try to print the mathematical calculations and print the characters, at
that time, you will see. So, the bottom line what you have to understand here is that the
blank space has no role, when we are trying to do mathematical calculations.
And, if you want to see it on the R console also I can show you here; that if you try to see
here 6 + 7, you can see here I have not given here any space. But, if I try to give here
these many space here 6 + 7, you can see here this is again 13 or even if I try to give here
no space in the first shot and then so much space on the right hand side of the operator,
that is again going to be 13 and yeah this is true with all the operators.
10
209
For example: if I try to take here 6 - 7, this is here minus 1 and even if I write try to write
6 blank space minus say here 7, still you can see here this is -1. And, similarly for the
multiplication also like a 6 into 7 without leaving any blank space, this is 42. And, if you
try to take it here 6 and some blank space and then some blank space with here 7, this is
here 42. And, the same thing is with here division also 6 divided by 2 is 3 and if you try
to take here this 6 and say divided by here 2, this is again 3.
11
210
So, that makes the things more simple to understand that how the things are happening in
the R software. And, once now you have understood that how the operations between
two scalars are being performed, now we can extend this concept when you have a scalar
and a data vector. Do you remember what was data vector? We had discussed it in the
earlier lecture, that if you want to create any data vector you have to use the c operator,
lowercase c.
And, within the parenthesis you have to give the values like 1, 2, 3, 4, 5 etc. and if you
try to store it in a variable say x, then now this x is going to be the data vector. So, now I
am going to make such operations, when there is an involvement of scalar and a data
vector. And, after that I will explain you when there a is an involvement of data vectors
only and once you understand these operations, then you can conduct any type of
operation involving scalars, data vectors, one data vector, two data vectors and so on,
right.
So, and then R has got a little bit different approach of computation when data vectors
are used. So, my objective in this lecture is that I want to explain you that how R
functions. Once you understand that thing then using that logic you can write your own
programs and then you can write your own functions without any problem. So, in order
to explain it, once again I will try to take some examples and through those examples I
will try to show you. And, I promise you if you understand only the first example, after
that understanding all other example will become very simple and straight forward for
you.
So, now let me take here the first example. So, if you try to take here, I have taken here a
data vector which contains four values 2, 3, 5 and 7 and they have been combined using
the operator c and now I try to add here a scalar 10. So, if you enter here, you will get
here an outcome like 12, 13, 15, 17. So, now, you have to understand what R is trying to
do. Actually, what is happening that when you try to write down here 2, 3, 5 and 7 and as
soon as you say here plus 10, this 10 is added to all the numbers.
So, this operation is like this here 2 + 10, 3 + 10, 5 + 10 and 7 + 10. So, what you have to
see what is happening? The scalar is entering into this data vector inside this parenthesis
and it is operating with the operator which is given here on each of the element inside the
12
211
data vector and that is why it is becoming here the outcome comes out to be here 12, 13,
15, 17.
Now, in case if I ask you at this stage, if you have understood the basic fundamental that
whenever there is an operation between a data vector and a scalar, then the scalar is
entering inside the data vector and it is trying to make the same operation on each of the
element. What is this mean? This means that either I am trying to use here addition,
subtraction, division, multiplication; don’t you think that same logic will hold true? Yes,
and that is why I am trying to spend time on the first calculation, after that you will see
this will become very simple and very straight forward, right.
Similarly, if I now ask you that what will happen with the subtraction? You try to take
here a data vector which has 4 values 12, 13, 15, 17 and you try to subtract it by a scalar
10. Now, I do not need to explain you, now you can understand it very easily that now
this -10 is going to be operated on each of this element, right.
So, this will become here 12 - 10, 13 - 10, 15 - 10, 17 - 10 and the answer will come
here, this you can see here 2, 3, 5, 7, right. So, once again I can explain you, if you try to
see here this is here 12, 13, 15 and 17. And, now this -10 is jumping inside the data
vector and it is trying to operate on each of the element that is all. So, the answer comes
out to be here like this.
13
212
(Refer Slide Time: 22:50)
And, similarly if I ask you for the multiplication, do you really think that do I need to
explain you? The same logic is going to work here and in case if I try to take here the
data vector of 4 values 2, 3, 5, 7 and I try to multiply it with the number 10 then;
obviously, this operation multiplication by 10, that is going to be operated on each of this
element.
So, what will happen when 2 is multiplied by 10, this gives you an answer 10, when 3 is
multiplied by 10, this gives you an answer 30, when 5 is being multiplied by 10, this
gives you here the answer 50. And, when 7 is being multiplied by 10, it will give you the
answer 70 and this is the same thing which happens here.
14
213
And, in case if you try to think about the division, do I need to explain you? Actually, I
do not because, in case if I try to take here one data vector consisting of four values 12,
13, 15, 17 and if I try to divide it by here 10, then this operation that division by 10 that
is going to be operated on 12. When operated by 12, this will become like this 12 divided
by 10 and the answer will come out to be 1.2.
When it comes to 13, this is 13 divided by 10 and the answer comes out to be 1.3, then it
is operated on the 15, then 15 divided by 10 and the answer comes out to be 1.5 and then
17 divided by 10, this comes out to be 1.7 and this is the screenshot here. So, you can see
here it is not a very difficult operation which is happening. But, as I said I am not
interested in doing this operation. My objective is that you try to understand what is
happening with these operations.
And, now let me try to show you these operations on the R console also. So, if I try to
take here some numbers say 10, 11, 12, 13 and if I try to add here with some number
here 5. So, you can see here means this 5 is going to be added in each and every element
of this data vector and you have this answer 15, 16, 17, 18.
And, similarly if I try to subtract 5 from the same data vector then once again this
operation of subtracting by 5 will be operated on each of the element 10, 11, 12, 13. So,
15
214
what do you expect? That every element will be subtracted by 5 and your answer will be
5, 6, 7, 8 and similarly if you try to do here with the multiplication; what will happen?
That this multiplication will be happening on each of this element 10, 11, 12, 13 and if
you enter here, you get the answer here 50, 55, 60, 65. And, similarly if I try to take here
the division so, I can take the same data vector and I try to divide every element by 5. So,
you know that when this 5 is going to be operated over 10, 10 divided by 5 is 2.
And, similarly if you try to enter here, you will get here the answer 2.0, 2.2, 2.4 and 2.6.
So, now, you can see here that doing operations between the data vector and scalar is not
so difficult and it is not even division, addition, division, multiplication, subtraction; they
all are pretty simple. Yeah, there are some more operations like as power operation and
some other type of arithmetic operation, that I will try to show you in the forthcoming
lectures also.
So, now here I stop in this lecture, that you try to take some arbitrary data vectors
yourself and try to repeat these operations. Now, when you are trying to repeat these
operations you know what is the outcome; for example, you know that what will be the
multiplication of 2 into 10. So, try to see the same thing in the R software and try to see
are you getting the same thing.
Remember one thing, the human being is the supreme and R is simply following the
rules of the human being, the rules which are created by the human beings. So, the rules
of mathematics are not created by R software, but the rules of mathematics are created
by human beings and R is following them. So, before doing any computation, before
doing any mathematical calculations, you have to make sure that R is doing the same
thing what you want.
And, that is always a good practice in the programming that whenever you write the
program, you are essentially doing some computation. So, before executing it on the
final data, try to take couple of values and try to do the calculations manually and try to
see that whatever you are doing it manually is R giving you the same outcome and if not,
then you have to check where is the problem in the programming. So, you try to practice
it and I will see you in the next lecture. Till then good bye.
16
215
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 09
Calculations with Data Vector: Addition, Subtraction, Multiplication and Division
Hello friend, welcome to the course Foundations of R Software and you can recall that in
the last lecture, we started a discussion on how to do calculations in the R software and
we had considered two cases. First was that how can you do the calculations between
scalars or among scalars? And, how you can do the calculations between our data vector
and a scalar?
And at this moment we are restricting ourselves to addition, multiplication, division and
subtraction, four operations and later on, once we understand these basic concepts that
how R works, then I will try to take some more type of calculations. So, now, in this
lecture, we are going to consider, how R does the calculations when there are data
vectors?
For example, if there are two data vectors, then how the additions are done? How
multiplication, division, subtraction are done? When we are trying to work in the R
software and we are trying to do these type of calculations, so as we have seen that when
we took scalar versus a scalar, that it was just like a calculator. But, when we took data
vector versus a scalar, then the operation was little bit different. And it was not the usual
operation or the way in which all most of the software’s work.
Similarly, today when we are going to consider the operations with the data vectors, you
will see that the way R works, it is little bit different. And you have to understand the
process, that how R behaves? How R follows the instructions?
And I will try to take here a couple of examples and through those examples, I will try to
illustrate that, how R works and what are the rules which R follows and that in turn will
help you that when you are trying to write your own program and if you want to do a
certain type of calculations, then you know, how to write this calculation so that R can
understand and do the same thing which you want.
216
So, let us begin this lecture and I will try to take here some examples, in the beginning, I
will try to explain you more. So, that you can understand that, how these things are
happening and once you understand the first operation after that understanding the
remaining operation will be quite quick, ok.
So, now, we are going to consider the operations between data vectors. I am saying
between because we are considering here a very simple case. And these cases of two data
vectors that can be extended to any number, which I will try to discuss later on. So, now
let me try to first take a very simple example in which I have got a data vector, with four
values 2, 3, 5 and 7 and the second data vector which also has four values -2, -3, -5 and
8.
And you can see here that, both the data vectors have the same number of element the
same number of elements or values. There are four values in the data vector 1 and four
values in the data vector 2. So, now if you try to see how this data vector is going to
operate? Well, first I try to show you, what is happening? And then you can understand
it. So, suppose I try to write down here the data vector 1 and data vector 2. So, now, if
you try to see here, these are here the positions: position number 1, position number 2,
position number 3 and position number 4.
Now, in the data vector 1, in the position number 1, there is an element 2, you can see
here. 2, 3, 5 and 7 and in the data vector here 2, these values are here -2, -3, -5 and here
8, right. So, now, what will happen? You are trying to make here an addition. So, now,
what will happen here is the following. That, the value which is at the 1st position in the
2
217
data vector 1, that is going to be added to the value at the 1st position in the data vector
2. So, this will become here 2 plus minus 2, that will be here 0.
Next, the control will come to the next value and then the value at the 2nd position in the
1st data vector that will be operated with the 2nd value in the 2nd data vector. So, this
will become here 3 plus minus 3 which is equal to here 0. Now, the control will come to
the 3rd element and the operation on the 3rd value in the 1st data vector and the 3rd
value in the 2nd data vector this is going to be done.
So, this will become here 5 plus minus 5 which is equal to here 0. And finally, this
operation will move further. And, it will come to the 4th value in the data vector 1 and
the 4th value in the data vector 2. And whatever is the operation that is going to be done
7 + 8. So, this will become here 15. So, what you can observe here, that here the
operations are made element wise.
That the 1st element is going to be operated on the 1st element, 2nd element is going to
be operated with the 2nd element, 3rd and 4th elements are going to be operated with the
3rd and 4th elements in another data vector respectively. So, what will happen here? This
and this -2, they are operated. This -3 and this 3, they are operated. This 5 and this -5,
they are going to be operated and finally, the fourth values in the two data vector, they
are going to be operated. And you get here the value 0 0 0 and 15.
And now, the same type of operations will happen if you try to replace the addition with
respect to or addition in you try to replace addition with subtraction, multiplication,
division etc. The operations are going to be element wise.
218
So, now, let me try to take here one more example. So, in this data vector what you can
see here I have taken here four elements in the 1st data vector and two elements in the
2nd data vector. So, what will happen here if I try to write down exactly in the same
way? So, there are four positions 1, 2, 3 and here 4 and in the data vector 1 there are
values here 2, 3, 5 and 7 that you can see here which I have written here, right.
And in the 2nd data vector there are only two values 8 and 9. So, I try to write down here
it 8 and 9. Now, and you want to do here addition. So, now the same rule is going to be
applied and the same type of operations will happen, but let us try to see what happens?
So, as we have discussed that the operations are element wise. So, the 1st element in the
data vector 1 will be added with the 1st element in the data vector 2. So, there is no issue
so 2 + 8 this will become here 10.
And then the control will move further, and control will try to consider the 2nd value in
the 1st data vector and the 2nd value in the 2nd data vector and it will try to add here 3
and 9 which will give the answer 12. Now, the control in the data vector comes to
position number 3 here. And it tries to write here 5 and then here plus. But now, when it
comes here, this place is blank.
And the, same thing happens when the control come to the 4th value here and then it
comes to the 2nd data vector here, this value here is blank. So, now, what R is going to
do is the following. It will try to consider the first two values and it will try to copy and
paste in the second two values. So, what will happen here is the following, that these two
blank places will be replaced by 8 and 9, exactly in the same order in which they are
written.
And then, the operations will be done here, that in the value at the 3rd place which is
here 5 and the value at the 3rd place in the 2nd data vector which is here 8, they are
going to be added and you get here an answer 13. And similarly, the value in the 4th
places in data vector 1 and data vector 2, they are going to be added and you get here the
value 16 and you get here these four values 10, 12, 13, 16.
So, what you have to understand here, that, when we are trying to work here, that then
these two values they are added to the respective positions in the 1st data vector and then
after that, there are no values for the second set. So, these values come here again and
219
they are added to these values. So, there is a sort of repetition. So, the values in the data
vector which are lower in number in comparison to other one, they are repeated.
But here what you can see, I have taken an example here, where the number of elements
in the 1st and 2nd data vectors, they are the exact multiples. So, data vector 1 has four
elements and data vector 2 has two elements. But now, what will happen? Suppose, if a
data vector 1 has four elements and data vector 2 has suppose three elements, what will
happen? So, let us try to understand this thing also.
So, now if you try to see here in order to explain you this operation, I am trying to take
here one data vector which has four elements 2, 3, 5 and 7 and 2nd data vector which has
got value 8, 9 and 10. So, let us try to understand the same operation as we have
understood in the other two cases. So, the 1st data vector here is c 2, 3, 5, 7 and then
these are located at the positions 1, 2, 3 and 4 and in the 2nd data vector the values are
here 8, 9 and 10.
So, now, let us try to do this operation of addition. So, the same rule, whatever we have
discussed that is going to be followed. That the control will come to the 1st position in
the 1st data vector and 1st position in the 2nd data vector and these two values are going
to be added say 2 + 8 which is equal to here 10.
220
Now, after this, the control will come to the 2nd value in the data vector 1 and the 2nd
value in the data vector 2 and this becomes here 3 + 9 which is equal to here 12. After
this, following the same rule, the data vector will come to the 3rd value, in the 1st data
vector and the 3rd value in the 2nd data vector. This will become here 5 + 10 and it will
become here 15.
Now, when the control comes here to the 4th place in the data vector 1 and the 4th place
in the data vector 2. So, now, you can see here that this place here is blank; there is no
value in this place here. So, now, R is confused. What to do? So, what it does here is the
following, that it takes the data vector of the shorter length once again. So, the data
vector of the shorter length here is like this 8, 9 and 10. So, this 8, 9 and 10 that is going
to be repeated here.
So, now in this place, this is going to be filled with here 8 and then two more values here
9 and 10 they will also come here. So, now, the R will start the operation and then it will
try to add here 7 and 8 this will give you the answer 15. Now, after this, R once again
become confused. Why confused? That now there are two values in the data vector 2 and
there is no value here in the data vector 1. But now, R do not want to move further. So, R
stops here. And, it gives the same answer here, 10, 12, 15, 15 you can see here, but it
gives you a warning message.
And warning message says that, the longer object length is not a multiple of shorter
object length. So, R is trying to inform you that, the length of the two data vectors are not
the same; as well as they are not also the exact multiples. So, it is trying to do something,
but please try to be watchful and try to see whatever R is doing, is it correct or not? And
you can see here the same operation when it is done on the R console also.
So, this is how actually R works and I am sure that now you will understand that how R
is doing the things. But before moving further with other operations, I would try to show
you that what happens when you are trying to do it in the R console.
221
(Refer Slide Time: 16:45)
So, if you try to take here two data vectors 2, 3, 4 and here 5 and if you try to add here
with say another data vector here say, 10, 11, 12 and 13. So, you can see here means the
first value in the data vector 2 is added with the 1st value in the data vector 2 that is 10
and the this value 3 in the data vector 1 is added with the value 11 in the 2nd data vector
which are and both are in the second place.
Similarly, the third value in the data vector 1 is 4 and the 3rd value in the data vector 2 is
12. So, 12 and 4 are added and you get here 16 and similarly the 4th value in the data
vector 1 and the 4th value in the data vector 2, they are added together and you get here
the answer 5 plus 13 as 18. Now, I try to do the same thing, but I try to reduce the
number of the elements in the 2nd data vector.
So, now, my 1st data vector has four elements 2, 3, 4, 5 and the 2nd data vector has only
two elements 10 and 11. So, now, let us see what happens? So, this 10 and 11, they move
from here and they come to 2 and 3 and then the values at the respective places they are
added together. So, 2 is added with 10 and 3 is added with 11. So, this becomes here 2 +
10, 12 and 3 +11, 14.
Now, 10 plus 11. Now 10 and 11 they come here, on the 3rd and 4th element in the data
vector 1 and then 4 and 10 are added to give you an answer 14. And 5 and 11 are added
to give you an answer 16. So, this is how the things happen. Now, I try to do the third
222
operation, that now I try to keep, four elements in the data vector 1 and only three
elements in the data vector 2.
So, now if I try to do it here, what will happen? This 10, 11 and 12 they will move from
here and they will come to the first three values or the values in the first three position in
the data vector 1 and the respective values are added. So, 2 will be added with 10, 3 will
be added with 11 and 4 we will be added with 12 and then after that there is nothing, so
the same data vector will be repeated again here and the last value in the data vector 1
which is 5 will be added with 10.
But now there are no more values in the data vector 1, where 11 and 12 can be added.
So, 5 plus 10 will become here 15 and after that R will give you a warning message, that
the longer object length is not a multiple of the shorter objective length, right. So, this is
what is happening. This is the way in which the R software actually works and it does
the calculation.
So, I hope, I have explained you in detail, how this addition is happening. Now, the same
thing will happen in other operations.
So, now it will not be difficult for you to understand them very easily. So, now, let me
try to take here subtraction. So, subtraction you know, that I am taken here taking here
two data vectors 2, 3, 5 and 7 in the data vector 1 and -2, -3, -5 and 8 in the data vector 2.
223
So, now, what will happen? The same operation will happen- elementwise operation. So,
the 1st value in the data vector 1 and the 1st value in the data vector 2, they will be
operated.
So, now, this will become here 2, this minus and then here -2 which is here like this.
And, it will give you an answer here 4. Now, similarly now the control will come to the
2nd value in the data vector 1 and the 2nd value in the data vector 2 and then this and
this they are going to be operated like 3 minus inside the parenthesis -3 this value and
then it will be become here 6.
Now, once again the same operation is going to be repeated. The 3rd value in the data
vector 1 which is here 5 and the 3rd value in the data vector 2 which is -5, they are going
to be operated with this minus sign here. So, this and this they will be operated like this 5
minus and within the parenthesis -5 and this will give you answer, the answer as 10.
And finally, whatever is the value in the 4th position in the data vector 1 and the 4th
value in the data vector 2, they are going to be operated. So, this 7 and 8 they are going
to be operated and. So, this will become here 7 minus 8 and this answer will become here
-1 and this is here the operation. So, you can see here exactly the same thing happened
here also.
224
And now, if I try to take a similar example which I took in the case of addition, then I am
sure that it is not difficult thing for you to understand that what is happening. So, I am
trying to take here four values in the data vector 1 and two values in the data vector 2.
So, now, what will happen? That these two values 8 and 9 first they will come here and
they will be operated with the first two values in the data vector 1.
So, one value here 12, that is the 1st value in the data vector 1 and another value 8 which
is the 1st value in the data vector 2, they will be operated as 12 minus 8 and the answer
will come out to be here 4. And then after this the value 9 will be operated with 13. So,
the 2nd value in the data vector 1 and the 2nd value in the data vector 2, which are here
13 and 9 they will be subtracted and the answer will come out to be here -4.
Now, after this R does not know what to do. So, what it will try to do? It will try to copy
the same data vector which is of shorter length 8 and 9 and it will try to bring it here.
And this 8 and 9 will be operated with the remaining two values in the data vector 1. So,
it will become here 15 minus 8. This will become here 7 and then 17 minus 9 which will
give you here 8.
So, the 3rd value in the data vector 1 will be operated with the so-called 1st value in the
data vector 2. And, the 4th value in the data vector 2 which is 17 this will be operated
with the 2nd value in the data vector 2. So, this repetition will go on. So, and then you
will get here an answer like 4, 4, 7, 8, right. I hope it is not difficult for you to
understand.
10
225
And now, I try to repeat the same type of calculation which I did earlier in the case of
addition. That I try to take here four values in the 1st data vector 12, 13, 15 and 17 and
then three values in the data vector 2. So, what will happen here? c 12, 13, 15 and here
17 and in the data vector 2 the values are going to be here 8 9 and 10.
So, first of all this block is going to be operated. And your answer will come out to be 12
- 8, using this and this and this 13 - 9 and this and this 15 - 10. So, this will become here
4, 4, 5. Now, after this, when the R software wants to repeat it, the problem is that here
we have one value. But in case if it repeats here, so this is here 8, but now after this,
there are two more values here 8, 9 and 10. But here in this case these two values are
missing.
So, now in the second step, this block is going to be operated, but only the 1st available
value which is here 17 - 8 this will be operated. But now, there are no values here and
here. So, this will be something like this question mark minus 9 and question mark -10.
So, R does not know what to do with this thing. So, it will stop here at this place itself
and it will give you a value here 4, 4, 5, 9, but it will also give you here a warning
message, that the longer object length is not a multiple of shorter object length.
So, you can recall that in the beginning, when I introduced the R in the very first lecture,
I had informed you that R always give you an error message, which are user friendly.
That if you try to think about them, try to understand them, you can understand what R is
trying to inform you. So, this is how the subtraction is happening in the case of R
software.
11
226
So, now let me try to give you this operation also on the R console. So, suppose if I try to
take here some values 11, 12, 13 and 14 and if I try to subtract it with here c 1, 2, 3 and
4, right. So, you can see here that, the 1st value in the data vector 1 will be subtracted by
the 1st value in the data vector 2. So, that will become 11 - 1 which is 10. Then the 2nd
value in the data vector 1 will be subtracted by the 2nd value in the data vector 2, so this
will become 12 -2, 10.
Then the 3rd value in the data vector 1, 13 will be subtracted by the 3rd value in the data
vector 2 this is 13 - 3 is 10. And similarly, the 4th value in the data vector 1 will be
subtracted by the 4th value in the data vector 2 which is 4, which will once again be 14 -
4 is 10.
So, now you can see here, all the values are here 10, but you know that these four 10s are
obtained by different operations, right, ok. Now, I try to take here one more example
here and I try to say subtract here only two values say 2 and 3, right. So, you can see
here. Now, what is happening? This 2 and 3 they will come over the first two values 11
and 12, then 11 - 2 and 12 -3 both will give you the answer 9 and 9.
Once again, this 2 and 3 will operate over 13 and 14. This will become a 13 - 2 and 14 -
3 which is here 11 and 11. And so you get here this value. And similarly, if you try to
take here one data vector with four values 11, 12, 13, 14 and the 2nd data vector with
only 2, 3, 4 here, what you see? It is like this. So, this 2, 3 and 4 will be operated over the
1st three values 11, 12 and 13. So, 11 - 2 is 9, 12 - 3 is 9 and 13 - 4 is also 9. So, you get
here these three 9s.
And after that, this 14 will be subtracted by 2 which is here 12, but after that there are no
values in the data vector 1, from where the remaining two values of the data vector 2
which are 3 and 4 can be subtracted. So, it will stop here, but it will give you here a
warning message, ok. So, now, I am sure that you must have understood that, how R
works. And now, it will be easier and faster for us to understand the division and
multiplication also.
12
227
(Refer Slide Time: 29:46)
So, now, I try to take the example of multiplication which is quite quick. So, I try to take
here a similar example, the two data vectors of the same length. 1st data vector has four
values 2, 3, 5 and 7 and 2nd data vector have these values minus 2, minus 3, minus 5 and
8. So, now, the same thing will happen. That if I try to write down here, c 2, 3, 5, 7 from
the data vector 1 and from the data vector 2, if I try to write down the values -2, -3, -5
and here 8, then it is multiplication.
So, what will happen? The values at the respective position they will be multiplied
together 2 into -2, this will become here like this, -4 and then this value 3 into -3 will
happen here. Then, 5 into -5 will happen here, then 7 into 8 will happen here.
So, you can see here, that the 1st position in the data vector 1 is multiplied with the 1st
position in the data vector 2. So, this becomes a 2 into -2 which is here -4. Then the 2nd
position in the data vector 1 and the 2nd position in the data vector 2 which are 3 and
minus 2 respectively. They will be multiplied here and the answer will come out to be
here -9.
And after this, the 3rd value in the data vector 1 and the 3rd value in the data vector 2,
they will be multiplied and this answer will be 5 into -5 which is here -25. And finally,
the 4th value in data vector 1 and the 4th value in data vector 2, they will be multiplied
here as 7 and 8 and you will get here an answer 56. And this is the screenshot of the
same operation.
13
228
(Refer Slide Time: 31:52)
Now, in case if I try to take here a similar example, that if I try to now choose here two
data vectors, in which 1st data vector has four elements and the 2nd data vector has two
elements which is the exact multiple of the length of the longer data vector. So, in this
case, what will happen? Once again, the c 2, 3, 5, 7 will come. Then c 8 and 9 will come.
So, in the first short, what will happen?.
That this block is going to be operated and you get here 2 into 8; that means, you try to
multiply the 1st positions in the data vector 1 and 2 and the 2nd positions in the data
vector 1 and 2. So, you will get here 3 into 9 from here. Now, what will happen? This 8
and 9 they are going to be repeated.
So, this 8 and 9 this will be repeated here and it will become here 8 and 9 and then once
again 5 into 8 and 7 into 9 that is the 3rd position in the data vector 1 and the 1st position
in the data vector 2, then the 4th position in the data vector 1 and the 2nd position in the
data vector 2, they will be multiplied here and you get here a answer like 16, 27, 40 and
63.
14
229
(Refer Slide Time: 33:18)
So, you can see here, the same logic, same rule has been applied here also. Now, I try to
take a similar example, which I took in the means earlier three cases here, that I try to
take here the data vector 1 which has only four elements and the data vector 2 which has
only three elements. So, the same thing will happen here once again. So, you will have a
2 3 5 7 in the data vector 1 and data vector 2 will have c 8, 9 and 10.
So, their respective positions are going to be multiplied. So, first this 2 and 8 then 3 and
9 and then 5 and 10 they are going to be multiplied. So, this 1st position in the data
vector 1, 1st position in the data vector 2, 2 and 8 will multiply you get here an answer
16. Then the 2nd data, 2nd position in the data vector 1 and data vector 2 they are
multiply 3 into 9 which gives you a 27.
And then, the 3rd position in the data vector 1 and 3rd position in the data vector 2 which
are 5 and 10, they are multiply and you get here an answer 50. And after this, what
happens? That the same data vector, this will be replaced here 8, 9 and 10. So, now, 7
and 8 will be multiplied here like this, 7 will be multiplied by the 1st element 8 and this
will give you an answer 56.
But now, what to do here with the remaining two position in the data vector 1, which
have to be multiplied by 9 and 10, R does not know. So, it will give you value the first
value, second value, third value and fourth value. But here, it will not be able to do
anything and it will give you here a warning, that the longer object length is not a
15
230
multiple of shorter object length and so this is here, the screenshot of the same operation
when you try to do it on the R console, alright.
So, let me try to quickly show you these operations on the R software also. So, that you
can understand them very easily. So, if I try to take here a data vector c 2, 3, 4, 5 and if I
try to multiply it by here means another data 7, 8, 9 and 10. So, you can now see here
that. So, you can see here, why it is giving you here error? You know, what mistake you
have done or I have done? That, I have taken here capital C, but it has to be here small c,
right. So, if you try to see it here, now it will work.
So, this 2 is going to be multiplied with 7, 3 in the 2nd position in the data vector 1 is
going to be multiplied with the 2nd positions value in the data vector 2. So, this will be 3
into 8 and then 3rd and 4th value in the data vector 1 which are 4 and 5 they are going to
be multiplied with the 3rd and 4rth values in the data vector 2 which are 9 and 10
respectively and you get here the values 36 and 54 into 9 is 36 and 5 into 10 is 50.
Now, in case if I try to make the length of the 2nd data vector to be of only two such that
it is the exact multiple of the length of the longer data vector. So, now, you can see here
the answer comes out to be like this. Because, what is happening? The 7 and 8 they are
coming to the 1st data vector and they try to multiply with the elements in the 1st and
2nd positions and then the remaining two positions at the 3rd and 4th positions they are
once again multiplied by 7 and 8. So, you get here 4 into 7 is 28 and 5 into 8 as 40.
16
231
Now, in case if I try to make it here the 3rd value as 9. So, that the 2nd data vector has 3
elements and the 1st data vector has four elements. So, if you try to multiply here the 7,
8, 9 this is going to be multiplied with 2, 3 and 4 respectively. And after that, this 5 is
going to be applied with the 1st element which is 7. So, you get your answer 2 into 7 is
14, 3 into 8 is 24, 4 into 9 is 36 and then 5 into this 7 this is 35.
But after this, R cannot do anything. So, it is giving you here a warning message that, ok
that the longer object length is not a multiple of the shorter objective length. So, this is
how the multiplication goes.
Now, finally, about the division. Now, I am sure that you have understood it all the
operations. So, it will not be a difficult thing for you. So, you can see here, that I am
trying to take here two data vectors. 1st data vector has four elements 24, 28 and 16 and
the 2nd data vector has also four elements 3, 4, 2 and 8.
So, you are trying to divide. So, what will happen? This once again the division will be
done with respect to the position; that means, the 1st value which is at the 1st position in
the data vector 1 will be divided by the 1st value which is at the 1st position in the data
vector 2. So, this will become here 24 divided by 3 this will become here 8.
And then, the value at the 2nd position in the data vector 1 which is 20 value at the 2nd
position in the data vector 2 which is here 4, 20 and 4, they will be divided. So, this will
17
232
give you here an answer 5. And then, the value at the 3rd position in the data vector 1
which is 8 and the value at the 3rd position in the data vector 2 which is here 8, they are
going to be multiplied and you get here a divided by 2 which will give you here answer
4.
And then finally, this the value at the 4th position in the data vector 1 which is 16 and the
4th value in the data vector 2 which is 8, they are going to be divided. So, this will give
you 16 divided by 8 this will give you answer 2. And this is here the screenshot. So, this
is a very simple operation.
Now, I try to take here similar examples which I have taken earlier in the case of
addition, subtraction and multiplication, that I try to take the 1st data vector of the length
four and the 2nd data vector of the length two such that the 2nd data vector is at exact
length of the 1st data vector.
So, now, the same operation will happen. That this 4 and 2, they will first come to the 1st
two elements of the data vector 1 and they will operate over the 1st two values which are
24 and 20. So, you can see here 24 is divided by 4 and 20 is divided by 2 and you get
here an answer 6 and 10.
Now second shot, what will happen in the second step? That once again this 4 and 2 will
travel to data vector 1 and they will come here over 8 and 16. So, 8 will be divided by 4
18
233
and 16 will be divided by 2 and then you get here an answer 2 and 8 respectively. So,
this is the screenshot of the same operation.
And similarly, if you try to take here the last operation, that where you are trying to take
the 1st data vector of the length four, which has four values 24, 28 and 16 and the 2nd
data vector which has only three values and which are not the exact multiple of the
length of the 1st data vector.
So, once again, the same operation will happen it will like 24, 20, 8, 16 and then it is
here 4, 2 and here 8, right. So, now, first of all this block will be operated and the
element wise operation will be done. And then, this 24 will be divided by 4, 20 will be
divided by 2, 8 will be divided by 8 and then you get here the answer here like this,
which is here 6, 10 and 1.
Now, after this, R does not know how to proceed. So, it will try to copy here 4, 2 and 8,
the same block over here, but now there are no values in the 1st data vector where the
operation can be done. So, this 16 divided by 4 will be operated. And it will give you
answer here 4, but then after what to do with the 2 and 8 data vector 1, that is not known
to R software. So, it will stop here and it will give you here a warning message, right.
19
234
That the same warning message that the longer objective length is not a multiple of the
shorter objective length. So, this is how the R software works, when you are trying to
division in the case of data vector.
So, let me try to show you these operations on the R console. So that, you can understand
them very easily. So, if I try to take here two data vectors. Suppose here, I try to take
here 4, 8 then 12 and then here 16 and I try to divide them by here the data vectors here
like here 2 and then here 8 and then here 4 and then here 8.
So, if you try to see here, what will happen here? This 2, 8, 4, 8 will be divided by the
respective position in the 1st data vector 4, 8, 12 and 16 and you will get here an answer
like as 2, 1, 3, 2. 4 divided by 2, 8 divided by 8, 12 divided by 4 and 16 divided by 8.
Now, in case if you try to remove two elements that your 2nd data vector is of the exact
multiple of the length of the 1st data vector.
So, then the answer will become here 2, 1, 6, 2 because this 2 and 8 is going to be
operated over 4 and 8 and then over 12 and 16. And in case if you try to make it here the
2nd data vector of the length say three, something like 2, 8, 6. So, you can see here that
2, 8, 6 that is going to be operated over 4, 8 and 12. But, after this, the 16 will be
operated with 2, but then what to do with 8 and 16 that is unknown. So, it will give you
here a warning message, right.
20
235
So, this is how the operations are conducted in R software when you are trying to deal
with data vectors. So, you can see it is not difficult, but there is a different way, there is a
different approach. And, I will show you later on in the forthcoming lectures when we
try to do some computations that, once you understand that how R is behaving with these
numbers, how R is operating with these numbers, then your programming will become
much easier.
Very complicated expression can be programmed very easily in R software. And these
are some of the reasons why R became so popular. These calculations what are being
made here in this specific way, they are trying to help us when we are trying to do the
real programming, at least in statistics I know.
So, now, once again you have a good homework today, that you try to take some values
yourself, some data vectors and then try to make such operations. Well, just in order to
make the lecture simple, I have taken here only two data vectors, but you can take three,
four as many as you want. Try to make different combination that, ok out of four. Three
data vectors they are of the same length and say four data vector, they are of different
length either exact multiple or not.
And then, try to see this different combination, means in the same expression you try to
take plus, minus, division, multiplication try to use the BODMAS rule. There are many
things which you can understand today after this lecture. So, you try to practice it and I
will see you in the next lecture with more operation till then goodbye.
21
236
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 10
R as a Calculator with Scalars and Data Vectors: Power Operations, Integer and
Modulo divisions
Hello friends, welcome to the course Foundations of R Software. Now you can recall
that in the last two lectures we had discussed different types of mathematical operations
in the R software and we had considered the mathematical operations when we are trying
to consider scalars, data vectors and a combination of them.
And we had essentially learnt that how to do addition, subtraction, multiplication and
division with the data vectors and scalars. So, now in this a lecture we are going to move
forward and we are going to consider some more operations. So, as we have done in the
last two lectures similar type of thing we are going to do in this lecture also and we are
going to understand how are does the power operation, modulo division, integer division.
Well, these things are important and you know that when you are trying to do the
mathematical computations, these things are required in many situations. So, this is what
we are going to do, we are exactly follow going to follow the same pattern what we have
done in the last two lectures and we will try to understand that how are performs these
operations in the software.
237
Now, you can see here that in case if you want to compute the expression like 2 cube
means you are trying to multiply 2, 3 times, then how are you going to do it.
So, in the R software there are two notations to indicate such power operation. One here
is hat, hat you can see this is one of the key on your keyboard and second option is two
stars, you can recall that when you are trying to use only one star this is indicating the
multiplication and when you are trying to indicate two stars this is going to indicate the
power operation.
So, now suppose if you want to compute 2 cube. So, in R you have to write it like 2 hat 3
or other alternative is you can write down here 2 star star 3 like this. Now you can see
here this is what example, I have taken here I want to compute here 2 cube and I have
written this here as a 2 hat 3 and which is giving you an answer 8.
And the same thing I have written as 2 double star 3, which is giving you an answer here
8 and this is here the screenshot. So, you can see this is how you can do the power
operation when you are trying to consider a scalar.
Similarly, in case if you want to find out the square root of something say 2. So, you
know this square root of 2 can be written as 2 raise to the power of 1 upon 2 or 2 raise to
the power of here 0.5. So now, after this you can very easily write this quantity in R as 2
238
hat 0.5 or 2 double star 0.5 and this is what I have shown you here, that if you try to see
this is the value of square root of 2 and this is giving you 1.414214.
And similarly the same operation you can do with the double star also this is also giving
you the same value. And similarly if you want to find out 1 upon square root of 2, then
this can be written as 2 raise to the power of minus 1 upon 2 which is 2 raise to the
power minus 0.5. And then you can write down here this thing here, is say 2 hat minus
0.5 or 2 double star minus 0.5.
Yeah, do not get confused that this minus is going to be operated first because you are
going to follow the rule of BODMAS, right. So, in case if you want to compute 1 upon
square root of 2 then you can write it like this 2 hat minus 0.5 and this will give you the
value 0.7071068. And this is here the screenshot of the same operation.
I hope it is not difficult and it is very easy for you to understand after understanding the
earlier lectures.
So, now these operations are related to when you are trying to do the power operation
with respect to a scalar, right. Now I try to do the power operation when one of the value
is a data vector and the power is a scalar. You know that for example, when you write 2
cube, so then this is one value this is another value suppose I call it say here value 1 and
this here as a value 2.
3
239
Now you have different combination. Value 1 value 2 both are scalars, value 1 is a data
vector and value 2 is a scalar or third option can be both value 1 and value 2 are data
vectors. So, we will try to see all sorts of combination and you try to understand how R
function.
So now, I try to take it here this data vector say 2, 3, 5, 7 and then it has here a power 2.
So now, in case if you try to operate it the same rule operates here. What happened in the
mathematical operation? That this hat 2 is going to be operated over all the elements. So,
this will become like 2 square, 3 square, 5 square, and 7 square and this will give you the
value here 4 9 25 and 49.
This is as simple as that, as you have done with addition, subtraction, multiplication,
division. Just like that power operator will also be entering inside the data vector and it
will make the operation on each and every element in the data vector. You can see here
this is the screenshot of the same thing.
Now, after this I try to take a similar example. So, I try to take here two data vectors let
us call it here say data vector 1 and another data vector here DV2. So, first data vector
this has got a the number of elements which is four and the second data vector has
number of elements two.
240
So, the number of elements in the second data vector is the multiple of the number of
elements in the first data vector. So, what will happen here that if you try to recall the
earlier operations the this will be your here position say here 1, 2, 3 and here 4. So, in the
DV1 you have 4 values here 2, 3, 5 and 7 and say DV2 you have here 2 values 2 and 3.
So, first of all the operation will be like as here, hat and so for this block is going to be
operated and you will get here 2 hat 2 and 3 hat 3 which is here 4 and here 27, a 2 square
and 3 cube. Now after this when R moves further it does not find any value over here.
So, what it does it will copy this value here 2 and 3 and then the values at the third and
fourth positions will be operated with these 2 values and you will get here 5 square and
say here 7 cube.
And this will give you here the value 25 and 343. So, you can see it was working in the
multiplication, division, addition, subtraction the same way power operation is also being
done in this case.
And now in case if you try to take care one more example where once again I am trying
to increase the number of data values in the 2 data vectors, but I am trying to keep them
as a multiple. You can see here now this 1, 2, 3, 4, 5, 6 there are 6 values and in the
second data vector there are 2, 3, 4. So, now, you will see here this 2, 3, 4 will come over
here and it will be like this here 2, 3 and here 4.
241
And then once here this 2, 3, 4 will come here and it will become here 2, 3 and here 4
like this, right. And then you will get here the value here 1 square which is here 2 cube
which is here 3 to the power of 4 which is here, 4 square which is here 5 cube which is
here and 6 power of a 4 which is here. And if you try to get this value in the R software
they will come like this no issues and this is the screenshot of the same operation ok.
Now, I try to take one more example where I am trying to keep the length of the second
data vector, which is not the exact multiple of the length of the first data vector.
So, this is my here DV1 which has here 4 values and this is my here DV2 which has only
3 values. So, the same thing will happen here also if I try to write down here DV1 and
DV2 values, so DV1 is here 2, 3, 5, 7 and DV2 here is 2, 3, 4 and then its positions are
here like as here 1, 2, 3 and here 4.
So, now, what will happen with this power operation that this block will be operated first
and you will get here the values 2 square, 3 cube, 5 raise to the power of here 4. And
after that R will get confused that there is no value here, so what it will do it will try to
copy the same block here the same value will come here 2, 3 and here 4.
So, R will try to operate here with the first combination 7 and 2 and it will make it here 7
square, but after that what to do with 3 and 4 there are no values in the data vector 1
which R is getting to operate with the value 3 and 4 here. So, it will stop here, but it will
6
242
give you here a warning that the longer object length is not a multiple of the shorter
object length in this vector and this is here the screenshot.
So, you can see here it is not a very difficult thing, means that is exactly in the same way
the other operations were happening.
So, before I move further I try to give you this illustration in the R software. So, you can
see here if you want to make it want to find out here 3 square that is 3 hat 2 and the same
thing if you try to make it here with the two star symbol it will become here 3 double star
2 it is again 9.
Now, in case if you want to find out the square root of 3 you can express it like this 3 hat
minus 2 this is 0.111 and so on. And if you try to find out here the same value using the
double star notation, you can simply replace hat with double star and still you will find
the same operations you can see here.
243
(Refer Slide Time: 09:59)
Now, I try to take here one more example 2, 3, 4 and here 5 and I try to take it here say
cube.
So, now this is hat three, so this is cube. So, this cube operation is going to be operated
on each of this element inside this data vector having values 2, 3, 4 and 5. So, this will
become here 2 cube, 3 cube, 4 cube, 5 cube and you can see here the values 2 cube is 8, 3
cube is 27, 4 cube is 64 and 5 cube is 125. And now, in case if I try to do here the same
operation with here with some negative power also that will also work here yeah that you
can see right there is no issue at all.
And the same thing if I try to replace it with the double star operation, you can see here it
is giving you the same value right.
244
Now, I try to take an example where you are trying to work with the data vectors. So, if I
try to take here one example to c 2, 3, 4, 5 and suppose just for the sake of illustration, I
try to take here one more data vector of the same length. Suppose I can take it here 5, 4,
3 and here 2. So, this will become here what 2 raise to the power of 5, 3 raise to the
power of here 4, 4 raise to the power of here 3 that is 4 cube and 5 square. So, now if you
enter, this will give you this value 2 raise to the power of 5 is 2 into 2 is 4, 5 2s are 8, 8
2s are 16 and 16 2s are 32. Similarly 3 3s are 9, 9 3s are 27 and 27 3s are 81.
Then 4 into 4 into 4 is 64 and 5 square is 25. Now in case if I try to change this value and
if I try to remove the two values, so that the second data vector has only two values so,
what will happen here this 3 and 2 will be operated on the first 2 values. So, this will
become 2 cube and 3 square and then once again 3 and 2 will be operated in the
remaining 2 values and this will become here 4 cube and 5 square and you can verify
these values without any problem.
And now in case if I try to show you here one operation that if I try to make it here -3
and -2 what will happen, and instead of having making here -3 and -2 if I make it here
what will happen. You can see here both are the same values that that is correct. That
because once you are trying to do here for example, -3 and -2 as to the power on all these
numbers and here you are just trying to take this minus sign common and inside the
bracket there are only 3 and 2.
But on the other hand in case if you try to make here the second vector of the length
which is not the exact multiple of the first data vectors length, this will be here like this.
5 4 3 they will be operated over 2 3 4, so this will become 2 raise to the power of 5, 3
raise to the power of here 4 and 4 cube and then 5 raise to the power of here 5 which is
3125. But there will be a warning message also.
So, that is how it is going to work. So, now, you can see here that is power operations in
the R console there also not very difficult to operate and they can be done without any
problem.
245
(Refer Slide Time: 12:51)
So, now after this we try to take here one more operation that is integer division.
What is this integer division? For example, if you try to divide well I will use the
primary class arithmetic, if you try to divide 2 by 2 this will become here 2 1s are 2 and
remainder here is 0. And similarly if you try to divide here 5 by 2, so I can write down
here 2 2s are 4 and the remainder here will be here 1. And similarly, if I try to divide 7
by 3, so I can write down here 3 2s are 6 and the remainder here is 1.
So, that is how I was taught the arithmetic in my childhood, so that is the same thing. So,
now, you can see here when you are trying to divide any value with something then you
have here 2 answer one is here and another here is remainder. So now, when you are
talking about the integer division in this integer division, the fractional part which is the
remainder this is discarded and only this value is going to be here.
So, similarly if you try to see this division here is 2 and here 2. So, that is how we
actually work in many many programming sometimes you want to do such operations.
So, and these are very common operations in mathematics. So, in order to do the integer
division the command here is like this one which is percentage sign on your keyboard
then this backslash and then percentage sign.
So, if you try to operate here like this here 2 integer division that is 2 percentage
backslash percentage 2 then it is giving you the same value and this value 1 is given
10
246
here. And similarly if you try to do here with 5 integer division by 2 like here this; so,
this will go here like this and this value is given here, right.
And then similarly if you try to divide here 7 by 3 using integer division like this then
this value here is given here like this and this 2 value here is obtained here like this. So,
this is the concept of integer division in R software and here you can see the screenshot
of the same operation.
And similarly if you try to do this integer division when you try to choose 1 as a data
vector and another as a scalar, the same thing will happen what has happened in the
earlier operation; just like addition, subtraction, division, multiplication that this operator
this will operate on each of this value 2, 3, 5 and 7.
So, this will be like as this 2 integer division by 2, 3 integer division by 2, 5 integer
division by 2 and 7 integer division by 2. And you can see here 2 integer division by 2
will give you answer 1 and 3 integer division by 2 will also give you answer 1, 5 integer
division 2 will be like this 2 2s are 4. So, remainder here is 1.
So, this value comes over here is 2 and when you are trying to divide 7 by 2, 2 3s are 6
remainder here is 1, so this 3 will come here. So, you can see here this is the same way as
11
247
it was happening in other operation the same process will be followed here also in the
case of integer division.
And yeah in case if you try to take a two data vectors, so if you try to take the data
vectors of the same length then each of the value in the respective position in the two
data vectors will have the integer division operation. That is now very obvious for you,
so just for the sake of example I have taken here two data vectors, one is of length 4
having 4 values and another is of length 2 which has 2 values 2 and 3.
So now, this integer division will be operated over this set of values and then this set of
values. So, in case if you try to do here like this, 2 integer division 2, and 3 integer
division 3 right. So, this value will be say here 1 and 1. And now in case if you try to
have integer division over this 5 and 7. So, it will come here, so 5 integer division by 2
and 7 integer division by 3.
So, in both the cases the values are going to be 2 2s are 4 and 3 3s are 6. So, these 2
values are here and this is the screenshot of the same operation, right.
12
248
(Refer Slide Time: 16:50)
And similarly if you try to take here a similar example where the first data vector has a
length 3 and second data vector has a length 2, which is not the exact multiple of the
length of the first data vector. So, then the same thing will happen these 2 values c 2, 3
will be operating over the first 2 values in the data vector 2 and 3.
So, this will become here like this 2 integer division by 2, 3 integer division by 3 and
then this 2, 3 will come here once again over the 5. So, 5 will have the integer division
by 2, but there is no other value with which something can be divided here using integer
division. So, this will give you here a warning message you can see here this is the
screenshot of the same operation.
13
249
So, let me try to show you these operations over the R console. So, that you get here
more confidence. So, you can see here if I try to see here 7 integer division say 3 this will
be here 2 and if you try to take it by 2. So, then 2 3s are 6 the answer will come out to be
a 3.
And in case if you try to take here value here 5, 7, 8 and say here 4 and if you try to have
the integer division by some scalar. So, this you can see here like this. So, this will be
like here 2 2s are 4 2 3s are 6, 4, 7, 2 4s are 8 and 2 2s are 4, so this is the value here and
similarly if you try to use here some data vector also.
So, if I try to say here 2, 3, 3 and here 2 like this data vector. So, what will happen the
corresponding positions in the data vector 1 will have the integer division with respect to
respective positions of the data vector 2. So, if you try to see here this will be 5 integer
division by 2 this is 2, 2 2s are 4 7 integer division by 3 which is here 2, 8 integer
division by 3 which is 3 2s are 6 this 2 and 4 integer division by 2 which is here 2, 2 2s
are 4.
And similarly if I try to make it here like this say 4 and here 3 and if I try to remove the
other 2 values, such that, the length of the second data vector is an exact multiple of the
first data vector this will be here 1 2 2 1 how this 5 integer division by 4, 1, 7 integer
division by this 3 this is 3 2s are 6 and then 8 integer division by 4 which is 4 2s are 8
and 4 inter division by 3 which is here 3 1s are 3 this is 1.
And in case if you try to say here 4, 3 and 2 here then you will get here like this 4, 3, 2
they are going to be integer division for the 3 number 5, 7 and 8 and for the 4 there is
only one value here 4. So, it will give here 4 1s are 4, but the for the remaining 2 value 3
and 2 there are no numbers here in this data vector where the integer division can be are
formed.
So, it will give you here a warning. So, you can see here it is not a very difficult thing
which you cannot do and it is very easy to understand after learning all these topics.
14
250
(Refer Slide Time: 19:45)
So, after this I try to give you an idea about one more topic which is the last topic of this
lecture this is about modulo division. Yeah a popular name is like x mod y this is how we
call it. So, what is this modulo division? If you remember we have just done this exercise
that I try to divide 2 by 2. So, you will hear this will be here 2 1s are 2 and the remainder
here is 0.
And similarly, if you try to divide 3 by 2, this will be 2 1s are 2 and the remainder here is
1. And then similarly, if you try to divide 7 by 3, this will be here 3 2s are 6 and the
remainder here will be 1. And similarly, if you try to divide 7 by say here 4 then will be
4 1s are 4 and the remainder here will be 3. So, this modulo division, this gives us the
remainder after division of one number by another.
What does this mean? That when we are trying to divide 2 by 2 then the remainder is 0
when we are trying to divide 3 by 2, then the remainder is 1 and when we try to divide 7
by 3, then the remainder here is 1 and when we try to divide 7 by 4 then the remainder
here 3 integer value.
And this modulo division will provide this remainder values. So, for example, the
symbol for operating the modulo division in R software is like this one two signs of
percentage like this. So, if you try to have the 2 mod 2 which is the 2 modulo division by
2, then its answer is going to be here 0 from here.
15
251
And similarly, in case if you try to have 3 mod 2 which is 3 modulo division by 2 which
is here like this and it remainder here is 1 which is the answer here. Then the next
operation is 7 modulo division by 3, 7 mod 3 and this is done here and its remainder here
is one which is here like this. And then the last one 7 modulo division by 4 that is 7 mod
4, which is here done here and its remainder here is 3 which is here like this 3.
So, this is the modulo division it will give you the remainder. So, now, this I have
illustrated when I am trying to take the scalars only.
Now, in case if you try to take the first data vector as like this c 2, 3, 5, 7 and second
value is the scalar. So now, in this case if you try to operate the modulo division what
will happen, now this modulo division is going to be operated on each of this element 2,
3, 5 and 7 like this one 2, 3, 5 and 7.
So obviously, when you try to divide 2 by 2 then the remainder is going to be 0, when
you try to divide 3 by 2 the remainder is going to be 1, when you try to divide 5 by 2 the
remainder is going to be 1 and when you try to divide 7 by 2 the remainder is going to be
1 and this is here the screenshot.
So, you can see here this modulo division is working exactly on the same line as the
other operations.
16
252
(Refer Slide Time: 22:31)
Now in case if I try to take both the parts as data vector. So, the first data vector here is c
2, 3, 5, 7 and the second data vector is c 2, 3. I can also take the second data vector of the
same length, but now you know that how it is going to be operated. There is no issue in
understanding.
So now, what will happen to this that this c 2, 3 will come over here and this will be
operated for the modulo division like this. 2 will be operated over 2 and 3 will be
operated over 3 and this will give you an answer 2 divided by 2 remainder is 0, 3 divided
by 3 remainder is 0.
Now, once again this 2 and 3 will come over here and they will be operated over 5 and 7.
So now, this will be your here 5 modulo division 2 the remainder will be here 1 and then
7 modulo division 3 and the remainder will be here 1. So, once again you will get here
the values like the 0 0 1 1 and you can see here this is here the screenshot.
So, you can see here that in this case also the modulo division works exactly in the same
way as other operations.
17
253
(Refer Slide Time: 23:32)
Now, I try to take the last example where I try to take the two data vectors for modulo
division, but I try to take the first data vector which has got three elements and the
second data vector which has got two elements.
So, the number of elements in the second data vector this is not the exact multiple length
of the first data vector. For example, if you try to see in this case the length of the first
data vector was 4 and the length of second data vector was two. So, the length of the
second data vector was the exact multiple of the length of the first data vector.
So now, in this case the similar operation will happen that this c 2, 3 will come over here
and it will try to operate over this c 2, 3. So, this will become here 2 modulo division by
2 and 3 modulo division by 3 and after that 1 value is left. So, this c 2, 3 will come here
again and it will try to do this modulo division 5 irrespective 2. So, this will be here like
this, but after that where to operate with this 3 in the first data vector it is not given it is
not known to R.
So, it will not operate anything but it will stop and it will give you here a warning
message that the longer object length is not a multiple of shorter object length. So, this is
how this operation will be done. So, this is here the screenshot of the same operation and
I try to show you these operations first on the R console and then you will get more
confidence.
18
254
(Refer Slide Time: 24:50)
So, if I try to say here 7 mod 3. So, you can see here 3 2s are 6 and the remainder is here
1. Similarly, if you try to take here the first value here as say 7 and say 9 and then the
second value is going to be here suppose here 4. So, you can see here now, if you try to
divide 7 by 4 the remainder is going to be 3 and if you divide 9 by 4 the remainder is
going to be 1.
Now if you try to take the second data vector also as a vector and for that we try to
increase the length of the first data vector say here 7, 9, 3, 4 and say other data vector I
try to say here 2, 3 and say here 2 and here 3. So, you can see here now in this case this 2
and 3 they are going to be operated over this 7 and 9 and this 2 and 3 they are going to be
operated over this 3 and 4 and you get here and answer 1 0 1 1.
Similarly, if you try to change here the length of the data vector and suppose if I try to
take it here only 3 and 2. So, what will happen here? That the 3 and 2 will be operated
first over 7 and 9 and the 3 and 2 will be operated over 3 and 4. So, if you try to see here
this modulo division will give you an answer 1 1 0 0. Why? Because if you try to divide
7 divided by 3, 3 2s are 6 remainder is 1, 9 divided by 2, 2 4s are 8 remainder is 1, 3
divided by 3 remainder is 0, 4 divided by 2 answer is 2 remainder is 0. So, that is how
the things are going to work and if you try to take care the number of element which are
not exact multiple length you can see here like this that this 3, 2, 3 is going to be
operated with 7, 9, 3.
19
255
So, 7 divided by 3, the remainder is 1, 9 divided by 2, remainder is 1, 3 divided by this 3,
remainder is 0 and 4 divided by 3 this remainder here is 1. But it will give you warning
message, because there is no place where 2 and 3 can be operated over here, right. So,
that is the way we are going to operate on it. So now, you can see here with this modulo
of division also you have learnt one more topic that how these different types of power
operation integer division modulo division they are operated in the R software.
So, now as you can see now that as we are moving forward more operations are coming
and you need to remember more things. But my advice to you all is that whatever we are
doing now today; please try to combine it with the earlier lectures also. And try to see
how you can do better. There are some more aspects for these calculations, which we
have to learn, but after the lecture today once again you have a good amount of
homework to do.
Please try to take this these operations, try to take some values yourself. And try to do
those calculations manually using your own hand, own pen, own paper and try to verify
it with the outcome of the R software that will give you more confidence that is the way
R is thinking you are also thinking. So now, whatever you want to get it done through the
R you can write an appropriate syntax command function for those things.
So, you try to practice it and I will see you in the next lecture till then goodbye.
20
256
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 11
Built in Functions and Assignments
Hello friend welcome to the course Foundations of R Software and you can recall that in
the last couple of lectures. We had talked about the different type of mathematical
operations which R does and we have understood how R does it. And you have seen that
there is a special way in which the R is doing these type of mathematical computations
and calculations. So, now continuing on this line today we are going to talk about some
more aspect of this computation.
Second option is that somebody already has written a program for finding the sum or the
mean. So, now what you have to do? You simply have to input the data and you have to
use that program suppose the name of the program is sum. So, you have to simply write s
u m sum and within the parenthesis you have to give the input data. And then as soon as
you execute it this function is going to find out the value of sum.
So, the difference is that in this case you do not have to do any programming. But you
can use these programs directly and in this R that is the beauty that you have an option to
write your own programs and you can also use the built in program. So, in the lecture
today we are going to discuss about some popular built in function there are there is a
long list of functions that are available in the R software.
257
But surely you will understand that discussing all those is very difficult in a lecture. So, I
will try to take here some popular representative functions and my objective is not really
to tell you about the functions. But the main objective is, how are you going to use them.
And after that I will try to show you that how you can use these built in functions along
with the simple computation that you have learnt in the last couple of lectures. So, we
begin our lecture and try to understand what it does.
So, as we have learnt now that R can perform all type of standard calculations like
addition, subtraction, multiplication, division, power operation, modulo, division,
integer, division etc. and R also has some built in functions for computation.
258
So, what are those things? So, let me try to take here couple of examples and try to
explain you. Suppose you want to find out the maximum value among some given values
for that your first option is that you try to write down a program yourself or the
alternative is that you can use a built in program and this built in program have been
programs have been given a name.
For example, if you want to find out the maximum of some numbers. So, the name of the
program here is m a x. So, you can call this program by this name and the job of this
program is to provide you the maximum value among the given numbers. So, you can
see here I try to write down here max and within the parenthesis I try to write down the
numbers among them I want to find out the maximum value.
So, you can see here I have taken three numbers; 1.2, 3.4, -7.8. So, you know the
maximum value out of these 3 number is 3.4. So, the output will come out to be here 3.4.
Now, I try to address you one more issue as you have learnt in the earlier lectures that
whenever you are trying to give more than one numerical values in the R software as
input.
Then you always use the concept of data vector and in order to define a data vector you
wrote all the numbers inside a parenthesis and you use the command lowercase c. So,
now, in this case if you try to see in the first option here you have not used the aspect of
data vector to input the values you have simply given the values here 1.2, 3.4, 7, -7.8
without using the command c and now you are trying to use here the command c.
So, you can see here what is the difference there is no difference still if you try to see the
input is the same 1.2, 3.4 and -7.8 and their output is the same as 3.4. So, now, in this
case you can see here that in case if you are using the command c or you are using the
concept of data vector or you are not using the option of data vector the outcomes are
going to be the same. But now I will try to show you here some more example and then I
will try to address what should we do, right.
259
(Refer Slide Time: 06:14)
So, in the beginning I am saying again that I am going to take here sum commands
where I will be demonstrating the application by using the data vector concept and
without using the concept of data vector. So, similarly, if you want to use here the
command m i n.
Then this m i n is used to find out the minimum value among the given values. So, once
again exactly on the same way you can see here that if you are trying to write down here
3 numbers 1.2, 3.4 -7.8. Then the minimum value among all of them is the -7.8.
And in this case you are not using the concept of data vector you are simply giving the
values inside the parenthesis. Now if you try to do the same thing, but you try to use here
the concept of data vector and you input your data using the command c then once again
the answer comes out to be the same -7.8, right.
So, in both the cases in maximum and minimum you can see here that either you are
trying to use the concept of data vector or not in giving the input values the output is the
same and it is not affected by the use of c command. And you can see here this is the
outcome for the minimum and this was here the output for the maximum.
260
(Refer Slide Time: 07:40)
But now I try to show you here something else and now you have to be very careful what
I am trying to show you here. Similarly, in case if you want to find out the arithmetic
mean then the command here is mean m e a n you know that arithmetic mean is defined
that if you have some values x 1, x 2, x n. So, you try to define the arithmetic mean by
their sum divided by the number of observations.
For example, if you have suppose some number 2, 4, 6 and if you want to find out the
arithmetic mean of these numbers this will be 2 + 4 + 6 divided by 3, right. So, in R
software there is a command here mean all in lower case and if you want to find out the
mean of suppose 3 numbers 2, 3 and 4.
So, now, I first try to use my earlier command to give the input data using the simple
concept. That means, without using the concept of data vector without using the c
command and then I will try to do the same exercise after giving the input using the
concept of data vector. So, let us try to see what happens.
When I try to operate here mean of 2, 3, 4 it gives us the answer 2, but in case if you try
to find out this arithmetic mean this is going to be 2 + 3 + 4 divided by 3 which is equal
to here 3. So, this mean here is coming out to be 3, but then R is reporting you here 2.
So, there is a big question mark what is happening? You should get here the answer 3
and that is what I always have suggested you in the past that whenever you are trying to
5
261
do any calculation. First you try to do your manual calculation and then you try to cross
check and verify with the output of the R software.
So, now I change my approach and I use here the data vector approach and I give the
input data using the command c and I give here the command as c 2, 3, 4. And now if
you try to see, this mean is coming out to be 3 which is matching with the arithmetic
mean that you have found manually.
Now in case if you try to see here in these commands, if you try to see here in this
maximum there was no difference whether you are using the command c or not, same
was true with the minimum command also. But now in the case of mean this is changing
and when you are using the command c here, only then it is trying to give you the correct
answer.
So, now what should we do and why this is happening? So, one of the reason that why
this is happening, I do not know, but that is my guess that when this R started then
different people across the world were participating in the development and it is possible
or it might be possible that people developed. actually these different functions were
developed by different people. So, some people might have used the command c to
execute the program and some people might have considered the function to get the input
data without using the c command. So, some people used the c command to input the
data and some people used the simple; that means, inputting the data without using the c
command and gradually all these functions were incorporated in the R software.
So, that is why some functions gives you correct answer with c and some function give
you correct answer with or without c, but ok that is ok that I understand that is
happening, but what should I do as a user. Because if you try to think, you are writing a
program in which you are trying to find out the value of the mean and you have not used
the c command what will happen.
You will be inputting some data value they will be coming to the mean function and
mean function will not be giving you the correct value. But the program is long you will
not be able to understand and or it will be very difficult for you to find whether the final
outcome is correct or not.
262
So, that can be one problem which may happen, but anyway our objective is to find out a
solution there are many problems in life, but more important part is to find out the
solutions. So, one solution is very clear that when you are trying to use the c command
then you are getting the right answer correct answer in all the three cases - minimum,
maximum as well as mean.
So, why cannot we make here a rule that whenever we are trying to input the data, we are
always going to use the c command. At least this will ensure that the output is going to
be correct and this will guarantee that there is no mistake at least in giving the data as an
input for any program.
So, that is my very sincere advice to you all that when you whenever you input the data,
use the c command, use the data vector concept, ok. So, let us now begin our lecture and
we continue with our lecture and we try to now I will be using always the data vector
command to give the input of my values that is clear to me. And I hope that is clear to
you also and it was a convincing argument also, ok.
If I ask you a very simple question that you should find out what is happening in this first
case where you are trying to find out the mean of 2, 3, 4 and it is giving you value the 2.
Actually in this case it is simply trying to read the first value and after that it is not
reading the remaining values 3 and 4.
So, that is why it is giving you 2 divided by the total number of values which is here 1
and so it is giving you here the value 2 that is what, that is happening. So, you have to be
now careful, ok and then you can see here this is the screenshot. So, now before moving
further let me try to show you these operations on the R console also, right.
263
(Refer Slide Time: 14:01)
So, you can see here if I try to use the command here maximum. So, let me give the data
without using the c commands. We can see here 23, 170, -98 etc. and you can see here
this is giving you the value 170. But in case if you try to use here the c command also it
will give you the same value and in case if you try to use the minimum command.
So, let me try to use here the same value. So, you can see here the minimum of 23, 170
and minus 98 is -98. So, you can see here the answer is -98. And in case if you try to use
here the c command here then also you will get the same value, but in case if you try to
use here the mean say mean of 2, 3 and 4 you can you see here this is here 2.
But in case if you try to use here the c command and you input your data as data vector,
you can see here it is giving you the correct value three. So, you can see here that using
these built in functions is not a very difficult thing, right.
264
(Refer Slide Time: 15:16)
And similarly means, there are many more functions some common useful functions
which are there in the R software are as like as a b s which is used to find out the
absolute value, s q r t this is for finding out the square root. Remember when we were
trying to learn the computation with the power operators we had considered the function
like square root of 2 which we had given as 2 hat 0.5.
But now you can also give here it like that s q r t and inside the parenthesis, just write 2
and similarly we have function like here round, floor, ceiling. They are used for rounding
of the number floor or say ceiling the number you know that these are the basic
mathematical operations.
And similarly if you want to find out the sum or product of some numbers then we have
the command here sum s u m and then p r o d product. So, if you simply try to give here
some numbers inside the data vector inside the parenthesis, then it will directly give you
the sum of those number and similarly different types of log functions which are also
available here log, log 10, log 10 is for the log logarithmic value when the base is 10 and
so on.
This exp this is used for exponential functions. Then we have the trigonometric function
like sin, cos, tan and cosec is something like asin, sec is like acos and cot is like atan
right. So, this is something like here cosec, sec, cot. And similarly we have here
265
hyperbolic function also sin h, cos h, tan h, asin h, h a cos h, atan h etc. So, using these
functions is straightforward the way you have used the functions like minimum
maximum and mean similarly you can use all of this function exactly in the same way.
So, I try to take here some example so that I can explain you the different aspects of
using these functions. So, for example, if I try to use here the operator abs so this abs is
used for finding out the absolute values. So, if you try to see here the function abs -4. So,
the absolute value of -4 is actually 4. So, it will give you the answer here 4 and yeah
these functions wherever needed. They can also be operated over the data vectors and
they will try to operate on each of the element inside the data vector.
For example, if you try to see here if you find the absolute value of say -1, -2, -3, 4 and 5
and you use the concept of data vector to input the values. So, you can see here this
absolute function will go inside it and then it will try to find out the absolute value of -1,
absolute value of -2, absolute value of -3, absolute value of 4 and absolute value of 5. So,
this will give you the answer 1, 2, 3, 4, 5 and you can see here this is the screenshot here,
ok.
10
266
(Refer Slide Time: 18:33)
And similarly if you want to find out the square root of any number you simply have to
use s q r t and within the parenthesis you have to give the number. For example, if I try
to find out the square root of 4, this is here 2 and similarly if I try to use here the data
vector. So, the I try to take here the value 4, 9, 16, 25 and I try to find out here the square
root of all these values.
So, when I try to operate it this square root function goes inside the parenthesis and it
becomes square root of 4, square root of 9, square root of 16 and square root of 25. So,
this will become here 2, 3, 4 and 5 ok and this is here the screenshot.
11
267
Similarly, if you try to find the sum of the values for example, if I want to find the sum
of 2, 3, 5 and 7; that means, I want to know the value of 2 + 3 + 5 + 7. So, then I can use
here the sum and using the data vector I can try write down all these values and this
value will come out to be here 17.
And similarly if I want to find out the product of some numbers for example, the product
of 2, 3, 5 and 7 which is like 2 into 3 into 5 into 7. So, this will come out to be here 210.
So, you can see here it is very convenient to do the mathematical computations in this R
software.
And similarly if you want to find out the round off value of 1.23. So, we know that in
case if the number is more than 1.5, then it is rounded off to 2. Otherwise it is rounded
off to 1. For example, so it is here 1.23 which is less than 1.5. So, it will be rounded off
to 1, but if I try to take here 1.83 which is more than 1.5, so it will be rounded off to 2.
So, you can see here these operations are very simple and very easy to use and similarly
if you want to use here the log.
12
268
(Refer Slide Time: 20:31)
So, if you try to see here if you try to find out here the log of 10. So, log of 10 here is
coming out to be 2, 2.3. Actually, if you try to see here this log function you have to
remember this gives you the natural log which was your log with respect to the base e
and it is not the log value with respect to the base 10.
Why? Because log of 10 when the base is 10 is equal to 1 and if you want to verify it
here you can see here that if you want to find out the log of exponential of 1. So, this is
something like l n of e or this is here log of e base e. So, this value will come out to be
here 1.
So, log is a used for finding out the natural log values and yeah this operation is valid on
the data vector also. So, if you try to write down here log of c 10, 100, 1000. It will give
you the values of log of 10, log of 100 and log of 1000, ok.
13
269
And similarly if you want to find out the log the base 10 then you have to use the
command log 10 right. So, if you try to see here log 10 and the of the value 10 is here 1
and similarly you know that log of 100 with the base 10 is equal to 2. So, this will come
here come out to be here 2.
And similarly if you try to use here the data vector then the log of 10, 100 and 1000 will
be like a log of 10, log of 100 and log of 1000 which will come be coming here as say 1,
2, and 3, right. So, let me try to first give you the example of these things. So, that you
get here more confidence that how they will be working on the R software.
So, if you try to see here I will try to take the example of all the numbers whatever I have
considered. So, I will try to take a absolute value of here say -8. So, you can see here this
is here 8 and if I try to find out here absolute value of here some data vector say -8, -9,
then 1 and 2.
So, you can see here that it is coming out to be like this the absolute value of 8 is 8, the
absolute value of -9 is 9 and absolute value of 1 and 2; they are 1 and 2. So, you can see
here that the absolute values of -8, -9, 1 and 2 they are 8, 9, 1 and 2 respectively.
Similarly, if you try to find out here square root, so square root you can see here suppose
I square root of here 9 this will be here 3. And earlier in the earlier lecture you had found
the square root of 2. This was 1.414 which is coming here the same and yeah if you try to
14
270
see here I can show you here this is square root of 2 which you try to compute this is the
same value.
And similarly if you try to find out the square root of here some values like here 2, 3, 4, 5
and so on. So, it will come out to be like this. So, this is like a first value is 1.414 is the
square root of 2, then the remaining these 3 values are the square root of 3, square root of
4 and square root of 5, right. And now if you try to suppose, for example, give here some
negative value. What do you expect?
This will give you, there is some warning because negative number cannot have the
square root. So, it is giving you here NaN, right. So, I will try to discuss about what are
this NA, what is NaN, what is NULL, etc. in more detail in the forthcoming lectures. But
here I wanted to show you that if you are trying to do some wrong mathematical
calculations the R will give you here a warning message, right, ok. So, now look let us
see that what are the other operation that we have to do which is here sum and product.
So, you can see here the if you try to see here sum of c say 2, 3, 4, 5 etc. whatever you
want it will give you here like I say this and if you want to find out here the product of
these numbers. So, you simply have to give here the product and if you want to just
verify it here. So, you can see here 2 + 3 + 4 + 5 this is giving you here 14 and 2 into 3
into 4 into 5, this is giving you here 120, right.
15
271
So, this is how you can do this sum and product of the operations and now if you want to
use here the round. Suppose if I say 100.4, so this will be your here 100 and if I try to
take a 100.6, it will become here 101. So, that is the way this rounding of the values in
mathematics actually work, right.
So, now, in case if you try to find out here the value of log. So, you can log you can see
here log of say here 10 this will give you here this value. But if you try to find out here
the log of say exponential of here 1. So, you can see here this value will come out to be
1. So, that ensures that the log function is trying to find out the natural log. So, if you
want to find out the log of a data vector. So, you can see here 10, 12, 17 etc. and you can
see here there are three values which are representing the log of 10, log of 12 and log of
14; three values, right.
Similarly, if you want to find out here the log value with respect to the base 10, so you
know that log of 100 when the base is 10 is 2. So, you can see here this is here 2 and if
you want to make it here more details you can see here log of the say values like here
100, 200, 300. And if you try to enter here you will get here the value of say, log of 100
is 2, log of 200 is 2.301030 and log of 300 is 2.477121. So you can see here finding out
these values is not difficult. Similarly if you want to find out the value of exponential 2
for example, this is 7.38.
16
272
(Refer Slide Time: 27:27)
So, this is how you can move further without any problem and I just want to show you
here one more operation. So, that will really help you when you are trying to write your
own programs. So, whatever values you try to give in a variable in the form of a data
vector that can also be assigned to a variable.
For example, if you try to see here I give here an input x1 as say data vector of the values
1, 2, 3, 4 and then I define a new variable x1 square and I try to store it in a new variable
x2. So, you can see here this x2 will give me here the value 1, 4, 9, 16 which are
something like the if square of 1, square of 2, square of 3 and square of 4, right. So, this
is also possible. So, you can see here these are the outcomes on the R console, right.
17
273
Now, I want to give you one more idea. So, now, you have learnt two types of operations
one with scalar with data vector and with this built in function. So, actually you can
combine any one of them together and the same mathematical rules will be followed. For
example, if I try to take here data vector c 1, 2, 3, 4 and then I try to take here a function
of sum of 1, 2, 3, 4 and then the product of 1 and 2, right.
So, if you try to see here the parenthesis is here. So, first of all the product of 1 and 2 this
will be here 2 and then this will be here c 1, 2, 3 and 4. So, this value will be multiplied
by here 2 for finally, and this will become here sum of c 2, 4, 6 and here 8 and then sum
of this will become here 2 + 4 is 6; 6 + 6 is 12 and 12 + 8 is 20. So, this entire value will
become here 20.
So, now this 20 is going to be added in this first data vector. So, this will become here 20
+ 1, 20 + 2, 20 + 3 and 20 + 4. So, that is the same exercise which you did when you try
to add a scalar with the data vector, right. Similarly if you try to find out the absolute
value of some function.
So, the values are contained inside this parenthesis and if you try to see here this is once
again the same thing c 1, 2, 3, 4. So, this is here 1 + 2 is 3; 3, 3 6; and 6, 4 here 10 and
then product of here this is here 1 into 2 this is here 2. So, this is here 10 into 2, so whole
this value is going to be 20.
And now this 20 is going to be subtracted from this data vector. So, this will become
here 1 - 20 -19; then 2 - 20 - 18; 3 - 20 - 17 and 4 - 20 - 16 and then you try to operate
the absolute function over this and this will give you here the values 19, 18, 17, 16.
So, you can see here that these things are not very difficult, but let us try to operate them
on the R console and try to see do they really work here work or not, right. So, let me try
to copy these values and then try to operate. So, this will save our some time.
18
274
(Refer Slide Time: 31:07)
So, before going further let me try to show you this example that I try to take here c as 2,
32, 45 like this and then I try to define here x2 which is here say here x1 say cube. So,
you can see here x1 here is like this and x2 here is like this. And in case if you want to
define here either values here say x3 here s square root of say here x2. Then also you can
do it so you can see here the x3 will comes out to be like this. So, I have not made any
manual calculation, but R is doing my job.
19
275
And similarly if you try to see here if you try to do the same operation here, you will get
here 21, 22, 23, 24 and if you try to do this operation here on the R console, you once
again you get the same outcome. So, now, you can see here that when we are trying to
work with built in function first of all it is not difficult at all and it is not very difficult to
understand how R is working on it.
And I hope you will appreciate that a built in functions helps us a lot and whenever we
are trying to do some complicated calculations, lengthy calculations, then they help us a
lot. And this is one of the very important feature of the R software which made the R
software so popular.
So, now, I stop in this lecture, but as I always say, now once again today you have a
good amount of homework. And you need to practice, try to consider these built in
functions, some functions I have considered some function I have not considered. Try to
consider them try to see whether they are working or not try to use some scalar values
data vectors.
And try to see are they working or not and above of all whatever you have learnt in the
last couple of lectures with different types of operation with the scalars data vectors etc.,
try to combine them with the built in functions and this will give you a more deeper
knowledge. You will have a more whole of the working of the R software. So, you try to
practice it and I will see you in the next lecture, till then good bye.
20
276
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 12
Matrices
Hello friends. Welcome to the course Foundations of R Software and you may recall that
in the last couple of lectures so we have learnt the different aspects of calculations and
computation in the R software. We have learnt about how to do the calculations, with the
scalars, with data vectors and with built in functions.
Now, continuing on the same line, another very important part of calculation is matrix.
You know what is the matrix and the rules for matrix operations, they are different than
the usual mathematical operations. So, the next question before us is how to handle the
matrix in the R software and then, how to do various types of calculations and
computation related to the matrix.
Well, this matrix is very important because when you are going to use the R software for
various applications in statistics and other areas, many times, the input data has to be
there in the format of matrix. The output data is also many times in the format of matrix.
So, how to understand it, how to read it and how you can extract a particular part of the
matrix, it can be an element, it can be a row vector, it can be a column vector or it can be
a sub matrix.
Then after that there are many operations like as transpose, inverse, for finding out eigen
values etc., which are very popular when you are trying to use the statistics and other
sciences where you want to do the calculations. So now, in this lecture and in the next
couple of lectures my target is that I would like to discuss about the matrices.
So, in this lecture, I will try to give you basic idea that how you can define the matrix
inside the R software. And then in the forthcoming lectures, I will try to take some
common popular operations, which are related to matrices. So, let us begin our lecture
and try to understand first that how we can handle the matrix in the R software, ok.
277
(Refer Slide Time: 02:31)
So, the matrix theory, you know that matrix theory plays a very important role in doing
modeling and other type of things and in R also matrices are important objects in any
calculation. The first question comes here what is the matrix? I believe that you all have
a reasonable background in the matrix theory, but just for the sake of a quick revision,
means I can tell you a matrix is a rectangular array in which the values are arranged in
rows and columns.
For example, if I say that there is a matrix which has got p rows and n columns, right. So,
this looks like this, I mean there are rows here 1, 2 up to here 3 rows and then there are
here columns, 1, 2 etc. and columns and the data is like arranged like 1, 2, 3, 4, 5, 6 and
so on, that you know. Now, the values inside this matrix they are the elements of this
matrix and an element in the ith row and the j th column is denoted by say X i j like this,
X and the subscript i j.
Well, that is the notation, when we try to do this matrix algebra on the blackboard in the
book. But when we try to write it down in the say program version means how to write it
in the software, then a popular notation is like this one; X and inside the square brackets,
you try to write here i and j or i and j they are going to be some values. For example, i
can be any values the value between 1 to n and j can be any value between 1 to p.
278
Now, you can see here, that for the first time, I am now showing you what is the
application of this square bracket and you can recall that at some point of time, I had
shown you that when we are trying to do the mathematical calculations, then we have
this type of simple brackets, then curly brackets and then say square brackets and these
brackets have got some important role in the R software.
So, now this is first place where I can show you that this square bracket has a different
role, than the mathematical operations. So, now, in case if you try to understand the
matrix in a broader way, then the element of a matrix can be an object, for example, a
string also. But in mathematics, we are mostly interested in the matrices which have got
some numerical values and whose values are some real numbers, right.
So, the first question comes here; how can you create a matrix in the R software? So, we
know that whenever we want to create a matrix, there are three ingredients which are
needed. We want to know the total number of rows, then total number of columns in the
matrix and then we want to know what is the data that has to be arranged in these
number of rows and columns. And after that many question comes whether you want to
arrange the data row wise, column wise etc.
So, you will try to seek answers for all those questions. So, the first thing I would like to
discuss here that how to create the matrix. So, in the R software, matrix is the command
279
to create a matrix which is m a t r i x and all are given in the lower-case alphabets. And
after that within the parenthesis, you have to indicate the basic information that how the
matrix has to be created you want to inform the total number of required rows columns
and the data.
So, in order to inform the matrix that how many rows are going to be there, we have a
command here n row; n row and in case if you write n row equal to for example, 4. So,
this will indicate that there are 4 rows in the matrix. Similarly, in order to know the
number of columns in the matrix, we have the command here n c o l ncol which means
number of columns.
So, if you write here ncol is equal to 2 this will indicate that there are going to be two
columns in the matrix. Now when you have 4 rows and 2 columns, then you need to have
4 into 2 is equal to 8 data values, which are to be arranged inside the matrix in rows and
columns. And you know any particular element in the matrix has an address and that
address is indicated by the row and column. For example, I can say the element in the
second row and third column.
So, in order to give the input data inside the matrix, we have a R command here d a t a,
data and now after this you have to write data is equal to, but after that you have couple
of options, that you can give the data in the form of a sequence or individual values or in
continuation etc. But anyway at this moment, we have not done many things in the R
software. So, I would try to keep it here very simple and I try to take here eight values 1,
2, 3, 4, 5, 6, 7, 8 and I use the format of data vector.
So, these values are going to be arranged in 4 rows and 2 columns. So, in case if I try to
assign this command in a variable here x, and if you try to execute on the R console then
x will look like this. So, you can see here, this is indicating the first row let us say r 1
then second row I say r2 third row say r3 and fourth row say r4. And these are the
columns.
So, this is suppose here column 1 and this is here column 2. So, now, you can see here
there are 8 positions here where the data has to be placed. So, now, all these values
which are given in the data vector they will be placed inside this matrix in these eight
280
values. So, you can see here how these values are going to be placed. The first value if
you try to see the data is in this order 1 to 8.
So, the first value comes here, then the second value, then the third value, then the fourth
value and after this the control goes to the first row in the second column. So, first four
values are inserted in the first column in four rows. Now, after this the 5th value is going
to be entered, then 6th value is going to be entered, then 7th value is going to be entered
and then 8th value is going to be entered.
So, now you can see here that this data is entered like this. So, the data is entered here.
How- Column wise that first the data is filled up in the column and then in the rows. Yes,
that can be interchanged also. So, how to get it done? These are the questions that we are
going to understand in the forthcoming slides. So, this you can see here this is the
screenshot, if I just write here like this and then you get here this type of matrix, this type
of output, ok.
So, now if you try to see in this matrix command, I have explained you what is the role
of nrow, what is the role of ncol and what is the role of data and these data values are
written column-wise. So, that is what you have to understand.
281
(Refer Slide Time: 10:55)
Now, the next question comes here how do you identify or how do you access a
particular value in the matrix. So, one thing I can share with you that is coming from the
matrix theory actually, that when I try to write down here x say here 1, 2. So, this means
this is the value in the first row and second column. So, this 1, 2 is indicating like here
the position of row and position of column.
So, for example, in this matrix which is given here, if you try to see in case if I see here x
1, 2. So, 1 means the first row like I say like this and 2 is the second column so like this.
So, this value at the intersection of these two lines this is the value of x 1, 2 which is
equal to here 5. So now, the same thing I want to do in the R software. So, for that the
question is what type of command are we going to use? So, in case if you want to access
a single element for, of a matrix say x, because you have to give the matrix a name. So,
here I have given the matrix name as x, then inside this square brackets you have to first
write the position of the row and then you have to write the position of the column. From
there, you want to access that element. For example, in case if I want to access the
element in the third row and second column, then I have to write down here x third row;
that means, r equal to 3 and column is the second, so c is equal to 2.
So, now if you try to see here, if I try to write down here x 3, 2. So, what is this value?
This is here my row number 3 and this is my here column number 2. So, this value at the
intersection is 7. So, now, as soon as you write here x 3, 2 and the 3, 2 they are separated
6
282
by comma and then they are written within the square brackets, ok. So, this value will
come out here 7 and this 7 is actually here, this 7. And if you try to execute it on the R
console also there will not be any problem, ok, right.
So, before I move further, let me try to give this example, it on to you on the R console.
So, let me try to define my here matrix here like this you can see, x is here matrix nrow
equal to 4, ncol equal to 2 etc. And if you try to enter here, then you try to see the value
of here x this comes out to be like this. So, you can see here now the values are entered
1, 2, 3, 4 column wise and then 5, 6, 7, 8, right and in case if you want to access any
particular element from here, you can write the name of the matrix x. And then suppose
if I say here 4 comma 2.
So, 4 comma 2 will come here 8, because this is the fourth row and the second column
and this value here is 8. And similarly, if you try to take here say x 2 comma 4, then what
will happen do you what you expect this is your here second row and fourth column. But
where is the fourth column? I have only had two column, column number 1, column
number 2.
So, let us see what R does, it will give you error. That the subscripts out of bounds; that
means, you are trying to give the value of the column which is not in the range of the
columns which are present in the matrix.
283
So, rather if you try to write down here something like x 2, 2 what does this mean? This
is the second row and second column. So, the value here is 6. So, if you try to enter here,
you will get here the value 6. And in case if you try to use here some other bracket like x
2 comma 2, you can see here this will not work because it cannot understand what this
parenthesis is trying to indicate. So, you have to use only the square bracket that is going
to make the R understand that this is a command for the matrix.
So, now we come back to our slides and try to see some more operations, ok. So, now
you have seen, that when you are trying to define a matrix in this particular way as you
have done, this data is going to enter column wise that you can see here, the data is
entered here, column wise, like 1, 2, 3, 4 and then 5, 6, 7, 8 like this.
So, now, my objective is that suppose I want to enter this data row wise. So, how should
I control it? So, when you are trying to enter the data column wise, 1to 8, it goes like this
1, 2, 3, 4 and then it goes to second column 5, 6, 7, 8. But when you want to write down
the data in row wise, it means that 1 after this it will be 2 will be assigned in the second
column and then it comes to 3 and then it will go to 4, then it comes to 5 in the first
column and 6 in the second column, then 7 comes in the first column and 8 comes in the
second column.
284
So, now, how to control it? So, for that we have here an option which is b y r o w,
byrow; this by row this is a logical variable. So, this can take two possible values, one is
here TRUE and another here is FALSE. So, now, in case if you write byrow is equal to
FALSE and try to think for a while, what does this mean? You are trying to say by row is
FALSE; that means, the data has not to be arranged by rows. When the data is not to be
arranged in rows then what is the next option? The data has to arrange in the columns
only.
So, in this case, if you try to see here the outcome comes out comes out to be like this
data is arranged in the column, but what is this mean? If you try to compare, this
command which is not using byrow and the second command which is using here the
byrow. When you are trying to use byrow then or you are not using the byrow, it is
giving you the same outcome.
This means when you are not using the option of byrow then matrix command is
assuming byrow equal to FALSE as default. That either you will give it or not, it will
always assume that byrow is equal to FALSE.
So, now the question comes if you want to arrange your data row wise, then what you
have to do that is now very obvious simply try to use byrow is equal to TRUE. And in
case if you try to use the same command, but now I am using it in a different variable
285
name say here y. So, matrix command with nrow equal to 4, ncol equal to 2 and data is
the same, but by row is equal to now TRUE. Now, you can see the outcome of y.
So, you can see here 1 is coming here, then it goes to 2 in the second column. Then the
next value comes with the first column, it is here 3 and then it goes to the fourth column
which is here 4 and then it comes with the first column as 5, then go to the next column
as 6. Then it comes to the first column as 7 and then goes to the next column as 8. So,
you can see here this operation is going like; this is row wise assignment of the data.
So, now you can see this is in your control that how you want to assign the data in the
matrix.
And if you try to see here the screenshot, that when you are trying to see here there is no
option, by row is equal to TRUE or FALSE you are not writing anything, then the data
is this here like this column wise. And when you are writing here by row is equal to
TRUE then the data here is row wise like this.
10
286
(Refer Slide Time: 19:15)
So, let me try to show you these operations on the R console also so that you get here
more confident. So, you can see here, this is your here in matrix x, where you have not
used any command over here and you can see here this is your here x. Now, in the same
command, let me try to reduce the font so, that you can see everything on the same scale,
suppose I make it here 16.
11
287
So, now you can see here, that I try to add here byrow is equal to FALSE. Now, you can
see here the value of here x, this is here like this. Let me reduce the font size to be like
this so that you can see clearly. Yes, now it is clear. So, you can see here when you are
trying to use here byrow equal to FALSE, then the outcome here and here that is the
same. But now in case, if you try to use here this outcome and you try to make it here
true.
And you try to assign it in a new matrix, say y then you can see here y. Now, you can see
here, when you have by row equal to FALSE then the data is like 1, 2, 3, 4 and when you
are trying to use here by row equal to 2, then the data like 1, 2, 3, 4 and so on. So, that is
the difference between the two, right, ok. So, now, let me come back to our slides and try
to see some more operations, right.
So, now let us try to consider the same matrix and with then we try to learn some more
operations on this matrix x, which has four rows and two columns with the data 1 to 8.
So, this is your column wise arrangement and we try to learn here something here more.
12
288
(Refer Slide Time: 20:48)
Now, whenever you are trying to use a matrix. Suppose some matrix is given to you and
you want to know the information about this matrix.
So, one information in which you are always interested in the dimension of the matrix,
that how many number of rows are there, how many number of columns are there and so
on. So, in order to know the dimension of the matrix we have a command here d i m,
dim. And within the parenthesis if you write down the name of the matrix it will give
you the information like is here, like this 4 comma 2, 4 is the number of rows and 2 is the
number of columns.
And in case if you want to know only the number of rows, then the command here is n
row and parenthesis inside the parenthesis you write the name of the matrix. So, this will
be your here nrow and say inside parenthesis x and the answer will come out to be here
4. Why? Because you can see here in this matrix x, you have here 1, 2, 3, 4 rows and
how many columns are there 1 and here this 2.
So, now in order to know the columns of a matrix, you simply have to use the command
here ncol; n c o l and inside the parenthesis you have to write the variable in which the
matrix is saved. And it will explain you the number of columns present in the matrix. So,
it is here 2. So, you can see here these are not very big operations.
13
289
(Refer Slide Time: 22:08)
And similarly, if you want to know the mode of the matrix, do you remember mode we
had done in the earlier lecture, where we had numeric character its etc. and remember
this mode was not the mean, median, mode, this is statistical mode.
So, this is going to inform us that how the values in this matrix x are stored. So, this is
you can see here numeric and similarly we have here one more command here attributes.
So, this also provides all the attributes of an object actually.
So, in this case because this is a matrix so, if you try to write down here attributes x
inside the parenthesis, then it will give you this type of information which is trying to tell
you the dimension of this x which is here 4 and 2, which is 4 rows and 2 columns. And
the spelling of this attributes is a double t r i b u t e s in all in lower alphabets lowercase
alphabets, small alphabets.
14
290
So, this is how you can obtain this type of information. Now, in this matrix there are
many other options which are available that how you can handle this matrix. So, I will
not go into all the details, but I will request you is that you try to see into the help matrix.
So, try to type help and within double quotes, type here matrix it will give you all the
description matrix as matrix, is matrix etc.
And after that it will give you the usage then the details about different arguments, dim
names and then after that it will give you these type of detail.
15
291
I have simply taken it from the help which is available in the R, right. So, I will not take
detail it here, because now you know all the things I already have explained you the
concept of say is and as for example, if you use here this command as dot matrix. So, this
will simply show you whether the variable here is in the format of matrix or not and in
case if you want to convert any number into a matrix, then you have to use as dot matrix
right.
So, all these things are there, I would request you that you please try to look into this
help menu and try to read it here. And I will try to show you these things on the R
console about these two operation dimension nrow, ncol and more attributes. So, let me
try to show you these things on the R console.
16
292
So, you can see here now, let me try to see here y is our this matrix and if I want to find
out the dimension of this here y, this is here 4, 2. In case if I want to find out the nrow in
the matrix y, you can see here this is here 4. And in case if I want to find out their ncol of
this y this is here, like this 2.
And if I try to find out here the mode of y, this will be here numeric and if I try to find
out the attributes of here y, this will be like 4, 2. So, you can see here getting the
information about the matrix is not difficult at all ok. So, now we come to an end to this
lecture this was a very simple short lecture. Short in terms of the content of the lecture.
My idea was that ok this is the first lecture where you are trying to understand the
concept of matrix for the first time. So, I should stop here. So, that you get some time to
do all these operations yourself in the R console. It is not difficult, the only thing is this
you have to settle down these concepts in your mind that how this nrow, ncol, data,
byrow, etc. they are going to work.
Once you are familiar with these objects, after that I promise you handling with the
matrices is very simple and it will be very easy for you to do such operation. So, you try
to have a quick revision, try to practice it and I will see you in the next lecture with more
details on matrix operation till then good bye.
17
293
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 13
Matrix Operations - Row, Column and Other Operations
Hello friend, welcome to the course Foundations of R Software and you can recall that in
the last lecture we had talked about the matrix operations. Actually we had learnnt that
how we can define the matrix in the R Software and we had learnt that there were three
ingredients, three main ingredients like a number of rows, number of columns and data
which are needed to define the matrix in the R software. And after that we also had learnt
about the by row operation that if you want to assign the values column wise or row
wise.
So, now we continue on the same lines and in this lecture we are going to take up some
more operations which are related to the matrix theory and essentially these are very
simple operations and which are related to rows, columns and some more operation like
transpose etc. So, we begin our lecture and we try to understand these concepts that how
you can implement them in the R Software.
So, consider here a matrix in this particular format that there are 4 rows and 3 columns
and data I am giving here there from 1 to 12.
1
294
So, this is another way of giving the data like as 1, 2, 3, 4 up to here 12 and you can see
here there are 4 rows and 3 columns. So, we need here 4, 3 are 12 number of values
which are given here from 1 to 12. So now, this and this gives us this following matrix.
So, you can see here this is my 1st row, this is 2nd row, this is 3rd row and this is here
4th row and similarly here this here is 1st column, this here is 2nd column and this is
here 3rd column, you will just try to follow my pen right where I am writing.
So, now you can see here that this rows and column they have got some name. For
example, in case if you try to look at this row number 1, so it is indicated by square
brackets like this one then 1 and comma and similarly in case if you try to see here this
row number 2 this is indicated by square brackets 2 comma. And similarly this 3rd and
4th row are also indicated by square brackets 3 comma and 4 comma. And similarly if
you try to look into the column, columns also have been given a name which is square
bracket then comma and then 1 this is column number 1.
Similarly, column number 2 here is a square bracket then comma and then 2 and 3rd
column is like here square brackets comma and here 3. So, these are the names which are
automatically given by the R software to these rows and these columns. And suppose we
want to rename these rows and columns that we want to give them another name of our
choice. So, in order to do this, we have a command here rownames r o w n a m e s and
inside the parenthesis you write x and then colnames that is c o l n a m e s this will
rename the names of the columns.
295
So, if you try to look here in these two operations, suppose I want to change the names of
the rows and suppose I want to give them a name like say r 1, r 2, r 3, r 4 like this. So, for
that I write here row names, all in lower caps and inside the parenthesis the name of the
matrix and then I use the c command to create a data vector and then I try to give these
names within this double quotes can see here.
So, these are the characters. So, this will work like that I want to change the names of the
rows in the matrix x with these names and if you try to operate, it will change it. For
example, now if you try to see at this outcome you can see here this is now changed, ok.
And now similarly if you want to change the names of the columns here which are here
given here like this.
So, I use here the command colnames and inside the parenthesis I write the name of the
matrix x and then I try to give the names which we want to replace. So, suppose I decide
that the names of the columns are going to be c1, c2, c3, so I then so I try to give them in
the format of data vector using the command c within the parenthesis and you will and
all the names inside the double quotes.
So, that they are the characters and if you try to execute it then this function colnames
will change the names of the columns of this matrix x and you get here c1 c2 c3. So, this
is how you can rename the names of the rows and columns in any given matrix.
296
For example if you try to see this is here the screenshot of the same operation when you
try to do it on the R console.
So, let us try to do this operation on the R console and try to see that what do you get
here?
So, first we try to define here this matrix x. So, you can see here this is here like this and
now then I try to operate here the command rownames on this matrix x. So now, if you
297
try to see here this x this will come out to be here like this, where the earlier names were
like this 1 2 3 4, but now the new names are here r1, r2, r3 and r 4. And similarly in case
if you try to change the name of the column.
So, let me try to give some other name suppose if I give here the command here column
names of this matrix x, suppose if I try to give it here say c say apple, banana and say
oranges. So, even you can do these things you are simply giving the name. So, now if
you try to see here, now the names are here apple, banana and oranges. So, this is how
you can change the names of the rows and columns in any given matrix. So now, we
come back to our slides and we try to learn some more operations.
298
Now, suppose you need a matrix in which all the values should be the same, well these
are the different type of operations which we need when we are trying to handle the real
data set and it is useful. So, my objective here is that I want to create here a matrix in
which all the values are the same.
So, suppose I try to create a matrix of order 4 by 2; that means, there are four rows and
two columns and I want that in this matrix all the values should be 2. So, in order to do
this thing I simply write down here data is equal to 2 and if I use all these parameters in
the matrix command then and assign the value in a variable x, then x will come out to be
like this. You can see here this is the 1st row 2nd row 3rd row 4th row, 1st column 2nd
column and all the values are here only the 2.
And similarly if you want you can choose any other value and this is here the screenshot
of the same operation when you try to do it in the R console now.
Similarly, when you are trying to do matrix operation, many times we need a diagonal
matrix, the diagonal matrix is a matrix in which the off diagonal elements are 0 and there
are only some nonzero values in the diagonal you know that if you have a square matrix
suppose then all these values on this diagonals, they are called the diagonal element and
all these values on this so called triangles here and here they are called as off diagonal
elements, these are off diagonal, right.
299
So, in case if you want to see here then we do here then you have to simply use here the
command diag and you have to define here the values which you want to put on the
diagonal. Suppose I want to create an identity matrix, identity matrix is a matrix in which
the diagonal elements are 1 and the off diagonal elements are 0. Suppose I want to create
a 3 by 3 identity matrix and in that case I will write down here nrow equal to 3 and ncol
equal to 3.
And then if I try to operate it, it will give me here this matrix in which you can see here
the values of the diagonal element is only 1 and all other off diagonal elements they are
0. Well, in case if you want you can also do the same operation using the matrix
command, but in that case you have to be very careful when you try to define the value
of the data and the option by row. This data has to be assigned in such a way, so that it is
arranged in the way I want. So, in order to avoid those complications this diag command
directly gives us the diagonal matrix.
And similarly in case if you want to choose here some other value and the diagonal
matrix suppose I want to create matrix in which the there are three rows and three
columns and all the values on the diagonal element they are 5 and all the other values on
the off diagonals they are 0.
300
So, in that case I have to simply use the same command here diag and then nrow equal to
3, ncol equal to 3. So, that it generates a 3 by 3 matrix and the value on the diagonal
elements is 5. So, if you try to see here it will give you this type of operation. And now
you think that in case if I want to have a matrix in which the diagonal elements are like
1, 2, 3 and off diagonal elements are 0.
How to do it, think about it? But before that let us try to first understand these operations
on the R console, so if you try to see here if you want to create such a matrix in which all
the elements are going to be 2.
Then let me try to show you here if you use the same command it gives you here 2. And
if you want to give here some other value say 20, then if you try to see here this will give
you all the values in the 4 by 2 matrix as 20.
301
(Refer Slide Time: 10:04)
And similarly in case if you try to define here a diagonal matrix say diag, then the data
here is 1 and suppose I define here nrow is equal to 4 and ncol is equal to 4, then this will
give me an identity matrix of order 4 like this. And if I try to change the value here on
the diagonal, suppose I want to have a 100. So now, you can see here this matrix here is
like this where all the diagonal values are here 100.
And as I ask you that if you want to suppose give here a values like here four values 1, 2,
3, 4 like this. So, then how to give it means on the diagonal the values are 1, 2, 3, 4 and
9
302
off diagonal elements are 0. So, I can use here my logic and I can give here the data in
the using the data vector command and let us see what happens you can see here now so
this 1, 2, 3, and here 4.
So, you see now you can think that how you can combine different types of operations,
which are possible in matrix theory and then try to experiment with them try to play with
them and try to see what is really happening, right.
So, now after this I come to some more operations, I am sure that those who are familiar
with the matrix theory, they must be knowing about the transpose of a matrix. Transpose
of a matrix that means, the rows and columns are interchanged. For example, if I try to
take care of very simple example like as here if I take a matrix here 1, 2, 3, and here 4.
So now, in case if I try to find out the transpose of this matrix then these rows and
columns will be interchanged.
So, this first row this will become here like first column 1, 2 and this 2nd row will
become here 2nd column like this. So, if you want to do such an operation in the R
Software then how to get it done? So, for that suppose we define here a matrix of order 4
by 2, which has the data values 1, 2, 3, 4 up to 8 and by row is equal to TRUE and you
know that instead of using TRUE and FALSE we can also use here capital T and capital
F respectively.
10
303
So, I try to use here t and then gives me here this type of matrix in which the
observations are row wise arranged you can see 1 2 3 4 5 6 7 8.
Now, I try to find out its transpose, because the command to find out the transpose of a
matrix is simply T and inside the parenthesis, you have to write down the matrix. So, if
you see here I try to write down here T and inside parenthesis this is x and I try to give it
a name say here xT and now you can see here this is changed.
So, you can see here from the screenshot that this column which was 1 3 5 7, now this
becomes here a row. First column becomes first row and this 2nd column this is 2, 4, 6, 8
this becomes here a 2nd row. So, this is the transpose operation and you can very easily
do it in the R software.
11
304
Similarly, in case if you want to find out the sum of the values in the rows and columns,
then how to get it done? So, I will just take an example to explain you. So, suppose I
define here a matrix like this which is a 4 by 2 matrix with the data 1 to 8 and it is here
like this. So now, in case if you want to find out the sum of the elements on the rows and
columns, what does this mean? If I try to say here the sum of the values in the first row
this is here 1 plus 5 which is here 6.
And similarly, in the 3rd row it is 3 plus 7 which is equal to here 10 and similarly in case
if you want to find out the values in the first column and try to find out their sum, this
will be 1 plus 2 plus 3 plus 4 which is equal to here 10. So, in order to do such operations
in the R software which are required many times while doing calculations we have a
command here row sums and column sums.
But if you try to see here how these commands have been spelled for rowSums it is r o w
S u m s, where this S which is the first S in the sum this is capital this is upper case and
similarly in the 2nd command c o l S u m s and in this case also this S is capital. So, this
will find the column sums. So, in case if you simply try to give here the rowSums or
colSums and inside the parenthesis you give the name of the matrix it will find out all the
sums.
12
305
For example, you can see here first side first you try to look here, can see here row sums
of x it is 6, 8, 10, 12 and column sums of x is 10, 26 like this. So, if you try to see what it
is trying to do? It is trying to add here 1 plus 5 which is here like this 2 plus 6 here 8, 3
plus 7 here this is 10 and 4 plus 8 it this is here 12.
And similarly, if you try to find out the column sums here 1 plus 2 plus 3 plus 4 this is
here 10 and 5 plus 6 plus 7 plus 8 which is here 26 and these are the values 6, 8, 10, 12
which are given here and these are the values 10 and 26 which are given here. So, you
can see here, right.
So, similarly instead of finding out the sums if you want to find out the arithmetic mean
of the values in the rows and columns, then similarly those values can be found very
easily. So, suppose if I try to take here the same matrix here which we have just
considered for finding over the row sums and column sums. So, we have this matrix here
and now we want to find out the arithmetic mean of the values in the rows and columns.
For example, if you try to see here, in the first row I have the values here 1 plus 5. So,
the arithmetic mean will be here 1 plus 5 divided by 2 which is here 3 and similarly if
you try to see the sum of the values in the first column 1 plus 2 plus 3 plus 4 divided by 4
which will be equal to here 10 upon 4 is equal to 2.5, right.
13
306
So, in order to find out this means we have the command here rowMeans and colMeans,
but you have to be very careful that here this M this is in the uppercase alphabet, that is
capital M. So, the spelling here is r o w capital M e a n s out of is e a n s is again in the
lower case and similarly in the column means c o l in the lower case col then capital M
and then in the lower case alphabet e a n s and then inside the parenthesis you have to
give the name of the matrix.
So, if you try to see this is the very simple operation, that if you try to look here it will
try to say here 1 plus 5 which is equal to here 1 plus 5 divided by 2 which is 3, then 2
plus 6 which is here 8 divided by 2 which is here 4 and then 3 plus 7 which is here 10, 10
divided by 2 which is here 5, 4 plus 8 which is here 12, 12 divided by 2 which is here 6.
So, this 3, 4, 5, 6 you can see here this has been obtained here in the command rowMean
x and the same output is obtained here 3, 4, 5, 6. Now in case if you want to find out the
column means you know that you have to find out the sum of the values in the first
column that is 1 plus 2 plus 3 plus 4 and then divided by 4 which will be here 2.5.
And the means of the value in the 2nd column it is 5 plus 6 plus 7 plus 8 divided by 4
which is here 6.5 and you can see here these 2 values are appearing here in the outcome
of the command called means, which is which are 2.5 and 6.5. So, let us try to do these
operations on the R console and then try to see how you can obtain them so, right, ok.
14
307
(Refer Slide Time: 17:50)
So, let us try to use our x which we had defined earlier, but this is all 20. So, if you try to
see here the transpose of this matrix here x, you can see here all the values are the same,
because it is a constant matrix and but the number of rows and number of columns are
changed. Here there are 4 rows and 2 columns, but now here there are 2 rows and 4
columns. So, why not to take the same example that we have considered here in our
slides?
15
308
So, let us try to consider here the same matrix here this is here like this. So, this is the
matrix here. So, this matrix has 4 rows and 2 columns, ok and the data is arranged like 1,
2, 3, 4, 5, 6, 7, 8 which are row wise. So now, if you try to find out its transpose then you
can see here this is changed. The first row becomes here the 1st column 2nd row 3 and 4
becomes here the 2nd column and so on. So, this is how the transpose operation is done.
Now, we try to find out the rowSums colSums rowMeans and colMeans. So, let us try to
use the same matrix. So, if you try to see here if I try to find out here rowSums of here x,
this will come out of here like this 1 plus 2 this is 3 which is here like this 3 plus 4 is the
7 which is here like this and so on.
And similarly if you try to find out here colSums of this matrix here x this is here like 16
and 20, right, this is 1 plus 3 plus 5 plus 7 is 16 and so is the sum of the 2 plus 4 plus 6
plus 8 which is here 20. Now similarly if you want to find out the rowMeans here you
have to be very careful when you are trying to write down the spelling. So, rowMeans
will become here like this. So, you can see here every row has got the two elements. So,
if you simply try to divide 3, 7, 11, 15, by 2 you get here the rowMeans here like this 1.5,
3.5, 5.5 and 7.5.
And similarly if you want to find out here the column means, the column means of x is
obtained here like 4 and 5. So, there are four elements in the each of the column. So, if
16
309
you try to divide the column sums by 4 you get the answer which is here 16 divided by 4
is 4 and 20 divided by 4 is 5. So now, you can see here that these are very simple
operations, these are very elementary operations also, but when you are trying to do the
bigger programming you write the big programs, then possibly these concepts are going
to be used at many many places and that was my objective that why I have chosen this
operation.
But now I have taken very small number of operations, so that you can settle this concept
inside your mind and you have time to practice it. So, you try to practice this command
try to take some examples yourself and try to see that whatever calculation you are trying
to do manually, is it matching with the outcome on the R Software. So, you try to
practice it and I will see you in the next lecture till then goodbye.
17
310
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 14
Matrix Operations - Access and Mathematical Operations
Hello friends, welcome to the course Foundations of R Software and you can recall that
in the last two lectures, we have been talking about the Matrix Operations. So, now, in
this lecture also, we will continue with similar type of matrix operationS, which are very
simple very elementary, but very useful.
So, now once you have learnt how to define a matrix, then first question come that
sometime, you are interested in extracting a part of the matrix. That can be a particular
row or a particular column or a submatrix or certain elements.
So, the question is how to get it done? And after that, there are several mathematical
operations like as when you try to add, subtract, multiply a matrix by some scalar, or
their when you are trying to add subtract multiply some matrices etcetera. These are
various types of operation which are possible in the matrix theory and which can be very
easily done in the R software.
So, now in this lecture, we are going to learn that how you can access the some elements
in the matrix like as rows, column or as some matrix. And we will begin our discussion
on some basic elementary operations in the matrix theory. One thing I would like to
request you, that when we are trying to deal with matrix theory, the matrix theory has
some different type of mathematical rules then the addition, subtraction etc. of say
scalars and data vectors. So, I am assuming that you all have a good knowledge of matrix
theory in the sense that you know all the rules of mathematical operations. So, with this
assumption, I begin this lecture now.
311
(Refer Slide Time: 02:20)
So, I will try to take here some examples and through that I will try to illustrate you how
these operations can be done in the R software. So, let me try to create here a matrix of
order 5 by 3; that means, there are 5 rows which are defined by the parameter nrow, there
are 3 columns which are defined by the parameter ncol, the data is a range rowwise and
data is here 1 to 15 why because there are 5 rows and 3 columns, so, you need here 15
values. So, I am simply trying to give here the data is 1, 2, 3, up to 15, ok. And, if you try
to create this matrix this matrix will become like this, ok. Now we are interested in
knowing that how I can access a particular row, particular column or a sub matrix. So,
you can see here these are our rows and these are our 1st, 2nd and 3rd columns now you
know.
312
So, now suppose I want to access a row, suppose I want to access the 3rd row. So, for
that, the syntax here is you try to write down the name of the matrix x. Then write the
square bracket and then in case if you want to access a particular row, try to write down
here the row number and then write comma. On the other hand, in case if you want to
access a particular column then you write the name of the matrix and a square bracket,
write a comma and then write down the column number.
That is the way we can access it. So, for example, in case if you want to access here, the
3rd row of this matrix x, then you can write down here x. Then square brackets and then
3 and then comma. And after that you do not have to write anything you can just leave it
as a blank and this will give you here the value 7, 8, 9 and if you try to see here this is
here the 3rd row.
And similarly, in case if you want to access the 2nd column then you have to write down
here the name of the matrix x and then square brackets. Then just write comma and
before that you have to leave it blank and after write the column number say here 2. So,
you get here the values 2, 5, 8, 11, 14.
So, now if you try to see here in this matrix this is your here column number 2; 2, 5, 8,
11, 14. So, now, you can store these value in a particular column, in a particular say this
variable and use it further and you can access any particular column, any particular row
with these commands.
Similarly, in case if you want to access a particular section of this matrix, suppose I want
to have a submatrix in which I want to consider the 4th and 5th rows and 2nd and 3rd
columns.
So, I can write down exactly in the same way I try to write down here the matrix name
then square brackets and then information about rows and then information about
columns. And they have to be separated by the comma. For example, if you want to have
a sub matrix from this matrix consisting of 4th and 5th row; and 2nd and 3rd column. So,
I can give it in this way the rows are defined by here 4 colon 5 and columns are defined
as 2 colon 3.
So, now you can see here this is here the outcome and if you try to see here what are
these thing, what are 4th, 5th rows and 2nd, 3rd column? These are your here 4th, 5th rows
3
313
and this is here 2nd, 3rd column. So, this is your here intersection 11, 12, 14 and 15 and
this is here the outcome. So, you yourself can just verify it by doing it in the R console.
So, that is how we can know these values, ok.
Now, similarly in case if you want to access any particular rows and columns. For
example, here I have taken rows which are 4th and 5th, they are in continuation and
columns 2nd and 3rd which are in continuation. But, suppose if you want to have a sub
matrix in which you want to choose the elements which are available in the 1st row 4th
row and 1st column and 3rd column.
So, it is very simple once you understand the basic fundamental that first you have to
write down the name of the matrix, then square bracket and then information about rows
and information. Now, you are giving here only the row numbers, these are your here
row numbers and these are your here c 1, 3 is your here col.
And now you know that how to inform R that you do not want to read only the one
value, but you want to read all the values in the data vector. So, now if you try to see
here, if you try to look into your here R matrix here, then if you try to choose here these
many columns well, I can remove it here. So, that you can see it more clearly. So, if you
try to see here, the 1st row is this and then you have taken here 4th row.
314
So, you can see here this is your here 4th row. And now you have taken here which
columns- 1 and 3. So, you try to take it here column number 1 and here say column
number 3. And now you try to see what are the elements on their intersection 1, 3, 10 and
12 and you can verify here are they coming here? Yes, they are coming here like this. So,
now you can see here this is how you can access any part of the matrix by these
operations.
Before I try to show you these operations on the R console let me try to show you here
something. Suppose I hide, suppose if you try to look at this outcome of the R software a
screenshot. And suppose I hide this part and this part. So, you can see here this is for
about the row and the second one is about the column, but I try to hide it here like this
and now I give you only this outcome this and this.
Now, by looking at these two outcomes can you get this information, whether these two
outcomes are corresponding to any row or any column, if I hide this thing also. Then by
looking at these two values 7, 8, 9 and 2, 5, 8, 11, 14 can you tell me whether these are
the values from a particular row or a particular column or which row or which column.
So, that is the question which I am trying to answer here.
That in case if you try to look at this matrix and if you try to have only look at this
outcome. Can you really tell me whether this is row or column? No, you cannot tell. The
315
reason why I am trying to inform you here is that, because in some software, when you
write to try to write down a particular row or column, they also give you that
information.
So, by looking at the outcome here which is written here like inside the bracket 1, we can
also know in those software that whether this particular value is coming from a row or a
column.
So, that is the thing which we have to keep in mind when you are trying to do these
operations in the R software, right. So, let us try to do these operations in the R software
and try to understand how these things are happening.
316
So, let me try to create this matrix you can see here this is the matrix here x. Now in case
if I want to x is here 3rd row and 3rd column. Now as a student, there is always a
confusion when I want to write down the 3rd row, whether I have to write like this or I
have to write like this.
This is always a very common confusion among the students, that when they want to exit
the 3rd row or 3rd column which one of them is the correct option. So, I am telling you
very simple option if you try to see here, this is here 3rd row and in which they have
written here is say bracket 3 and comma. And when it is here 3rd column, they are
writing here is say bracket comma and 3.
So, whatever you want to know you just try to look at this address and try to write down
here x, right.
So, for example, if you try to see here, this will give you here this here is like blank and
3. So, this is here this 3rd column and if you want to find here the 3rd row. So, you can
write down here like this you can see here this is here like this, right.
317
(Refer Slide Time: 11:17)
And similarly, if I try to say here this is your matrix x and you want to have here some
matrix from 1 to 2 and 4 to 5. You want 1st and 2nd row and 4th and 5th column and
then you try to see, there is an error here. Why? Because if you try to see, when you are
trying to write down here 4 colon 5; that means, 4th and 5th columns there is no 5th
column there is no 4th column there is 4th row and 5th row, right.
So, it is why, that is why it is giving you as an error and it is saying the subscripts out of
bound. So, rather in case if you try to write down here a sub matrix in which you want
4th and 5th rows; and 1st and 2nd columns this will give you this matrix. 4th and 5th
rows they are here and then 1st and 2nd column they are here. So, this 10, 11, and 13, 14
this will give you here this matrix.
And similarly in case if you want to choose some arbitrary rows which are not in
continuation, you can use here the c command and say here 2 and here 3 and what about
here this column, see here 1 and here 2. So, you can see here 4, 5 and here 7, 8. These are
the four values which will come here. So, that is how you can access any of this rows
columns or any sub matrix from this matrix, right, ok.
So, now we try to consider here some more operations some more options. And now we
come to some mathematical operations. And we are now first going to consider that
when we are trying a scalar, we are trying to conduct a mathematical operation related to
318
a scalar and a matrix. So, suppose I want to know what will happen to a matrix when a
scalar is added. So, you know the rule is very simple that the element is added in all the
elements, that is from the matrix theory.
So, if you try to consider here a matrix say like this, x is equal to matrix with 4 rows 2
columns and the data from 1 to 8 and the data is arranged by row, then you have here this
matrix. Now if you try to add here 5 here then what will happen that each of the element
will be added with 5 like this and this is here the outcome.
So, similarly when you try to subtract from a scalar form of matrix, then the similar
operation that as in addition the number was added in subtraction the number will be
subtracted. Suppose if I try to consider the same matrix here x and I try to subtract the
matrix by 5. So, what will happen here; that means, every element will be subtracted by
5 like this, right. And you can see here this is here the outcome.
So, this is how actually R works when the scalars are added and subtracted in the matrix.
319
(Refer Slide Time: 14:13)
And similarly, if you try to multiply then similar to that, that is happened when you are
trying to add and subtract a scalar in a matrix. Similarly, when you try to multiply a
matrix by a scalar then every element is multiplied by the same scalar.
For example, if you try to here take the same matrix and then I am trying to multiply it
by here 5. So, every element in this matrix is going to be multiplied by 5 like this and
this is here the outcome you can see here. That is pretty simple straight forward without
any problem.
10
320
And similarly, if you try to divide as a matrix by scalar. Remember one thing that I am
not talking of the division of a matrix by a matrix, because that is absurd that does not
exist. But I am simply saying that you are trying to make an operation here where you
are trying to divide a matrix by scalar in the R software, then what R does?
So, if I try to take here the same matrix which we have considered earlier. And now I try
to divide this matrix by 2. So, now, what will happen every element in this matrix x will
be divided by 2 like this. So, and this is here the operation, you can see here you can
verify here.
So, that is not difficult at all, right. So, before I try to move with the operation between
matrices look let me try to show you these operations on the R console also.
11
321
(Refer Slide Time: 15:40)
So, let us try to take this matrix here like this. And suppose I try to add this matrix with
here 10. So, we can see here every element in this matrix has been added by 10 and in
case if I try to subtract every element by 10.
Then this is your here x and x minus 10 here is like this, you can see here is 1 minus 10,
3 minus 10 and so on.
12
322
(Refer Slide Time: 16:08)
And similarly, if I try to multiply this value by here 5; that means, every element in the
matrix x is going to be multiplied by 5, you can see here like this every element is
multiplied by 5 here.
And similarly, if I try to take the same matrix and divide every element by here suppose
2. then you can see here each of this element is divided by here 2 and this is here the
outcome. So, you can see here that, these operations are pretty simple and
straightforward. And they are not difficult to understand.
13
323
But you have to understand that when you are trying to work with a matrix and a scalar
then what R is trying to do. Now after this I come to a very simple aspect of addition and
subtraction of matrices; that means, now I am trying to take two matrices and I would
like to see how R does the addition and subtraction.
Now, remember one thing, when we are trying to do the matrix addition and subtraction
that mean two matrices are added and two matrices are subtracted, then this is done
following the rules of matrix theory. In which when two matrices are added then both the
matrices should have the same orders; that means, the number of rows and columns in
both the matrix should be the same.
And that is the same thing when you trying to subtract two matrices, that is the first
condition that both the matrix should have same numbers of rows and columns number
one. After this the addition or subtraction that becomes element wise; that means, the
element of a particular addresses; that means, if you try to say some element of matrix
one and another element of the matrix two which is located at the same address as of the
first matrix, they are going to be added or subtracted together, right.
So, in order to do such matrix operations in R, let us try to consider two matrices and
then we try to see how do they operate. So, I try to define here two matrices here x and y
which are 4 by 2 matrices; that means, they are having 4 number of rows and 2 number
of columns.
And data here is in the first matrix is from 1 to 8 and in the second matrix the data is
from 11 to 18 and in both the matrix the data is arranged by row. So, now, this is your
here x and this is your here y you can see 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17,
18.
14
324
(Refer Slide Time: 18:38)
Now, I try to add it here. So, now, what will happen here? If you try to look here, the
addition or subtraction this element which is at the 1st row and 1st column this will be
operated with the 1st row and 1st column element in the y. So, these two are going to be
operated.
Similarly, if you try to take any other suppose this element which is in the 3rd row and
2nd column in x and the same element in the 3rd row and 2nd column in y, they will be
operated. And similarly, all the corresponding addresses are going to be operated. So, if
you try to say to add it here then 1 and 11 are going to be added as 12 and if you try to
subtract 1 minus 11 that is minus 10 is going to be there.
So, that what happened and if you try to see here that, this is your here x this is your here
y. If you try to add x plus y, this comes here 1 plus 11 which is here 12. Similarly, here 2
plus 12 which is here 14 and so on and this is here the outcome. And similarly, if you try
to subtract them the same operation happens here and the corresponding elements at the
same addresses in the two matrices x and y they are they are subtracted.
For example, if you try to take in this matrix itself for example, if I try to take here this
element which is on the 3rd row and 2nd column that is going to be subtracted with the
element on the 3rd row and 2nd column of the y matrix. So, this will become here 6
minus 16 which is it will here minus 10 and this minus 10 is here, right.
15
325
(Refer Slide Time: 20:12)
So, similarly you can see here such an operation here. So, in this case all the values are
coming out to be here minus ten because you see 1 minus this 11 this is here -10, 2
minus 12 this is going to be here -10 and so on. So, do not get surprised, but now let me
try to show you these things on the R console. So, that you get here more confidence that
these things are possible to do. So, let me try to choose here these two matrices x and y.
This is your here x and this is your here y. So, we can see that here both of them are of
the same order.
16
326
(Refer Slide Time: 20:51)
So, now in case if you try to add them, x plus y and if you try to subtract them x minus y,
it comes here like this. So, now, and I hope that you know that the two matrices are
added by the same symbol plus and two matrices are subtracted by the same symbol
minus. So, there is no issue.
So, now we come to an end to this lecture and you have seen that we have learnt once
again here very elementary basic operations in the R software. But I am, but I can
promise you these are very important. What that whenever you are trying to deal with
this matrix and theory operations in the R then, this accessing a particular row, particular
column or drawing a sub matrix or extracting a sub matrix from the bigger matrix or
means multiplying by scalar with the matrix, adding a scalar in a matrix etc.
These operations are very useful and more. So, over for you it is more important that
what R is trying to do once again I will say I will emphasize on that. Because the way R
is going to work, if you understand that thing, then you can modify your programming in
a much better way.
So, now once again you have some homework to do today. Try to take some matrices of
your choice and try to play with them. For example, I will say if you want to see whether
if the two matrices are of different orders, then how do they subtract or add? Try to take
17
327
two different matrices x and y of different orders and try to see whether do they add or
subtract or not. And try to verify what does this matrixc theory tells you.
Matrix theory says ok, unless and until you have two matrices of the same order you
cannot add or subtract them. Try to see whether R is following the same thing or not. So,
you try to play with R software with this matrix operation and I will try to see you in the
next lecture with more matrix operations till then goodbye.
18
328
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 15
Matrix Operation - Mathematical and Other Operations
Hello friends, welcome to the course Foundations of R Software. Now, you can recall
that in the last couple of lectures, we started discussion on the matrix operations and we
considered different types of matrix operations. And, now in this lecture, we will
continue with some more Matrix Operations. So, now in this lecture just for the sake of
quick review I will give you an example of the addition and subtraction of the two
matrices.
And after that I will discuss say multiplication of matrices, finding inverse of matrix, and
then some more operations like as finding out the characteristic roots or the eigenvalues.
One thing I would like to make it here clear that the first thing is that whenever you are
trying to do any matrix operation, you have to follow the rules of that particular matrix
operation; for example, here we are going to consider the matrix multiplication today.
So, you have to follow the rules of the matrix multiplication.
Similarly, when we are trying to finding the inverse of a matrix, then you need to follow
the rules of finding of the unique inverse of a matrix. For example, the matrix has to be
positive definite, etc. I am not going to discuss about all those basic fundamentals about
the matrix theory, but I believe that you know them. And, also I have taken the examples
here which consists of only two matrices, but these operation can be extended to any
number of matrices provided you fulfil the rules of the matrix operation.
So, now we begin our course and we try to consider first the addition and subtraction of
the of two matrices just for the sake of quick review, ok.
329
(Refer Slide Time: 02:10)
So, now, let me try to take here one more example that is the same example that I took a
couple of times earlier, that I am trying to create here a matrix of order 4 by 2 in which
the data values are 1 to 8 and the data is arranged by row. So, you can see here this is
your here matrix in which the data is arranged like 1, 2, 3, 4, 5, 6, 7, 8 like this. Now, in
case if you try to multiply this matrix by a scalar, then you know that each of the element
of this matrix gets multiplied.
For example, you can see here if I try to multiply it by here 4, then every element is
going to be multiplied by here 4. And, then I have here one more matrix here like this 4
into x. Now, I try to add and subtract x and 4x. So, you can see here this x matrix has got
a order 4 by 2 and then 4 into x matrix this also has got the same order 4 by 2.
330
So, there is no problem in doing the addition and subtraction operations. So, you can see
here, if you try to add here x and 4x then you get here this outcome. And, if you try to
see this is the same rule has been applied that the corresponding elements at a respective
places, they are added together for example, the element at the first row and first column
in the two matrices x and 4x respectively, they are added.
So, this becomes here 1 plus 4; and it is here 5. And, similarly here the element in the
second row and second column which is here 4 in x and 16 in 4x, they are added together
and you get here an outcome 4 plus 16 which is here 20 and so on. So, you can see here
this is how the addition of two matrices is obtained. Now, similarly if you try to subtract
these two matrices and I try to subtract here four into x minus x.
So, you can see here, now that the same rule is followed here also and in; for example, if
you try to see here in the 4x matrix at the second row and first column the value here is
12. And the element in the matrix x at the same address is here 3. And, if you try to
subtract here 12 minus 3, you get here 9, right. And, similarly in case if you try to see
here that the element in the fourth row and second column here is 8; and the element in
the fourth row and second column of 4x is 32.
So, if you try to subtract 32 minus 8 you get here 24, right. So, this is how this operation
goes on. So, let me circle here. So, that you can identify that which operation is
corresponding to actually what; and the same thing is if you try to see here is obtained
here also. So, now you can see here that it is not a very difficult thing to get such
operations done in the R software.
331
Now, after this I try to consider here the multiplication of matrices. So, you know that in
case of multiplication of matrices there is a condition that if I try to take here suppose
two matrices x and y. And, suppose the order of the matrix s x is m cross n. And, the
then in case if I want to multiply another matrix y in x then the order of y should be like
this n cross something either n cross n or say n cross p such that these two values; that
means, the number of columns in the first matrix and the number of rows in the second
matrix they have to be the same, right.
So, following this rule of matrix operation, we will try to conduct here all the
calculations. So, the first thing is that if you want to multiply the matrices the operator in
R software to multiply the matrices is like this percentage star percentage, right. And, if
you try to recall if you try to use here only the star sign, only the elements in the
corresponding places are multiplied.
But, you know that when you are trying to do the matrix multiplication, you try to follow
a different rule, the respective elements at those positions are not simply multiplied. So,
following this rule, let me try to show you show you an example here. So, I try to take
here a matrix x of order 4 by 2. So, this is of order 4 by 2. And, then I try to take here a
matrix y which is of order 2 by 4.
So, if you try to see here this matrix here y is like here 2 by 4. So, if I try to multiply x
into y that should not be problem. And the data values in the matrix x are the number
from 1 to 8. And the data values in the matrix y are the numbers from 11 to 18, and they
are arranged by row. So, if you try to see this is here the matrix x and this is here the
matrix y.
So, now, in case if you try to multiply it here you know that if I try to multiply x into this
y with the matrix operation, what happened? That this row gets multiplied with this
column. So, that will be something like 1 into 11 plus 2 into 15, right. So, that is the way
the matrix operations are conducted.
332
(Refer Slide Time: 07:45)
And, so if you try to simply multiply here two matrices x and y here, you get here this
outcome. Yeah, in case if you try to do the manual calculations also, you can verify it.
And, similarly if you try to do here this y into x, then what is the order of the y here? 2
cross 4; and the order of the x is 4 cross 2. So, the resultant is going to be here a matrix
of order 2 by 2. And, the same thing will happen here in x and y also. The order of x is 4
by 2 and the order of matrix y is 2 by 4. So, the resultant will be of a matrix of order 4 by
4.
So, that is why you can see here there are 4 rows and 4 columns in the first operation that
is x into y. And, in case of y into x you are getting here a matrix of order 2 by 2
following this rule. So, you can see here it is not a very difficult thing if you try to
operate the two matrices with respect to multiplication. You simply have to follow the
correct operator, right.
333
Now I want to give you here one more example and with which I would try to show you
that there is a particular way of multiplying a specific type of matrix in the R software.
For example, if I try to take here a matrix here x and I try to find out its transpose.
Transpose will be indicated by here x transpose like this. So, in case if you want to
multiply these two matrices, first option is that you would simply try to follow the simple
rules of matrix multiplication, but we also have a operator or a function which is called
as crossprod c r o double s p r o d.
So, we try to understand that, how it works and then I will try to explain you why it is
important and why do we need it. So, I try to consider here a matrix of order 4 by 2 in
which the data values are from 1 to 8, and they are arranged by rows. The same matrix
that we have considered earlier; and you know that in case if you want to find out the
transpose of this matrix, then we have a command here t and inside the parenthesis you
have to write down the name of the matrix. So, this will be, this is here the transpose,
transpose of matrix x.
Now, in case if you want to multiply like transpose of x into x like this, right, then first
option is that you simply try to write down here transpose of x matrix multiplication and
here x and you will get here this outcome. And, similarly if you want to multiply the x
transpose like this then you also you can write down here a matrix multiplication
operator into transpose of x, and you will get here this operation.
334
So, you can see here this x is of order 4 by 2. So, when you try to take here x transpose x
the order is going to be like this; 2 cross 4 into 4 cross 2. So, your outcome is going to be
here of order 2 by 2. And if you try to make it here xx transpose then it is going to be 4
cross 2 and 2 cross 4, and the outcome is going to be of the order 4 by 4. So, that is what
is happening here. And, you know this quantity x transpose x this is very popular in
statistics, right.
So, that is why people have created a function here which is called as crossprod c r o
double s p r o d. And, in case if you try to use this function crossprod and inside the
parenthesis if you write the matrix then you will get the outcome of x transpose x. So,
whatever this operation is trying to do here this operation. So, you do not need to write
this entire statement, but you can just write here a one simple command.
335
And in case if you try to do it here, you can see here that if I try to write down here
crossprod x then you can see here, this value comes out to be here like this. And, if you
want to verify here you can verify in this screen also that I have this matrix here x. And,
then I try to operate this here command crossprod. And then I try to use here the basic
command say that x transpose x and you can see here that they have got the same
outcome.
When you are trying to deal with bigger matrix then it is said that this function
crossprod, this execute the multiplication faster than the conventional method which is
here like this. And the reason is this for example, you know that when you are trying to
do such operation like as matrix multiplication and inverse etc., then there are some
algorithms which try to conduct such operation.
So, whenever you are trying to use an algorithm then there is always a concept of which
algorithm can produce the result faster. So, that is believed here that if you try to use the
function crossprod, then it will execute the multiplication faster than the conventional
method; and that is why it is here. So, now, I try to show you first here these operations
on the R console. So, that you get here more confident and then I will try to show you
some more operation.
So, let me try to first create here this matrix x. So, you can see here this is the x matrix
here like this. Now, in case if you try to multiply this matrix by here x; so, we have this
336
matrix. And, now in case if you try to write down here x plus 4 into x you can see here,
the outcome here is like this.
And, similarly in case if you try to do here this operation here 4 into x minus x. So, you
can see here this is here the operation. So, you can see here it is not a very difficult thing
to do. So, now let me try to consider here these two matrices x and y, and then I try to
show you here that how you can do the matrix multiplications.
337
So, let me clear this screen and I try to create here these two matrices here x and y. So,
you can see here x is like this and y here is like this. And, if I try to write down here x
then percentage star percentage y. And you can see here the outcome is obtained here
like this.
And in case if I try to use the multiplication y into x, that y percentage star percentage x
and then it gives you this operation, right. So, you can see here it is not a very difficult
thing.
10
338
And, similarly if you try to take here x to be here like this and trace of x to be here the
transpose of this matrix x comes out to be here like this. And if you try to multiply
transpose of x that is like this t x and the percentage star percentage x this comes out to
be here like this. And in case if you try to use here the command crossprod, and inside
the parentheses you name the matrix and then you can see here, both these outcomes are
coming out to be the same.
Well, here I am trying to take a very small matrix that is why you cannot observe the
speed of the operation, but I am sure that when you are trying to deal with bigger
matrices you will see the difference, right.
So, after this I come to another operation which are related to the matrix. Suppose I want
to concatenate the matrices. What is the meaning of concatenate? That means, I want to
join, right, I want to bind. So, now, I have here two options say, for example, I can join
the matrices row wise or I can join them column wise. So, the R command two
combining or concatenating the two matrices row y is rbind and inside the parenthesis
you have to write the matrices in the same order in which you want to write.
So, this is r b i n d, all in lower case and similarly if you want to concatenate the matrices
column wise then the command cbind c b i n d, all in lower case. And then inside the
parentheses you try to write down the matrices in the same order in which you want to
11
339
join them. So, we try to understand this operation through this example. So, let me try to
create here two matrices x and y.
So, x and y both are of order 3 by 2; and in the matrix x the data values are from 1 to 6.
And in the matrix y, the data values are from 11 to 16 which I have taken intentionally so
that you can very easily identify that the elements are corresponding to which of the
matrices. So, and the elements are arranged row wise. So, if you try to see here this is
your here x and this is your here y.
Now, in case if you try to see here, these elements are 1, 2, 3, 4, 5, 6; and these elements
are 11, 12, 13, 14, 15, 16 in x and y respectively. So, now, and I try to use here the
operation rbind x comma y inside the parenthesis, then you can see here these are the
elements of matrix x and these are the elements of matrix y, right. So, if you try to see,
what is the outcome of this rbind? It is row wise.
So, all the rows are combined like this. So, that is what you have to see that what exactly
R is going to do right. So, these matrices are joint vertically. So, x and then here y and if
you have some more matrices possibly, it will come down the y matrices.
And similarly if you try to consider here cbind in the same matrices; so, this is your here
matrix x and this is your here matrix y. And now you want to join them column wise. So,
you can operate here the command cbind and inside the parenthesis this is x comma y.
12
340
So, you can see here this is here x and this is here y. So, in this case you can see that the
matrices are joined horizontally which is column wise.
So, once again I would request you that you please try to observe that how R is working
when it is trying to join the matrices row wise and column wise. So, let me try to show
you first these two operations on the R software. So, that you get more confident and
then I will try to show you more operations. So, let me clear the screen.
And if you try to see, these are the two matrices x is here like this y here is like this. And
if you try to take here rbind and inside the parentheses, if you try to say x y it is here like
this you can see.
13
341
(Refer Slide Time: 19:20)
And similarly in case if you try to take here cbind. So, I am just writing here both the
matrices so that you can observe them very clearly it is here like this. So, once again you
can see that the matrix operations in these two cases are not difficult to understand and
they are very simple to operate also.
Now, you can see here this is the screenshot of the same operation which I just shown
you on the R console. So, you can be confident that these things are working, ok.
14
342
(Refer Slide Time: 19:45)
Now, I consider the next operation which is finding out the inverse of the matrix. So, you
know that whenever you are trying to find out the inverse of the matrix, say there is some
matrix here say here A, then you try to denote it by here A inverse. Actually we are
trying to find out here the unique inverse. I am writing here this word unique inverse
because there is one more concept of inverse of the matrix which is called as generalized
inverse, but we are not going to consider this thing here, ok
So, you know that whenever you are trying to find out the inverse of the matrix there are
certain conditions which have to be satisfied; for example, the matrix has to be positive
definite matrix. So, the R command to find out the inverse of a matrix is solve s o l v e
and inside the parenthesis you have to give the matrix. So, now let me give you a very
simple example.
For example, if you create here a matrix y which is of order 2 by 2, and the values are 84,
100, 100, 120 which are arranged row wise. So, now, this is your here matrix y and if
you want to find out the inverse of this matrix y, you have to simply write down here
solve y and it will give you this outcome. And if you want to verify if you try to multiply
y and this here solve of y the outcome of this, then it will come out to be an identity
matrix; this is the rule of any matrix.
The only thing what you have to be careful that these things are based on the algorithms.
So, in case if you try to increase the order of the matrices that will create more
15
343
complexity and then, but this condition will always be holding true. And that is also true
that if you try to use different software; that means, that different software are using
different types of algorithms to find out the inverse of the matrix.
So, it is possible that if you try to find out the inverse of a matrix of some higher order in
different software, then their values may vary very little, means there will be some
difference, but the difference will not be so high, but whatsoever be the case this
condition will always be holding true. So, in case, this is happening you should not doubt
on the integrity of this R software.
And similarly there is one more operation in the matrix theory which is finding out the
eigenvalues and eigenvectors of a matrix or this is also called as characteristic roots and
characteristic vectors of a matrix. So, these are very special thing, but this finding out the
eigenvalues or the characteristic values that that is taught in the undergraduate classes.
So, I thought that let me add it here.
But after that I will surely stop with the matrix operation because there is a long list of
such matrix operation that you can do and they can be done in the R software also. So, I
try to take here the same matrix here y that I had just used in the case of finding out the
inverse of the matrix. So, this is here the matrix here y and then I try to find out the
eigenvalues and eigenvectors.
16
344
(Refer Slide Time: 23:15)
So, the R command to find out the eigenvalues and eigenvector here is e i g e n all in
lower case alphabets, and inside the parenthesis you have to write down the matrix. And,
if you try to execute it like as here these two values are the eigenvalues; and these two
values here like this one and this one these are the coefficient of the characteristic vector
or the eigenvectors corresponding to these eigenvalues.
Well, those who are familiar with the eigenvalues and eigenvectors, they will understand
it very easily. And yeah, when you are trying to find out for some higher order values
right similar type of differences may come if you try to find it find them from different
software, right. So, let me try to show you these things in the R software. So, first let me
try to show you the inverse operation. So, let me try to create this matrix and then I try to
find out the eigen values of this matrix and the inverse of this matrix.
17
345
So, this is your here matrix y. So, you can see here this is your here matrix y. Now, if
you want to find out the inverse of this matrix y this is here like this right. And, in case if
you try to save this value in the in some variables say here z. So, then if you try to say
here y percentage star percentage z, you can see here this the multiplication of y and
solve y is an identity matrix right. And the same thing will also happen if you try to say
here z percentage star percentage y.
This will also give you close to 0 and nearly 0. So, that is what I wanted to show you,
because this inverses are found using some algorithms, ok.
18
346
So, now, let me try to show you that for this matrix y, I try to find out here the
eigenvalues and eigenvectors. So, the command here is e i g e n inside the parenthesis y
and if you try to d here see here this will like this these are the values and these are the
eigenvectors.
So, now let me come to an end to this lecture. And, with this lecture I will stop with the
matrix operations also.
As I said there is a long list of matrix operations, and if I try to do all of them, possibly
the entire course will be spent on only matrix operations. But my objective in this matrix
operation part was that I wanted to show you that R can handle the matrix operations
also and whatever matrix operations you learnt in your classes, all of them can be done in
the R software also.
The only thing is this you have to find what is the correct function or correct command
for finding out or doing that operation. And once again, I will just advise you that
whenever you are trying to do any matrix operation, try to first see that, what are the
conditions which are to be satisfied and those conditions will come only from the theory
from those topics which you have studied in your class.
And after this, different people, different student, different candidates, might be requiring
different types of matrix operation. So, for that I will say you try to look into the help
and try to find out the appropriate operation of your requirement. And, after this I will
request you that you try to take some examples, try to create some matrices and try to
play in the R software with different type of matrix operations whatever we have learnt
so far and this will make you more confident.
And in case if you want to go for the statistics or data science etc., believe me without
matrix theory matrix operation you cannot survive even. So, with this objective that you
will now try to learn the remaining matrix operations yourself. I will see you in the next
lecture with more operation till then, goodbye.
19
347
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 16
Logical Operators
Hello friend, welcome to the course Foundations of R Software and now from this
lecture we are going to begin with a new topic and this topic is Logical Operators. You
know that we have two types of operators one are mathematical operators and say other
are logical operator. The mathematical operator will give you a numerical value whereas
the logical operator they will give you the answers in terms of whether the answer is
correct or not.
And this logical operator, they are very useful actually. You will see that whenever we
want to frame a query in databases they are very useful, similarly when we want to
execute something under some constraints, under some conditions, then this logical
operators helps us. So, the first question comes here that what are these logical operators
in the R software and how they are operated? How to interpret their outcomes?
And if I try to take an example suppose if I say 8 is greater than 6, the answer is yes or
no or true or false. So, that this then this greater than sign, this is my logical operator, but
in case if I try to find out the difference between 8 and 6 this is 8 minus 6 which is equal
to 2. So, this is. So, here minus is a mathematical operator, but when I try to see 8 minus
6 is greater than 0 or not then this greater than becomes a logical operator.
So, similarly we have different operator like less than greater than less than equal to
greater than equal to not equal to and equal to equal to is not like the mathematically
equal to sign, but this is logical equal to sign. So, these are some logical operators that
we are going to discuss in this lecture and in some forthcoming lectures also and we will
try to see that how we can manage the computations and calculation using the logical
operator. So, let us begin our lecture.
348
(Refer Slide Time: 02:46)
So, these are the logical operators and I will try to show you what does they do.
Remember one thing that logical operators will always give the answers in terms of
TRUE or FALSE or you can also write TRUE as T and FALSE as F in your R
commands and you might recall that in an earlier lecture I had explained you about these
two operators TRUE and FALSE and I had explained you that these two are the reserved
words and which are our logical operator.
So, now this is the place where we are going to talk about them. So, ok. So, we have got
a different types of logical operators. So, first let me try to explain you that what are
these operators and what are their corresponding operators in the R software. So, when I
want to compare something then we have two options whether the value is greater than
or less than. So, in order to execute the greater than operation, we have the symbol here
greater than sign. This is the usual sign in mathematics also when we try to compare two
values.
And when we have a condition that we want to compare like greater than or equal to in
mathematics, then this greater than or equal to in R software is indicated by this symbols
that we write greater than sign and we also write equal to sign and similarly in case if I
want to compare something like something is smaller than this, then we have this
operator for it is our usual less than sign and this operates the operation of comparing for
less than.
2
349
And when we are trying to compare with respect to less than or equal to, then again I try
to write here less than sign and I write here equalty sign like this in the R software that is
just like greater than equal to we have here less than or equal to. Now when we try to
write something as 3 equal to 2, this is wrong, but if I try to write down here 2 equal to 2
this is correct.
So, here what are we trying to do? We are actually trying to compare the numbers on the
left hand right hand side of this equality sign. So, actually, here we are trying to indicate
that these operators, which I have used here as equality sign, these are our mathematical
operator, but here we mean to use them as they are logical operators.
So, the logical operators for this equality is two equality signs like this and this. So, this
will indicate whether the two values are exactly equal to and in case if I want to compare
like not equal to which is our like this in mathematics indicated by this symbol. So, this
operation is done by this exclamation sign and equality sign like this. And when we just
want to do that negation; that means, not then we have here a symbol exclamation sign
that is a symbol on your keyboard.
After this when I want to join two such operations then I have two options one option is
that I am trying to say that if this condition and this condition. So, when I want to join
two logical statement through and means both then we have here symbol ampersand and
like this one, but there are here one symbol and two symbols. So, what is the difference
between using this ampersand and one times or two times, that I will try to explain you.
And similarly when I want to make a logical comparison with respect to the statement or
for example, this happens or this happens then, in that case our symbol is this vertical
line. This is also a symbol which is available on your keyboard. So, we use here single
line or double line, but what is the difference between single and double line that I will
try to show you with some examples. And similarly if I have two statements x and y and
I want to say operate like either x should be operated or y should be operated.
So, something like either or operations, then for this thing we have a command here xor
and inside the parenthesis we write x comma y means either x is operated or y is
operated. Now in case if I have some statement and I want to know whether this
statement is TRUE or FALSE then we have one more command here which is like
3
350
isTRUE, but if you try to see here how it is written i s this is in small letters lowercase
alphabets and TRUE is written in the upper case alphabets.
And whatever I want to compare that I have to give it inside the parenthesis, suppose my
statement here is x and I want to know whether this statement is TRUE. So, this
command is TRUE will test if x is TRUE and similarly if I want to test if x is FALSE
then similarly we have a command here is FALSE, but once again you have to be very
careful that i s is in the lower case alphabets and F A L S E this is in the upper case
alphabets.
And these are the conditions which are quite needed whenever we are trying to do the
programming that we want to say for example, if this condition is TRUE then you try to
operate this statement and if this condition is FALSE, then you try to operate with this
condition, right. After this as you already have talked about it, we have the two logical
operators which are TRUE and FALSE. So, capital T capital R capital U capital E will
indicate the statement is TRUE and capital F capital A capital L capital S capital E
FALSE that will indicate the statement is FALSE.
And I would like to have your attention that when you are trying to write down here like
as T r u e means T is capital and all other alphabets are in the lower case then this is not
the same as TRUE. This TRUE and FALSE in upper case alphabets, they are the
reserved words means you cannot use them in any other variable means you cannot give
a variable a name like as TRUE or FALSE, but in case if you think the pattern means
either one or more than one letters are not in the upper case alphabets in TRUE and
FALSE then it is different.
So, similarly if you try to write down here all the alphabets in lower case in TRUE, then
this is also not the same as TRUE. And similarly for the FALSE also if you try to write
down here False like this means F is capital and a l s e is in lower case then it is also not
the same as here FALSE. And similarly if you try to write down here false like this all in
lower case alphabets this is also not the same as FALSE which is a reserved word. So,
that is what you have to keep in mind, yeah.
351
I am sure that at this moment you might be confused that what are these operations and
how would they are trying to work, but I promise you as soon as I take some examples
means everything is going to be 100 percent clear.
So, let us try to take some examples, but before that as I said that we have two types of
operators means using ampersand and this vertical lines which are used for and or and
then we have an option that either I use 1 symbol or II symbols then these two have a
different interpretation.
At this moment I will simply try to make the statement and later on towards the end of
this lecture I will try to show you with some example then this statements will become
clear. So, this shorter form; that means, you are using only the one symbol this or this
this performs the element wise comparison in almost the same way as arithmetic
operators whereas, the longer form; that means, two symbols like this or this.
They evaluate left to right examining only, the first element of the each vector and
evaluation proceeds only until the result is determined. So, you will see that when you
are trying to deal with more than one values in a data vector then these operations will
make a difference.
352
(Refer Slide Time: 13:07)
And what is the difference I will try to show you with some examples, but let us first try
to begin from scratch and we try to understand what is the meaning and utility of these
logical operators; so, ok.
So, now let me take here one example, suppose I choose here a value x equal to 8; that
means, you know what is the meaning of this? That the value 8 is assigned to a variable
x. Now, I want to choose that whether this value 8 is smaller than 10 or this 8 is less than
2. So, for this or I have this symbol here and I try to write this statement as a x less than
10, 2 vertical lines and then x less than 2 and for clarity I have written them inside the
parentheses.
Now let us try to first understand what happens. Do you think that 8 is a smaller than 10?
Answer is yes. Do you think that 8 is a smaller than 2? Answer is no. So, that means,
when I am trying to say 8 is less than 10 this statement is TRUE and when I am trying to
say here 8 smaller than 2 then this is here FALSE. So, and I want to obtain the answer if
any of the condition is TRUE.
So, TRUE or FALSE will give you here TRUE because one of the condition is TRUE
right. And I will try to take here more examples so, that this concept become clear and
that is why this TRUE is coming here in this outcome. Now I try to take here one more
353
value of x and I try to take here x equal to 18. So, now, I try to use here the same
condition and I want to test here whether 18 is smaller than 10 or 18 is smaller than 2.
So, I try to write this statement here let us say 18 is less than 10 x is the value of 18. So, I
simply write here x less than 10 and in place of or I use the logical operator and then I try
to write down here x less than 2. So, this is exactly it is written here. Now tell me
whether 18 is smaller than 10 TRUE or FALSE? The answer is no it is not TRUE and
similarly if you try to compare here whether 18 is smaller than 2? Answer is no, this is
not correct.
So; that means, when I am trying to compare 18 less than 10, then this statement gives
me an answer FALSE and when I am trying to compare 18 is smaller than 2, then it is
giving me an answer FALSE. So, when I am trying to say here, trying to combine these
two statements FALSE or FALSE then the answer is going to be FALSE, means either
of this statement is correct. So, both the statements are FALSE. So, this answer will
come out to be here FALSE and this FALSE is given here, right.
So, now, let me try to take here one more example and try to show you when you are
trying to use one symbol and two symbols for this R operator. So, now, I try to take here
two values say x equal to 8 and x equal to 18 and I try to put it inside the data vector and
354
I try to use the same condition whatever I have this condition that x is less than 10 and x
is less than 2 the same condition I try to implement here.
What do you want and what do you expect? You want R to operate like this that first of
all, it will try to choose the value here x equal to 8 and then it will try to verify what is
the answer of this command that x is less than 10 or x is less than 2 and after this, then in
the second step, step 2 you want that x should pick up the value 18 and it should try to
test whether x is less than 10 or x is less than 2, this is what you want to do.
So, now in order to do this thing, you try to operate here only the or operator and you use
here two vertical lines. So, now, let us try to see what happens. The answer comes out to
be only here TRUE and now I try to replace these two vertical lines by one vertical line
this is also your or operator and try to operate it. Now if you try to see here, here I am
trying to use the same condition, but with only one operator and the answer comes out to
be here like this TRUE and then FALSE.
Now, what is happening? If you try to see in this earlier slide this is your example 1 and
this is your here example 2. Example 1 is trying to consider x equal to 8 and example 2 is
trying to consider x equal to 18 and their answers are coming out to be in the example 1
answer is TRUE and then the example 2 answer is FALSE.
So, when you try to consider here this example 1 at x is equal to 8 the answer should
come out to be here TRUE and when you are trying to consider x equal to 18 the answer
should come out to be here FALSE, but when you are trying to use 2 vertical symbols
then the answer is coming out to be here TRUE and when you are trying to consider here
only one symbol then both TRUE and FALSE are coming out here.
So, what is really happening? Now this is what exactly you have to understand here and
the same operations and same type of logic will be happening at other values also. So,
what is happening here that R starts functioning and when it finds that there are two
symbols two vertical lines then it starts taking the first value and it takes the first value
that is x equal to 8 and then as soon as it gets the answer TRUE after this, it stops.
So, this double symbol is operating only on the first element of this data vector and it is
not moving to the remaining elements. On the other hand, when you are trying to use
355
only the single operator; that means, only one vertical line then what is happening that
you are trying to write down here x is equal to c 8 and then 18. So, R starts working, R
comes to first here at this 8 and then it finds the answer as here TRUE.
And then R moves further to the next value in the data vector and then it finds the correct
value, which is reported here as a FALSE. So, this single vertical line this is going to
operate on all the elements in the data vector and that is the difference between using the
single line and double line. And if you try to read this sentence what I shown you here
now you will understand it the shorter form performs element-wise comparison in almost
the same way as arithmetic operator.
The longer form evaluates left to right examining only the first element of each vector.
So, when you are trying to use here the, this longer form is like 2 lines then it is
evaluating only the first element in the data vector that is what I wanted to say. So, as I
promised that as soon as I take an example the things become more clear. So, now, you
can understand here that how R is working with this logical operators.
So, now after this I try to take here one more example and I used here the and operator
right. So, first I try to take here two and operators and then try to show you, but before
that I have here one question for you. When you were trying to use here the two vertical
356
lines here and when you were trying to use two vertical lines here in this earlier example
then why it was giving you the correct answer?
The answer is very simple that in the earlier example, you were trying to consider only
one value that is why you could not find whether R is creating any trouble or not, but as
soon as you came to the situation of a data vector and you had more than one values then
you found that this double symbol is going to create a problem. So, now, the similar
story is going to happen with this & operator also, but let us try to see what happens
when we try to use this single and double & operators in the R console with some data
vector.
So, once again I will try to take here some examples and try to explain you. So, let me
try to take here the first value here x equal to 5. So, now, this x has been assigned the
value 5 and I try to test here a condition whether 5 is less than 10 and 5 is greater than 2.
So, now, if you try to see the answer here, 5 is smaller than 10, answer is TRUE, yes, it
is TRUE and then 5 is greater than 2, answer is yes, this is TRUE.
And now you are trying to join this TRUE and TRUE. So, when both the statements are
TRUE then TRUE and TRUE is going to be here TRUE and that is what is mentioned
here. So, both the conditions are satisfied only then the condition is going to be TRUE,
right means sometimes you have seen that whenever the grading is done in the
examinations and the final results is declared for example, if you have two subjects, say
subject 1 and subject 2.
And in case if you have a condition that how are you going to be declared that you have
passed the examination? I have here two options I say you have to pass in both the
subjects and second option is that you have to pass in either of the subject. So, when I
say that the candidate has to pass in both the subjects; that means, the candidate has to
pass in subject 1 and the candidate has to pass in subject 2 only then the candidate is
going to be declared as passed.
The other condition is, I am trying to say in case if the student pass the subject 1, but
suppose unfortunately the student fails in the subject 2 or vice versa that the student fails
in the subject 1 or pass the subject 2 then in that case the student has passed either of the
10
357
subject either 1 or 2, subject 1 or subject 2 then in this case the student is declared to be
passed.
So, these are the conditions that we are trying to test here using these two operators. So,
now, I try to take here is another example I try to take it here x equal to 15 and then I
want to here test whether 15 is smaller than 10 and 15 is greater than 2. So, these are my
two conditions which I want to test. So, in case if I say 15 is smaller than 10 this is no
this is FALSE.
When I am trying to test a condition 15 is greater than 2, answer is yes, and this
condition is TRUE and when I am trying to combine it with here and then one condition
is FALSE and say another condition is TRUE. So, this is going to give you the answer
FALSE why? Because it will give you TRUE only when both the results are giving you
the answer TRUE, which happened in the case number 1, right.
In this case, even if the second condition is also FALSE; that means, the outcome of
suppose both the outcome is FALSE and FALSE even in that case this will give you an
answer FALSE. Some of you might recall that there is a concept of truth table, right and
that table gives you these types of outcomes whether TRUE and TRUE is TRUE or
TRUE and FALSE is FALSE. So, I will try to discuss it little later on, but first I want to
explain you what is happening, right.
11
358
Now I try to do the same operation which I did with the or operator with the single and
double symbols in the case of and, and try to see what happens the result is going to be
the same. So, since now you have understood the first operation it will not be difficult for
you to understand the second operation. So, now, I try to take here two values say x
equal to 8 and x equal to 18 and I try to combine them in this data vector here like this.
Now, I want to use the same condition which I have used in the earlier example. You can
see here this is x less than 10 and x greater than 2 right. So, this is the same condition x
less than 10 and x is greater than 2 and I try to use here single and double operators what
is your objective? You want that x has two values. So, you expect that R will start
working and it will try to first choose the first value and then it will try to here judge
whether x is less than 10 and x is greater than 2 for x equal to 8.
And then it will try to find out the answer and you have seen that this answer is coming
out to be here you can see here this is here TRUE and in the second case the answer here
is FALSE. So, this answer will come out to be here TRUE and then in the next step this
or will move further and it will try to pick up the second value in the data vector and then
with this x equal to 18, it will try to test the same condition x less than 10 and x greater
than 2 and you have seen that this condition is giving you the answer FALSE.
But what do you observe? If you try to operate this thing on the R console here and if
you try to use here two & operators the answer is coming out to be here TRUE only the
first one. On the other hand, in case if you try to choose the same x and try to operate the
same condition with the single & here then the answer is coming out to be here TRUE
and FALSE you can see here.
So, now, what is happening that when we are trying to work with single & then R starts
from the first value in the data vector x it takes the answer TRUE and it brings it here
and after that then R moves to the second value it finds the answer and it brings here as
FALSE. So, the same thing is happening that when you are trying to use double
operators then it is operating only on the first element and not on the remaining elements.
And when you are trying to use single & operator then it is operating on all the elements,
right, but once again why this problem was not there when you were trying to work here
and here, because you were trying to deal only with the single value and then you are
12
359
using here double & operators. So, double and operator is working only on the first value
and since there is only one value. So, the one value is the first value and that is why you
could not face this type of problem.
And as soon as you consider the data vector, where you have more than one values you
observe this problem and once again if you try to look at this statement, which I shown
you this is the same thing if you try to see I am trying to write down here that the longer
form evaluates left to right examining only the first element of each vector, right. And
the shorter form perform the element wise comparison in almost the same way as the
arithmetic operators.
So, now, I hope I have made this aspect clear and I have shown you that how you can use
this and after this, I try to take here one more example and try to show you that how the
things are working and through this example, I will try to give you some more
operations, which can be done on this R software with the logical operators.
So, I try to consider here a data vector 1 to 6. So, I am writing here 1 colon 6. So, this is
going to give us the values like 1, 2, 3, 4, 5, 6 it is something like I can also write x as
data vector x equal to c parenthesis 1, 2, 3, 4, 5 and 6. Now after this I have a condition
the condition here is that x greater than 2 and x smaller than 5. So, first you have to see
13
360
what you want. You essentially want that this condition should be checked over each and
every value in this data vector x.
So, for that now you have to use here the single & operator. So, this condition is going to
check whether the values are greater than 2 and less than 5 for each of the element in the
data vector x. So, now, you know that what will happen it will try to choose here the
value here x and x, 1 is greater than 2 and 1 is less than 5 the answer will be here FALSE
now you can verify it very easily, then it will try to pick up the second value it will try to
see whether 2 is greater than 2 and 2 is smaller than 5 this value will also come out to be
here FALSE.
Then it will try to take here x equal to 3 and it will try to check here the condition that
whether 3 is greater than 2 and 3 is smaller than 5 and this condition will come out to be
here TRUE. So, now, you can see here this answer is coming here like this. So, these are
here this is corresponding to x equal to 1, the second value is corresponding to x equal to
3, third value is corresponding to x equal to 3 and fourth value is corresponding to x
equal to 4, fifth value is corresponding to x equal to 5 and the sixth value is
corresponding to x equal to 6.
So, now you can see here that it is indicating that for the values of x, 1 and 2 this
condition is FALSE, for the value 3 and 4 this condition is TRUE and for value 5 and 6
once again the condition is FALSE that we can see for x 1 and 2, for x 3 and 4 and for x
5 and 6. So, that is how we can check a condition over a data vector.
Now after this suppose you want to find that, which are the values in this outcome for
which the condition is TRUE, right. So, you can see here these are there are two values
here 3 and 4 for which the value is TRUE. So, now, you want to find these values in the
R software in an automated way.
And this you can imagine well I am trying to take here only the six values. So, I can
show you all this logic and manipulations manually, but suppose you are working with
thousands value million values billion values in a data file then how can you check it
manually that which are the values for which the condition is TRUE, right. So, in that
case you need to work only through the programming and the programming is going to
14
361
help you in finding out the values in the outcome, which are showing the outcome as
TRUE.
So, in order to do it now I am going to explain you how you can do it. So, suppose I want
to know what are the values in x for which the condition x greater than 2 and x less than
5 is TRUE, right ok. I repeat it once again because for the sake of understanding try to
understand and try to look at my pen, I consider the data vector here x and I want to find
out what are the values in this x for which the condition x greater than 2 and the
condition x smaller than 5 is TRUE.
So, if you try to see how I have expressed it. First I have expressed here the data vector
in which I want this condition to be found and then I am trying to write down here a
square bracket and under and inside this square bracket I try to write down here the
condition. Well, this is the fundamental either your condition is related to & operator or
say or operator that will remain the same this methodology will remain the same.
So, now if you try to do it and if you try to execute it on the R console you get here the
values 3 and 4. So, you can see here that these are the two values, which you had found
manually right. So, now, the rule is very simple in case if you want to judge whether a
condition is TRUE or FALSE, you can simply write the condition and if you want to
15
362
know that how many values in that data vector for which you have found the condition to
be TRUE you have to write in this particular way.
Just write the value of the data vector or the symbol of the data vector and inside the
square brackets try to write down the condition and this will give you the values for
example, here this finds out that which values are greater than 2 and smaller than 5, right.
So, this is how it actually works.
16
363
Now let me first repeat this example with the R operator and after that I will try to show
it on the R console also. Well, since these screenshots are there. So, it should not be very
difficult for you to understand.
So, now, I once again take the same data vector here x equal to 1 to 6 and I try to take
here a condition that x greater than 2 or x is smaller than 5 and I want that in this data
vector x is equal to like 1, 2, 3, 4, 5 and 6. I want R to check this condition that x greater
than 2 or x is smaller than 5 is TRUE or FALSE. So, what will happen? The same
operation at x will try to choose the first value 1 and it will try to check here 1 is greater
than 2 or 1 is less than 5 and since you have given here only the single operator. So, now,
the control will go to the next value also and it will try to see here for what happened for
x equal to 2 and it will try to check here whether 2 is greater than 2 or 2 is greater than
FALSE.
And in this case you can find out the answer will come out to be here TRUE in the first
case and TRUE in the second case and similarly you can continue with x equal to 6 and
you will find the answers and this outcome is given here. So, you can see here this value
is for x equal to 1 the second value is TRUE, which is indicating the outcome for the x
equal to TRUE 2 and the third value is here TRUE, which is indicating the outcome of
the value x where x is equal to 3.
And similarly the fourth value is indicating the outcome for x equal to 4 which is TRUE
the fifth value is indicating the outcome for x equal to 5 which is the fifth value in the
data vector and then finally, this last TRUE it is indicating the value of x for x equal to 6.
So, this TRUE for is the answer when x equal to 6, right. So, this is how it will actually
work. Now in case if I want to find out that what are the values in this outcome for which
the answer is TRUE.
So, you can see here, this condition is TRUE for all the values of x equal to 1, 2, 3, 4, 5
and 6, but if I want to do this operation on the R software using the R command, then I
have to follow the same rule, same methodology, which I just explained you in the case
of and operator and you simply try to write down here the condition whatever
whatsoever is the condition and then you write down here the data vector and then close
it inside the square bracket.
17
364
So, the rule is very simple try to write down the data vector and inside the square bracket
try to write down the conditions and you can see here because this TRUE is coming for x
equal to 1, x equal to 2, x equal to 3, 4, 5 and 6. So, the same thing is given here these are
the values of x for which the condition is TRUE, which condition? That the values are
greater than 2 and or they are smaller than 5.
So, all those values which are greater than 2 or smaller than 5 they are mentioned here in
this outcome, ok.
Now you can see that this is not a very difficult operation and if you try to do it on the R
console this type of is outcome will be there, but now before going further let me try to
show you these operations in the R console also. So, if I try to copy this conditions and
then I will try to operate it on the R console.
18
365
So, if I try to write down here x equal to here 8 and then I try to see here what is this
condition this is here TRUE, and now in case if you want to choose here x equal to 18
then after that you have to type the condition once again. So, I type the condition this is
here FALSE and now in case if I try to define here x as here the data vector 8 and here
18 and then I try to give this condition you can see here this is only TRUE because there
are here two vertical lines, two operators.
But if I try to change the same condition with single operator like this then if you try to
enter you can see here this is giving you the answer TRUE and FALSE.
So, now, if I try to take here other condition and I have choose here x equal to 5 and I try
to operate it with the & operators you can see here when I am using two & operator it is
giving value TRUE and when I try to choose here the value x equal to 15 then this
condition is giving me an answer FALSE.
And when I try to choose both the values inside a data vector here like this 5 comma 15,
then you can see here that this is giving me here the values, which is here only single
value because I have used here two & operators, but if I try to use here only one &
operator it will give me here the answer TRUE and FALSE, right.
So, ok after this I try to take here this example where I am trying to use six numbers.
19
366
(Refer Slide Time: 45:27)
So, x is going to be here 1 colon 6 you can see here. So, now, value is here x 1, 2, 3, 4, 5,
6 like this and if I try to execute this single and operator this is coming out to be FALSE
FALSE TRUE TRUE FALSE FALSE and if I try to and suppose if I want to know that
which are the values here in this outcome for which the values are TRUE then I simply
have to write here x and inside the square bracket I have to write down the condition and
you can see here this is coming out to be here 3, 4.
And similarly in case if I want to find out this condition for the or operator here, you can
see here that this is here like this.
20
367
That I try to take here x here as say 1 to 6 and if I try to operate this condition then it is
operated over the entire data vector and all the condition are coming out to be TRUE and
if I want that which of the values are here TRUE.
Then I have to simply write down the data vector x and inside the square bracket I have
to write down the condition and it will give you the answer 1, 2, 3, 4, 5, 6.
And now before I move further let me try to give you here one more concept so, that I
can explain you something if you try to see I am taking here value of x from 11 to 18 like
this, now if you try to write down here x and inside the square bracket if you write here 1
then you see what you are getting.
You are getting the value here 11; that means, this one inside the square bracket is the
location in the data vector and this expression is giving you the first value in the data
vector. And similarly if you try to write down here x inside the bracket if you write here
7, then you will get here the seventh value in that data vector.
21
368
(Refer Slide Time: 47:20)
So, if you have understood this concept then I would try to give you here one example to
explain that what is the meaning of this longer form evaluates left to right examining
only the first element of each vector which I promised you in the beginning of the
lecture.
So, now you have taken x from 1 to 6 now if you try to take a x greater than 2 and with
double & operator it is x smaller than 5, then it is giving you an answer FALSE. So, you
know that why this is happening, but what is happening that I want to explain you. If you
simply try to take the first element of the data vector, which is indicated by here x and
inside the parenthesis brackets 1 and you try to execute this condition. This condition
here with only with the first element here like this using the single & operator you get the
same outcome.
So, this x 1 is indicating only the first element. So, this is being operated only on the first
element, right. So, anyway I have explained you this concept with couple of example and
I hope now you are clear that how logical operators are going to work in the R software
and they are very useful actually you will see when you are trying to do the
programming many at many many places you will try to use them.
And, but now your job is that after I finish this lecture try to take some values try to
understand what is the difference between mathematical and logical operators, how do
22
369
they operate and how do they give the outcome, what is their evaluation process and try
to do it manually and then try to do it inside the R software also because finally, you will
not have an opportunity to look at them manually. So, this practice will give you
confidence that whatever you are thinking R is doing the same thing. So, you try to
practice it and I will see you in the next lecture. Till then, goodbye.
23
370
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 17
Relational and Logical Operators
Hello friends, welcome to the course Foundations of R Software and you may recall that
in the last lecture, we started a discussion on the logical operators and we had understood
that what are the different types of logical operators and how do they work. Now, in this
lecture also, we will continue with the Logical Operators and Relational Operators and
we will try to see how do they work. And we will try to investigate some more
applications of these numbers and we will try to learn some more operations.
For example, in case if you come to know that you have got some variable. Now, you
want to know, whether this variable is a logical variable or not. Because unless and until
you would know that this variable is logical, you cannot make the logical operations. So,
in this lecture we are just going to learn about these type of very simple observations and
we will try to see how do they work in the R software.
So, we begin our lecture and try to understand and before that I will give you a quick
review of all the operation that we had learnt in the last lecture. So, we are going to talk
here about the relational and logical operators in this lecture.
371
So, you had learnt in the last lectures couple of logical operators and relational operators
like as greater than which is indicated by greater than sign, greater than or equal which is
indicated by greater than and equality sign, less than which is indicated by less than sign,
less than or equal to which is indicated by less than and equal to sign, exactly equal to
which is indicated by two equality operators, not equal to which is indicated by an
exclamation sign and equality operator, and negation is indicated by exclamatory sign
like this.
And one thing you have to just keep in mind that TRUE and FALSE, these are the two
reserved words and they are the logical constant. And zero is considered as FALSE and
non zero numbers are taken as TRUE, ok.
So, now and then you also had learnt about the & and or operator and we had seen that
this exclamation sign this is a logical not and when we are trying to use this and operate
then we have two options one is to use the single sign or single & or we try to use double
&. So, when we are trying to use the single & then it gives element wise logical
operations related to and operator and when we are trying to use double & sign then this
is a logical and executes only the first element in the data vector whereas, this single &
works for the entire data vector element wise.
And similarly this or operator which is indicated by this vertical line, single vertical line
or double vertical lines. So, they are the logical operator for the operation or and this
372
single operator gives you element wise logical operation and it works element wise; that
means, for the entire data vector whereas, if you try to use double vertical lines, this is
also logical or but it works only on the first element of your data vector.
And after that you also had learnt about the one more operator like as xor and inside the
parenthesis x and y. So, this is giving you either x or y or type of logical operations and
then you had learn isTRUE and isFALSE to know, whether a statement x is TRUE or
FALSE respectively. And here you have to notice that this TRUE and FALSE in these
two statement that has to be given in the capital letters and then you had learnt about the
TRUE and FALSE which are the simply true and false.
373
Now, based on it I try to give you the idea of the truth table. Truth table for example, if
you try to see we had made a different types of operation in the last lecture and sometime
I was saying that ok, true and true is true and false and false is false true or false etc. So,
now, this is the table which is called as truth table and this gives us an idea that when we
are trying to operate with the true and false using the & and or operator then what will be
the outcome.
So, suppose I consider here two statements statement 1 and statement 2, which are
indicated by x and y, respectively. Now, I try to consider here two logical operations-
one is here x and y and another is here x or y. So, these are the two outcome that we are
going to consider.
So, in case if you try to take here the first statement whose outcome is true and the
second statement whose outcome is true then in case if you try to operate here x and y;
that means, true and true this is going to be true and in case if you try to consider the
outcome x or y that is true or true then this outcome is going to be here true.
Similarly, in case if you try to consider one more statement whose outcome is true and
the next statement whose outcome is false and in case if you are interested in considering
the outcome like x and y then its outcome is going to be false. And in case if you are
interested in an outcome like x or y, that is true or false then this outcome is going to be
true.
Similarly, in case if you try to consider here one more operation in which the statement 1
has got an outcome which is false. A statement 2 has got an outcome which is true. So,
this is just like the earlier case like as here. So, in that case what will happen that if we
are interested in the outcome like x and y, then this is going to be false and if we are
interested in the outcome like x or y, then the outcome is going to be true.
And finally, in case if the outcome of both the statement is false, then the outcome of the
statement x and y will also be false and the outcome of the statement x or y this will also
be false like this. So, that is why now you can imagine that in the earlier lecture when I
was trying to take two statement and sometime I was saying that ok, this is true and
another statement this is true and then I used to say that their operation true and true this
is going to be true like this.
374
So, this all those things were coming from the truth table and I had promised you the last
lecture that I will explain you about the concept of truth table, but now since you have
understood the application. So, it is not difficult for you to understand the application of
this truth table.
Now, I try to give you here some examples so that I can explain you what is the meaning
and utility of the outcomes in this truth table. Suppose I try to take here two variables x
and y and I take x as TRUE and y here as a FALSE. Now, I try to do here different type
of this operation from the truth table.
So, if you try to see here I am trying to take here for example, case number here three
one is true another here is false and then I want to operate here these two statements x
and y and x or y. So, you can see that x and y statement is here false and x or y will be
true.
So, if you try to see here when x is TRUE, y is FALSE then x and y, it is giving you an
outcome FALSE. And x or y which is here like this, this is giving you here an outcome
TRUE. And in case if you try to consider here like as negation of x then you have to
write down exclamatory sign and x. So, since x here is TRUE so, the negation of x is
FALSE. So, you can see here this is here the outcome. So, you can see here this is how
the truth table works and its concept are used, right.
375
(Refer Slide Time: 09:30)
So, before going into the R console let me try to show you these operations on the R
console itself. So, you can see here I take here two variables here. x is equal to here
TRUE and y equal to here FALSE, right. And now, if you try to see here x and y this is
FALSE, x or y this is here TRUE.
And if you try to take here negation of x this is FALSE, because x is TRUE and if you
try to take a negation of y, the y is FALSE , so the negation of y will be TRUE. So, you
376
can see that these are the operation which are giving you the same result which you have
just considered right.
After this I try to give you another context or command, which is helpful in finding out
whether the outcome of an operation or a variable is a logical variable or not. So, you
know, that in the earlier lectures we had used different types of such commands like this
dot numeric is dot character etc.
So, we will take here a couple of example and try to see how this operation works. So, I
try to take here a variable whose value is 5 and then I try to define here a variable
Logical1 equal to x greater than 2. So, x is coming from here. So, the outcome of this
statement that 5 is greater than 2 is going to be stored in the variable Logical1. So, 5 is
greater than 2, you can see here there is no issue. So, the outcome of this Logical1 is
TRUE.
And now, if you try to see the character of this Logical1 or the mode of this Logical1,
whether it is a logical variable or not. So, you can see here is dot logical and inside the
parenthesis you have to give Logical1 and this is giving you an answer TRUE. And
similarly, if you try to take here means another operation which is smaller than.
So, I try to take here the variable x smaller than 10 and I try to assign its outcome in
another logical variable Logical2. So, x here is 2 and five here is less than 10. So, this is
TRUE. So, if you try to see the outcome of Logical2 is TRUE.
So, in case if you try to see the behavior of this Logical2 using the command is dot
logical and inside the parenthesis you say Logical2 the name of the variable, it gives you
here TRUE; that means, this variable is a logical variable. And similarly, if you try to
take here one more operation that x is not equal to 5. So, 5 is not equal to 5. The outcome
of this operation is stored in the Logical3 variable, whose outcome comes out to be here
FALSE; obviously, 5 is not equal to 5 is a FALSE statement.
377
So, now if you try to see the output of is dot logical and inside the parenthesis Logical3,
then this comes out to be TRUE. So, do not get confused, that because here it is TRUE.
So, it is here TRUE here it is TRUE. So, this is also here TRUE, no these are two
different outcome they are trying to represent two different things. So, here it a Logical3,
value here is FALSE, but this TRUE is indicating that the outcome of Logical3 is a
logical variable which is correct.
378
And this is here the screenshot of these operation which I just explained you, I will try to
show you it on the R console also, but before that let me try to take here two more
examples. So, now I try to show you that if in if you are trying to take some
mathematical operations on these variables then what happens.
So, I try to define here one more variable here as a Logical4, which is saying 2 into x is
greater than 11 and x here is 5. So, 2 into x which is here 2 into 5, 10 is greater than 11
or not this is a FALSE statement. So, this outcome comes out to be here FALSE and if
you try to see is it a logical variable answer comes out to be here TRUE.
And similarly, if you try to take here similar one more operation that 3 into x is less than
20 and you store the outcome in variable Logical5, then the outcome of this will be 3
into 5 is 15 little is less than 20 answer is yes.
So, that is why this statement is TRUE and if you try to see whether this is a logical
variable or not using the operator is dot logical then the answer comes out to be here
TRUE and this is here the screenshot. So, this is how the R works with these logical
operators. So, before I try to move forward let me try to give you these operations, I try
to show you on the R console also, right.
So, let me try to take here x equal to here 5 and then the Logical1 here is say x greater
than 2. So, if you try to see here is dot logical and then inside the parenthesis logical say
379
here 1 and if you try to see here this comes out to be a TRUE. And in case if I try to
change my logical variable as x say smaller than say here 10 then what happens comes
out to be here if you try to see here the outcome comes out to be TRUE, yes that is
correct. And if you try to change it to be x is say, less than 3 then what happens, if you
try to see the same operation it will again come out to be TRUE.
Why? Because if you try to see the outcome of this Logical1 here is like FALSE; means
5 is smaller than 3, no this is FALSE. So, this is FALSE, but is this a logical statement
answer is yes.
So similarly, if I try to consider here more operations here like as I try to take here say 4
into x is 3 and if I try to see the value of this logical variable is like as here FALSE, but if
you try to see here is dot logical is this variable answer is TRUE. And similarly, if you
try to change this variable as a 4 into x is greater than 3 yes. This is also a logical
variable which is TRUE.
So, you can see here it is not a very difficult thing to find whether a variable is logical
variable or not, right. And the answer is always going to be in terms of only TRUE and
FALSE.
10
380
(Refer Slide Time: 17:55)
Now, in case if you try to take here some more examples which will give you a query
made and handy solutions without much efforts. I am trying to write down here 8 greater
than 7 is this TRUE? Yes. This is TRUE and the answer here is TRUE once again.
Now, my next statement is 7 is smaller than 5? The answer is no. That is why answer
comes out to be here FALSE, then I try to take here is 7 greater than 7? Answer is no.
So, the answer comes out to be here FALSE, but if I try to rephrase this question as is 7
greater than or equal to 7? Yes. Answer is correct.
So, this comes out to be here as a TRUE. Similarly, if I try to take here a statement like 8
is smaller than 8? No. So, the answer here is FALSE. And when I try to rephrase it 8 is
smaller than or equal to 8? It is correct. So, that is why the answer comes out to be here
TRUE, right.
11
381
And similarly if you try to take here some more operations, you are trying to write down
here 8 is not equal to 9? Yes, this is correct. So, it is TRUE. Now, you are trying to write
down here 9 is not equal to 9, is FALSE. You are trying to write down here is 7 is equal
to 7? Answer is; obviously, TRUE.
So, the answer comes out to be here logical TRUE right. Here is 7 is equal to 8? Answer
is FALSE, 7 cannot be equal to 8. Now, in case if you try to take here one variable here
say x equal to TRUE and then if you try to see here negation of x this comes out to be a
FALSE.
So, this is how you can see that these operation work on the R console, but in all this
example I am trying to take only the scalars, only one value at a time. Now, the next
question comes what happens if you are trying to consider more than when one values
which are stored in the format of a data vector.
So, I try to consider here an example in which I consider here two data vectors x and y
and x takes values 1, 2 and 3 and y takes three values 4, 5 and 6. So, now, in this case in
case if I try to write the statement like x greater than y. Remember both x and y are the
data vectors which have more than one values.
So, now the answer comes out to be the outcome comes out to be FALSE, FALSE and
FALSE. What is this indicating? Actually this is indicating that the elements in the x and
12
382
y, they are compared position wise. For example, the first element in the data vector x
and the first element in the data vector y, they are compared after the first element, the
control come to the second element and then the second elements of data vector x and y,
they are considered and after that the control comes to the third element, and the
operation is made over the values at the third positions in the data vectors x and y.
So, if you try to see what is really happening when you are trying to write down x greater
than y. So, if you try to write down here x and here say here y, x is 1, 2, 3 and y here is 4,
5, 6 and the operation here is like this greater than. So, it is being compared here that 1 is
greater than 4, 2 is greater than 5 and 3 is greater than 6. So, this FALSE, if you write
this is an outcome of the operation 1 greater than 4. This is an outcome of 2 greater than
5 and this and the third value which is here the FALSE, it is an outcome of the operation
3 greater than 6 right.
And similarly, if you try to take here next operation say here is say x smaller than y then
it tries to compare the three values and the first value in the outcome is an outcome of the
operation 1 less than 4. The second TRUE is an outcome of the operation 2 smaller than
5 and the third TRUE is an outcome of the operation 3 smaller than 6, because now
instead of this greater than sign. Now it is replaced by less than sign.
And similarly, in case if you try to take here the operation x is not equal to y. Then again
it has three outcome and in this operation and this sign is replaced by not equal to. So,
now this TRUE, the value at the first position this is an outcome of 1 is not equal to 4
which is TRUE. The second value is an outcome of 2 not equal to 5 and the third value is
an outcome of the operation 3 is not equal to 6.
Now, after this I try to consider the equality operator. So, all these symbols they are
replaced by equality sign double equal to sign. So, that it is a logical equality sign. So, if
you try to see here this is here the outcome where the first outcome it is trying to
consider and this is an outcome of comparison of 1 and 4 that 1 is equal to 2, second
value is the outcome of 2 equal to 5 and third value is an outcome of 3 equal to 6 and so,
this is how the operation is done when we are trying to do with the vectors data vectors
right.
13
383
(Refer Slide Time: 23:54)
So, let me try to show you first these values in the R software and then I will try to
explain you the last slide. So, if you try to see here. If I try to consider here the outcome
like 7 is greater than 8, it is FALSE, but if I say 7 is greater than 3, it is TRUE and if I try
to say here 7 is less than 8 or 9, it is TRUE, 7 is greater than 9, this is TRUE, 7 is equal
to 9, no why? Because you have not used the logical operator.
14
384
So, let me try to write down here. 7 double equal to 9 and this is going to be FALSE.
And if you try to write here 7 is not equal to 9 this is here TRUE right. So, you can see
here these are the operations here.
And similarly, if you want to do a some more operation like 8 is less than 8, FALSE, but
if you try to make it here 8 is less than or equal to 8, this is again a wrong thing. Why?
Because your operator should be like here this TRUE. And similarly, if you try to make
it here another operation 9 is greater than 9, answer is FALSE, but if you try to take it
here 9 greater than or equal to 9, then it is TRUE, right.
15
385
And similarly, if you try to look at here this operation where I am trying to take two data
vectors. So, let me try to take here x is equal to c 1, 2, 3 and y here is c 4, 5, 6 and if you
try to see here x less than y this gives you a TRUE, TRUE, TRUE, x greater than y will
give you FALSE, FALSE, FALSE, x equality y FALSE, FALSE, FALSE, x greater than
or equal to y this will give here FALSE, FALSE, FALSE.
And similarly if you try to take here x say what equal to y it will give you here TRUE,
TRUE, TRUE, right. So, that is how you can make such calculations without any
problem and you can see these are very simple operation. Now, I want to show you the
outcome of these two more operation is dot TRUE and is dot FALSE. So, I try to write
down here i s in lowercase alphabets and TRUE in the upper case alphabets and I want to
check is 8 smaller than 6, answer is no.
So, if you try to write down here is to 8 less than 6, the answer here is FALSE. And
similarly if you want to check is greater is 8 greater than 6 simply write i s and then T R
U E. So, i s is in lower letter T R U E in capital letter and inside the parenthesis write
down the condition 8 greater than 6 it is TRUE. So, it is the answer comes out to be here
TRUE.
Now, after this I would like to show you one more application of the statement is
FALSE. So, suppose I want to check is 5 less than 8. So, I will write down here i s in
lowercase alphabets and then FALSE, F A L S E in upper case alphabets and then inside
the parenthesis I write here 5 smaller than 8 and the answer comes out to be here
FALSE. Do not you think that this is confusing. So, let us try to understand what it is
trying to do.
So, the condition here what it want to check is 5 less than 8, the answer here is yes. Yes
means TRUE. And what are you asking? You are asking is FALSE. So, is this FALSE
and the answer that is why comes out to be here FALSE, that you are asking that it is
FALSE, but you are FALSE; that means, it is 5 is smaller than 8 is correct. So, this is
how you have to understand what it is trying to do.
Similarly, if you try to use this condition here isFALSE 5 greater than 8, now once again
if you try to see what is happening? 5 is greater than 8, is this correct? No, this is not
TRUE, but answer is here TRUE. What does this mean? This 5 is not greater than 8. So,
16
386
this mean this is FALSE, but your question is this FALSE? Yes, this is FALSE and so,
the answer comes out to be here TRUE.
That is was the reason that it is coming out to be here TRUE. So, do not get confused and
try to means understand these operations logically. And you know that unless and until
you understand this basic operation and you do not understand this basic concept you
cannot do it.
So, let me try to show you these operations on the R console. So, if you try to see here is
TRUE 8 smaller than 6 is FALSE and is 8 greater than 6 is TRUE. And if I try to do the
same thing over here this that I try to replace here by here FALSE you can see the
change very clearly isFALSE this is here TRUE. And if you try to see here isFALSE 8
greater than 6, answer comes out to be here FALSE. So, that is how the things work with
this logical operators and relational operators in the R software.
So, now I have given you a decent introduction on the use of logical operators and
relational operators. Now, it is your turn, because as I said these things are very useful
particularly when you are trying to deal with databases and you want to create different
types of questions, different types of arrays. You can write is x greater than 3 and y is
greater than 4 and z is less than 8. And one more important thing. In this examples, I
have taken here only two variables. And at a time I have taken only here one operation
either like as & or the or operator.
17
387
But, if you want to have more than one statements and you want to test more than two
conditions at a time or more than one conditions at a time. You can also do it. Means I
can write here a statement x greater than 2 and y less than 3 or z greater than 5, but then
you have to understand that what will be the outcome and why do you want to do it and
you should be able to understand the logic of the outcome and this is going to be really
helpful.
So, my request to you all is that you try to create your own examples. Try to take very
simple values, try to first understand them that if you try to operate the operation like and
or etcetera. Then what will happen? And then try to see, are you getting the same
outcome that you expected? This will give you more confidence to practice it and I will
see you in the next lecture till then goodbye.
18
388
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 18
Missing Data Handling
Hello friend, welcome to the course Foundations of R Software. In this lecture, we are
going to begin with the new topic; this is about Missing Data Handling. What is this
missing data handling? So, let me try to take a simple example to explain you; suppose
you are trying to collect some data and suppose you have to go to five houses and you
have to ask some value and then you have to enter it. Suppose you go to the fourth house
and you find that the house is locked.
So now, you cannot obtain the value of the observation from the house which is locked.
So, now in this case what we have to do and how we have to enter the data, so that it is
handled in the R software? One thing I can inform you that in statistics, we have a full
area, whole area which is called as missing data models; that we try to impute the value
of the missing data by using the data that is available to us and we try to repair the data.
So, you try to find out the value of the missing data and replace it there and then you try
to conduct your statistical analysis.
So, there is a whole area, but we are not going into that; but I just wanted to convince
you that how this missing data is handled in statistics, but here in this course we have a
very different approach. The first question is this, in case, if this data is missing, how this
data has to be entered or what is the rule by which R will come to know that this data is
missing? And then once this data is missing, how to know and how to handle different
types of operations? So, these are couple of questions, which we are trying to answer in
this lecture.
1
389
(Refer Slide Time: 02:36)
So, let us begin this lecture and try to understand how to handle the missing data values,
ok. So, the first question comes that, how do you know that if there is any value which is
missing? So, for that there is a rule in the R software that whenever we try to enter the
data and if I know that the data has to be read in the R software; then the rule is that
missing data is indicated by capital N, capital A. So, this is also a reserved word.
The example which I took here, suppose if I have here house number say 1, 2, 3 and here
4 and 5. Now, that fellow goes and it tries to collect the observation. So, suppose from
house number 1, the value is 100; from the house number 2, the value is 110; from house
number 3, the value is 120; from house 4, this house is locked.
So, the data will be recorded as here say NA and then after this the house number 5 has
suppose some value 105 and that is all. So, this is how the data is recorded. So, as soon
as you write NA, the R software will understand well this data is missing, that is the first
rule. Second rule is this when you are trying to do any analysis in the R software, then
the data is coming from some external sources and that data is in the form of some file
etc.
And that file may contain the data, which is say 1000 million etc.; there may be 1000
value, million values, etc. So, it is not really possible for you to scroll through the whole
file and try to find that is there any value which is missing. So, you would like to have a
command, so that you can detect the missing value from the file or from the variable.
2
390
So, the command here is, is dot na; all this is dot na they are all written in the lower case
alphabets. And then in case if you want to know that is there any value missing in some
variables; so you try to write down is dot n a and inside the parenthesis you try to write
down the variable name. And this is going to return you a value which is a logical value;
that will be TRUE or FALSE.
So, in case if the value comes out to be here TRUE, the outcome comes out to be here
TRUE; that means the value is missing. And if the value comes out to be FALSE; that
means that there is no value which is missing. And in all those location where the value
is missing, the value is represented by capital N, capital A; that is the rule what we try to
do in the R software, right.
So, if I try to show you here a screenshot here, I will try to show you on the R software
also that I try to take here a value itself as capital N, capital A and I try to say here is dot
n a and inside the parenthesis x, it gives me answer here TRUE, right. So, that is the
same thing which I am trying to show you here, right.
So, when you yourself is trying to give a value NA; that means you are writing that the
value is definitely missing. So, you are assigning the value NA to the variable and then
you are asking is it missing. So, one you have assigned the value as NA; then when you
are asking that is there any value which is missing, definitely the value is missing and
you can see here that the answer comes out to be here TRUE.
3
391
(Refer Slide Time: 06:10)
Now, suppose I get a data and the data is a data vector. So, suppose the data has four
values and out of four, two values are missing. So, let me try to write down this data here
as say x which has four values; 11 and 13 they are known, they are some numerical
values, but the values at second and fourth position they are missing.
So, now I want to know in this data vector, where the values at first and third positions
are available and the values at second and fourth positions are missing. How to know this
thing? So, all this value have been stored in the variable x; so I try to execute here the
command is dot n a and then I write down here the name of the variable x.
So, you can see here you get this type of outcome. So, now, our job is that, we have to;
see what is the meaning of this outcome? So, here the first value it is here FALSE; this
FALSE is corresponding to this value 11. So, this value is not missing, this is available.
So, your question is, is dot na is the value missing and the answer here is FALSE; that
means the value is not missing.
Similarly, when you try to take here the second value, which is here missing NA and
your answer here is TRUE. So, what is the meaning of this? You are asking the question
is dot n a, the answer is yes the value is missing and this yes is indicated by this TRUE.
And similarly when the value is available, this gives you here the value FALSE and
when the value here is missing in the fourth place, this gives you the value TRUE.
4
392
So, you can see here that this FALSE and TRUE appear at those places, where the value
is not missing and missing. So, you can see here at first and third position, the value is
available; so you get here FALSE like this and the value at second and fourth position
this is missing, so you get here the answer TRUE. So, that means when you are getting
the answer TRUE; that means the value is missing.
So, now the question comes here, what does it make when the value is missing and we
want to use any mathematical function? So, for example, in case if I try to say this
example that in this data set suppose I want to find out the arithmetic mean; arithmetic
mean you know, because this is the sum of all the numbers present in the data vector
divided by the total number of values.
So, in case if I try to say here mean of x. So, what will happen? All the values 11 plus
NA plus 13 plus NA will be divided by 4, there are four values. So, and the, but the
answer comes out to be here NA, this does not help us. What we actually wanted? We
wanted that out of this x vector there are two values 11 and 13 at first and third position
and there are two values at second and fourth positions which are missing.
You wanted let this missing value are removed from this data vector x and then whatever
is the data left in this data vector which is here 11 and 13, we try to find out the mean of
this 11 and 13 like this. So, how to get it done that is the question. So, in case if you want
that in any function, the missing values are removed and the function is operated over the
5
393
available values; then you have to use an option which is n a dot r m is equal to TRUE,
means n a dot r m means remove the NA values.
So, it has two possible outcomes; one solution is that you try to give here TRUE or you
try to give here FALSE. So, when you are trying to give the answer as TRUE; that means
it is asking or you are asking the R software that mister R please look into the values
which are inside this data vector x and since I am asking that na dot r m is equal to
TRUE, so please try to remove the missing value from the data and try to find out the
arithmetic mean of the values of those values which are available inside the data after
removing the missing values.
So, now if you try to see what will happen here, the answer comes out to be here 12. So,
what is happening? That this NA values are removed and then you have here two values
11 and 13 and their arithmetic mean is coming out to be here 24 divided by 2 which is
here 12. So, that is what happens when you are trying to deal with the missing values in
the R software.
So, this is how you try to find out and this is how you try to operate them. Now, I try to
first give you these operations inside the R software and then I will try to means come to
remaining topics.
So, let me try to take here this variable here let say x is equal to NA. So now, you
yourself is giving the value x as NA. Now, if you try to type here is dot n a x, answer
6
394
will be coming out to be here TRUE. On the other hand if you try to take here another
value here y equal to 8 and if you try to find out here is dot n a y, it will say FALSE,
right.
So, similarly if you try to take here say vector here, say like as say 11, say NA, then 12
and then here NA. So, you can see here now this is your here x and if you try to find out
here, if you try to give here is dot na x; you can see here that the values at first and third
position they are available, so it is giving you answer FALSE and the values at second
and fourth values are NA which are missing, so it is giving you the answer TRUE, right.
So, this is how it will work. Now, in case if you try to use suppose here small na; let us
see what happened, this is not found. Even if you try to say small say capital N and small
a, this is also not found; but if you try to see here NA, this is found. So, NA capital N,
capital A this is a reserved vote, which you want to have.
7
395
So, now in case if I try to show you here that is equal to if I try to take here two values
here; say 11 and say here NA and then here say 12 and then here NA. And if I try to find
out here the mean of x that are simply arithmetic mean. So, this comes out to be here
NA; but in case if you try to add here an option that na dot r m is equal to TRUE; so you
can see here now it is coming out to 11.5.
Why? Because 11 plus 12 that is 23; 23 divided by 2, this is 11.5. And in case if you try
to make it here n a dot r m is equal to FALSE; then you will see that this is the default
option when you are trying to use here, for example here mean of x. So, it will give you
here NA and you are trying to tell well mister R please do not remove the NA values and
try to find out the mean, ok.
So, you can see here it is not a very difficult operation, but surely I would like to address
here one more concept. Here in this case you have seen that I have given you the concept
of here na dot rm is equal to TRUE and n a dot r m is equal to FALSE and you have
understood what really happens when this na dot rm is equal to TRUE and na dot rm is
equal to FALSE.
Now, you will see that in forthcoming lectures, whenever we are trying to learn about
any command; this option will always be there. Now, you have to take a call whether
your data has missing values or not. In case if the values are missing, then you try to use
this na dot rm is equal to TRUE always; whether you are trying to find out the mean or
variance or median or anything else what we are going to do in the forthcoming lectures.
So, that is another command which you can take that, I have explained you that how to
remove the missing values and then how to compute the function.
8
396
Now, just like NA, there is here another reserved word which is here NULL and you will
see that sometime you are trying to execute a function and that function gives you an
outcome as NULL. So now, let me try to explain you what is the difference between NA
and N L and NULL. So, NA and NULL, N, U, double L which is in the all in capital
letters. So, NA and NULL they are not the same; actually NA is a place holder for
something that exist, but is missing and the NULL stands for something that never
existed at all.
Actually what I can do that, I can take here a very simple example to explain you that
what is the difference between NA and NULL. Suppose a school conducts an interest
examination and suppose 100 students appear in the exam and suppose out of those 100
students, 70 students are admitted in their school.
Now, you have two classes of a student; a group of students who got admission in the
school and another group of students who has not got the admission in the school. So,
you know that in the school there is always an attendance every day, in which the class
teacher marks absent or present.
Suppose a student who has been admitted in the school is not there, the student is absent.
So, the teacher will mark as absent. Now, try to consider one more situation, a student
from the remaining group; that means the group of students who have not been given the
admission, a student from that group is not coming to the school.
Do you think that the teacher is going to mark absent? No, why? Because this student
was not admitted in the school; so teacher does not expect the student to be there. So, in
the first case when a student who has got the admission is not coming to the school, the
teacher will mark the absent; because teacher expected that the student will come to the
school.
And since the student is missing today, so she will mark NA instead of absent, that is the
same thing. And the student who was not admitted; in case if that student has to be
marked in the school; whether that student is absent or present that has to be indicated,
then that will be indicated by NULL. Why? Because that student did not got admission
in the school; so nobody expect the student to be in the school, so that is why we use two
options NA and NULL.
9
397
So, NA is trying to indicate that something was expected and the value is missing and
NULL that is indicating that it was not expected, right. So, that is the difference between
NA and NULL.
So, let us try to continue our lecture. So, now, we try to consider here some more basic
operation about NA. Whenever some data vector has got some values which are missing
and they are indicated by NA; we would like to know what are the values which are
missing and for that our interest is that I would like to know the location of the values
NA in the data vector. So, that will give us that the values at that location are missing.
So, now we are interested in finding out the location of the missing values in the data
vector. So, for that we have a function here which; which all in lower case alphabets and
we try to use here which and within the parenthesis which we write is dot na and then
inside the parenthesis, we try to give the data vector. So, for example, we have taken
here a data vector having four values x equal to 11, NA, 13 and NA, so which is here like
this.
So, now, I try to write down here which is dot n a and inside the parenthesis x. So, it
gives me here 2, 4. What does this mean 2, 4? So, if you try to see this data vector, there
are four values and their positions are like 1, 2, 3 and 4. So, one is for 11, 2 is for NA, 13
is for has the position 3, and NA has got the position four. So, this 2 and 4 this is related
10
398
to the positions of NA in the data vector. So, I can say here that, the values at second and
fourth place in the data vector x they are missing, right.
On the other hand in case if we are interested in finding that how many values are
missing in the data vector? So, in order to count the number of NAs, we use the function
sum. And we use it like this sum inside the parenthesis I will write is dot na and then
within the parenthesis I will write the data vector.
So, in case if I try to take the same example that x has got two missing values this NA
and NA at second and fourth position and two values 11 and 13 they are available at the
first and third position. Then if I try to operate here sum is dot na x; so you can see here
the answer comes out to be here 2. So, that is indicating that there are two values in the
data vector x which are missing. So, that is how we can know about this different type of
information which we want to know from this data values.
11
399
(Refer Slide Time: 21:22)
Now, another question, when you are trying to handle a data vector which has got
missing values, you would like to identify that which of the data values are complete?
So, we have here a function complete dot cases, c o m p l e t e complete dot c a s e s
cases and this all is written in the lower case alphabets and inside the parenthesis, you
have to give the name of the data vector.
So, this is going to written as a logical vector and in which we have to identify that
which are the complete cases. So, wherever the data is available, data is not missing, it
will give the answer TRUE and wherever the data is missing, this operation will give the
answer as FALSE. So, let us try to consider the same data vector which has 11 and 13 at
the first and third position as available data and two values on the second and fourth
position they are missing as NA.
So, this is my here vector x and I try to write down here complete dot cases and inside
the parenthesis x, which gives me the operation; after this operation, I get the values here
TRUE, FALSE, TRUE, FALSE. So, if you try to see here what is happening; this TRUE
is corresponding to this 11, that means the data is available, this is the complete case.
And similarly this another TRUE; this is at the third position, so this is corresponding to
the third value in the data vector x which is 13 and that is indicating that the data is
available and this is the complete case now.
12
400
Similarly if you try to look at the values which are FALSE; for example this here is
FALSE, this is at the second place in the outcome. So, this is corresponding to the
second value in the data vector x and here since the outcome here is FALSE. So, that is
indicating that the value is missing and the case is incomplete. So, complete case is
FALSE; that means the data is missing and similar is the story at the FALSE, which is
occurring at the fourth position in the outcome, this is corresponding to the fourth value
in the data vector x.
So, this is indicating that the values at the first and third position, they are not missing,
but they are available and the values at the second and fourth position, where the answer
is coming out to be FALSE, they are missing, right.
And after this I try to give you here one more operation, which is a very simple operation
And in case if you want to create a data set after removing the missing values or after
omitting the missing values; then how to get it done? Because all the time you are not
interested only in the operations on the data values; but sometimes you also want to
extract the data set which is completely, which has all the values, which is complete.
So, in order to do such operation, we have here a function na dot omit, na dot o m i t and
within the parenthesis, we try to give the data value and this will return the object with
list wise deletion of the missing values. So it, so what it does? It drop outs any row
13
401
which has got the missing value anywhere in that data set and then forgets about it for
always.
So, let me try to explain you this example; this concept through this example I try to take
the same data set x is equal to c, 11, 13, NA, NA, in which first and third position are
data is available and the data at second and fourth position is not available. So, I try to
now create here another data vector y, which is obtained by using the command na dot
omit and inside the parenthesis x. So, what I am trying to say, I have got here a data
vector x like this; I am asking please omit NA and then please bring the data which is
complete. So, that means please bring the data after omitting the NA values.
So, now you can see here this outcome comes out to be here like this 11, 13 and it has
two attributes which are saying that it is 2, 4; that means out of 4 values 2 are missing, 2
have been removed, 2 are available and then this omit operation has been classified from
the class this, because the class is omit.
And now if you try to see what will happen? From the mathematical point of view, we
have just learnt that how to find out the mean after removing the missing values using
the operator na dot rm. Now, I am giving you an alternative way, I am trying to consider
here x like this and then I am trying to create another data vector y, which is obtained
after removing the missing values which has values 11 and 13.
14
402
Now, in case if you try to find out the mean of this x, this will come out to be here NA;
whereas if you try to find out the mean of this y, this will come out to be 11 plus 13
divided by 2, which is 24 divided by 2 is equal to 12. So, the same operation which we
had obtained earlier by using the command mean and inside the parenthesis x comma na
dot rm is equal to TRUE; the same command can be used here or a similar type of
command can be used here to get the numerical values. So, now, let me try to show you
here these operations on the R console also, so that you get here more confident about
them.
So, I try to first operate this which command and for that suppose I try to take here this x
data vector, you can see here. So, in this data vector you can see there are four values and
the values at first and third place are 11 and 12 respectively which are available and the
values on second and fourth position they are missing. So, now, I want to use here this
command which so, I try to say here which of the values are missing. So, it is giving me
an answer 2, 4. So, the values at second and fourth position, they are missing.
So, after this I try to find out here their sum that, how many values are missing; so I try
to use here the command here sum, which is like this. And you can see here that there are
two values which are missing, ok. And in case after that if you want to see here that what
are the complete cases in this case; if you try to see the command here, it is giving you
here TRUE, FALSE, TRUE, FALSE.
15
403
So, these TRUEs, these two on the first and third position they are corresponding to 11
and 12 and these FALSE on the second and fourth position they are corresponding to
these two NA, which are on the second and fourth position in the data vector x. And
similarly, in case if you want to obtain here the data vector see here y, after omitting the
values from x, so n dot omit and then from this here x. So, you can see here now y here
is like this, ok.
So, now, if you try to find out here the mean of here x, so you can see here this is coming
out to be NA why; because x has these missing values. And then if you try to find out
here the mean of here y; this will come out to be 11.5, which is the mean of 11 plus 12
divided by 2, 23 divided by 2 which is 11.5.
So, this is how you can see that we can very easily handle these missing values in the R
software; you just have to keep in mind the symbol and notation for indicating the
missing value in the R software is NA. Actually if you try to go to different software,
they have different ways of handling the missing data and usually in every software;
there is a special symbol by which they try to indicate the missing value.
So, similar is the case in the R software also and R indicates it by NA; that is what you
have to keep in mind and do not get confused with NULL, this is different thing that I
explain you. So, why do not you try to create an artificial data set of couple of values and
try to operate with this options; try to use some more functions which you know like a
16
404
sum etc. and then try to see that how you can handle the missing values. For example,
how would you like to find out the sum using the command na dot rm and then na dot
omit and try to see are you getting the same value.
And similarly when we are moving now more into the R course, you will be coming to
know you will come to know about different types of functions; try to operate them with
and without missing values both. Actually that should be the default rule for the learning
of the R software that whenever you are trying to learn a new function, always try to see
how are you going to learn about the, how are you going to handle the missing values.
So, if you try to learn say for example, some commands to find out the median, variance
etcetera in the further classes; also try to look yourself that how you can handle the
missing values and how you can find the same median or same variance when the data is
missing. So, that will give you a good practice. So, you try to practice and I will see you
in the next lecture with more commands, more function, more details; till then goodbye.
17
405
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 19
Conditional Executions - If and If-Else
Hello friends, welcome to the course foundations of R Software. And in this lecture, we
are going to begin with the new topic, this is about the Conditional Executions. The first
question comes here what are these conditional executions. So, you know that suppose
we take a call that in case a student gets more than 50 percent marks, then the student is
going to be declared as pass and in case if the student gets marks less than 50 percent,
then the student is going to be classified as fail. So, now in this case, how are you going
to execute this command?
First, you want that some condition has to be checked. What is that condition? The
condition is to check whether the student has got marks more than 50 percent or is
smaller than 50 percent. Based on that you are going to take a call; what is going to be
the outcome, whether pass or fail.
So similarly we have many more such conditions where the complication increases or the
need of the situation changes. So, in order to fulfil those things, we have different kinds
of conditional executions. And essentially, we are now beginning a topic on the control
structure. Control structure means you are trying to control the control different parts of
your programming.
So, the first part which we are going to talk about under the heading of, under the broad
heading of control structure is the conditional execution. So, today in this lecture, we are
going to talk about the if and if-else conditional execution commands, and we will try to
see that how these conditional executions can be done in the R software. And in
particular, if and if-else conditions, what are these conditions and how they have to be
executed.
So, in order to explain you I will try to take here couple of examples, and through the
good examples, I will try to show you that how these conditional executions are
implemented in the R software. So, let us begin our lecture.
1
406
(Refer Slide Time: 02:36)
So, as I said, now we are going to begin with the topic of control structures in the R
software. So, first we are going to discuss about the control statements, then functions,
and after that loop. So, this will continue in the next couple of lectures, and we are going
to cover all these topics one by one, right.
So, as I already have explained you that why do we need the conditional executions. The
conditional executions are needed when we want to execute a command when certain
conditions are TRUE or they are satisfied. Now, these conditions can be one or more
2
407
than one. So, based on that we have different types of contacts and commands in the R
software and among them, we are going to discuss first the if condition, right. That is the
popular name, that is how we try to address it.
So, the syntax for the conditional execution using if is like this. First, we try to write
down here if ‘if’ then inside the parenthesis you can see here we try to write the
condition which we want to test. So, this condition is to be tested. Now, this condition
that is going to be a sort of logical statement, its outcome is going to be either TRUE or
FALSE.
So, in case if the condition is TRUE, then whatever commands we want to execute that
we have to write inside this curly bracket, right. So, I will write down here these brackets
and here I will try to write down all the commands which are going to be executed when
the condition is TRUE. And in case if the condition is FALSE, then whatever is written
inside this curly bracket, all these conditions are ignored. So, this is how the if condition
works in the case of R software.
So, now, you can see here couple of things. Number 1, when you learned about the
logical statements, now you can see one application of those logical statements that you
want to first test and this outcome is going to be a logical variable in terms of TRUE and
FALSE. So, for that you would also like to check whether the outcome of the condition
is logical or not. So, all the commands which we have learnt in the earlier lectures, they
are now going to be useful as we proceed further.
And after that I had discussed in the beginning that in mathematics, we have three types
of brackets, this simple bracket, then curly bracket, and then square bracket. And I had
explained you that in R software these brackets have got different roles. And in
mathematics I had requested you to use only this parentheses, small brackets, these
simple brackets.
And I had earlier explained you one utility of this square brackets, and now I am I have
explained you here one utility of this curly brackets. If you recall at that time I had told
you that in R software, this parenthesis curly and this square bracket they have got
different types of utilities, and this is one of the utility where it is going to be used, right.
3
408
So, what you have to keep in mind that when you are trying to use this if condition, try to
use it only when there is a scalar, and this if condition should not be applied when the
condition whatever this condition here that we want to evaluate is a vector. So, this is
best use only when we are trying to meet a single element conditions. So, what is this
single element condition, I will try to show you that there is only one condition which
has to be met and so on.
So, the basic syntax for this if condition is very simple, if then inside the parenthesis
write down the condition and inside the curly bracket you try to write down the
statements or the commands which need to be executed. In case if you try to look at the
flow chart of this if conditional execution, it goes like this. That you start you try to test
here the condition, now there are two options, the condition is TRUE or the condition is
FALSE.
In case if the condition is FALSE here like this, then we come and we stop here. And in
case if the condition is TRUE then we try to execute the commands and then the program
is stopped. So, that is how the if condition works.
4
409
(Refer Slide Time: 07:31)
So, now let me try to take here some example and try to show you that how it works.
Suppose, we want to multiply all the values in some program which are greater than 4
and we want to multiply such values by 3, right. So, somebody will enter some value.
The condition whether the value is greater than 4 or smaller than 4, that is going to be
checked.
And in case if this condition that x is or the number what you have entered is greater than
4, if this is found to be TRUE, then whatever is the number this is going to be multiplied
by 3. And if this is FALSE, the input number is not greater than 4 then nothing will
happen, right. So, that is the thing which we want to do it and you can see here now that
using the conditional execution concepts and using the if commands we can do it very
easily.
So, suppose I give here an input x is equal to 5 and then I try to write down if condition
here like this if, and inside the parenthesis you try to write down here x greater than 4
and after that you write here x star 3; that means, the number is going to be multiplied by
3, right. Here in this case, either you write here curly bracket or not this will not make
much difference because there is only one condition one an element. But later on you
will see that when we are trying to increase such conditions, then this curly bracket will
be necessary.
5
410
So, here on the R console, I will try to show you the application of actually both. So, in
this case since you have taken the value to be 5 and this 5 is greater than 4. So, the
condition is here TRUE and so this number is multiplied by 5, and you get here 3 into 5
as 15. Now, in case if you try to take here another number here x equal to 3, then what
will happen? 3 is less than 4, this is FALSE.
So, now in this case, no outcome will be obtained. If you try to execute this condition, if
greater than 4 and then x into 3 no response will be obtained here. So, you can see here
when I am trying to take it x equal to 5, then I get this outcome, but when I try to take
here x equal to 3, then you can see here as soon as I enter on the R console there is no
outcome, right. So, let me try to show you.
And now I would like to show you the same example curly bracket for this execution, so
that you do not get confused, right. So, you can see here that this is the same program I
take x equal to 5, this is the same condition if x is greater than 4, but now I use here these
curly brackets. And you can see here this gives you the same outcome.
And similarly, if you try to take here this x equal to 3 and if you use the same condition
here, but now I am using here this curly brackets, then it is giving you no response. You
can see here this is the same condition, but here I have used the curly bracket and in the
earlier example I had not used here the curly brackets, right. So, as I said it will not make
much difference as long as you are working with the scalar single conditions. Now, let
me try to show you this execution on the R console, so that you get more confident.
6
411
(Refer Slide Time: 11:06)
So, you see when you are trying to execute such statements on the R console directly,
then first you have to give the value of the variable. So, for example, if I say here x equal
to 5 and then I have to give here this condition, and as soon as I give this condition you
can see here that this condition is going to give you the value 15, right.
And here in case if you try to take here the same condition, but if you try to now put here
the curly brackets you will see here you get the same outcome. But in case if you try to
take here x equal to suppose here 2, then you can see here; what happens to this
condition? This will give you no output because there is nothing, because the condition is
FALSE, so there is no outcome. And even if you try to write down here these curly
brackets also, this will not give you anything. So, this is what I meant when I was trying
to explain you on this example.
7
412
So, now let us try to take one more example and try to see what happens. So, you know
there is a command here print. So, print is used to print something on the screen, right.
So, well, we are going to discuss about different types of such options in the forthcoming
lectures, but here I would like to show you something which is based on the use of print.
So, now I want to print a value, in case if the value is more than 3. So, suppose I give
here the value x equal to 6 and then I try to write down this requirement in the form of R
contacts. So, I try to give here the condition.
The condition here is x greater than 3. So, it is if inside the parenthesis x greater than 3,
and now in this curly bracket I am trying to write down the condition here. So, the
condition here is print and whenever we are trying to print something which is a scalar
we try to give it inside the double quotes. So, I try to write down here within double
quotes the value is more than 3 and I try to include it inside the parenthesis.
So, now if you try to execute it because the value of x is 6. So, 6 is greater than 3, this
condition is TRUE. So, whatever is there inside the curly bracket that will be executed
and you get here this outcome. So, you can see here that if you try to take here x equal to
6 then it gives you here this outcome, but if you try to take here x equal to 2, then it will
give you here no outcome. Why? Because 2 is greater than 3; no, this condition is
FALSE, right.
8
413
So, yeah, that is the same condition I want to show you here. So, let me try to show you
here what happened with x equal to 2. This 2 is greater than, 3 this condition here is
FALSE. And so this condition is FALSE, so whatever is written inside this curly
brackets this will not be executed. So, if you try to execute it here, no outcome is going
to be obtained. So, let us try to do this example on the R console and then we try to see
what happens, ok. So, let me clear the screen and you know the command here is control
l.
So, now, if I try to take here some value here x equal to suppose 6, and then I try to give
this statement like is here, you can see here the outcome is the value is more than 3. But
on the other hand in case if I try to take here some value here 2, and if I try to repeat the
same thing here, you can see here there is no outcome here, no outcome. But in case, if
you just change it x to be here 10, and then if you try to repeat the command, you can see
here the outcome comes here once again the value is more than 3, right, yeah. It is also
possible that you can also write what is the value of 3.
So, these things we are going to learn in the forthcoming lectures that you can write that
here that the value is more than 3, that is ok, but what is the value which you are trying
to compare, so you can also print the value of x. But that we leave it at the moment for
the future lectures, right.
9
414
(Refer Slide Time: 15:54)
So, now you have understood that how this if condition is going to work. And but now
you have seen here that in this case you have a constraint that if the condition is TRUE
only then whatever the condition you want that is executed. But in case if the condition is
FALSE, then nothing is happening.
But now in case if your wish or your requirement is that if the condition is TRUE, then
something should be printed or something should be executed and if the condition is
FALSE then something else should be computed. For example, the example which I took
in the beginning that if a student has got more than 50 percent marks then it should be for
example, printed that the student has passed, and if the student has received the marks
less than 50 percent, then it should be printed that the student is failed like this.
So, these type of conditions can be executed using the conditional execution with the
help of the concept of if-else conditions. So, as the name suggests if-else, if it is TRUE
then something should be executed and else, that means, if it is not TRUE then
something else should be computed.
So, if you try to understand this situation by the flow chart of this if-else execution, then
it goes like this. You start with something, then there is a condition which is here
checked. Now, there are two options that the condition is TRUE or the condition is
FALSE. In case if the condition is TRUE, then it will work just like your if condition and
10
415
whatever is the condition or the code you have written that is going to be executed and
after executing it the program will stop.
But in case if the condition is FALSE, then in the earlier case there was no output, but
now in this case you are going to write down here some more here code, that is called as
else code. Means when the condition is FALSE then under else these syntax are going to
be executed, and then after its execution the program will stop. So, this is how this if-else
condition works.
So, in R the syntax for this if-else condition is like as follows. You try to write down
here if or in lower case alphabets and then inside the parenthesis you try to write down
the condition. Now, this condition is going to be either TRUE or this condition is going
to be here FALSE. In case if the condition is TRUE, then whatever you want to execute
that has to be written inside the curly bracket just after this.
So, this will go with TRUE condition. So, there can be commands here, they will be
executed. And in case if the condition is FALSE, then what will happen? So, after that
you write here else, e l s e all in lower case alphabets and then within the curly brackets
you try to write down the commands which you want to execute when this condition is
FALSE. So, that is a very simple thing, that is a sort of extension of the if condition that
you are adding here one more here syntax for the else.
11
416
So, in this if-else condition you have to be little bit watchful that the condition in the
control statement may not be vector valued. And in this case only the first element of the
vector is used, right. And means ideally if-else condition should not be applied when the
condition which we want to judge or which we want to evaluate in terms of TRUE and
FALSE is a vector.
And this if-else condition is best used when you are trying to have a single element,
right. And when you are trying to use this condition means you want to judge whether
this is TRUE or this is FALSE, this condition may be a simple expression or any
complex expression where you can use all your logical operator like “and” like say “or”
or, right. Whatever logical operators we have studied they can be used here in defining
the conditions in the proper way, ok.
So, now let me try to take here one example and try to show you how the things are
going to work and the more important part which I always say is that you have to
understand what and how R is going to execute it. So, now, suppose I try to write down
here a function like this, means I can write down here this function takes value x minus 1
if x is equal to 3, and it takes value 2 x otherwise. So, that is the most simple language of
mathematics in which we study.
So, now in case if you want to understand, what are you going to do? You are trying to
say that in case if the input here is equal to 3, then this has to be executed that is x minus
12
417
1. And in case if number is something else, that x is not equal to 3 like this, then
whatever the number is there that has to be multiplied by 2
So, now in case, if you want to write this statement then what you have to do? The first
thing is this when you try to see here; what about this? x equal to 3 which I am writing
here you are trying to say that if x is equal to 3 then execute x as x minus 1 and in case if
x is not equal to 3 then execute x as twice of x. What is this x? This is an outcome that
will be obtained after giving the input value.
And after this if you try to see this equity operator, this is mathematical? No, this is
logical. So, that is why when you try to write down the condition, you have to use the
logical operator which consists of two equal to sign and that is indicating the exactly
equal to sign under the logical operations. So, now if you try to see here how are we
going to execute it; so, the inputs here suppose is x equal to 5 and now I write down here
if and inside this parenthesis I try to write here x equal to 3 that is exactly equal to using
the double equality operator.
And now I am saying that in this case x should be replaced by x minus 1. So, I try to
write down here this curly brackets, and within this curly bracket I write here x equal to
x minus 1. And in case it is not TRUE; that means, x is not equal to TRUE, then I try to
write down here l and then within the curly bracket I write to I write down here x is equal
to twice of x.
One thing I would like to have your attention here I mean and my advice is do not
misunderstand this statement. For example, here I am trying to write down here x is
equal to x minus 1, please understand this is not a mathematical operation, but we are
trying to say is that replace x by x minus 1. And some people may write it, think it
mathematically that x is equal to x minus 1 and they try to cancel x, it will be simply
rubbish, it has no meaning.
And similarly if you try to look at here, this is here x is equal to twice of x, this means
replace x by twice x. This is the meaning it is not like that you write x is equal to 2x and
then try to cancel x and x, and then try to prove that one is equal to 2 which is wrong,
right. So, this is also wrong, this is also wrong. So, please do not misunderstand this
expression, ok.
13
418
So, now, if you try to see here what will be the output; so, if I try to take here x equal to
5 then in this case x is not equal to 3 and so x will be operated as 2, 2 into x. So, the
value will come out to be here 10. And you can see here this is here the 10, and this
value here is coming out to be here 10, right. So, let me try to execute this expression
first on the R console and then I will try to move further. So, let me try to copy this
condition, so that I can save some time.
And now if I try to take here x equal to here 5, yeah, remember first you have to give the
value of x as input and then I try to write down this condition and then I try to write
down the value of here x. So, now you can see here you had given the value of x to be as
5, but now the value of x is transformed to 10. Similarly, if you try to take it here x equal
to here 9, so what will happen here? This condition is going to be operated once again
and then if you try to print out the value of here x this is here 18, that is 2 into 9 is 18.
But now I try to take it here x equal to 3 and then if you try to repeat this condition. And
if you try to see, what happened to the value of x? This becomes here x minus 1, that is 3
minus 1 which is equal to 2, right. So, this is how you have to operate with this if-else
conditions, right.
14
419
(Refer Slide Time: 26:38)
So, exactly this is the same thing which I just explain you on the R console, that you try
to take here x equal to 3 and then if you try to operate because x equal to 3 is now
TRUE. So, this is going to be operated. And x will be replaced by 3 minus 1 is equal to 2
and then you get here an outcome 2, right, ok.
So, now I try to take here one more example. This is the same example or a sort of
extension of the earlier example which we consider, that we want to print if a value is
more than 3 or less than 3. So, earlier we had printed the value is more than 3, but what if
the value is smaller than 3 then there was no outcome, right. So, now I try to extend it
and I try to show you here how you can do. So, you can see, here you have a two
15
420
condition that you try to give here an input x and this condition is that that x is going to
be checked whether this value is more than 3 or smaller than 3.
And based on that there will be an outcome TRUE or FALSE, and based on that there
will be two outcome. If the condition is 2, then you have to print. The value is more than
3, right like more than 3 and if the value is here FALSE, the outcome here is FALSE,
then you have to print here less than 3, ok. So, now how to get it done?
So, I try to write down here this statement if inside the parenthesis x greater than 3 and
then within this curly bracket, first set of curly brackets I write here print and then within
the parenthesis and within the double quotes I write down here whatever I want to type,
the value is more than 3. And if this condition is FALSE, because this x greater than 3
will have here two outcomes one is here TRUE.
So, when this TRUE is there this is going to be operated, and if this outcome here is
FALSE then I have to write down here else, and after that I have to write another pair of
curly brackets and I have to write here what we want to do. So, here I want to print the
value is less than 3. So, I try to use here this parenthesis and I try to write down the
statement within the double quotes.
So, now, in case if you consider here this x equal here 6 because 6 is greater than 3, this
condition is TRUE. So, whatever is the condition under the TRUE part that the value is
more than 3, that is going to be printed here, right.
16
421
And similarly, if you try to take here say here one more value here that x equal to here 2,
then what will happen? 2 is greater than 3 this condition here is FALSE. When this
condition is FALSE, then you are going to print whatever is written after the else
command and it is here print the value is less than 3, right. So, you can see here the
outcome will come here like this. And you can see here this is here the screenshot of
both these operations, but let me try to show you these operations on the R console to
make you more confident, right.
So, let me try to copy this command, so that I can save some time.
17
422
And I try to take it suppose here x equal to 5, and if I press this command you can see
here it is printed here the value is more than 3. And in case if I try to take here x equal to
2, then you can see here it is printed here the value is less than 3. So, you can see here it
is not a very difficult command here; and yeah. So, this is the screenshot.
So, now, we come to an end to this lecture. And you can see here I have explained you
two types of conditional execution, one is using the condition if and another is the if-else.
So, these if and if-else, now you can very clearly understand they have got different
types of role. The if condition is going to be used only when you have one condition, that
if the condition is TRUE then this has to be executed.
But in case if you have suppose two outcome, and you want that in case of the if
condition is TRUE, then this should be done, and if the if condition is FALSE then
something else should be done. Then you have to use the if-else condition. And believe
me these are very useful conditions because whenever you are trying to write down the
programs basically in statistics at least I know, that you have these types of condition
that this has to be evaluated only when the condition is True.
For example, many times you want to evaluate mathematical expression only in case of
the value of that expression is greater than 0, right. Suppose, you want to find out the
square root of some complicated expression. And when you are trying to evaluate it you
18
423
would like to find out the square root of that the outcome only when the outcome is
positive, and suppose the outcome can be positive or outcome can be negative.
So, in this case this type of conditional execution help us and we always say please try to
check this condition, if this condition is greater than 0 only then you try to find out the
square root. So, these are the ingredients of a program. So, now, my request to all of you
is that you try to take some example. Try to create the example yourself because when
you are trying to create an example yourself you also know the answer.
And then you try to see that when you are trying to execute the same concept in the R
software, are you getting the same outcome. And that will give you more confidence and
after that you will become a good programmer. So, you try to practice it, and I will see
you in the next lecture with more details on the conditional executions.
19
424
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 20
Conditional Executions - Nested If else If and If-Else
Hello friend, welcome to the course Foundations of R Software and you can recall that in
the last lecture we started our discussions on the Conditional Execution and we had
considered two commands If and If Else. And you had understood that under what type
of conditions we use such conditional executions. So, now in case if you try to recall
how we proceeded; first we considered the condition if, where you have to execute it
only when the condition is true.
Then you considered the condition if else, where you would like to proceed in two ways;
basically if the condition is true, then do like this and if the condition is false, do like
this. But now suppose you have more than two options; if you have two options, you can
control it by say true or false. But in case if you in case if you have more than two
options and if you have more than two conditions and based on that, you have to execute
something, then how are you going to do?
So, one simple concepts that comes to our mind automatically is that can we extend the if
else condition? That you have used if, then else. Now, can you add more condition if
else, if else and so on? Yes, so we are going to talk about such if else condition which
repeat and then I will tell you about one more conditional execution, in which the
executions are happening if the condition is true or false; that is also similar to if else
condition, but it is syntax is different.
425
(Refer Slide Time: 02:26)
So, we begin this lecture and I try to explain you these two new concepts. As you had
initiate in the initiated in the last lecture that we are now trying to discuss the control
structures, in which we would like to go for control statement, functions and loops and
these control structures are needed when you are trying to do the programming.
So, at this moment we are considering the conditional executions and just for your
recollection, I will try to give you here a quick review of the commands.
So, in the last lecture, we had learnt about the if condition and the rule was you have to
write the condition like this if and then within the parenthesis you have to write the
condition. In case if the condition is true, then whatever you are trying to write inside
this curly brackets that will be executed and if false, then after this the program will stop
and there would not be any outcome.
2
426
(Refer Slide Time: 03:17)
And after that we had considered the conditional execution, in which we were trying to
say that in case if the condition is true; then some codes have to be executed and if the
condition is false, then some other codes have to be evaluated.
So, the syntax was you try to write down here if and then within the parenthesis, you try
to write down the condition; in case if the condition is true, then whatever you write just
after this inside the curly brackets that is going to be executed. And if this condition is
false, so then you try to write down here else and then write down the condition inside
the bracket, which are going to be evaluated in case if the condition comes out to be false
and this is how this if else condition was working.
427
Now, I am going to work on another aspect, which is a sort of extension of the if else
condition and this is called nested if else. So, do you know what is the nest and have you
ever observed that how a bird makes the nest? It will try to take some sticks and then try
to put one stick over each other and so on, right. So, in this case we try to use the if else
if etcetera couple of time and this statement allows us to execute a block of code when
there are more than two alternatives, right.
So, as I said this is a sort of extension of the if else condition. So, now, how you have to
give this condition; suppose you have got couple of conditions say condition 1, condition
2, condition 3 and so on. And based on the condition; whether the condition is true or
false, you have to execute certain commands. So, in case if you try to see, the way I am
trying to write down this syntax if you try to understand; then it will be very easy for you
to implement it. So, I try to use the same concept, which I used in the case of if else.
So, in the case of if else what happens that, I will try to write down here if and then in the
parenthesis I will try to write down the condition and after this, whatever has to be
executed, if this condition is true is to be written. So, suppose my first condition I am
trying to indicate condition 1. So, now, if this condition is true, then I try to write
whatever is to be executed inside the curly bracket. So, I write here executes commands
if condition 1 is TRUE. And in case if the condition is false, then it moves further and
you write here else; just like as you have done in the earlier case.
428
But earlier what you did, after the else you just wrote the condition inside the curly
bracket; but now you try to continue with the, if else condition. So, I try to do here say
else, after this else I try to repeat and I try to write down here if; the second condition say
for example, I write condition 2 inside the parenthesis and then whatever is to be
executed, I am trying to write down here inside the curly brackets.
So, this is the statement like executes command if condition true, 2 is TRUE. And
suppose if this is also not true, then again I have to just repeat it; I will simply try to write
down here else and then if the condition 3 and then after this whatever I want to execute,
I have to write down inside the curly bracket like execute command if condition 3 is
TRUE. And this will continue and finally, you have to write down here else and after
that you have to write down within the curly brackets whatever is to be executed, if all
the earlier conditions like condition 1, condition 2 etc. they were not true, right.
So, whatever you want to execute finally that is written here that, execute command if all
conditions are false. So, in case if you try to see the structure here carefully what
happens? If you try to see here, you are trying to create here a structure like as here if
and then here else like this and then once again you are trying to create here a structure
like here if and then here else and so on.
And finally, whatever you have means, you are essentially trying to start here if
condition and you are trying to close it here. So, within these if you try to see here there
are condition if else, if else etc. So, that is why this looks just like a nest and this type of
execution is called as nested if else condition. So, I hope this is clear.
429
And now, let me try to take here some example by which I can explain you that, it is not
a difficult thing to execute this condition and to get an outcome. Suppose I try to write
here a function f(x) and this function takes value here x minus 1, if x is equal to 3 and it
takes value x plus 5, if x is less than 3. And if x is not less than 3, x is not equal to 3, but
x is greater than 3; that means otherwise, if these two conditions are not true, then it will
simply find twice of x.
So, this type of function you have learnt and you are pretty comfortable. So, that is what
I said in the beginning that when there are more than two condition, which are to be
satisfied in terms of true and false, then we use this type of conditional execution.
So, I try to write down here this thing. Now, if you have understood and if you have
practiced the if else condition, then writing the syntax for such a function is not difficult
at all. So, what I have to do here that, I try to write down here if and then I try to write
down here x equal to 3. And after this my aim is that, if this condition is true; then I want
to replace x by x minus 1 and I try to write it down under inside the curly brackets.
Now, after this in case if this condition is false, then what will happen? If this condition
here is false, then it will try to check else, if x is less than 3. And then whatever I want to
execute, I try to give it inside the curly brackets and this is replace x is equal to x plus 5;
that means you try to replace the value of x by the value x plus 5 and after this.
Now, there are two options, whether the condition this x less than 3; if this is true, then
you are trying to execute x equal to x plus 5. But in case if this condition is here false;
that means the first condition I am marking here this is false, the second condition this is
here false. So, now, whatever you want to do, just try to write down here else and inside
the curly bracket you try to write what you want to execute, that is x is equal to 2x.
So, in case if you try to look here, I will try to use here black pen and then I will try to
mark here and to show you the correspondence between the function and the program.
So, if your condition was like this x minus 1 is equal to x is equal to x minus 1, if x equal
to 3; so this condition has been executed here. And similarly, in case if you want to see
what happens with x equal to x plus 5, that is executed when x is smaller than 3; so this
is executed here, you can see here and this is here like this.
430
And after this you have here 2x. So, then you are trying to write down here this 2x as 2 x
and this is here executed here. So, you can see here this is what we wanted to do that, if x
is equal to 3, then execute x equal to x minus 1; if x is smaller than 3, then execute x is
replaced by x plus 5 and in case if x is greater than 3, try to execute x is equal to 2 x. So,
now I try to show you one example. I try to take here a value here x equal to 5, you can
see here. And now, if you try to see here, where this x equal to 5 falls? 5 is greater than
3, this is the only condition which is here true.
So, what will happen, the control will come here first to 5 is exactly equal to 3, this
comes out to be here false; then it comes to 5 is less than 3, this also comes out to be here
false and then finally, it come to the third condition that else whatever is there, you just
try to print here 2 into 5 and this answer here comes out to be here 10, which is obtained
here, right. So, this is how this conditional execution works.
Now, so now, I try to take here one more value in the same example. So, I try to take
here x equal to 2. So, now, you can see what will happen; the condition will come here 2
is equal to 3, this is false. So, it will not execute here this thing, but it will come to here
second part. Now, it will try to check here 2 less than 3, the answer comes out to be here
true.
So true, so whatever is written here inside this curly brackets, this is going to be executed
and you will get here the value here 2 plus 5 equal to 7. And after this the control will not
go to the third condition, because this condition is true and as soon as the condition
7
431
become true, the execution is done. And you can see here the outcome comes out to be
here 7 and this is exactly what I explain you.
And similarly if you try to take here one more value here x equal to 3, then what will
happen? The control will start execution and control will come here at the first place;
now it says here 3 is equal to 3, answer comes out to be here true.
So, as soon as it is true, whatever is written in the first curly bracket that is 3 is going to
be replaced by 3 minus 1 equal to 2 that will be executed and after this the program will
stop and you can see here this is the outcome which is coming here. So, now, let me try
to show you this example on the R console, so that you become more confident and then
I will try to move further.
432
So, I try to copy this command here, so that I can save some time. And you can see here
if I try to write down here x equal to suppose 9 and if I try to execute this thing and I try
to press here x, the value comes out to be here 18.
Why? Because if you try to see the x is replaced by here 2 into x, that is 2 into 9 which is
18.
Now, similarly if I try to take here suppose x equal to here say 2 and if I try to repeat it;
then try to see what is the value of here, x this comes out to be 7 y, because x is equal to
2, which is smaller than 3. So, the second condition is satisfied and it becomes a 2 plus 5,
so this is here 7.
433
And similarly, if you try to take here x exactly equal to 3 and you try to execute the same
command; then the value of x comes out to be here 2. Why? Because at x equal to 3, this
condition become true and this x minus 1 is executed, which gives you the value 3 minus
1 equal to 2.
So, you can see here it is not difficult at all to work with this nested if else condition.
And now I will give you one more option, where you can do the conditional execution.
And this is also called as if else condition, but there is a difference; if you try to see
earlier, we had written if and then else and there was a blank space between the two. But
now there is no blank space, I am writing i f e l s e without any blank space, all in lower
case alphabets.
So, this execution works actually just likes if else condition; but there is a minor
difference, the difference is like this. Then inside the parenthesis you want to execute,
whatever you want to execute that has to be specified. And the way it is specified is like
this that, you try to give here a condition and this condition is to be tested.
In case if the condition is found to be true; that means the answer is yes, then whatever
condition or whatever commands you are given here under the yes, they will be
executed. And if this condition results in a false, then whatever you have written as here
no; the syntax and commands whatever you write here, they are going to be executed.
10
434
So, that is a very simple statement and it is just like an if else statement that you did
earlier; but the only difference is that here you are trying to write if else together and
then within one parenthesis, you are trying to first write the condition and if the
condition is true or false, based on that you are trying to specify the commands which are
separated by comma, right.
So, this is actually useful when we are trying to make vector value evaluations of
conditions. If you try to see in the earlier conditions I had always told you that they are
useful when you are trying to deal with the scalar value; but in this case, you can handle
the vector value.
So, now, the first concept is clear that, under what type of condition you are going to use
if else condition and this type of if else condition; yeah just be careful because I am
always using the word if else and I cannot pronounce if else something like if else yeah
that, will not look that will not sound actually good.
So, anyway, so this is the first advantage of this condition and the way it is happening as
I said that, the components in the vector value logical expression, which is given under
the test; they will when they provide the outcome to be true, then whatever is the
commands given under the yes, they are executed. And if they provide the answer to be
false; then the commands which are under the no, they will be executed.
11
435
So, let me try to take here some example and then try to show you that how this actually
works and it will make the things very clear. So, let me try to take here set of number x
equal to 1 to 10, so this is the number 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. And now what I want, it
is something like this; f x is equal to say x square if x is equal to 1, 2 up to here 5 and it
is going to be here x plus 1 if x is equal to 6, 7, 8, 9 and 10 like this. So, this is the
function that we want to evaluate.
So, now I try to write down this condition here as say if else without any blank space;
then within the parenthesis I try to write down first the condition, the condition here is x
is less than 6. So, what we want here that if x is less than 6, then it is to be here like here
x square in general; well I have because I have taken here the values 1, 2, 3, 4, 5, 6, 7 8,
9, 10, so I have written here like this, but that can be valid or any real values also, right.
And in case if not; means otherwise this is going to be here x plus 1.
So, now this is the condition where x is less than 6. Now, this condition will have two
outcomes, whether this condition is true; then the outcome is going to be x square and if
the condition is false, then the outcome is going to be x plus 1. So, now, what will
happen that, as soon as you execute it the control will first come to the input value,
which is here x equal to 1, 2 up to here 10.
Now, it will first try to choose here the first value 1 and it will try to test whether 1 is less
than 6 true or false; the answer here is true and when true, this will become here 1 square
and the outcome will be here 1. Similarly, if we try to come to the next value x is equal
to 2 and then it will try to test the condition whether 2 is smaller than 6 or not; the
answer comes out to be here true and this will give us an outcome 2 square, which is here
4 and it will continue.
Then suppose it takes somewhere here the value here 7; so 7 is less than 6, answer is
false. So, when it is false, then it will give you the outcome here as here say x plus 1,
right. So, this will become here 7 plus 1 equal to here 8. And similarly if you try to take
here one more value here 8 and then it will say here 8 less than 6, this condition is false
and then the outcome will become here 8 plus 1 is equal to 9 and this will come up to 10.
And then 10 is less than 6, this is false; the outcome will become here 10 plus 1, which is
11 and after this it will stop, because there are no more values in the data vector x.
12
436
So, this is how it is going to work. So, you can see here if x is less than 6 that is true,
then x equal to x square, which will be operated by the command under the yes. If x is
greater than or equal to 6; that means the condition is false, then x will be replaced by x
plus 1 and x equal to x plus 1 has to be given under the option no. So, you can see here
for x equal to 1, 2, 3, 4, 5; we get here these value 1, 4, 9, 16, 25 and for x equal to 6, 7,
8, 9, 10, you get here the value 7, 8, 9, 10 and 11, right.
So, and the same outcome is here also, you can see here. Now, I try to show you this
thing on the R console and you can see here this is the screenshot of the same outcome
here. So, let me try to just copy this command, so that I can save some time.
13
437
And I try to come here on the R and I try to define here x is equal to 1 to 10. So, you can
see here x is here like this. And if I try to execute this condition, this gives you here like
this, right. And even if you try to change here; suppose if I try to define here say 100 to
say 120, then x is here like this and if you try to consider here, you can see here this
execution is obtained here. So, now, you can see that it is not a very difficult thing in R
to execute such operations.
And now let me give you one more example and after that I can explain you the utility of
this if else condition. Suppose your objective is that you want to write a program that
how to know whether the input number is an even number or an odd number.
So, the first thing here is you need to first understand and think that what can be the
logic. So, the logic behind deciding that whether the number is even or odd is that you
try to divide the number by 2 and if the remainder comes out to be here 1; then the
number is odd and if the number gives the remainder when divided by 2 as 0; that means
the number is even. And now, you can recall how you can do it in the R software, you
have done the Modulo Division if you try to recall.
So, the modulo division was done by two symbols of percentage sign and this finds the
remainder after the division of one number by another. So, now, you want to write this
program. So, now, if you try to see; if you use this if else condition, this job become very
simple. I will try to write down here if else and then x modulo division equal to 2 by 2
14
438
and then if this is exactly equal to 0; that means you are trying to say that you try to
divide the number by 2 and if the remainder comes out to be 0, that means the condition
is true and under this true condition, you want to write that the number is even number.
And in case if this condition is false; then obviously if the number is not even, it has to
be odd, yeah means I am talking only of the this real numbers and integers, right. So, in
that case you have to print odd number. So, I try to take here this data vector x equal to 7,
9, 8, 4. Now, you can see here this data vector is operated inside this if else condition and
x takes value here 7; it divides by 2 and the remainder comes out to be here 1, which is
this equal to 0 false.
So, in case if the condition is false, it will print here odd number; then it will x will take
next value 9 and then 9 will be divided by 2, the remainder will come out to be here 1,
which is equal to 0, no this is false. So, now, it will print here odd number. Now, the next
value here is x equal to 8. So, 8 will be divided by 2, the remainder will come out to be
here 0 and 0 equal to 0, yes it is true. So, this will be printed here as a even number.
And finally, x will take the value here 4 and 4 divided by 2; the remainder will come out
to be here 0 and 0 is exactly equal to 0 this is true and so the, so it will be printed as the
number is even. And you can see here that is what exactly it is happening here; odd
number odd number even number even number and because you wanted to print a
character, so that is why you have given it inside the double codes, right.
15
439
So, that is what is happening here now. And if you try to see here this is the screenshot of
the same operation and now I try to show you this thing on the R software also, so that
you become more confident, right.
So, I tried to copy this condition and I try to take here some number; suppose I try to, I
choose here x is equal to say 1, 2, 3 and 4, right. So, you know that 1 and 3 they are odd
numbers and 2 and 4 are the even number.
So, if you try to write down here this condition, you can see here this is operated and you
get here this outcome. And similarly if you try to take here any other number that is not a
very difficult thing for you to identify, right ok. So, now, we come to an end to this
lecture and we stop here. So, you can see here that, this was a pretty simple lecture and
you have understood the application of two more conditional expectation.
Now, that will be your decision that, in a given problem which control statement you
want to use, which conditional execution statement you want to use; whether this is if, if
else or something else. And you have to basically think that the use of which of the
condition will make the programming easier; because the program should not be
unnecessarily very long.
So, these commands have been helped; try to help you in making an efficient program.
So, once again I will request you; now this is your turn that you try to think about some
16
440
problem and try to first judge that out of these four syntax, which is going to be more
useful for you, which of the command is going to help you more.
And then try to write down the program and try to see whether the outcome, which you
thought and the outcome which R is giving are they matching; if yes be happy, if not, try
to look where is the problem. And one thing I can assure you that, these commands are
very helpful when you are trying to do any statistical calculation, mathematical
calculations and simulations. So, these are very basic fundamentals, which you need to
learn for writing a good program in the future. So, you try to understand it, practice it
and I will see you in the next lecture; till then goodbye.
17
441
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 21
Functions for Conditional Executions - switch and which commands
Hello, friend. Welcome to the course Foundations of R Software. And, you may recall
that in the last two lectures, we had talked about the conditional executions and we have
talked about four different commands which are used for the conditional executions
under different conditions.
So, now, continuing on the same broader topic in this lecture, today, I am going to talk
about two more commands which are switch and which; which are also used in the
conditional executions and they try to give you different type of outcome. And, these
type of outcome they are many times helpful in the programming. So, we begin this
lecture and we try to understand what are the roles of this switch and which commands
ok.
442
So, as we had discussed earlier that we are trying to understand the broader topic of the
control structure in the R software and we are trying to deal about control statement,
functions and loops one by one.
So, today we first try to consider the command switch the spelling is s w i t c h and the R
command is written in s w i t c h with all the lower case alphabets. Actually do you know
what is the meaning of a switch? Switch means sometime you say ok you switch from
this classroom to another classroom and what do you do? You just try to jump from one
class room to another classroom.
But, on the other hand if the teacher asks you more specifically that ok if there are
suppose five classes and teacher ask you, ok, you are at present you are sitting
somewhere, but you please try to go to the class number 3. So, what you will do? You
will simply try to jump to the class number 3, ok. So, this switch is just like this and yeah
do not try to get confused with the meaning of the switch which is the sort of electrical
switch, right.
This electrical switch will give you only on-off whether this switch is another English
words and which is used in the sense that you please try to switch from this place to
another place. People try to switch from different offices, in different countries, in
443
different cities and so on so that is what you have to keep in mind while understanding
this control statement.
So, actually this switch is a substitute for long if statements that compare a variable to
several integer values, right. I mean suppose if you want to know that you have a some
sequence of numbers say 1 to 100 and if you want to know where is this number 99 is
coming. So, this type of information you can obtain from such as function.
So, actually this switch is a sort of multi way branch statement and it tests a variable for
the equality against the list of values. So, you have to give the different options like as
you said that there are five class room. So, the list of classroom is class number 1, class
number 2 up to class number 5 and then you are asking to do something out of that list.
So, this actually switch maps and search over a list of values, right.
So, if there are more than one matches for any given value then switch will return only
the first matched value. Suppose, if you try to say that there are two rooms whose
numbers are 3. So, and if you ask the someone to go to the room number 3, the person
will start searching for the room number 3 as that the person will go to room number 1,
room number 2 and then the room number 3 is there and the person will go there.
Now, after some time if you say ok there is another room number where you have to go
then the person will get confused. So, the way the switch works exactly in the same way
the R will also work.
444
So, switch actually evaluates an expression also and accordingly chooses one of the
further arguments which are there inside the parenthesis. For example, means if there is
some expression that can be a number or that can be a character string which you want to
evaluate and then there are several options and you want to match this expression with
those options.
So, what you will do that you will try to write down here the command switch s w i t c h,
all in lower case alphabets and within this parenthesis you try to write down the value of
that expression that can be a number or a character string and then after that you try to
write down here the list of values here, right.
And, then for example, you can write down here switch and then expr the expression and
then say suppose case 1, case 2, etc. So, what it will try to do? The switch will try to
operate, it will come inside the parenthesis, it will try to read the expressions and then it
will try to seek the information required about this expression from any of these possible
values which are listed after that, ok.
So, the flow chart of this switch command will look like this one that the you start the
operating the switch command it will come to the Case 1 and it will try to compare it
with the values which are available which are listed after that. And, if this found find out
to be True, then the Statement break and if the this condition comes out to be here False
so, then this control will come to the next value and it will try to compare.
445
And, if that comes out to be True; that means, there is a match then the statement will
break and if this comes out to be False, then this operation will continue until it finds out
the true statement. And, if suppose if there are say small n number of cases and suppose
at every place the switch is finding only the false outcome then finally, whatever is the
default that default statement will be executed, right. So, this is the way the switch will
actually work.
So, it is something like this, you are trying to ask someone to go to the room number 3
out of those five rooms. So, the person will start it will try to see this is suppose room
number 1, so, it will try to see is this matching with the room number 3 and the answer
comes out to be no. So, it will come move forward and then it will come to the room
number say here 2 and it will try to match that the person has to go to the room number 3
and this is room number 2, is it matching, answer comes out to be here false.
So, once again it will be here false and then in case if the person come to the third room
and it tries to match with the third room with the room number 3 and it matches. So, it
will become here True and this will give you the outcome here and whatever is outcome
has to be evaluated after the switch statement that will be evaluated.
For example, if the person has to sit in the room the person will sit; in case if the person
has to take the food in the room number 3 it will take the food and so on. So, that is a
pretty simple operation, but a very useful operation.
446
And, in the switch character string expression always match to the listed cases that is
what you have to keep in mind. And, whenever we are trying to consider a non character
string expression that is coerced to the integer and if there are multiple matches as I said
only the first match element will be used, right.
So, now we try to take here some examples and try to understand the functionalities of
this switch statement. So, the first example I am trying to take here where I am trying to
use this switch function as an integer. Suppose, I have here a list and this list has three
values say apple, banana and orange and because these are characters so, I am trying to
give them inside the parenthesis.
So, now you have two options. The first option is that you want to know what is there in
the second place and yeah after that I will try to consider that you want to know what is
the name at the second place. So, anyway if you try to first understand in this case what
are the different location or position of these three values in the list. So, apple has
position number 1, banana is written as at a position number 2 and orange is written at
position number 3.
Now, in the switch command you are asking the expression your earlier command expr
is equal to here 2. So, you write here 2 and you are asking where is this 2 or what is there
at second location. So, this 2 starts here it comes to the first location and it tries to
447
matches. This is false because this is the position number 1. So, it moves forward and it
tries to matches the position number.
So, this position number is 2 and this matches with this here 2. So, it stops here and it
says that the condition here is true and it will give you the value here which is written
here as say banana. So, you can see here you have used here an integer 2 and it is
matched to the value which is present at the second position.
Similarly, if you try to repeat this experiment and try to write down here 1 then
obviously, whatever is written at the place number 1 which is here apple this will come
here as an outcome, right.
And, similarly in case if I try to take here one more example before I go to the R console,
suppose I try to use here a string in place of an integer. So, for example, you have seen
here I have used here this 2 and here 1 as an integer, now, I would like to use here a
string. So, this switch function can also be used in the string as well and the matching
name will be written in this case.
For example, I try to create here a list as here like this. So, this is here the first element in
my list so called what we had written here as a case 1. Case 1 if you remember here I had
given you this idea. These are your here case 1, case 2 and so on. So, this is your here
case number 1 where you are trying to write down colour and blue and colour is equal to
blue and both are taken as characters. So, they are written inside the double quotes.
448
And, similarly you try to take here second value as case 2 which is here gender is equal
to male and third case that the third string this is here volume is equal to 50, right. So,
volume is a string, so, that is given inside the double quotes and 50 is the number. So, I
am writing it like this. So, now, you want to know where is this colour. So, you try to
write down your expression expr as c o l o u r colour within the double quotes and now
this expression will start working.
It will come to the position number 1 and it will try to match whether this colour is
matched with this colour or not, answer comes out to be here true. So, it finds out what is
the value of this here blue this colour which is here blue and this blue is printed here. So,
this is how you can actually switch works. So, you are trying to give here name and it is
try to give you the value assigned to that name.
Now, as an another example if I try to repeat the same thing, but now I try to give here
the expression as a volume. So, what will happen now? This volume will start working.
It comes to the here first situation and it finds that volume is not matching with this
colour. So, it becomes here false. So, now, what it will do? Then it will now come to the
next case and it tries to matches this volume with gender. Once again it comes out to be
here false.
So, now, it will move further and it will come to the case number 3. Now, this volume
matches with this volume and whatever is the value assigned to this volume it is 50, this
is reported here. So, you can see here that is a very simple operation, but it is very useful
when you are trying to do programming and you want to have this control structure.
449
Now, in these examples in both these examples what can happen? Suppose, if you try to
give a value either integer or a string which is not available in the list then what happens.
So, in the case that if there is a no match and or if there is an unnamed element, then
whatever is there that is written; that means, there is no value no name. So, nothing will
be written and there will be no outcome.
So, if you try to see here in the same example where I took apple banana and orange I try
to write down here 4. So, you can see here that apple is at position number 1, banana at
2, orange at 3 and there is 4th, there is nothing here. So, 4th will come here, but there is
nothing. So, there would not be any outcome.
And, similarly if you try to take the earlier example where I took the colour as the case 1
gender as the case 2 and volume as the case 3 – in this case you are trying to give here a
value which is here, size. So, size comes here, it is there is no match, it gives here False.
Then it comes to here, second case. This is here a gender, there is no match. So, it comes
to here third case, here it is volume so, there is no match and after that it moves ahead,
but there is nothing. So, it will give you here no outcome. So, there will be no outcome,
right.
So, let us try to first try to execute these examples on the R console so that you get here
more confidence.
450
So, if you try to see here I try to copy and paste here this command you can see here
when you are trying to take here switch 2, then whatever is written in the place number 2
that comes here banana. And, if you try to replace 2 by here suppose 3 you can see here
now the orange will come, right and in case if you try to replace 3 by here 1, then apple
will come.
And, now I try to make it here 4 instead of 1. So, you can see here there is no element in
the 4th place. So, it will not give you anything, right. It is just like this even if you try to
take here say here instead of 4 you try to take care 5, there is no outcome. So, that is
what I was trying to explain you, right. Similarly, in case if you try to take here a string,
then let us see what happens.
So, if you try to take here this string you can see here I have written here colour, colour
blue, gender etc. and if you try to see here this colour is matched and you get here blue,
right. And, now in case if you try to write down here something like here instead of here
suppose I make here one mistake I want to show you. Sometime you have seen that the
spelling of the colour is something like c o l o r. So, I make it.
Now, what do you expect what will happen it will not give you any outcome because this
spelling is not matched with the spelling of this colour its c o l o r where its c o l o u r.
So, this is what you have to be very careful then when you are trying to do it. And, then
10
451
similarly if you try to find out here say gender, so, gender will come out to be here male.
This is the value here which is given to be here like this you can see here gender is equal
to here male, right.
So, similarly if you try to take here one more value which is not available here, suppose I
say here suppose I say here size you can see here there is no outcome, right. So, this is
what I meant when I was trying to explain you. So, you can see here these are very
simple outcomes.
And, now after this I try to give you some details about another simple operation which
is very useful in the conditional execution. This is which w h i c h and you know what is
the meaning of the which. So, whatever is the meaning of this switch exactly in the same
way this command is also used in the R software. When you try to use this which
command it will try to give you the position of the element in the logical vector when it
is find that the outcome is TRUE, right.
So, I will try to explain you with some examples but, before that let us try to understand
that how it is done. So, when you try to write down here which w h i c h all in lowercase
alphabets and whatever you want to give the give the data that is inside the parenthesis.
So, this will give you here two types of outcome TRUE and FALSE and based on that it
will give you the position of the element in that data vector, right.
11
452
So, for example, if you want to suppose there is a logical vector which is given here as a
x, after that you have here two option arr dot ind and useNames where this N is in the
capital letters. So, if you try to give here a double r dot ind, this is a logical value and it
returns the array indices if this x is an array for a matrix, right. And, after that if you also
use this command use names that it is also logical variable and it tells about the
dimension names of an array x, right.
So, why not to take this example and try to see that how this which command is going to
be useful for us? So, let me try to take here r data vector x which has here these values
10, 15, 8, 14, 6 and 12. So, this is your data vector.
Now, I want to know at which place we have a value 14 in this data vector. So, you can
see here this 10 has got the value; that means, the location number 1, 15 is at the position
number 2, 3 8 is at the location number 3, 14 is at the location number 4, 6 is at the
location number 5 and 12 at the location number 6, right.
So, you want to know that what is the position of this number 14 in this data vector. So,
you try to write down here which inside the parenthesis x and logical equal; that means,
two equal to sign equal to 14 and this will give you answer here 4. So, what is this 4?
This 4 is the location which is here given in here 4, right.
And, similarly you can operate here some other commands also without any problem.
Suppose, you want to know that which are the numbers in this data vector x which are
12
453
not equal to 12. So, you try to ask here which inside the parenthesis you write x not equal
to 12. So, you can see here in this data vector what are the numbers which are not equal
to 10 12 their positions will be reported.
So, you can see here the number 12 is here at the location number 6 and all are the other
number 10, 15, 8, 14 and 6 they are at location number 1, 2, 3, 4 and 5 which is reported
here. So, which command will provide you the location of those numbers where the
number is not equal to 12. Similarly, if you want to know out of this data set which xs
are there whose value is greater than 10. So, I try to write down here which and then I
write inside the parenthesis x is greater than 10.
So, what will happen this operation will start going inside the parenthesis and it will try
to see here that what are the values where the value is more than 10. So, you can see here
that it will come to here try to observe my this blue pen. So, the control will come to the
position number one and the value here is 10.
So, so you want here the value greater than 10. So, it will not be reported, then it comes
to the second position, it has the value 1, 5, 15; 15 is greater than 10. So, the answer is
TRUE, right. So, I can write down here. So, if I try to take here x equal to 10 which is if
you it will try to compare 10 greater than 10, answer is here FALSE, it will not be
reported.
Then it comes to the next value x equal to 15 and it tries to check 15 is greater than 10
answer is true. So, it will try to report the position of the number 15. So, the position is at
2. So, this 2 is going to be reported here and similarly, it will go for x equal to 8, x equal
to 14. So, it will try to see at 14 the answer is coming out to be TRUE and the 14 is at
position number 4 you can see here, right.
So, similarly it will try to search for all the numbers and you can see here that here there
are three numbers which are more than 10, 15, 14 and 12. So, it will give you here the,
their location 2, 4 and 6. So, this is what is happening, right. And, so, before I move
further let me try to give you this thing in on the R software also R console also so that
you can be more confident.
13
454
(Refer Slide Time: 23:07)
So, if I try to take here x is equal to suppose I take any value say 5, say 7, 10 and say
here 20 and say here 13, right. And, suppose if I say here which of the x is equal to 14,
do you find here any value which is equal to 14? No. So, this will give you here there is
no value integer 0, but if you try to write down here 10 then you can see here it gives you
here yes first, second and third that the value 10 is at the location number 3, right.
And, similarly, if you want to find out here that which are the values which are greater
than 10. So, you can see here it will give you the value here 4 and 5 which are
corresponding to the values in the data vector x which are at fourth and fifth position
which are 20 and 13 respectively.
And, similarly if you want to find out here that all other values in this data vector x
which are not equal to 10 so, you can see here the 10 is occurring at the third position
and all other values 5, 7 and 20, 13 they are not equal to 10 and they are occurring at 1, 2
and 4, 5 location, right 1, 2, 4, 5 positions. So, that is what is here. So, you can see here
that it is not a very difficult thing.
14
455
So, this was an operation over the data vector, now I try to give such an operation on an
array. So, I try to consider here a matrix. So, I try to create here a matrix of order 3 by 3
with the data 1 to 9, now you know how to create a matrix and this matrix x is here like
this. Suppose, I want to know in this array that which of the value is the minimum value.
So, for that I have a command here like it is which dot minimum or it is written here w h
i c h dot m i n and inside the parenthesis you try to write down the name of the array.
So, you can see here in these values 1 2 3 4 5 6 7 8 9 this one is the minimum value, ok.
So, it is coming here one and you want to know suppose which is the maximum values.
So, for that the command here is which dot max which is which maximum and it is
written like w h i c h dot m a x and inside the parenthesis you try to write down the array
which is here x.
So, this will find the that which of the value is the maximum. So, you know that here out
of 1, 2, 3, 4, 5, 6, 7, 8, 9 – 9 is the maximum value which is reported here, ok.
Now, I try to give you some more. Suppose, I want to know out of this matrix which are
the elements when they are which are actually odd. So, that means, when they are
operated with the modulo division and the division is done by 2, then what are the
numbers where the remainder will come out to be 1, which are those number? So, what I
15
456
will do here I will simply take the same matrix here x and I will write down here which
inside the parenthesis x modulo division 2 and this is logical equal to 1.
So, now you can see here when you are trying to take the number 1, 2, 3, 4 up to here 9
inside this matrix, then all the odd numbers when they are divided by 2 they will give the
remainder as 1, they are going to be reported. So, when you try to divide see here 1, 3, 5,
7, 9 they will give you the remainder 1 and you can see here it is giving you all these
values 5, 7 and 9, right. So, these are the values.
Now, suppose you want to know what are the location of these values means what are
the places in this x where this odd values are occurring they are located. So, I try to use
here the same command which with x modulo division 2 is logically equal to 1 with an
option here arr dot ind is equal to TRUE.
So, if you remember in the beginning we had talked about this option and now, if you try
to operate it will give you this type of outcome. Now, please try to see what it is trying to
show you. It is trying to show you that the first outcome which is odd is the value 1 and
the 1 is occurring at these are the row and column at row number 1 and column number 1
and you can see here this is here like this.
And, similarly, if you try to take here one more example suppose I try to take here the
third value; third value here is 5. So, where this 5 is occurring? This is occurring at row
number 2 and column number 2. So, you can see here this is the location of a 5 and so
on. So, you can see here by using this option arr dot ind is equal to TRUE, this will also
give you the location in an array that what are the places where such things are
occurring.
So, you can now think that these things are going to be extremely useful when you are
trying to do different type of data manipulations.
16
457
(Refer Slide Time: 28:25)
And, this is here the screenshot of the same operation. So, let me try to show you these
operations on the R console also so that you get here more confident.
So, this is your here matrix x and now I try to operate here this command here which of
the value is the minimum value. So, you can see here. This comes out to be here 1 and
similarly if you want to find out the maximum value, this comes out to be here 9.
17
458
And, after this you want to know that which are the values which are here odd. So, if you
try to see here these are the values which are here odd, and if you want to know the
location of these values in the array x which is here a matrix, you have to use this
command here like this where arr dot ind is equal to TRUE, right.
Similarly, if you try to repeat this command and if you want to know about the even
numbers, suppose x is here like this and you want to know that where what are the places
18
459
where you have even numbers. So, you can see here that it is giving you the value here
like this the even number is occurring in the row number 2 and column number 1 which
is here 2. Then, similarly here this is occurring at row number 3 and column number 2.
So, this is here 6 and so on, right.
And, if you just want to know that what are these values so, you can remove this arr dot
ind and it will give you 2, 4, 6, 8, right. So, these are the values which are even right, ok.
So, now, I stop in this lecture and you can see this was a very simple lecture and my
objective was to give you some function which are used for the for controlling the
programs. So, I have discussed here only two commands switch and which, but you will
see that in the R there is a long list of such functions. So, and it is really not possible for
me to cover all the things. So, I will leave it up to you, but for you it is very important to
understand what this functions are trying to do. So, you try to take some example, try to
practice it and I will see you in the next lecture till then, good bye.
19
460
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 22
Loops - for loop
Hello friends, welcome to the course Foundations of R Software and you can recall that
in the last couple of lectures we have discussed a couple of topics about the control
structures and the control structures are essentially used to control the program or the
structure of the program. So, now, continuing on the same lines, in this lecture today we
are going to begin with a new topic of control structure that is about loops.
So, the first question comes that, why these loops are needed? Loops are needed
whenever you want to repeat the same command more than once. For example, let me try
to take a simple example to explain you. Suppose in a class the mark sheet and the grade
sheet are to be prepared for all the students and suppose there are 100 students and
suppose there are only two classifications that if a student has secured more than 50
percent marks then the student is going to be graded as pass otherwise fail.
So, now that is the rule. So, that command you can write very easily, but now this
command has to be repeated over all the student all the 100 students. So, now, first
option is that you try to take one student at a time and you try to execute this program
and you try to record the outcome and you create the grade sheets.
Second option is that in case if I can have an option that I supply the list of all the
students and this program is going to be executed repeatedly for all the students and
finally, the grade sheets of all the students will be prepared automatically. So, under
these types of situations we try to use the concept of loops.
Now, there are different types of loops. So, in this lecture we are going to talk about the
for loop and in the next lecture we are going to talk about the while and repeat loops. So,
let us begin our lecture and we try to consider some examples to understand how you can
execute the concept of for loop in the R software. So, let us begin our lecture.
1
461
(Refer Slide Time: 02:50)
So, you know that we had started a discussion on the control structures in R and we
already have done the control statements, conditional executions like if, if else, if else if
etc. Then we had considered the some functions and we had considered the function
switch and which and now we are coming to the concepts of loops.
So, why these loops are needed? These loops are needed to repeat the commands, right
that in case if you want to execute a particular command or a set of commands for more
2
462
than once, then you try to use the concept of loop and there are three popular loops
which are popularly called as for loop, while loop, repeat loop.
So, in this lecture we are going to talk about the for loop and all the three loops, they are
used in different types of conditions that depending on the need and requirement you can
choose whichever loop you want to use. So, I will try to explain you their conditions
under which they are going to be used.
First, let us try to understand about the, for loop. So, the for loop is used when the
number of repetitions is known in advance; that means, you know that for how many
times you want to repeat the loop. For example, you just consider the example where we
know that there are 100 students and we want to prepare their grade sheets. So, whatever
is my program, whatever are the commands they have to be repeated 100 times.
So, under such a case, we try to use the concept of, for loop and the flow chart of this for
loop is like this, that we start and we enter into the loop. So, whenever you are trying to
start the loop you have to first initialize it means, initially something like setting the
counter at 0 from or the setting the counter at a point from where you want to measure.
For example, suppose I want to know that I am driving my car and I want to know that
how many kilometres I am going to drive today. So, what will I do? At the time of a
3
463
start, there is a counter and there I will make the reading to be 0 0 0 0 and after I have
driven the car for the whole day then I can see that how many kilometers I have driven.
So, this process of making the counter to be at initial value 0 0 0 0 this is called as
initialization. So, you know that how the counter is going to work right because in the
loops also you have to define a counter which is going to control the number of times the
loop is going to be repeated.
So, now we enter into the loop and yes certainly there will be some conditions which we
have to check and based on the outcome of this condition whether the conditions are true
or false, we are going to execute the conditions. So, in case if the conditions are true then
whatever are the instructions whatever are the commands they are going to be executed
and after that and then in order to inform the R software that it has to go further the
counter will be updated here.
And this information will be supplied to the condition and once again in case if the
condition is going to be true, then once again this process will start and the instructions
will be executed and this process will continue. As soon as the condition become false
then the program will stop.
So, this is how the for loop works is just like this that the program will start and the
program has to be repeated for say 100 students. So, the program will come here and the
conditions are going to be checked. In case if the condition is true then whatever is the
grade of the student that is going to be reported, then after that the control will come to
the next student by updating the counter.
And the next student records will be taken as input and then the condition for the pass of
fail is going to be checked and then based on that the instructions are going to be
executed and this counter will be updated till we reach to the last student that is a 100th
student and as soon as the counter comes to the 101, the condition will become false and
then the process will stop. So, this is the way the for loop works.
4
464
(Refer Slide Time: 07:51)
So, now the question is that how are you going to work with the, for loop in the R
software? So, the syntax for the for loop is that first you would try to write f o r for and
then you have to give here a name inside the parenthesis in the form of a vector and yeah
it means you have to be careful that I am writing here, which contained in the vector
vector so, this vector is corresponding to this vector, right.
So, we have to give here a counter variable. So, that is going to control the values in this
vector and after that whatever commands we want to execute, they are going to be
mentioned just after that inside the curly brackets and then the loop will start and these
commands are going to be executed for all the values which are given in the counter
variable whose information to repeat is stored in the vector.
So, this is how actually for loop works. Do not worry as soon as I take some examples
this concept will become very clear.
5
465
(Refer Slide Time: 09:03)
So, why not to take an example and try to understand what it does? Suppose I want to
print the numbers 1 to 5 1, 2, 3, 4 and 5.
So, in order to do it I try to use the concept of loop and I write down my, for loop as
follows for now this is here the counter variable. Counter variable means this is going to
control the counter of the number of times the function is the functions are going to be
repeated. So, I have to define here a variable this is the counter variable i which is going
to inform the for loop that how many times or for what values this for loop has to be
repeated right.
So, then I write here after informing the counter variable I try to write down here n and
then I try to write down here the values for which I want to repeat the function or the
commands. Here because I wanted to repeat this value for the values 1, 2, 3, 4, 5. So, I
have written here as a 1 colon 5 otherwise you can give here any value right that can be
also in the form of theta vector and then I give the instructions inside the curly bracket
that print i square what is this i? This is coming from here right.
So, in case if you try to see what will happen? As soon as you try to execute it on the R
software the control will come to here i and it will try to choose the first value. So, i is
going to take here 5 values 1, 2, 3, 4, 5. So, first the value of i equal to 1 is going to be
considered and then this value of i will be coming inside the curly brackets and all the
instructions to be executed are read.
6
466
Then it is here print 1 square. So, this will print here 1. Now after this the second value
or the next value in the counter which is i equal to 2 this is going to be considered and it
will try to execute the commands inside the curly bracket as print 2 square and this will
give you a result 4 and similarly the next value in the i which is here i equal to 3 that is
going to be considered and then the control will come inside the curly bracket and
whatever are my instructions like as to print 3 square will be executed and we will get
here the value 9.
Similarly, it will come to i equal to 4 and it will print here 4 square which is here 16 and
then it will come to here i equal to 5 and then it will print here 5 square 25 and now after
this there are no more values in this vector in this set of values. So, now I will it will go
to i equal to 6, but this is not available.
So, the program will stop here and this is how these things work and you can see here
this is here the outcome which you are going to get when you are trying to execute it on
the R console, right. So, before going further let me try to show you this command that
how it works in the R software inside the R console.
So, now I try to use this command on the R console and you can see here that as soon as
you use this command and press enter, it will give you this outcome. Now at the same
place I will try to show you that in case if you try to replace this 1 to 5 values by some
other values which are not really in the sequence.
7
467
So, you can see here what I try to do that instead of 1 colon 5 I try to give here a data
vector say 2 comma 4 comma 6 comma 7 right and if you try to see here what will
happen what do we expect here? That the control will come to here 2 and it will print
here 2 square, then the control will come to 4 it will print here 4 square, the control will
come to 6 and then it will print here 6 square and control will come to 7 and it will print
7 square which is 49.
So, let me enter and try to see what happens? So, you can see here these are the 2 square,
4 square, 6 square and 7 square these values are obtained. So, you can give here any
value in this data vector and those values are going to be considered in the same order in
which they have been mentioned inside the data vector, right.
So, yeah this is the same command which I have shown you on the R console. So, I have
taken here this data vector here 2, 4, 6, 7 in place of here 1 to 5 and then it is printing
here this value, right. So, the same thing is happening that first i is coming to the first
value which is i equal to 2 and then it is printing 2 square then after this the next value i
equal to 4 is considered and then it is printing here 4 square then the third value is
considered i equal to 6 and 6 square is printed and finally, the last value i equal to 7 is
considered and 7 square is printed, right, ok.
8
468
(Refer Slide Time: 14:41)
So, now let me try to give you here one more example, yeah. One point where I would
request you to be little bit considerate is that here I am going to use here a concept of
function, right although I agree that we have not considered this concept up to now we
have not understood or I will say that I have not explained you up to now, but that is the
topic, which is coming just after the topics of loops.
So, I would request you that you please try to take the this example and after you
understand the function you can come back here and then I am sure that you will
understand it more easily, but on the other hand this concept is very simple and my
objective here is to show you the application of the loop and as I said in the beginning
itself that the way we are going to learn this software is that we are trying to take the
commands and there is a good possibility that some of the commands, which I am using
here may be considered in the forthcoming lectures, but then this is how we have to
understand each other and we have to help each other, right.
So, I request you that you please help me and try to understand this example my
objective is not at all to explain you the concept of function here, but I will try my best to
give you briefly and my main objective is that how this for loop is going to work. So, in
this example my objective is that I have got a set of values here, which are contained in
the data vector x 2, 4, 6, 8, 10 and 12 and I want to check that in this data vector in case
9
469
any of the value is and it is divided by 2 that half of the values contained in the data
vector is it greater than 1.5.
For example, in case if I say here 6; 6 divided by 2 it is 3 it is greater than 1.5 yes. So, I
want to count such values when they are divided by 2 they yield a result or a value which
is more than 1.5. So, in order to execute it now you can see here that I have to repeat
syntax and commands for the number of times the values are in this data vector and for
all the values which are contained inside the data vector.
So, for that I write here function. So, function is only a program means in the common
language I can say. So, it is written here as a f u n c t i o n and then inside the parenthesis
you try to write down all the input variable or in case if there are more than one input
variable that you can also write inside this parenthesis. Then after that whatever
commands you want to execute they have to be given inside this curly bracket.
So, let me try to write down this curly bracket number here 1 so, that you can understand
it, right. So, whatever you want this has to be written inside these two. So, now I try to
give here my commands, which I want to execute. So, what I want here that all the
values in the data vector x, they are to be divided by 2 and then they have to be checked
whether this is greater than 3 or not right. So, now, for this I use here a loop for loop.
So, for the for loop means, I want to count the values. So, for that I need to define here a
counter variable. So, I try to define here a counter variable as a count c o u n t, but you
can choose here any name actually whatever you want, right and I initialize it at 0
because the counting will start from 0. In case if you want to start the counting from
some other number you can control it here without any problem.
Now, write down my here loop for xval. So, this is my here counter variable. So, this
variable is going to control the values inside the x theta vector. So, I try to write down
here xval and then in and then my this data vector here x. Now after this I try to enclose
all the commands what whatever I want to use inside this curly bracket say like this and
here this. So, look let me try to call it here curly bracket number 2 right like this.
So, now my commands are extremely simple I try to use here a conditional execution
using if command. So, this if xval divided by 2 is greater than 3, then count will be
replaced by count plus 1. So, now, what will happen? This counter variable xval will try
10
470
to pick up all the values from the data vector x and it will try to use the counter variable
count and in case if this condition is true, then it will replace count is equal to count plus
1.
Otherwise it will remain as thus same and after this the program will stop here and then
finally, the program will come out of this loop and then it will try to print whatever are
whatever is the value of the variable count and then after this function will stop and after
that in case if you want to execute this program, then you have to write down the name
of the function e x c o u n t and within the parenthesis the input variable x, right and it
will give you here the value 3.
How? Let us try to understand how this is working right. So, in case if you try to see here
you are trying to say xval takes the value 2, 4, 6, 8, 10 and 12. So, now, the first value
will be chosen xval equal to 2, now 2 divided by 2 is greater than 3, the condition is
false. So, this count will be taken as 0 because this is the initial value.
Now, after this the next value in the xval which is equal to here 4 that will be taken and 4
divided by 2 will be considered the answer is 2 is it greater than 3 answer is false and
once again the count will remain here as a 0. Now after this the next value in the xval,
which is here 6 xval is equal to 6 will be taken in 6 divided by 2 is 3, 3 is greater than 3
this is false and once again the count will assume the value 0.
Now, after this the next value xval equal to 8 will be taken a divided by 2 is greater than
3 answer is true and then the count will be replaced by count plus 1. So, this count will
become here 0 plus 1 that is 1. Now after this the next value xval which equal to here 10
that will be taken then the same operation will be done 10 divided by 2 is it greater than
3 yes 5 is greater than 3. So, the condition is true.
So, now after this the count will become here the count plus 1 1 plus 1 is equal to 2 and
now finally, what will happen? The last value in the xval, which is equal to here 12 that
will be taken 12 divided by 2 is greater than 3 this is true and then the count will become
count plus 1 which is here 2 plus 1 equal to here 3 what is given here. So, this is how this
loop is going to work in this case, right.
11
471
(Refer Slide Time: 23:01)
So, now similarly I try to take here one more example and after that I will try to show
you both these examples on the R console. Now in this case what I want to show you is
that we can have a loop inside a loop I mean there can be more than one loop in a
function in a program and those loops can be inside the loops, right. So, that is just like
you try to write down here a for command and then inside this for command you give
here one more for command and it is possible that within this for loop you try to give
here one more loop and so on.
So, let me try to explain you with this example right. So, suppose I try to take here 2 data
vectors one is here child and another here is sweet. So, in the data vector child there are 3
children child1, child2 and child3 and suppose for the sweet there are 3 sweets sweet1,
sweet2 and sweet3 and I simply want to print the combination of the values in the two
data vectors child and sweet, right.
So, what I try to do here is like this suppose I try want to print child1 and sweet1, child2
with sweet1 2 and up to here see here child3 and sweet3 like this. So, now, I can use here
the concept of for loop and I can define here 2 for loops one for the data vector child and
another for say sweet. So, I try to do it here like this.
Please try to understand and then I will try to explain you how it works. I try to take care
of for loop and then inside the parenthesis I try to write down here a counter variable,
this counter variable is going to work for the data vector child, right. So, that is going to
12
472
control the values in the vector which are child1, child2 and child3 and I try to write
down here this curly bracket. So, let me call it this curly bracket as say here 1.
So, whatever I want to write inside this curly bracket that will be executed by this for
loop which is related to child. Now within this loop I try to write here one more loop for
y in sweet. So, I try to give here one more counter variable y which is trying to control
the values in the data vector sweet, sweet1, sweet2 and sweet3 and whatever I want to do
this is contained in this curly bracket which is here as number 2.
I simply want to here say paste the values of child and sweet and I want to print them.
So, for paste there is a command here paste. So, this is used for guessing the values, ok
we have not done this command yet, but we are going to do it in the forthcoming lecture,
but as the name suggest paste. So, it is not very difficult to understand.
So, now if you try to see what will happen here? Let me try to try to explain you. So,
now, the loop will start here and then it will first try to execute the loop for child. So,
there are three values here child1. So, the loop will start it will try to pick up the value of
child1 and then it will try to come inside the loop for sweet and it will try to choose all
the values in sweet and then it will try to execute the command for printing.
Now, after this when all the values in the sweet are consumed then it will come back
once again to the loop or child and it will try to pick up the second value child2 and it
will once again it will try to pick up all the values in the data vector sweet and it will try
to combine them together.
13
473
So, let us try to first see what really happens and then I will try to show you what is
really happening. So, first of all the control comes here in the x in child and it tries to
pick up the first value child1. Then after this the control comes to the second data vector
y in sweet and it tries to pick up the first value in the sweet which is sweet1 and then it
takes here x and y here as say x as child1 and y here as a sweet1 and it tries to paste and
then here print.
Now, after this the control will come here to here y means, you have to understand that
the if I have here 1 loop like this and there is another loop here like this inside this loop
say loop 1 and then this is here loop 2. So, first it works from inside that whatever are the
values in the loop 2 first they are executed and then after execution it will come out of
the loop and it will try to choose the values in the loop 1.
So, now that is what is happening. So, it will try to now select all the values in the data
vector sweet which are sweet1, sweet2 and sweet3 and it will keep the child1 as fixed,
right. Once this is done now the control will come to the for loop which is for the child
and it will try to pick up the second value child2 and then it will come to the loop for the
y and it will try to pick up the first value in the sweet vector sweet1.
And then it will try to take the second value sweet2 sweet3 and this child2 is going to be
remain the same and it will try to paste and then print and the same process will come
that finally, it will come to that data vector in x and it will try to choose here the value
child3 and then it will come to the data vector in sweet which is here sweet1, then
sweet2, then sweet3 and then it will complete the loop like this, right.
14
474
So, this is how the loop is going to work and you can see here this is here with the
screenshot of the same operation, right. So, let me try to show you first these examples
on the R console so, that you get here more confidence and then I will try to do some
other execution. So, if you try to see here I try to choose here this vector here x which is
here like this and then I try to just copy and paste the command to save my time.
So, if you try to see here it is like this and if you want to execute it I try to write down
here x count and then here if I try to write down here x, you can see here this is giving
me the value 3, right. So, this is how the program will work without any problem and
similarly I try to choose here these two data vector and I also try to choose this
command.
15
475
So, I simply try to paste it here you can see here that the outcome is coming out to be
here like this, right. So, you can see here means executing these commands is not a
difficult at all, right. So, now let me try to give you some more options, which are useful
in using the loop.
So, these are some additional commands which are required. So, for example, let me try
to first consider the command here break b r e a k this command is used to stop the loop
before it has loop through all the items.
Suppose, you want to stop it somewhere in between of the data vector then you have to
use the command loop, right. So, for example, if you are trying to say here x in c say 3,
7, 9 and 10 and you want to in break it here that the program should work only for 3 and
7 and not for 9 and 10. So, here you have to give the command here break.
So, how to get it done in the, for loop? So, look let me try to explain you with this a
small example suppose I try to take here the vector get vector drink and it has four values
coffee, lemonade, tea and juice, right and I want to execute this program in such a way
that I want to print this value print coffee, print lemonade, print tea and print juice, but I
want to stop it at tea that as soon as the control comes to here tea the function or the
program could stop.
16
476
So, what I try to do here that I try to write down here the, for loop as for x in drink. So, x
is going to be my controlling variable controlling this variable means it is going to
control the values in the drink vector. So, it will try to this and then I try to give my
commands inside this a curly bracket let me try to give it here a number 1 and then I try
to write my condition if x is logical equal to tea, right.
So, you have to give it inside the double quotes, then you try to execute these commands
which are inside this curly bracket number 2 and what is that command? Break that
means, break the program, and then whatsoever values you have collected before that try
to paint them right ok.
So, now if you try to see what will happen? The control will start for x it will take the
first value here coffee and it will print here coffee. Now the value will come to the next
value in the data vector drink it will try to take here lemonade and then it will try to print
lemonade and then it will try to come to the third value which is here tea, but the
command here is that if you if the value in the x is exactly equal to or logically equal to
tea then break. So, it will stop here. So, you can see here that this will be the outcome
coffee and lemonade, right.
So, after this break command there is another command here next n e x t. So, this
command is going to use when we want to skip an iteration without committing the loop
right. For example, in the same example if I consider suppose I have these four values
17
477
coffee lemonade tea and juice and suppose I want to execute the same program for all the
values except for lemonade, right.
So, what I will do? That I will try to write down the same program, but as we had used
here this here break I will try to use here next. So, what will happen here? The loop will
start and it will try to choose all the values in the drink using the variable x and now you
are trying to say that whatever is written inside this curly bracket number 1 that is going
to be executed. So, now, the and the commands are if x is exactly or logically equal to
lemonade then say next right. So, that is going to happen here.
So, now in case if you try to see here what will happen here? Yeah, first you have to just
see that here also is here this curly bracket also. So, yeah do not forget about this curly
bracket this is here the curly bracket number 2. So, whatever you are writing within this
curly bracket that is going to be there.
So, now in case if you try to see here, what will happen? The program will start it will
take the first value in the vector drink coffee it says no issue and it will print here coffee.
Now after this it will try to jump to the next value lemonade, but it finds here that the
program is saying that when x is equal to lemonade then next do not execute it.
So, now it will escape and then it will come to tea it will print here tea and then after that
it will choose next value juice and it will try to print here juice. So, that is how actually
this works right and simply we have here simply I try to suppose repeat this example for
a different value.
18
478
Suppose, I want to skip tea so, in the same program I try to write down here if x is
exactly equal to tea then please skip and use the command next for doing it. So, the
program will start from a drink it will take the first value coffee there is no issue in
coffee. So, it will try to print here coffee, then it will come to here lemonade, then it will
print here lemonade there is no issue after that it will come to here tea now it is saying
that ok if x is equal to tea then next. So, it will skip it and then it will come to juice and it
will try to print here juice.
So, you can see here this is how this program is going to work, right. So, let me try to
show you these things on the R console and so that you get here more confident. So, I try
to copy these values and I try to execute it here you can see here.
So, drink here is like this and I try to use the, for loop here where I am trying to give a
break when x is equal to tea. So, the first two values coffee and lemonade which are
before tea they are printed here as a coffee and lemonade. And similarly if you try to see
here I try to take here another command and I want to escape here the lemonade.
So, what will happen? The same data vector, but I am trying to use here the command
next and when x is lemonade. So, except lemonade all the things will be used here and
similarly in case if you want to escape the tea, then you have to just replace x equal to tea
and then you can see here this it would be like here and this is your here the data vector
drink.
19
479
(Refer Slide Time: 38:33)
So, you can see here the tea is skipped and all coffee lemonade and juice they are printed
here. So, now after this we come to an end to this lecture. Well, you can see that in this
lecture we have covered only one simple command, but it was long why? Because this is
the first time you are going to understand the concept of the loop in the R software.
So, that is why I had to explain you all the basic functioning’s because unless and until
you understand how R is going to work for the loops you cannot use it for your own
work. So, that is why I had tried my best to explain you in the quite a lucid way in detail.
The functioning is very simple, the command is very simple the only thing is this
depending on the need and requirement you have to use for loop with different types of
option right break or next or a command, which is a combination of them and so on.
So, I would request you why do not you take some example and try to practice it try to
see what you really you want to do. Suppose for example, you try to take more than two
for loops and try to give some values two or three values and see that whether the
outcome is coming is it matching with what you wanted means, I have taken in two loops
all the three values you can take two values in the loop number 1 and say three values in
the loop number 2 and you can means a local number of values in the two loops and try
to see how do you get this outcome?
You can use here different types of command like some product etc., which you have
done well I have used here the command here paste and print, but anyway you
understand them. So, you try to practice it and I will see you in the next lecture, till then
goodbye.
20
480
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 23
Loops - while and repeat
Hello friends, welcome to the course Foundations of R Software. You can recall that in
the last lecture we started a discussion on the topics of loops and we had discussed the
for loop. We had understood that how for loops works and how it gives the outcome.
So, I believe that now you have understood functioning of the loop and in this lecture we
are going to consider two more loops, they are while loop and another is repeat loop. So,
as we had seen that in the case of for loop the for loop can be used in those places where
the number of times the program has to be repeated is known in advance.
Similarly, this while and repeat loops they also have their own utility under different
types of conditions. So, let us try to begin this lecture and we try to understand the basic
functioning of while loop and repeat loop and I will try to take some examples. So, that I
can explain you that how do they work and how do they give the outcome. So, let us
begin our lecture.
481
So, as you can recall that we are discussing about the control structures in R and we
already have talked about the control statements; this conditional executions like if
statement, if else statement, if else, if statements, etc. Similarly we have considered the
functions also like a switch and which and now we are on the loops.
Just for the sake of quick review I will just show you the command. So, under the, these
loops we are going to consider these three loops for loop, while loop and repeat loop.
482
So, for so the for loop the way it is work is that the for loop is going to be used when the
number of repetitions are known in advance; that means, you know that how many times
the program has to be repeated.
And then the command for the for loop was like you try to write down here for, that
inside the parenthesis you try to write down the controlling variable as name, then in and
then the vector of values that you want this name to choose. And then whatever
commands you want to execute they are written inside the curly bracket here like this.
So, that is what we had done in the last lecture.
483
So, now we try to consider here another loop which is called as while loop so, as the
meaning of the loop suggests while; that means, this loop is going to be executed as long
as the condition remains TRUE. So, in this case what will happen that the number of
times you want to repeat the loop that is not known to us. So, that is why we try to give
here a condition in terms of while, right.
So, one of the basic fundamental things what you have to keep in mind that this while
loop is used when the number of loop is not known in advance, right. For example, when
we are trying to use some iterative algorithms; that means, the algorithm has to work till
we get a convergence. For example, when we try to use the maximum likelihood
estimation in the statistic then we try to use this type of algorithm because, we do not
know that whether the convergence of the algorithm is going to be achieved after 5000
repetition or 7000 repetitions, right.
So, we try to give there a condition like that, ok try to repeat the algorithm till you get a
convergence. So, for that we try to use the while loop. So, the flow chart of the while
loop is as follows that we start and then there is a condition and this condition is actually
checked in terms of TRUE and FALSE. If this condition is TRUE then the conditions or
the statements or the commands that we want to execute they are executed.
And then it will, because the condition is TRUE it will go back to the initial place and
then once again it will check whether the condition is TRUE or not. And then in case it
comes out to be yes then once again this these statements are executed and we come
back to the start point. And then once again it comes to the conditions to be checked
whether it is TRUE or FALSE and suppose after some time the condition becomes
FALSE. Answer comes out to be a no that case try to execute those statements, which
are written outside the while loop and you stop the program or try to do whatever has
been asked in the program.
So, you can see here it is not really known before the start of the program that how many
times this condition is going to be TRUE. So, that is why we do not specify any
particular number for which the program has to be executed, but we try to give that
number in terms of a condition that you try to execute the program as long as this
condition is TRUE.
484
(Refer Slide Time: 05:25)
So, the way in which this while loop is written in the R software is as follows. We try to
write down here while and then inside the parenthesis we write the conditions. Now this
condition is going to be either TRUE or it is going to be FALSE. So, whatever you want
to be executed when the condition is TRUE that is written just after this inside the curly
bracket, right.
So, all these commands which are written inside the curly bracket they are going to be
executed as long as the condition remains TRUE and as soon as the condition becomes
here FALSE. Then the control will come out of this commands, which are written inside
the curly bracket. And in case if the condition is not TRUE before entering the loop then
no command within the loop are executed, right.
485
So, let me try to take here some examples and try to explain you that how it works. So,
these are very simple examples and I have tried my best to take a similar example what
we have taken in the case of for loop. So, that you can understand it better and you can
also compare.
So, here also I want to print some numbers you may recall that in the case of for loop we
have printed the numbers 1 2 3 4 5 right. So, I try to write here a command which can
print all the integers which are smaller than 10. So, I try to take of the integer say i equal
to 1, so this is just like a counter actually.
And then I try to write down here the while command while i is less than 10 and then
whatever command I want to give they are given inside this curly brackets like this. So
now, my commands are print i square and then after printing try to replace i by i by plus
2. That means, you try to add 2 in the value of I that you have used earlier and then try to
check whether the new value i is smaller than 10 or not in case if the condition is TRUE
then you try to print otherwise not.
So, now let us try to see how it works. So, the first value of i which is going to be
considered here is i equal 1, which is given here you that you can see. Now after this it
tries to take the condition that i is less than 10 the answer is come comes out to be
TRUE. And when it is TRUE this will try to print here the value of i which is here 1
square and then it will try to choose i equal to 1 plus 2 which is equal to here 3, right.
And now after it has chosen here say i equal to here 3, then this control will go back to
this while condition. And then it will try to take this value of i and it will try to check
whether i is less than 10. So, 3 is less than 10 the condition is TRUE, so it will print here
3 square. And then I will be replaced by here 3 plus 2 which is equal to here 5.
And now this i equal to 5 is here this is once again taking to the while loop and while
loop considers here i equal to 5 and it tries to check whether i is smaller than 10, TRUE
then it will try to print here 5 square. And i is going to be replaced by 5 plus 2 which is
equal to here 7.
Now, you are here at a position where now i is equal to here 7. So now, once again this
condition is brought to the while loop and it tries to take here that whether this value of i
equal to 7 is smaller than 10. Answer comes out to be here TRUE and you print here 7
6
486
square. And i becomes here 7 plus 2 equal to here 9. And now after this next this i equal
to 9 is consider and it is transported back to while condition.
So, i equal to 9 is there and it tries to check whether i is smaller than 10 answer comes
out to be a TRUE and then nine square is printed. And then i is replaced by 9 plus 2
equal to 11. Now you have to be watchful what happens. Now i here is 11 this value of i
is travel is transported to while condition. And now i equal to 11 is checked whether i is
less than 10 answer comes out to be here FALSE and FALSE then the program stops
here, right.
So, that is how you can see here these values are printed here and this is here the
screenshot. So, this is how the while loop works right. So, before I move forward let me
try to give you this example on the R console. So, that you become more confident and
then I will try to give you here one more example in which I am going to once again ask
for your help.
You can see here I am just copying and pasting this command and you can see here the
outcome is coming to out to be here like this right. So, that is what is happening.
487
(Refer Slide Time: 11:08)
So, now I try to take here one more example. And yeah, once again I am going to use
here the concept of here function which I use in the case of for loop. So, since I have
already explained you briefly that how this function works. So, I do not need to repeat it
again, but as I promised I will try to take up this function topic in the next lecture, right.
So, what is my objective here my objective here is that I want to input a number and this
number has to be less than or equal to 25. And then whatever number I am entering, I
want to find out the sum of the number beginning from 0 to that number, right. Say for
example, if I enter a number see here 5 which is less than or equal to 25 that is less than
than actually.
So now, I want to find out the sum of 1 2 3 4 and 5, right. So now, how to do it well here
in this program, I am going to use here simple a some more commands here which I will
try to explain you that what happened ok that is command is here read line. So, means
you know that when you want to execute a program sometime you have seen that as soon
as you go further it will ask you something and then you will read the question and then
you will try to give the input.
So, for that job actually this read line command is used along with the prompt right. So,
these two commands I will try to consider later on, but here I can just show you that what
will happen. And once you see the functioning you will understand it very easily, ok. So,
488
I try to write down here a function. So, I try to write down here a function and then yeah
there is no input here because, I am asking the input inside the program.
So, I can leave it blank, but I have to give the parenthesis and whatever I want to write
this I am trying to write down in this parenthesis number 1, right. Now after this I try to
take here a initial variable sum, so this is initialization, right. And then after that I choose
here a number and this number is actually the number which I want someone to enter and
the objective is that the sum up to this number is going to be found.
So, this number has to be an integer that is my requirement. So, I try to use here the
command here as dot integer now you tell me do I really need to explain you what is the
meaning of as dot integer. This is simply trying to consider an input variable as an
integer that you know, right, because you have done such types of commands earlier.
So, after this then I use here read line and then I write down here prompt and then within
double quotes I try to write down here a statement please select any number less than 25,
right. So, what will happen when you try to execute the program the program will stop
here and it will try to prompt that you please try to first enter the value.
And after this I try to write down the conditions what I want to execute under the while
loop. So, I try to write down here while and I want to give a condition that as long as the
number is less than or equal to 25 then try to execute these conditions. So, what are those
statements which I want to execute, which are they are written here inside the curly
bracket number 2. That the first statement I want to execute is sum is equal to sum plus
number.
So, initial value of sum I have taken it to be here 0. So, it will start from 0 and then
whatever number I am entering here that is going to be here and then I am trying to
execute here number is equal to number plus 1. So, whatever the number has been
entered this is used in the first line and then this number is going to be replaced by
number plus 1, so 1 is added.
And after that when this job is completed when this as long as this the condition under
the while loop is satisfied this is going to be executed, after that it will come out of the
while loop and it will try to print here the sum of number receive from the while loop is
whatever is the value of here sum this is obtained here. So, well I am using here some
9
489
more concept of programming and then like as a print paste, but believe me these are
very simple commands.
And I promise you that a just after a couple of lectures you will be comfortable with all
the things and yeah these are very simple things. So, I am sure that it is not difficult for
you to understand them. So, now if you try to understand the functioning of this number
what will really happen?
Suppose, if you try to and execute the program, the program will come to the first line
and it will try to take here sum is equal to 0. Then it will come to the second line and it
will try to ask a number, suppose I try to give here a number see here 3. So, now the
condition here is this while the number is less than 25 then try to execute the condition.
And what are those condition there are two statement which I want to execute sum is
equal to sum plus number. So, the sum value here is 0 and then the number here is 3 and
then this number will become here say number which is here 3 plus 1, 4. So, now this 4
is going to be once again checked, under the while condition that whether while 4 is less
than or equal to 25, is this correct? 4 is less than or equal to 25 this is TRUE. So, once
again this program will execute and now the sum is going to be means that you have
taken earlier the initial value is now here 3 and plus the number, with number now
becomes here 4, right. And then after that the number is going to be added with 1. So,
number is now here 4 plus here 1 and then this condition will once again go to the while
condition and this process will continue till my numbers are less than equal to 25, right.
10
490
So, for example if I try to execute this program on the R console you can see here what is
happening.
I try to just say here some function and parenthesis it ask me please select any number
less than 25, so I give here suppose 22. So, now what will happen? It will try to print all
the numbers which are more than 22. So, they are 22, 23, 24 and 25 because my
condition is that please try to repeat the loop while the number is less than equal to 25.
So now, it will give me the number here the outcome here is this is the sum of number
received from while loop it is printed as such. And it and whatever is the value here that
is coming out to be this is 94 and it will print here like this and you can verify that the
sum of 22, 23, 24, and 25 is 94 and this is here the screenshot.
So, let me try to first execute this program on the R console. So, that you become
confident that these things are working and I am trying to copy these commands to save
the time.
11
491
(Refer Slide Time: 19:29)
And if you try to see here this function f u n c t i o n. you can see here this is here like
this is what I have printed.
And if I want to execute it suppose I write down here sum function and the parenthesis
and I enter it is asking me please select any number less than 25. So, I say 22 and this
gives me an answer here the sum of number received from the while loop I try to make
here this font is smaller.
12
492
So, that you can see the screen.
It is here 94, right. And similarly if you try to repeat this function or this program once
again you try to give here this number please select any number less than 3. So now, you
can see what is the use of this read lines and prompt I give here 3.
So, this here this number is here like this. So, I have to make this screen more smaller.
13
493
Because this number is bigger; so, you can see here this number comes out to be a 322
right. So, this is how this program will work now, right.
So, now we come to another aspect of this thing and I can make this.
Font size clear. So, that next time when I try to show you the things are looking clear on
the screen right.
14
494
(Refer Slide Time: 20:54)
After this I come to our next loop which is here the repeat loop now in the case of for
and while you have seen that they are dependent on the condition. That how many times
the function has to be or the commands have to be repeated that is known or unknown.
Now this repeat loop is independent of these condition right and repeat loop does not test
any condition for example, in the case of while loop, right.
But it is dependent on the programmer that how many times the program has to be
repeated or the commands have to be repeated. And in this case we do not also know that
at which point of iteration the condition is going to be satisfied or not. So, what we try to
do here we simply try to close our eyes and we simply try to ask the program to repeat,
right.
And for that we can define here inside the program that how the program has to be
terminated and that is the job of the programmer that the programmer has to take a call
that when the loop is going to be stopped. And for this actually we try to use here the
commands break for example, you know we have discussed about the break and next
command.
So, in case if you want to use the repeat loop the command in the R software is like this
you try to repeat use here the word repeat r e p e a t all in lower case. And within this
parenthesis try to write all the command that you want to execute that is all, very simple.
15
495
(Refer Slide Time: 22:37)
So, let us try to take here one example and try to understand how it works and this will
give you more clarity. Suppose I try to take here the same example that I consider in the
while loop that I want to print the integers starting from 1 as long as their value is
smaller than 10 and or less than equal to 10 on and then after that I want to break it,
right.
So, I try to write down here i equal to 1 that is a initial value now you have understood it.
Then I try to repeat and repeat whatever I want to repeat this I am trying to give it inside
the curly bracket number 1. And then I am simply asking that print i square and then try
to replace i by i plus 2 and then if i comes out to be greater than 10 then you break we
stop that program.
So, let us try to understand how it actually works. So, it will try to take here the first
value here i equal to 1 and then it will here print 1 square. And then I will become 1 plus
2 equal to 3 and it will try to check whether i is greater than 10 because now i equal to
here 3. Now the answer of this condition that i equal to 3 and whether i is greater than 10
this is here FALSE. So, it will once again execute the command print 3 square and then i
will become here 3 plus 2 equal to 5. And now the new value of i is going to be here i
equal to 5.
16
496
Now it will try to check whether 5 is greater than 10 or not answer is FALSE. So, it will
try to print here 5 square and then I will become here 5 plus 2 equal to 7. Now, after this
it will try to check i equal to 7 is greater than 10 or not the condition comes out to be
here FALSE and it will try to print here 7 square. And then I will become here 7 plus 2
equal to 9.
Then after this it will try to see once again whether i equal to 9 whether i is greater than
10 or not the answer comes out to be here FALSE and it will try to print here 9 square
and then I will be replaced by i plus 2 that is 9 plus 2 equal to 11. Now, if you try to see
what happens now i is equal to here 11 and then it tries to check whether i is greater than
10.
Answer comes out to be here TRUE and so the program will stop here because you have
given that you try to repeat the program as long as i is smaller than or less than equal to
10 and s i becomes more than 10 try to break it. So, the same outcome is shown here. So,
you can see here that is not a very difficult thing to understand after you have understood
the for and while loop.
And similarly I can also show you here in the similar program that how you can use this
next and break both together right. So, I try to choose here a initial value i equal to 1 and
I try to use here the command repeat and whatever I want to repeat I am trying to give it
17
497
under the curly bracket 1. Then i will become here i plus 1 now I have a condition that if
i is less than 10 then use the command next and print i square and if i is greater than or
equal to 13 then break the program.
So, what will happen here try to understand the functioning that it will try to choose here
i equal to 1 and then i becomes here i equal to i plus 1 that is 1 plus 1 2. And then it tries
to this is happening here and then it tries to check here whether i is less than 10 this is
here. So, i equal to here 2 2 is less than 10, so it is TRUE. So, you can see here i less than
10 is TRUE.
So, now what will happen after this? It will come to here next. And then it will not do
anything then i will become here now here i equal to here 2. And then once again now
the new i will become here 2 plus 1 equal to 1 which is equal to 3 and 3 is less than 10
yes this is 2. So, once again it will say next and it will keep on doing up to here i equal to
9.
Now, then it will become here i equal to 9 plus 1 equal to here 10 and then this condition
is actually checked whether 10 is less than 10 this becomes here FALSE. And then
whatever is the condition given here that print i square that will be executed here and 10
square will be printed. And then after this it will try to consider here the next value of i
from 10 to it will become here 10 plus 1 11 and 11 is less than 10 this is FALSE.
So, it will try to print here 11 square and then it will continue to here i equal to 12 and i
equal to 13 it will continue and then as soon as it become i equal to 13 then I will
become here 13 plus 1. And then this condition is going to be here checked i is greater
than or equal to 13 yes, and the function will break here and you will get here essentially
the outcome as 10 square 11 square, 12 square and 13 square.
So, this is how you can see the things will work in the this repeat loop also. So, let me try
to show you these examples on the R console also, but now you know that these are very
very simple thing to execute. So, I try to copy this command and try to give it on the R
console.
18
498
(Refer Slide Time: 28:54)
You can see here this is here like this and you get here the same outcome which I shown
you and explain you here, right, you can see this is the same outcome. And similarly if
you try to use here the command here break and next together in this program in this
example 4.
So, you can see here if I try to place this commands here, it will give you the same
outcome which is here like this, right.
19
499
So now, we come to an end to this lecture and you can see here that I have explained you
here the concepts of repeat and while loops. And you can see that these are not very
difficult things to understand. The main thing is that you have to understand what is
happening inside the loop when the loop functions, when the loop works. Because based
on that you have to take a decision in your programming that which of the loop are you
going to really use for loop, while loop or repeat loop.
So, now why do not you take some examples try to think about some conditions under
which you would like to use this while repeat and for loop. And try to write a small
program one line, two line program and then try to execute it and by the time you do it I
will also cover the topic of functions. So, the after that you will be in the conditions to
write smaller programs also. So, you try to practice it and I will see you in the next
lecture where, I will try to give you a brief explanation about the functions.
20
500
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Basics of Calculations
Lecture - 24
Functions
Hello friend, welcome to the course Foundations of R software. In this lecture we are
going to learn about a new topic this is about Functions. So, the first question comes here
what is a function? Instead of asking you what is the function if I ask you that do you
know what is the program is in terms of the computer language, I am sure you all know
what is a program.
So, now if I say that whatever is the meaning of the program in any other computer
language that is called as a function in the R software. After this do you think that it is
very difficult is it very difficult to understand what are we going to do today not at all. I
am sure that you must have done some programming in some other languages and so,
whatever you have done earlier, now we want to know that how it can be done in the R
software.
And the programming in the R software is one of the strongest tool of this R software
that you can write a program and within the program also you can call another program
and that is why this R programming is very strong.
So, once you try to define a function then within the function also you can define
functions and so, very complicated jobs can be programmed very easily in the R
software. So, first we try to understand what is this function and I will try to give you a
very interesting illustration that how to write this program just in a couple of minutes and
I can promise you know it ok.
So, let us begin our lecture and try to understand what is the function, but before that do
you remember what was the function when you studied the function in class 10, class 11
or class 12? Try to recall the function y equal to f x.
501
(Refer Slide Time: 02:26)
So, let us begin our lecture with this y equal to f x. So, do you remember that in your
high school class 11 or class 12 you used to write down a statement like y equal to f x.
And if you try to write down the statement then you will simply write this that y is a
function of x exactly that is the same meaning which has been transformed into the R
software and this is the same language we try to use here.
So, now first you try to understand what are the ingredients of a function. In this function
you can see here there are two values x y and x here is the input variable and whatever
you input based on that you get an output here which is indicated by y and how this input
and output are going to be controlled? This is by here f and this f is known to us that
what exactly we want to do for example, if you try to write down here function here like
as x square.
So, this this means here y is equal to here x square so; that means, whatever input value
you are going to give here say x equal to suppose 100, then y is going to be 100 square
right. So, in order to write a function in the R programming we follow the same setup. I
will simply write here y is a function like this y equal to function and then inside the
parenthesis, I will try to write down here the input variables there can be one variable
there can be more than one variable.
502
So, I try to list all of them here and then after this whatever execution I want to do or
whatever is the form of here f, this I will try to write down inside the curly bracket that is
all right. Similarly, if you try to say if I try to write down here z is equal to x square plus
y square the this means here what?
Your input variables are here x and y and your output variable here is z so; that means, if
you try to take here x equal to 2, y equal to here 3, then z is going to be here 2 square
plus 3 square and your f is like here a square function square and then sum like this. So,
this is the same thing I am going to do here in this lecture today. So, let us begin our
lecture and try to understand it.
So, the first question comes here what is the function. So, functions are simply like
program and what are programs? Programs are the bunch of commands which are
grouped together in a sensible way so, that they are executable inside the software or the
programming language and these programs are written to execute a particular task and
they are written in a sequence which makes sense and they give you the same output
what you really want and for writing the program you can use all your concept means
mathematical operator, logical operator, conditional execution etc.
So, what does this functions do? They take input arguments, they will do the calculations
or make some graphics call other function whatever you want and they will try to
503
produce some output and whatever is the output that is the result which is obtained after
executing the function and the written value whatever is the result this can be a complex
construct like a list, data frame etc. also what are listed data frame etcetera that we will
try to discuss in the forthcoming lectures.
So, now we have understood that when we want to write a function or a program, but
now I will use the word function instead of program. So, you need to have a couple of
ingredients the first thing is this you need to define a name of the function. So, this is the
name of the function which is stored with this name for example, you try to give a
program a name right then arguments. So, this arguments; that means, within parenthesis
you have to write all the input value values.
And it is possible that the function may contain some values inside the argument or
parenthesis or it may not contain any value inside the parenthesis because sometime if
you are trying to give the input value during the program or the input is coming from
some other program, then possibly you may not need to write down these values.
But otherwise you have to write all the input variable inside the parenthesis, then the
function body it is something like your here f. So, this body contains all the statement
that define what the function has to do, what is to be done, right and after that the
504
outcome. The outcome is the return value that whatever calculation evaluation have been
done up to the last expression of the function body they are written here as an outcome.
So, now we have a two options we can write our own functions or we can use the built in
functions also. So, built in function for example, you have read they were like sum,
product, mean, maximum, sum etc. So, they say built in functions are also the functions
which were written by someone else and they have been given the name of the function
which we try to see as the name of the built in function.
Somebody had written the program for the sum and it has given the name as sum
someone has written the function for finding out the product and it has been given the
name prod and so on. So, these functions are built in we which are available inside the R
software or you can create yourself also.
So, we are going to learn here today that how you can create a program although we are
going to discuss here very simple example, but these examples can be extended to any
level that is my promise to you if you understand it and when you are trying to write
down the program you can also use this built in functions also without any problem and
that is the beauty of this R software, right.
505
(Refer Slide Time: 08:58)
So, as I explain you the structure of writing the function is like this that first you try to
give a name to the function. And then you try to write down the equality operator. So,
you can use this operator or you can use this operator also you can see here I have given
it in the bottom of the slide the same thing with the equality operator.
After that you have to write function f u n c t i o n in lower case alphabets and then
within parenthesis within Arguments you have to give all the input variables. You have
to write down here the list of all the input variables which are required to execute the
syntax and command that you are going to write further. So, this Argument 1, Argument
2 they are the name of the input variable I am trying to indicate like this. After this you
try to write down the curly bracket and within this curly bracket try to write down all the
expression commands etcetera whatever you want to execute.
So, these expressions can have more than one command they can also have single
command also. So, that depends on your need and the and when you are trying to give it
here a name that instead of using the operator less than hyphen, you can also use here the
equality side that will not make any difference, right, ok.
506
(Refer Slide Time: 10:20)
So, now, before you try to go for creating a function, let me try to give you some tips.
The always try to give function arguments a meaningful name and function argument can
be set to some default values also and they may and they can also have some special
arguments also.
So, let us try to understand these things with the help of some example very simple
example right. So, now suppose I simply want to write down a function for computing a
507
function like y equal to f x equal to x square; that means, I simply give the input and it
gives me the output as an x square.
So, for that I give this y and here a name abc, right and after that I write down here
function and inside the parenthesis I write down here x which is my input variable then
after that I write this curly bracket and within this curly brackets I write down the syntax
for computing the x square which is x hat 2 very simple and after that if I try to enter the
this will create a program whose name is here abc and the R console.
Now, the next question comes here that if you want to see what is this program or what
are the contents of this program you can simply type abc or the function name on the R
console. So, you can see here I have in this screenshot I have given the here the program
after that I want to see how this abc looks like. So, it will give me here this type of
outcome. So, that will give us what are the contents of this function.
And after that if you want to execute it then this is the way. Write down the name of the
function and inside the parenthesis try to give the input values. If there is only one value
then you have to give only here one value, but if there are more than one values then you
have to give the input in the same order in which they have been defined in the function
that is what you have to keep in mind, ok.
So, now if you try to see here when I try to write down here abc 3 on the R console, it
means x is equal to a 3 and when x is equal to here 3 what will happen here? I try to
show you this x takes value here 3 this will now come inside the curly bracket and it will
try to make it here 3 square and then whatever is the outcome that will be given here as a
9.
So, you can see here when you want to execute it, it is very simple just try to write down
the name of the program and within the parenthesis you try to give all the values of the
input variables in the same order in which you have given it in the program while
defining the program while writing the program that is very important and if you try to
interchange it then you will make a mistake, right.
For example, if your function you have written it is like here say x, y and you try to give
suppose here x equal to 2 and y to 3 then you have to write down here f say 2 comma 3,
508
but if you write like f 3 comma 2 then; obviously, x will become a 3 and y will become 2
and you may get a wrong outcome right. So, that you have to be careful.
Similarly, if you want to execute this function for x equal to 6. So, simply try to write
down here abc inside the parenthesis 6 and then it will give you the value here 6 square
and similarly if you want to execute here abc 9; that means, x equal to here 9 the
outcome is going to be here 9 square which is 81. So, you can see that a if you want to
deal with the single variables in the function it is very simple and now instead of x
square you can write whatever you want to do, ok.
Now, let me try to give you an example here where I try to take two variables. Suppose I
want to have here a function like as here z is equal to x square plus y square, right. So,
this I can write down here say f of x, y like this. So, I try to give it here a name say abc
yeah I am trying to take the same name because for the sake of convenience is otherwise
you can take a different name also. So, it is here function f u n c t i o n and then within
the parenthesis in inside this argument I try to write the input variables.
So, there are two input variables x and y. So, I write them say x comma y and after this
within this curly brackets I try to write down the commands here what I want to
compute. So, I want to compute here x square plus y square. So, I try to give it here x hat
509
2 plus y hat 2 right and now in case if you try to execute it on the R console what you
have to do?
You simply have to write down here abc and if you want to execute it for x equal to 3
and y equal to 4, then you have to write down here abc within the parenthesis 3 comma
4. So, this will become here 3 square plus 4 square which is 9 plus 16 equal to 25. And
similarly if you want to execute this program for x equal to 10 and y equal to 10 then you
have to write down abc 10 comma 10 and it will give you the value here say 10 square
plus 10 square which is here 200.
And similarly if you want to do for x is equal to minus 2 and y is equal to minus 3, then
this value will become here 4 plus 9 is equal to 13 which is given here. So, you can see
that it is not a very difficult job to execute this functions in the R console and this is here
the screenshot also, right.
Similarly, if I try to give you here one more example right suppose I want to write down
the function here as say here f of x is equal to x plus sin square x plus cos square x, right
well those who are from mathematics background can know that sin square x plus cos
square x is equal to 1, but it is like there may be some candidates who may not have the
mathematics background. So, that is why I am taking this example here ok.
10
510
So, now in this case for what I will do? I will simply try to give here the name of the
function abc once again I am trying to take the same name just for the sake of
convenience, but it is up to you what name you want to give then I write down here
function and then inside the parenthesis the input variable here x there is only one
variable and then I try to use my built in function for computing the sin and cos function.
So, this becomes a sin of x hat 2 plus cos of x hat 2 plus x and then I try to write down
these thing inside the this curly, brackets.
So, you can see here when I try to execute it on the R console and I try to give here x
equal to here 9 what will happen here? Sin square 9 plus cos squared 9 that will become
here 1 and 1 plus here 9 that will become here 10. Now in case if you try to take here x
equal to 99 then what will happen here? That sin square 99 plus cos square 99 will
become 1 and then 1 plus 99 this will become here 100.
And similarly, if you try to take here abc minus 15; that means, you want to execute the
function at x equal to minus 15 and this will give you the value here say sin square x plus
cos square x 1 minus 15 and this will give you a the value here minus 14.
So, you can see here it is not a very difficult thing to do it on the R a console also, but
sole let me try to give you here some more examples so, that you can be confident that
when you are trying to do it on the R console. So, let me try to take here one more
11
511
function where I am not trying to give any input variable right and simply I want to print
say 1 cube, 2 cube and 3 cube their values. So, I try to write down here the function abc
and then I try to give here the function and now there is no input because everything I am
trying to give it inside the command.
So, I will simply leave it here blank then I try to write down here this curly bracket
number 1 and I try to close it here and then I try to use here a loop say for i in 1 2 and 3
this will execute whatever I am trying to write down under this curly bracket number 2
which is here print i cube that is all. So, you know now you can understand these things
very easily. So, what will happen here?
This i will come to here 1 and then it will try to print here 1 cube then i will come to 2 it
will print here 2 cube, then i will come to here 3 and it will try to print here 3 cube. So,
this will be here 1 8 27 and in order to execute it you simply have to give here abc and
just the parenthesis you do not have to write any value here, right.
Now, we try to first see that how these functions are working in the R console they are
very simple thing. So, and I am sure that now you have understood it and that was the
reason that I wanted to first explain you before you try to do it so, that you do not get
scared that what will happen when it is executed in the R console.
12
512
So, you can see here this is my here function abc this is your function and if you want to
execute it this will become here say abc 10 this will become here 100 if you want to
execute here for x equal to say here 99 this will become here 9801 which is 99 square.
And similarly if you try to take here this example that your function is x square plus y
square.
So, once again if you try to see here I try to copy and paste this function to save some
time, but now this has got 2 variables x and y. So, when you are trying to execute it you
have to give here 2 values x equal to 2 and y equal to 3 in the same order which you have
defined in the function.
And if you try to here enter it will give you 2 square plus 3 square which is 13 and
similarly if you try to find out here the square of say 12 square plus 13 square you can
give here x equal to 12 and y equal to 13 and it will give you the value 313 and so on.
Now, similarly if you try to take here this example in which we are trying to compute the
sin square x plus cos square x plus 1.
13
513
(Refer Slide Time: 21:04)
So, this is 1 plus x, but I am trying to write down the same function. So, you can see here
if you want to execute here this will become here say abc 10 this will become here 1 plus
10 this is 11 because sin square 10 plus cos square 10 will be equal to 1 and similarly if
you try to take it here abc 1000 at x equal to 1000 it will give you the same value a
similar value like 1001.
So, you can see here that these functions are not difficult at all and if you try to execute
here this function where you do not need to write anything.
14
514
So, here now you can see here say I am trying to print here 1 cube, 2 cube, 3 cube, but I
do not give here now any value inside the parenthesis and if you try to enter here it will
give you this value 1, 8, 27. So, now, we come to an end to this lecture and you can see
here. This function concept is very simple to understand and in order to write down the
function you need all sorts of information which we have learnt in the earlier lecture and
what we are going to learn in the further lectures.
Function is the soul of the programming and besides these three very simple example
now you can go back to your earlier two lectures where we had taken the examples on
the function using for loop and while loop and try to now see that what we were trying to
do that was not a very difficult thing.
So, now, I would say why do not you take very simple example do not take any
complicated example at this stage after you practice after you learn more thing you will
become a very good programmer and I am confident that you can write very complicated
and longer programs also but at this moment if you do not know the programming then
try to take very simple example and try to execute it try to use some built in functions
also inside your programming.
And in case if you know the programming in other languages then also try to programs
can be written in the R language also. So, you try to practice it and I will see you in the
next lecture till then goodbye.
15
515
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 25
Sequences
Hello friend. Welcome to the course Foundations of R Software. And, from this lecture
we are going to begin with a new topic. And, all these topics are going to be related to
the management of data. You know this R software is useful for computation, simulation
as well as managing the data sets.
So, what really happened that when we are trying to handle the databases, then many
times we have to manipulate them. And, also when we are trying to do over computation,
simulation etc.; many times we also need to generate a particular type of values. Now,
beginning in this direction we are going to start a topic on Sequence.
So, in this lecture and in the next two lecture, I am going to talk about sequences. So,
now, if you try to understand the literal meaning of a sequence, what is that? Sequence is
some arrangement of numbers in a particular in a proper way which is pre-defined. That
means, you need to define the way you want to arrange your numbers in a particular
fashion.
So, how to do all these things in the R software, that is our main motive. In this lecture
and in the further lectures, I will try to take some more operations and topics which are
related to the sequence of numbers. Now, there are many many operations which are
possible in the sequence.
So, how to get it done and some of the most representative examples which are a quite
popular and more useful, I will try to explain you with the help of several examples. So,
let us try to begin this lecture and try to understand how you can handle the sequence in
the R software.
516
(Refer Slide Time: 02:29
So, now if you try to see here, what is the sequence? So, sequence is essentially a set of
related number, events, movements or even item that follow each other in a particular
order, particular fashion, particular way. And, the usual regular sequences can be
generated in R without any difficulty. And, the command to generate such sequences is s
e q.
And, inside the parentheses you have to write the different types of values, parameters
and related options. Now, there are many many options in this syntax s e q, all in lower
case alphabets. But, I am going to consider here some popular options which I personally
feel that they are going to be useful when you are trying to handle the numerical values.
So, now if you try to see one of the basic operations in the sequence will be that you
would like to fix that what is the value from which the sequence should start. And, in
order to control it, we have to give here an option here like this inside the parenthesis
from. And, then from equal to whatever value you want; here I am writing 1, but you can
write anything which is allowed.
And, then you would like to fix that how long the sequence will go; that means, the what
should be the last value of the sequence. So, that can be controlled by this option here to
t o to. And, you try to write down here to is equal to whatever the value you want, here I
517
am writing here 1, but you can choose anything. So, this from and to they are going to
control the starting and end point of the sequence.
Now, the usual way in which the sequence is going to progress is by 1 unit like as 1, 2, 3,
4 etc. Now, you have different options that you can choose the sequence should be like 1,
2, 3, 4 etc. or like 1, 3, 5, 7 etc. or this can be something else whatever you want. So,
basically if you try to see you need to inform here somewhere between the 2 numbers,
that how the sequence should progress and by what value it should increase.
So, in order to control this jump, we have an option here by b y by and then we try to
define here the value by which the sequence has to progress more. The usual way in
which it will progress that is defined by the value at to, value at from, divided by length
dot out minus 1, right. And, then means there are some other option like l e n g t h dot o
u t length out along with a l o n g dot w i t h etc. And, there is a long list of such options.
So, I will try to consider here couple of examples and through which I will try to show
you that how these options can be used. But, I would request you that please try to look
at the help on this topic sequence s e q and try to understand that how the things are
going to actually work.
And, if you try to look into the help sequence, you will get here this type of say
screenshot and then you can see here there are many many details of various operations
3
518
in the sequence. So, this exercise as I am leaving up to you and I request you once again
that you please at least try to read that what are the different options.
I am not asking you to remember each and everything always, but at least once you read
it you will come to know that what are the different possibilities. And, based on that
whenever you need them, you can look into the help and can get the right syntax.
But, now let me try to take here some examples and through those example I try to show
you that how the outcome in a sequence can be controlled depending on your wish,
depending on your requirement. So, one point which you have to keep in mind is that the
default increment in the sequence is plus 1 or minus 1, 1 unit either positive or negative.
And, in case if you want to generate such a sequence where you want to make an
increment or decrement of 1 unit, then you have to write here like this seq. And, then
inside the parenthesis you write here from equal to whatever value you want from where
you want to start the sequence and to t o. And, then you try to write down here to equal
to whatever point up to which you want to have the sequence.
So, for example, I want to have here a sequence like 2, 3 and 4. So, you can see here this
is my beginning point and this is my end point. So, this beginning point will be indicated
by the option from and end point is going to be indicated by the option to. So, I can write
down here from equal to 2 comma to equal to 4. Please understand I am saying here t o
to, not 2 like this number. And, yeah this you have to be watchful during the entire
lecture, that in case if I am using the to like t o to or t double o too or the number 2.
4
519
My audio is always going to be only to. So, just be careful, watchful. So, now if you try
to execute this command here, it will give you this type of outcome 2 3 4. And, it is not a
condition that the value which has to be given in the from is always have to be smaller
than the value which is to be given in the to command. Even if you want to have a
sequence like 4 3 2 which is decreasing by 1 unit; you can also write here seq.
And, inside the parenthesis from equal to 4 and comma then t o to equal to number 2.
And, once you operate it, you will get here a value like here 4 3 2. And, in this sequence
command it is also possible that the values which you are going to consider in the from
and to option, they can be negative also. So, if you try to see here I want to generate here
a sequence which is beginning from -4 and then it is coming up to +4.
So, for that I try to give here the command seq and inside the parenthesis I try to write
from equal to -4, then comma and then t o to equal to 4. So, now what will happen? This
sequence will start from -4 and then there is a difference of 1 unit and the sequence will
progress further, like its -4 -3 -2 -1 and 0 and then after that 1 2 3 4.
So, this -4 is corresponding to this from and this 4 is corresponding to this option to right
and here you can see this is the screen shot here. So, before I move forward, let me try to
give you here these examples on the R console here. So, that you can understand that
what is really happening and how it will look like.
520
So, I try to write down here sequence from equal to 2 and say here to is equal to say here
6. So, you can see here this is giving you a 2 3 4 5 6, right. And, one option I would like
to show you here is that even if I try to write down here seq and simply I try to write
down here 2 comma 6, still I will get the same outcome 2 3 4 5 6.
So, in this command seq, the first value is the default value for from and the second
value after comma is the default value for the option t o to, right. But, my advice to you
all will be that always try to write the complete command so, that you will understand
and then there will be no confusion. But, I leave this decision up to you. Now, in case if I
want to have a sequence which is beginning say from say 6 and it is going up to here 2.
Then, I have to write from equal to 6 and t o to equal to 2 and you can see here this is
here 6 5 4 3 2, right. And, the same thing also you can write like here say the sequence s
e q 6 comma 2 and it will give you the same outcome. And, similarly if you want to have
a sequence which is beginning from say -6 and it is going up to say 6. So, I can write
down here from is equal to -6 and t o to is equal to 6 and this outcome will be here -6 -4
up to here.
So, now you can see that it is not a very difficult operation to do. And, now I am sure
that you will be confident that when I am trying to show you these operations on the
computer screen and in the slides, you can believe that yes this is going to work. So, now
521
I try to take here one more operation that I want to generate a sequence with constant
increment. And, my objective is that I do not want the default increment and suppose I
want to generate a sequence which starts at 10 and this ends at 20 and the increment in
the values is of 2 units.
So, you can say like here 10 12 14 up to here 20 like this one. So, now, how to define
this sequence in the R software? So, you can write down here seq and then from equal to
10 and t o to is equal to 20 and then after the comma you write here b y by and by you
can write down here 2. Because, this 2 is corresponding to this increment of 2 units and
similarly this from is corresponding to this 10 and to is corresponding to this value 20.
So, now this is going to generate a sequence like 10 12 14 16 18 and 20. So, you can see
here a from this 10, it at say 10 plus 2 then 12 plus 2 14 and so on. And, this operation
will continue up to here 18 plus 2, but then as soon as you reach to the 20, the R will stop
here, right.
So, that is not a very difficult thing. Now, similarly I want to have a sequence which is
decreasing. So, suppose I want to begin a sequence from 20 and I want to end here at 10
and I need a decrement of 2 units. So, it will be like 20 18 etc. up to here see here 10. So,
I use here the command s e q and inside the parenthesis I write down here the from, from
will be say 20 then t o to up to here 10.
522
And, then now because I want here a decrement; so, I try to give the value of b y by as
minus 2. So, plus 2 will indicate the increment and minus 2 is going to indicate the
decrement. And, if you try to do it, you will get here the values like 20 18 16 14 12 and
10. So, you can see here 20 to minus 2 is equal to 18, then 18 minus 2 is equal to 16 and
up to here this will continue here 12 minus 2 is equal to 10 and so on. So, once again you
can see it is not a very difficult operation.
And, now I try to give you here one more option where I want to show you that this
decrement is not only an integer value. But, that can be a fractional value also, that can
be a fractional value as well as that can be a positive as well as negative value. So, now,
you have here different types of choices that you want to increase or decrease your
sequence by any value, right.
So, for example, if I want to generate a sequence starting from 3 going up to -2 and I
want a constant decrement of 0.5 units, right. That means, the difference between the two
consecutive terms should be 0.5 and the sequence has to be decreasing, that mean the
values are decreasing. So, I try to give this command here say here s e q and then from 3.
So, this is coming from here then t o to which is coming from here -2 and then because
there is a decrement of 0.5 units.
523
So, I try to write down here by is equal to -0.5. And, now what will happen? The
sequence will start from 3 and then it will be 3 minus 0.5, this will come out here 2.5.
And, then once again 2.5 minus 0.5 and then 2 to 1.5 will become like 2 minus 0.5 and
so on. And, this value will continue until you reach to the value minus 2 which is here
and if you try to do it in the R console, you will get here this type of screenshot, right.
Now, I try to give you here one more option, that you simply know that I want to have a
sequence of pre-defined length; that means, you know that how many elements you want
in the sequence. And, you simply know one of the values with the to or say from and
suppose just for the sake of simplicity, I will say that we want the default increment of
plus 1 right.
So, in such a case what you would try to do? Suppose, I try to generate here a sequence
that goes up to the last point say here 10 so; that means, before that I want here 10
values. So, they will become here 9 8 7 and so on and the sequence is going to stop here
somewhere as soon as you get the 10th value.
Now, if you try to see that without doing any manual calculation, you can generate this
sequence in the R software by using the command s e q. And, inside the parenthesis you
write a t o equal to 10 and then l e n g t h that is the length and this is actually the length
524
of the sequence or the number of elements you want in the sequence. So, that is going to
be here 10.
So, what will happen here, that as soon as you execute you will get here this type of
outcome and if you try to see here the last point, this to here is like here so, this 10. And,
then after that you come here this is my 1st value, this is my 2nd value, this is my 3rd
value, this is my 4th value, this is my 5th value, this is 6th, this is 7th, this is 8, this is 9th
and this is my here 10th value. So, this 10 is corresponding to this 10 which is the length
of the vector or length of the sequence, ok.
So, you can see here these things are quite easy to do. And, similarly I try to take here
one more example where you know that you want to generate a sequence of the length
10, but you want to start from 10. Means earlier your end point was 10, but now I want
to have the starting point as 10. So, if you try to write down here say s e q, inside the
parenthesis you write from is equal to 10 and the l e n g t h all in lower case alphabet
length is equal to 10.
And, as soon as you execute it, you will get here this outcome. So, if you try to see here,
this is beginning at 10. So, this 10 is corresponding to this value here from. And, now
this is my here 1st value, 11 is my 2nd value, 12 is 3rd, 13th is 4, 14 is 5th, 15th is 6th,
16th is 7th, 17th is 8th, 18th is 9th and 19 here is the 10th value. So, this 10 is
10
525
corresponding to this length 10 right. So, there are 10 elements in this sequence and
which are beginning at 10. So, you can see here it is not a very difficult thing to generate
such a sequence in the R software.
And, now I try to take here a combination. This combination is I want to generate a
sequence of pre-defined length which has a constant rational increment. So, now you
know that you just have used this command here, where you have used the sequence
command with from and length. So, now, in this command I will use here the command
here by, this option here by. So, suppose I want to generate a sequence which starts at the
value 10 and it has 10 number of elements and these numbers are going to be increasing
by the fraction of 0.1 length.
So, I have to write here from is equal to 10, l e n g t h, length is equal to 10 and by is
equal to b y by is equal to 0.1 and this will give you here this type of outcome. So, you
can see here this is beginning at 10. So, this corresponding to this from 10 and now this
is my 1st value, this in 2nd, this is 3rd, this is 4th, this is 5th, this is 6th, this is 7th, this is
here 8th, this is here 9th and this is here 10th value.
So, this 10 is going to be corresponding to this length 10 and the difference between the
two consecutive terms, if you try to see. Suppose for example, 10.4 minus 10.3 is 0.1; so,
11
526
this is corresponding to this by. So, this is how you can generate the sequence without
any problem.
And, similarly if I want to generate a sequence just like what we have done, but I want to
have a constant decrement. So, I want that the length of the sequence is predefined and
there is a constant decrement. So, I can use either from or to whatever you want. So,
suppose I want to generate a sequence here which is beginning from, from is equal to 10
and it is going to have 10 values in the sequence.
So, I can write down here length is equal to 10. And, now what about this by? b y is
equal to -2, because I want here a constant decrement of 2 units right. This is here,
decrement of 2 units. If you want to give something else, that also you can give. So, this
is going to give you a sequence like a 10 8 6 4 2 0 -2 -4 6 -6 -8. And, you can see here
these are the 10 elements.
This 10 is corresponding to this from and this number of elements this is corresponding
to here this command here; length is equal to 10. And, the difference between the two
consecutive terms is say 2 units and they are decreasing. So, this is by is equal to -2.
12
527
(Refer Slide Time: 20:35)
So, now you can see here it is not a very difficult thing. Now, in the same example
suppose I want to make a constant decrement, but that is not an integer, but it is a
fraction. So, how to do it? Now, you know after this you means I am sure that you have
understood how the things are working and you can use your this common sense also to
make different types of combinations.
So, you can see here now I want to have a sequence which starts at the value 10 and it
has 5 elements. So, the command for l e n g t h length is equal to 5 and I want here a
constant fractional decrement of 0.2 right. So, I can write down here b y is equal to
minus 0.2. So, what will happen? This sequence will begin at 10 and then there is a
decrement of 0.2. So, this will become a 10 minus 0.2 is equal to 9.8 and this will
continue.
So, this will be a 1 2 3 4 and here 5 values which is controlled here by the command
length is equal to 5. So, you can see here it is not a very difficult operation and you can
see here, if you try to do it in the R console you will get here this type of outcome. So,
now let me try to show you these things on the R console. And, I will try to inform you
here that here I have taken the example using the from, but you can also take it here to
and there can be many-many combinations which actually you can do here.
13
528
(Refer Slide Time: 22:01)
So, for example, if I want to generate here this type of sequence which is beginning from
10 and then going up to 20 and there is a constant increment of 2 units. So, this will here
look here like this. Now, in case if you want to have here the constant decrement. So,
you simply have to write here by is equal to -2. So, this will decrease every term by
minus 2 that is like this, you can see here.
This is wrong why? That now you have to understand, because if you try to see
mathematically how can we start a sequence from 10 going to 20 and this is going to be
decrement. This is a mathematical contradiction. So, I wanted to show you just that you
have to think before you create a sequence. But, now in this case if you try to write down
here, you want to begin the sequence at 20 and you are going up to 10.
Then, it is going to be a decreasing sequence, right and you can see here this is going to
give you this value. There is no issue right. Now, similarly if you want to have a
sequence say starting from here 10 and going to suppose here up to 12 and you want a
constant increment of say fraction say 0.2. So, you can see here this is here like this, 10.0
10.2 up to here 12, right.
And, similarly if you want to have here a decreasing sequence. So, decreasing sequence
is starting from 12 and it is going up to 10 and the decrement here is by 0.2 units. So, if
14
529
you try to see here, this will give you here this type of command, right. And, now after
this we try to take care these commands here, where we have some defined here length.
So, if you try to see here, suppose I want to have a sequence of length 10 which is
beginning at 10. Then, I have to give here t 0 is equal to 10 and this you can see here,
these are my 10 elements. Similarly, if you want to start here a sequence in such a way
that ends at 20, then you can give here 2 is equal to 20 and then the length is going to be
here 10. And, you can see here that will start at some number, but the end point is given
here by 2.
And, similarly if you want to have here some constant increment or decrement; so, you
can see here you can give here say constant increment is suppose by 2. So, you can see
here this is ending at 20, but now this is giving you a 20 18 16 up to here 2, because this
is the length of the data vector which is giving you here the 10 elements in the data.
Now, suppose if I try to use here by is equal to -2, let us try to see what happened. Yeah,
that works why? Because, your this 2 here is 20. So, this is fixed and you are simply
trying to ask here that you want 10 elements and which are such that they are in the
decreasing and the number of total elements in the vector should be 10. So, this starts at
20, then 20 22 24 and then it ends here 38.
15
530
Now, in the same command, in case if I try to change this value up to here to some
fraction here say 0.5. So, I want to begin my sequence at some point so, that it is ending
at 2 and I need here 10 number of elements in my sequence which are at a constant
increment of 0.5. So, we can see here that with the final value here is 2.0 and after that
the values are coming here in this direction and the total number of values are here 10.
And, in case if you want to have a constant decrements and you can give here the option
here by is equal to -0.5 and you can see here these values are here, right.
And, now similarly as you have used here to, I can also use here from say sequence from
is equal to suppose 5 and I want to give here length is equal to 10 and I want to increase
it by 2. You can see here you can also use here from and this increment that can also be
decrement. So, if you try to give here the by equal to here -2. This will give you here this
type of decrement and similarly that can also give you here constant increment or
decrement.
So, if you try to take a constant increment in the same sequence, you can see here it is
here like this. And, if you want to have a the constant decrement here which is here like
this. So, you can see here all sorts of combinations are possible in this R software to
generate different types of here lengths, right. So, there should not be any problem in
now generating a particular type of sequence that you want.
16
531
So, now, we come to an end to this lecture and you can see here my objective was very
simple here. I have used only here one command s e q, but this sequence command has
many applications. But, how are you going to control the parameter so, that you get the
outcome of your requirement, that is the judgment which you have to take, right. And
that is actually according to the need what you have.
But, according to that you have to choose the correct value of the parameters from, to, by
etcetera. And, beside these things there are many more possibilities and options. So, you
can choose any two or three combinations and that will give you a different types of
sequence. So, I would say why do not you go in the other way, that you try to understand
what these parameters, these options are doing, how they are working. And, then you try
to take a particular sequence and see how you can generate it, that will be a very
interesting exercise.
So, you try to practice it and I will see you in the next lecture with more options on the
sequence command, till then goodbye.
17
532
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 26
More Sequence and Other Operations
Hello friends. Welcome to the course Foundations of R Software and you can recall that
in the last lecture we initiated a discussion on the sequence. And in this lecture, we are
going to continue on the same topic and we will try to learn some more types of
operations which can be done for the sequences. Do you remember that many times in
the earlier lectures whenever I wanted to write down the values like 1, 2, 3, 4, 5, then I
would try to write 1 colon 5.
And at all those places I was always asking you ok, we will try to do it in the future
lecture. So, this is the lecture where we are going to learn about such operations. So, as I
get in the last lecture that I took a couple of examples and through those examples I tried
my best to explain you that how R will react to those commands, how R is going to give
an output based on how we are trying to write the command.
So, in this lecture also we will try to take a couple of examples and we will try to
understand that how R works for these sequences. So, let us begin our lecture.
533
So, as we had understood in the last lecture that a sequence is a set of related number
events, movements or items that follow each other in a particular order and we had
explored the application of this command s e q, right, and then I had asked you that if
you need to have more information you can look into the help, right.
Now, today I am going to take here another aspect, if you try to write down the earlier
command and suppose if I want to write down here sequence from equal to 1 up to here
to equal to 2; then what will happen? This will give you a value like 1 2 3 4 up to 10,
right and similarly if you try to write down here instead of earlier form 1 to 10 if you try
to write down here from 10 to 1, it will give you a sequence like 10 9 8 7 etc. up to 1.
So, now I give you here an alternative to this s e q command, right. So, I can write the
command here to generate the numbers from 1 to 10, like 1 colon 10. So, this 1 the first
value is going to give you the option to write the starting point just like from and the next
value after colon sign, that is going to give you the option to write the option for the 2,
right.
So, if you try to write in this particular way, then it will generate a continuous sequence
with constant unit increment and yeah if you try to give the decreasing value then it will
give you a decrement, right. So, if you try to see here this will give you here the value
like 1 2 3 4 5 6 7 8 9 10, right. And similarly if you want to generate a decreasing
534
sequence starting from 10 and ending at 1, then you can give here a command like this
here 10 colon 1. So, this will start at 10 and then it will go up to 9 8 up to here 1.
So, this is the outcome that you are going to get, right. So, one question comes here,
what is the difference between this type of command and the earlier command based on
seq. So, you have seen that in s e q you had many more options, right, but in this type of
example you can simply generate the continuous sequence with constant unit increment
or decrement only. So, that is the thing.
So, obviously, sequence is a more general command and you have more control over the
values than in this present command, but anyway our objective here is to learn both the
options and then try to use accordingly then whatever you want, right, ok. So, similarly if
I try to take here one more example to show you that these numbers, this from and to
they cannot only begin with like here 1 or some other value or they are ending only at 1.
You can choose any value actually.
Suppose I want to create here a sequence which is beginning at 5 and it is ending at 15.
So, the sequence would be like 5 6 7 up to here 15. So, I have to write down here 5 colon
and if you want to start the sequence at 15 and then you want to end at 5. So, this will be
15 14 13 12 etc. So, that is going to be sort of decrement I mean.
So, you are not giving here then option like y is equal to -1 or -2. But once you are trying
to write down the value at from which is higher than the value at to then it is going to be
a automatic decrement. So, this I try to write down here as a 15 colon 5.
535
So, before I move forward let me try to show you these options on the R console so that
you get more confident and then I can move further. Suppose if I want to generate here a
sequence from 1 to 10, you can see here this is like this and if I want to generate here a
sequence from 10 to 1, you can see here this can be done here like this.
And if you want to generate here another sequence from say 5 to 15, you can see here
this is the way I can write it down, right. And after that if you want to generate a
sequence which is beginning from 15 and going up to 5, you can write down here like
this, right. And if you try to see here what happens if you try to take care the negative
numbers say from -1 to say here 10 what happens let me try to see yes it also considers
the minus number.
In case if you try to now give here say -1 to here -10, what do we expect what will
happen? This will go from -1 to -10 and if you try to give here from -10 to here -1, you
can see here this is also now working, right.
536
(Refer Slide Time: 06:12)
So, now after this you will have some confidence and I will try to explain you these
things. So, you can see here that is the same thing I just shown you on the R console
itself, that if you try to take here from and to as negative values then even then it gives
you a sequence -1, -2 up to here -10.
And if you try to take here the -10 to -1; that means, -10 colon -1 then the value begins at
-10 and then it will go up to -1 and you get here this sequence. Now, then if you try to
take any intermediate sequence also for example, you are not starting or ending at -1, but
you are trying to start at -5 and you want to end it at -15.
So, this will start with -5 -6 -7 etc. and then it will end at -15, which is here like this. And
then similarly, if you want to reverse that you want to start at -15 and then you want to
end it at -5 that also you can give here, -15 colon -5 and it will give you here a sequence
which is beginning at -15 and it is ending at -5, right.
537
(Refer Slide Time: 07:22)
So, that is how you can generate and I already have shown you this on the R console and
you can see here this is here the screenshot of the same operation. I hope that you get
understand it very easily, these are very simple commands.
Now, I try to show you something more. Up to now whatever examples I have taken I
am trying to take only the integer value for the from or the two statement means from
where they will start or from where they will end and even they can be positive or
negative also, but all are integer.
6
538
Now, I try to give you here an example that you can begin with any fraction number also
and then the main thing is that you have to observe that in such a case what R is trying to
do. The point at which it is going to finish the sequence what happens to that. So, from
this objective I try to take here one more example here, in which this sequence is
beginning at 1.23 and it is ending at 10.
So, now you can see here what will happen it will begin at 1.23 and then the default
increment is plus 1 this will go to 2.23 3.33, 23, 4.23 etc. then finally, it will come to
7.23, 8.23 from here 9.23 and after this the number will be here 10.23, but the sequence
has to go only up to here 10. So, this 10.23 will not be considered and the sequence will
stop at 9.23.
So, the sequence is going to stop at a value which is less than or equal to the value which
is given after the colon sign for example, it is here 10. So, that is why the sequence is
stopped here at a value which is less than or equal to 10. So, that is what you have to
keep in mind. Now, similarly as I have taken here an example where the second number
is an integer.
So, if I want to take it any other number for example, if I want to create here a sequence
like 1.23 and which is ending at 10.54, the logic is going to be the same as I explained
you, right. So, you can see here the sequence will start here with a constant unit
increment 1.23, 2.23, 3.23, 4.23 up to here, then 8.23, 9.23 and from 9.23, it will come to
10.23 and after that the value will be 11.23.
But you can see here that this value is greater than the last value 10.54, which is given
here. So, that is why the program will stop here and it will not go to 11.23, but instead of
going up to 10.54 it is stopping at 10.23. So, this is how this R works. Similarly, if you
try to take here one more example, where the value which is the starting point that is
greater than the value which is ending point. So, I try to create here a sequence like 10.54
to 2.23 like this, 10.54 colon 2.23.
So, now, what will happen? The sequence will begin at 10.54; 10.54 to 9.54, 9.54 to 8.54
and so on. It will continue it will come to here 3.54 to 2.54 and after this the value is
going to be 1.54. But this value what will happen here? You have to compare this 1.54
539
with this 2.23. So, 2.23 is coming earlier and after that this 2.54 is coming. So, it will
stop after this, right.
So, similarly if you try to take here one more example, where I try to take the negative
values in the similar example what I just shown you. Suppose I try to take here a
sequence which is beginning from -1.23 and it is ending at -10. So, what will happen
here? It will begin at -1.23 -2.23, then -3.33 up to here it will come to -9.23 and after that
the next value is -10.23, but now this value up to where the sequence has to end is -10.
So, this value will not be executed and the sequence will start at a -1.23, but it will finish
at -9.23, that is smaller than the value of -10, right ok. And similarly if you try to take
here a combination where the point which is starting that is a negative value and the
place where it is ending that is a positive value.
So, what will happen here? This will come begin as -5.23, then -4.23 -3.23 and then
somewhere it will become here positive also to 0.77 and then it will continue. And
finally, it will come to 5.77 and after this the next value is going to be 6.77, but this value
is greater than the value of 6. So, the program will stop here and this will not be
executed. So, you can see here this is how this program is going to work, right.
540
(Refer Slide Time: 12:02)
And this is here the screenshot of all these operations. So, look let me try to show you
these things on the R console here, we before we try to move further, right. So, if you try
to see here I try to take here the same example here.
Suppose I want to create here 1.23, 2 here say here 10 this comes here like this the
sequence is going to stop at 9.23. And similarly, if I try to take care starting at 10 to 1.23,
then it will be like here this 10 9 8 7 up to here 2, right. After this the value is going to be
here 1, which is lower than the value of 1.23. So, it will stop here, right.
9
541
Similarly, in case if you try to take here this example here like as here when the 2
numbers are both are fraction. So, if I try to take it here 1.23 2 here say 8 point say here
98, you can see here it will go up to 8.23 and then after that it will stop here. And
similarly, if you try to take here 8.98 up to 1.23 it will go exactly in the same way, right.
It will begin at 8.98, but it will finish at 1.98 because after this the value will come that
will be smaller than the value of 1.23, right.
Now, in case if you try to take here some negative values also. So, -1.23 up to here 10,
you can see here this is a like this and then if you try to take here both the values to be
here negative even then it is beginning at -1.2 ending at -9.23. And even if you try to take
here the first value to be positive and the second value to be -1.23 to -10, you can see
here this is working because it automatically detects whether the sequence has to
increase or it has to decrease.
So, if you try to see I have taken here a similar type of examples and so now, you are
going to be confident.
10
542
(Refer Slide Time: 13:49)
Now, some more basic operations. So, when you try to write down here command like
sequence like this s e q and inside the parenthesis you write the value 10. So, this is
going to give you a sequence which is beginning at 1 and it is ending at 10. And this is
the same as if you try to write down the sequence here 1 colon 10 or sequence say from
equal to 1 to equal to 10 like this.
So, they are going to give you the same result. So, I just wanted to show you that what
happens so that if you are trying to use it somewhere you know how is the outcome
going to be now.
11
543
Second thing whatever is the outcome of a sequence, that can be defined in terms of
variables also, that can be controlled by the variable also. For example, when you are
trying to give here the value like from and to etc. So, we are trying to give here some
fixed values, but these values can also be some variable, right. So, this is really going to
help you when you are trying to write down the programs.
So, for example, if you try to see here, I try take I take an example here that I consider x
equal to 2, x is some variable and then I try to define here a sequence in terms of this
variable. So, sequence from 1 to here x. So, whatever x we are going to give as input that
is going to be used here.
And then after this the third value which I am writing here I just want to show you that
although I am not writing here from to and by, but in the sequence command if you try to
write down the 3 values separated by comma, then the first value is going to be from
second value is going to be to and third value is considered as by that is the default, right.
So, if you try to see here essentially you are trying to write down here that sequence
beginning from 1 going up to 2 and then this increment is going to be 2 upon 10, which
is going to be here 0.2, right. So, if you try to see here this will give you a sequence like
1, 1.2 up to here 2. And similarly, if you try to change the value of the see here x to be
here 50 and you want to have a sequence which is beginning from 0 going up to 50 and
then the by that is the increment is here 50 upon 10, which is equal to here 5 units.
Then it will become here the sequence will begin from 0 it will come at the increment of
5 10 and it will finish at 50, right. So, you can see here these are the very simple
operation which are based on your knowledge which you have learnt up to now, you
know how to define a variable and now you are trying to learn how to define a sequence
and now you are trying to combine them together.
12
544
(Refer Slide Time: 16:23)
So, these concepts are going to be really useful when you are trying to do the
programming. Similarly, just for the sake of here example I try to show you that the
outcome of a sequence can also be stored because, here I am trying to take those
examples where I can show you the output on the screen in some bigger fonts, right.
So, suppose I try to take here a sequence like a which is beginning at 1. So, this is here
from and this is ending at to equal to 50 and then it is increased by 0.5. So, I give it here
1 by 2. So, you can see here this is the outcome starting from here 1 and then ending here
at 50 now I try to multiply all the values in the sequence y 2 and I try to store them in a
new variable y.
So, I write down here y is equal to 2 into x and then I try to you can see here that all
these values are multiplied by this is the value 1, which is multiplied here by 2, this is 2.
And similarly, this is the value here 2.5 which is multiplied by 2 this gives you here
value 5. So, you can see here that all these values are there and finally, this 50 value is
multiplied and this is your 100.
13
545
(Refer Slide Time: 17:31)
So, you can see here the outcome of the sequence can also be stored in some variable and
this is here the screenshot of the same operation, right. So, before I move further let me
try to show you these operations on the R console so that you become here more
confident.
So, now if you try to see here sequence, if I write down here 10. So, this is giving me
here the value like this and if I try to write down here sequence 1 colon 10, this is also
giving me the same value and if I try to write down here 1 colon 10 this is also give me
14
546
the same value. So, yeah you can use whatever you want depending on your need and
requirement.
Now, after this I try to take here I show you this example that I try to take this command
here like this, that I am trying to consider here a variable. So, if you try to see here, I try
to choose here a variable suppose here 5 and then I try to give here a command like here
like this sequence, which is beginning from. So, here from equal to 1 and then to up to
here see here see here x and then y is going to be here say 2 upon say x, you can see here
this is the value.
So, x is coming here 5 and then this by is here 2 upon 5, right. So, this is point 4 and
even if you want to give here say 2 also you can give it here as a new variable say 3 into
x, you can see here these values are here, right. And in case if you want to store these
values, suppose I can store this is the value of this command here as say y.
So, y is equal to this we can recall here this value by y you can see here this is the value
y here and then after that I try to define here a new sequence which is z is equal to 2 into
y. You can see here every value will be multiplied by here 2 and you can see here this is
your here is z.
15
547
(Refer Slide Time: 19:03)
So, 1 is multiplied by 2 it becomes here 2 1.4 multiplied by 2 this becomes a 2.8 and so
on and all these values are stored in the vector z in a new variable z. You can see here
that all these operations are possible here and how I try to give you here one more aspect
of this sequence, right.
And this will give you one more operation by which you can do many data
manipulations, there is a command here along. So, I want to show you here that first of
all that what is the application of along a l o n g and then I want to show you that how
16
548
you can assignment, how you can assign the values and an index vector. Do you know
what is index vector? Do you see that whenever you try to read a book, there is an index,
where the values are trying to indicate the locations, right.
There is an index that ok chapter 1 page number 20, chapter 2 page number 40 and so on.
So, similarly when you are trying to consider the data vector, every value in the data
vector has got a location and that location has to be indexed. So, now what will be the
advantage? Suppose you try to take the example of your book, suppose you want to read
the chapter 2.
So, you will see from the index that the chapter 2 is on the page number 40. So, will you
will simply jump to the page number 40 and in case if you want to know that what is the
location of the chapter number 2, then also you need to know that this is the location of
the index at 2 and what is the value which is at the index number 2, that is page number
40.
So, let us try to consider a very simple example to illustrate all this. So, let me try to
consider here an example as data vector which has 4 values 9 8 7 6, right. So, now, if you
try to see here these are the values here which I have written here, 9 8 7 6 and these are
the positions.
In this data vector also if you try to see here I can assign every number of seat. So, this is
here seat number 1, seat number 2, seat number 3, and seat number 4. So, these are the
positions of these values 9 8 7 6, right. So, which I am written here like this. So, this is
actually here the index and I try to combine all these values in a vector the this will be
called as index vector.
So, now, the question is how you can create such an index vector. So, for that you have
to use the command sequence s e q and inside the parenthesis you have to write along x a
l o n g all in lower case alphabet equal to x. So, what will happen? This along will try to
read the data vector in the x and then it will assign the values 1 2 3 4 etc. depending on
the number of values in the data vector. So, if you try to see the outcome here, the
outcome will come out to be here 1 2 3 4.
So, this outcome is indicating that is stored in the variable here ind, that it is indicating
that in the index number 1 that is corresponding to the first value in the data vector x.
17
549
Index number 2 that is corresponding to the second value in the data vector x, the value
at index number 3 this is corresponding to the third value in the data vector x and the
fourth value in the index number which is 4 this is corresponding to the fourth value in
the data vector x, right. So, now what is the advantage?
Now, in case if you want to know that which value is located at this index number here
2. So, you know that at index number 2 the value here is 8 like this one, but this is I am
showing you manually, I want to do it using the R software. So, in order to do the thing
that if you want to access a particular value in a vector using the index vector, what you
have to do? First you try to create the index vector and then you try to write down the
position in the index vector that you want to access. And this has to be written here like.
This i n d and then inside the square brackets we have to write down here the location or
the value which is there in the index number and then you have to write down here a data
vector x in which you want to find out that what is the value at the second place in the
data vector x and you have to write it inside the square bracket. So, this is like data
vector, the square brackets and then inside that there is an index vector and then inside
the square brackets you have to give the value, value that is in the index vector.
And now it will give you here the value here 8. So, if you try to see what is happening
first you try to write down the value in the index vector, enclose it by the square bracket
and then write down the value or the name of the index vector in which you have a
stored the values and then you try to write down the original data vector in which you
want to know where is this value and then close them by this square bracket.
So, this is how you can access that. For example, in this case what is the value of the data
in the data vector at the index number 2 or at the second position in the index vector,
right.
18
550
(Refer Slide Time: 24:19)
So, if you try to see here this is a very simple command, I will try to show you on the R
console, but it is more important for you to understand what is really happening that is
more important, right. Now, ok after this I would like to show you some more types of
sequence like a sequences of dates, etc. for example, you can create the sequences with
respect to days, months, years, etc.
But before that I want to show you that in R software it is possible to get the time and
date and then once you have this idea then in the next lecture I will try to show you that
how you can generate the sequence related to the dates in terms of year, month, etc. So,
and this type of command is very useful when you are trying to generate a report. So,
first let me try to show you what I am trying to do and then I will explain you the utility.
So, in case if you want to generate the current time and date, then you have the command
here sys dot time. So, this is like a time of the system, system means the computer
system on which you are working. So, if you try to see here the syntax is like this, capital
S and then y and s further they are in the lower case and that dot time t i m e all in lower
case alphabets and then you have to write down the parenthesis. So, this is going to give
you the current time and date from the computer system.
So, if you try to see here when I try to execute this command here Sys dot time, then it is
giving me this type of outcome. So, this is giving me here this is year 2021, this is here
month which is here month number here is 11, that is November and this is here the date
19
551
that is 29th. So, this is giving me a date here 29th-November-21 and that time here is this
is here hours, this is here minutes and this is here seconds and this is IST, right.
So, if you try to see here I get a this outcome, but certainly when I try to show you it on
the R console you will get a different value because this is giving me the time when I had
appeared the slides and I had use this command. So, when you are trying to use it on
your computer do not expect that this is going to be the same, but that is going to give
you the time and date when you are trying to use this command, right.
So, similarly if you want to have only the date, then the command here is sys dot date,
but you have to be careful when you try to write down this command. This is here say
capital S lower case y lower case s and then dot and then capital D is also capital upper
case then all a t e that is a lower case and then you have to write down the parenthesis.
So, if you try to execute this command on the R software, it will give you the value like s
y s dot d a t e where this S and this D they are going to be in the upper case, it will give
you here the value like this. So, this is going to give me the same date which I shown you
here, which is 29th-November-21, right and this is here the screenshot just to show you
that this was the time when I appeared the slide ok.
So, now let me try to show you these commands on the R console also so that you get
here more confident.
20
552
So, I try to create here a data vector say here c say 19, 10 say 12, right this is my here x
data vector and now I try to create here the index vector like this. And now in case if you
want to know, number 1 what is there at the index number 3, let us see what happens is 3
why because your index vector is here like this 1 2 3.
So, when you are trying to write down only here ind, the name of the index vector and
inside the square brackets the 3, this 3 is corresponding to the value which is here and the
third position of the index vector. Now, you want to know what is there on the third
position in the data vector x. So, I have to write down here x and then the index position
inside the square bracket and it will give you here the 12.
So, you can see here this is the third position here in the data vector x, whose value here
is 12. And similarly, if you try to say here what is the value at here 1, you can see here
the value at first position in the data vector is 19. So, when you are writing like this. So,
it is going to the position in the index vector at first and then it will try to find out what is
the value of x at the first location.
So, you can see here this is not a very difficult command and then I try to give you here
these two commands sys dot time and sys dot date.
So, now you can see here that when I am trying to do this is going to give me a time and
date when I am trying to record this video, but if you try to give it without this
21
553
parenthesis, it will not work, right. So, you have to give it here like this only, ok and then
it will give you this value here, right.
So, this is here 30th-November-21 and I am recording this video at 12:21 means 12
hours 21 minutes 07 second IST, right. Similarly, if you try to write down here sys dot
date and if you do not use here parenthesis this will give you error, but if you try to use
here parenthesis this will give you the date this 30th-November-2021, right. So, you can
see here that it is not a very difficult thing to learn, right.
So, what we have done today? We have considered almost the same command that we
have learned in the last lecture, but we have understood a different way. Now, the
question is whether you want to use this way of writing numbers with the help of colon
sign or with the s e q command, that decision lies with you have to see what you really
want to do what type of a sequence you want to generate as long as you are trying to
generate simple sequence, possibly this will work.
But if you want more option then the sequence command will be there. And similarly, if
you try to see I have told you here the commands for finding out the time and date. These
commands are going to be very useful when you are trying to write a program to
generate a report for example, you have given some assignment to your office people and
you want to know that when they completed and when they had generated the report.
So, you will write many many commands and among those commands, if you write this
sys dot time and sys dot date. So, they will also generate the date and time automatically.
So, that will give you a foolproof system where you can identify that when this report
was generated. So, you try to take some example, try to practice it and I will see you in
the next lecture with more commands on the sequence, till then goodbye.
22
554
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 27
Sequences of Dates and Alphabets
Hello friend, welcome to the course Foundations of R Software and you can recall that in
the last two lectures we were talking about the sequences and we had considered
different types of command, their options to generate, various types of sequences in the
R software. Now, in this lecture we are going to continue with the same topic, but we are
going to learn here how to generate the sequence of a different type that is the sequence
of the dates and sequence of the alphabets.
Up to now what you have done you have considered the sequence which are numerical.
So, the first question comes here, why should I learn it? So, you know that whether
whenever you are trying to handle different databases and you want to generate some
report or you want to manipulate some data file, it is possible that sometime you will
need some particular dates which are in a sequence like a January 21, February 21 etc.
So, how to do it in the R software and similarly you have alphabets like as a b c d they
are in lower case and upper case. So, somehow so, sometime we want to generate the
sequences of these alphabets so, the question is how to get it done? So, why not to begin
this lecture and try to learn? So, I will try to take care some example and through those
examples I will try to explain you how you can generate the sequences of dates and
alphabets.
555
So, let us begin our lecture. So, you can recall that in the last two lectures we had learned
the sequence command in which the command was s e q in which you need couple of
ingredients like from where you want to begin the sequence, where you want to stop the
sequence, what is the increment, how you can control it. So, that is going to be the by
and some other sequences and after that we also have learnt for example, how to use the
colon command to generate a sequence.
So, now in this lecture today first we want to learn that how to generate the sequences of
dates. right. So, sequences of dates for example, could be suppose 30th November to 21
and suppose you want to generate a sequence of 5 dates and the increment is going to be
either by day, by week, by quarter or by year. So, suppose if you want to generate say 5
dates which are incremented at a annual level. So, this will be 30th November 21, 30th
November 22, 30th November 23, 30th November 24 and 30th November 2025 and so
on.
So, this is how the sequence of dates will look like. So, now, the basic command to
generate the sequence is the same as we had learnt earlier that is s e q, but the way you
are trying to give different types of option like from, to, by, etc. they are going to be little
bit different, right. So, for example, if you try to see here the way you have used the
sequence command, first you have to give the ‘from’.
556
So, now this from is going to be the starting date and it is a compulsory argument, that
you have to give this value. Then you have to give the next value here I say to that is to
is the and date that when you want to finish your sequence at what date. So, that is an
optional and then we have here the third value which is here by.
So, by can be here day, by can be here week, month, quarter or year. So, what is the use?
That if you want to get a sequence in which the dates are changing by the day then you
have to write b y is equal to this “day” inside the double quotes. Similarly, if you want to
get the dates on a weekly basis then you have to write down here week and then you
have to write “week” inside the double course and so on.
And similarly you have here length out and along with so, this length out is an integer
and it helps in getting the desired length of the sequence and along with is the command
which take the length from the length of this argument, right. So, we will try to take here
some examples and try to understand how are you going to give this input, so that you
get the output of the required type.
So, suppose first I try to consider; how to generate the sequences of years, suppose, I
want to generate the sequence of first day of years, right. So, ok, the way I am going to
write down these commands here is like the first value is going to be here from, inside
the arguments inside the parenthesis.
557
Second value is going to be to and third value is going to be here by, because here the
commands are quite long. So, I am just trying to express them in the shorter way, so that
they can be accommodated on my screen. That is the only reason otherwise you can
always write say from equal to this to equal to this and by equal to this that is the correct
approach actually.
Now, suppose I want to generate the dates starting from 1st of January 2010 to 1st of
January 2017 and these dates are going to be changed yearly. So, now, you have to see
how are we going to give this input. Well, it is not difficult for you to understand, you
see first I am writing here from and from I am writing here like this suppose I want to
give you the date 1st January 2010. So, this is going to be like “2010 hyphen 01 hyphen
01”.
So, this is here year, this is here month and this is here day and this is written inside the
double quotes. And then we want to read this value as date, right because that can be
character also or in order to inform the R to read this number as a date we try to give
here a command as dot Date, but you have to be very careful that a s that is in lower
case, alphabet capital D is here in the date and then after that all other alphabets ate they
are in the lower case and then so, this become as dot Date.
So, now, you already have done couple of such commands where you used to have
something like as like as dot numeric, as dot character etc. So, you now you know that
what is the meaning of this as our date. So, similarly as you have learnt about this as dot
numeric, as dot character so, similarly we have here as dot Date and this is the way I
would like you to learn this course content in this lecture because if I try to cover each
and everything it will be a very long lecture.
So, I am trying my best to give you the basic fundamentals. So, once I have told you that
how you can use such command as dot numeric, as dot character, then it should be
obvious that when I am trying to use as dot Dates that is simply going to convert the
given number into a date, right. So, I give here from and then I try to give it here to as
date and then I try to write down here in the same format “2017 minus or say hyphen 01
hyphen 01”.
558
Now, after this I have to write down here by. So, by because I need these dates to be
incremented by years so, I try to write down here inside the double quote “years” and if
you try to see here this outcome will be there the first value is going to be “1st January
2010” then, “1st January 2011”, then “1st January 2012”, 2013, 14, 15, 16. And finally,
1st January 2017 and this is here the screenshot. So, you can see here this is not a very
difficult thing to understand.
And let me try to give you some more example and then I will try to show you on the R
console. So, similarly if you want to generate this sequence of days, right and now well,
if you want to take the same example here and you want to change this years to date,
months, etc. you can do it without any problem, right.
But now I want to generate here a sequence of days where I note the starting date, but I
do not know the last date, but my requirement is that I need 6 dates, right. So, suppose
for example, I want to generate the 6 dates starting with 1st January 2017. So, now, what
will be those dates 1st January, 2nd January, 3rd January, etc.
So, in order to do that thing you can recall in the case of sequence you had learnt that
when you know the length of the sequence then how you can use the from command to
generate the sequence of the required length. So, the same command I am going to use
559
here and I write down here s e q then I try to write down here inside the parenthesis as as
dot Date. So, this is going to be your here from.
So, this is the same format as I explained you, right and then you have to write here by is
equal to “days” and then you have to write down here length is equal to 6. So, you can
recall that earlier you had used this command length to inform the R that how many
values in the sequence are required.
So, the same thing I am trying to do here that I am trying to inform the R that ok I need 6
dates which are beginning from 1st January 2017 and you can see here this is here “1st
January 2017”, then “2nd January 2017”, “3rd January”, “4th January”, “5th January”
and finally, “6th January 2017” and this is here the screenshot, right. So, you can see
here that we have the same command sequence, but now it can be used in a different way
for different job.
And similarly, in this case suppose I want to generate six dates which are changing with
respect to the month. So, here I have used here the by days. So, I can change here to by
here “months”. So, you can see here the starting date is here from which is 1st January
2017 and then after this I want to get here six dates and the dates are going to be changed
with respect to the month.
560
So, I write down here length is equal to 6. So, you can see here the first date is here 1st
January 2017, then after that 1st February, 1st March, 1st April, 1st May and finally, 1st
June 2017, right and these are 6 values. So, you can see here this is not a very difficult
thing the only thing is what you have to understand that how you are going to input the
value of the dates and after that how are you going to use the command for the by
different options for the by, right.
And similarly in the same example suppose I getting the monthly dates I want to get here
the annual dates in which I want to start from 1st of January 2017 and then I want to
generate here 6 dates. So, the length is going to be here 6 and I want to increase the date
with respect to “years”. So, I try to say here like this and the outcome is here like this 1st
January 2017, 1st January 2018, 2019, 2021 and 1st January 2022 and you can see here
this is the screenshot.
561
(Refer Slide Time: 11:05)
So, you can see here this is not a very difficult thing to do, the only thing is this you have
to just understand that how the things are going to happen. So, similarly actually
whatever I have told you here whatever the output you are going to get for the dates that
can also be saved in a variable and these values can be used in the for the programming.
So, let me try to give you here an example suppose I want to generate the dates from
beginning from 1st January 2016 and up to say 1st January 2017, right.
So, whatever is the date of a start I try to store it in a new variable start date and the final
date that is stored in another variable end date, right. So, I try to write down here the
sequence say end date start date and now I want to show you here one more command. If
you try to see here in the by what I have done here, this is here “-1 month”, right, this
option is also there. So, now, if you try to say here by say “-1 month” and you try to give
it within the double quote, see what happens.
The first value comes out to be 1st January 2017 then it is going into the reverse
direction that 1st December 2016, 1st November 2016, 1st October 2016 and so on
finally, up to 1st January 2016. So, this 1st January comes here starting date and this is
here end date. So, what are you trying to do, do you think that when you done the
sequence you had done something like you wanted to have a decreasing sequence.
So, you wrote from 10 to say 2 and by you had written by here -1 or -2 like this. Well, in
case if you try to write down sequence start date and end date and by say year or month
562
the outcome can be saved in another variable for example, here I have taken the variable
to be here out, right.
And this is here the screenshot of the same operation which I just shown you. So, now let
me try to show you these operations on the R console. So, that you become more
confident that these things are really working. So, first I try to compute these dates which
are from 1st January 2010 to 1st January 2017.
And the dates are increasing annually. So, we can see here that this is going to happen.
563
(Refer Slide Time: 13:23)
Well, I can reduce here the font size, so that you can see it easily, right. You can see here
this is how you can change the location of your screen and similarly if you try to come
here for this command that you want to generate here the days which are changing by
days and you want six days. So, you can write down here starting from 1st of January it
gives you here six days.
And similarly if you want to do here by months or by year, whatever you want you can
write down here by month then and then you can see here what happens in months. And
10
564
similarly if you want to do it by years so, you can see here you can change here by as
equal to “years” and it is giving you here six days 1st January 2017, 18, 19 etc. So, you
can see here this is not a very difficult thing to do; the only thing is this you have to
understand how are you going to do. So, let me try to just show you here.
This example which is very interesting so, you can see here, I store here this variable
start date as 1st January 2016 and then I try to store here end date as 1st January 2017
and suppose I try to have here this outcome that I want to store here the values starting
from end date to start date by -1 by “-1 month” which so you can see here, now this is
here out like this.
So, you can see here this is the same outcome, right and now if you try to see here that in
this case if you try to rewrite this thing and suppose I try to give it only by month, then
you see what happens? What do you expect, what will happen? There is going to be an
error, why?
You see; this end date is coming earlier and start date is coming later. So, do not do this
type of mistake, but if you really want to do it you have to write down here like this that
sequence start date then to end date. And then you have to write down here by say here
same months and you will get here these values.
11
565
So, you can see here and if you want to store these values also you can give it under the
new variable name say out 1 and you can see here out 1 is here like this, right same
thing. So, that is how this our R actually works, right. So, you can see here that
generating the dates is not a very difficult thing and let me try to increase my font, so that
when I come back here next I can show you things more clearly.
So, I hope you have understood that how you can generate different types of sequence
for the dates. Now I come to another aspects you know that you have alphabets like as A
B C D etc. So, they are upper case alphabets or sometime you call it in common letter as
say capital letter. And similarly you have here alphabet small a small b small c small d
etc. these are your here lower case alphabets they are called in common language as a
small letters.
So, in this R you can also generate the sequence of alphabets. So, how to get it done that
is what we are aiming here to learn. So, the first command here is l e double t e r s all in
lower case alphabets, this function is going to give you all the alphabets in the lower case
you can see here a small “a” small “b” up to here “z” and this is here the screenshot.
12
566
(Refer Slide Time: 16:42)
And similarly, in case if you want to generate a sequence of this alphabet, so the question
here is, how are you going to do it? So, first you try to understand let me try to show you
here. Suppose, if you try to see this “a” is coming here, this “b” is coming here. So, if
you try to understand what are the positions or location of this alphabet? The position
here is 1 for “a”, for the “b” the position here is 2, for “c” the position here is 3, and so
on.
So, now, it is just like the index of these alphabets. So, this particular alphabets they can
be called by their location and also a sequence can be generated by using their locations.
So, if you try to see here in order to generate a sequence or you want to call a particular
alphabet, what you can do?
You can simply use this command l e double t e r s all in lower case alphabets and then
inside the square bracket you try to write from to means the starting point. Starting point
in the index or the location and then here colon and then here to this is the end point,
right.
So, for example, means if I want to have the lowercase alphabets which are occurring at
per second and third positions. So, I will try to write down here this lower case alphabets
l e double t e r s then inside the square brackets you will write here 1 colon 3. So, these
13
567
are the locations of your alphabets. So, now, you can see here that the first 3 alphabet a b
c their location is 1, 2 and 3 respectively they will come here, right.
Similarly, if you want to have a sequence of lowercase alphabets from the position
number 3 to 1. So, you begin here writing lower case that l e double t e r s and then
inside the square bracket you write 3 colon 1, right and this will give you 3 alphabets “c”
“b” and “a”. Similarly, in case if you want to have any arbitrary positions also. For
example, you want to know what are the alphabets at the location 21 to 23.
So, you can see here l e double t e r s you can write inside the square bracket you can
write here 21 colon 23 and this will be coming here as a “u” “v” and “w”. And similarly,
if you want to find out any particular alphabet at a given location then you can simply
write down l e double t e r s in lower case alphabets and then within in the square
brackets you write down here 2, right and this will give you here the second alphabet that
is “b”.
So, you can see here that is not a very difficult thing to understand but, before going
further let me try to show you these things on the R console.
14
568
(Refer Slide Time: 19:17)
So, if you try to see here this letters will give you this thing and remember one thing if
you try to write down here only letter l e double t e r without s this will give you error
because it does not exist. So, now, if you try to see here in letters if you want to know
what are the alphabet at the 17th position, what will happen?
It is “q” and if you want to have the letters from say 14 position to 17 position you can
see here these are “n” “o” “p” “q” and how you can find out this is the 13th position
which is here “m” so, 14th is “n”, 15th is “o”, 16th is “p” and 17th is “q”. And similarly,
if you want to have it from here 14 to 7 in the reverse order so, that can also happen here,
right.
And suppose, if you want to know that what is in the letter at suppose here say 18
position, you can see here this is “r” and similarly if you try to see here what is the letter
at 26 position this is here “z”. Now, what will happen if you try to give any other
number?
15
569
(Refer Slide Time: 20:13)
Suppose I say 27, there is NA, do you know why? Because there are only 26 alphabets
so, that is why if you are trying to give any index after this 26 which is 27 here it is
saying that it is not available, right, ok. So, you can see here it is not a very difficult
thing. So, similarly if you try to understand that whatever we have done for the lower
case alphabet that can be done for the upper case alphabet exactly in the same way.
So, in order to generate the uppercase alphabets we have the command here L E double
T E R S, but everything is written in the capital letters in the upper case alphabets and if
you try to execute it on the R console you will get here all these alphabets capital A
capital B and so on. So, now, the same thing is capital A is at the location number 1,
capital B is at the position number 2, capital C is at the position number 3 and so on.
16
570
So, similarly, we can generate the sequence of upper case alphabets exactly in the same
way as we have done in the case of lower case alphabets. So, suppose I want to generate
uppercase alphabets which are located at the position number 1 to 3. So, I write here L E
double T E R S all in upper case then square bracket and then I write down here 1 colon
3. So, that will give me a 3 letters “A”, “B” and “C”.
Similarly, if I want to go in the reverse direction then I try to type here L E double T E R
S and from 3 to 1. So, this will give me here 3 values here, 3 alphabets which are in the
reverse order at third position, second position and first position that is “C “B” and “A”.
Similarly, if you want to know what are the alphabets at the location number 21 to 23,
you can write down here L E double T E R S in the upper case and then you try to write
down here 21 and 23 and it will give you this “U” “V” and “W”.
And similarly, if you want to know what are the alphabet at the location number 2, then
you try to give here L E double T E R S inside the square bracket and then it will give
you here the alphabet capital “B” which is at the second location. So, you can see here
this is not a very difficult thing, but let me try to show you it on the R console. So, that
you get more confidence.
So, if you try to see here I try to write down here LETTERS. So, this gives you the 26
alphabets. So, now if you want to write down here what are the LETTERS from 1 to 4
17
571
this will be “A” “B” “C” “D” and similarly if you want to have what are the letters from
the location number 4 to 2 that will be “D” “C” and “B”. And similarly, if you want to
see here what is the location number say here 21, you can see here this is “U”, similarly
if you want to see what is the alphabet location number 26, this is here “Z”.
Now, once again if you try to write down here what is the alphabet at 27 position this
will give you NA because there are only 26 alphabets. So, this is how you can actually
work with the sequence of alphabets also. So, now, we come to an end to this lecture and
you can see here that we have now considered different types of sequences of numbers of
dates and then alphabets also.
So, these sequences are going to help us in different ways that you will realize when you
start doing the real life programming, right. But my objective was that I wanted to make
you learn the basic ingredients of the programming that these are the different
components.
So, if you try to learn all these components at the end whenever you need it you can use
them. So, now, my request to you all is that now we have used the sequence command to
generate different types of dates the sequence of dates which are monthly, annual, by
days, etc. etc.
So, why do not you try to experiment with all the options which I had shown you in the
slide, I have taken here only couple of them and then go back to your lecture in the when
we had learnt about the sequence and we had used different types of options of say from,
to, by, etc. Why do not you try to implement those concepts over here and try to see what
happens. The more you practice the more you learn better you are. So, you try to practice
it and I will see you in the next lecture, till then Goodbye.
18
572
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 28
Repeats
Hello, friends. Welcome to the course Foundations of R Software and now, in this
lecture we are going to begin with a new topic that is also related to data manipulation,
this is about Repeats. So, as the meaning of the name suggest repeat; that means, you
want to repeat something you want to do something again and again. So, now, we are
trying to learn here that how are we going to repeat some numbers or a sequence of
numbers.
Well, these are very small operations, but they are always required when you are trying
to do the big programming and you are trying to write down some longer program
having some complicated structure, then these types of thing they help you a lot. So, let
us try to understand what is repeat, how it works and I will try to once again explain you
through various examples so that you can understand them easily. So, let us begin our
lecture.
So, we have here a function rep, rep – this is the short form of the repeat. So this rep
function replicates the numeric value or takes or the values of a vector for a specific
1
573
number of times, right. Suppose you want to repeat a value 2 five times or you want to
repeat a sequence say three times and so on. So, whatever you want to repeat the syntax
is that you try to write down here r e p rep and within the parenthesis you try to write
down the value or a vector that you want to repeat.
So, for example, suppose if there is a vector x and you want to repeat so, just write repeat
inside the parenthesis x. Now, you have two options whether you want to repeat the
entire sequence or you want to repeat the values in the sequence. So, based on that we
have here two options with the rep command you can use here option t i m e s times
equal to say n or it can be each is equal to n.
So, if you are trying to use this command here times then it is going to repeat the whole
data vector n times. And, if you use the command here each is equal to n then it will try
to repeat each cell n times.
So, what it means, how it works let me try to take here some example and then I try to
show you. And, before I move further let me try to inform you that we have here three
possible options, say, one is here rep inside the parenthesis you write x comma length dot
out is equal to n. So, that will give you the output of a desired length say n or you write
rep inside the parenthesis x, and then you write here l e n g t h length is equal to small n.
Or you try to write rep underscore l e n and then inside parenthesis you try to write x
comma length dot out, then they are going to give you the same output. So, they are
going to repeat itself for the desired length, right.
2
574
(Refer Slide Time: 03:26)
So, in case if you want to learn about this rep command mode, then my suggestion is that
please try to look into the help and you can write down here say help inside the
parenthesis you write within double quotes you write rep and so on. So, you will get here
many many options and all the details about this rep command because I will be
choosing here some selected options and then I will try to give you those values which
are commonly used.
So, first let me try to show you here how you can use the rep command to repeat an
object a small n number of times. Suppose, I want to repeat a value 3.5 ten times, I will
write down here rep and within parenthesis I will write the number 3.5 which I want to
3
575
repeat which is over here x and then I will use here the command t i m e s times equal to
10. So, you can see here this will repeat the 3.5 value ten times 1 2 3 4 5 6 7 8 9 10, right.
And, similarly in case if you write down here for example, rep and then within
parenthesis you write 1 colon 4, so, what is the meaning of this 1 colon 4? Now, you
know this is 1 2 3 and 4 this is your here x, this is a vector and then you are trying to
write down here 2. So, then it is going to repeat it 2 times. You can see here 1 2 3 4 – 1
time and 1 2 3 4 2 times that is 2. So, you can see here that this command rep can be
used for a scalar like 3.5 and a vector like here 1 2 3 4 and this is here the screenshot.
So, now after this I try to illustrate here that what will be the difference in the outcome
when we are trying to use the option times t i m e s and each e a c h, right. So, I will try
to use it on the same input and then I will try to explain you here how their outcomes will
look. So, if I try to take here data vector say 1 2 4 it has 4 values 1 2 3 and 4 like this. So,
now, I try to repeat this x here using the option t i m e s is equal to 3 and e a c h is equal
to 3. Now, you see how the outcome looks like.
Well, I am trying to use here the option times you can see here this is the repetition 1
time, 2 times and 3 times. So, the entire sequence is repeated 3 times. Now, I try to use
here the option each like as here like this each is equal to 3. So, now, what will happen?
The first element or the element in the first cell of the data vector which is here 1, this is
4
576
going to be repeated here 3 times. 1 is repeated 3 times, 2 is repeated 3 times, 3 is
repeated 3 times and 4 is repeated 4 times.
So, you can see here this is how their outcomes will differ and this is here the screenshot
of the same operation. So, let me try to show you first these things on the R console and
then I will try to show you something more.
So, if I try to say here rep say 2.7 say times equal to suppose here 10 you can see here
this is repeated 10 times and even if you do not give here this option times or you do not
write here time, but if you simply write down here 10, then it is going to be here like this.
Now, in case if you try to write down here say here we try to repeat a sequence say 1 to 4
and times is equal to say 3. So, you can see here 1 2 3 4, 1 2 3 4 and 1 2 3 4 they are
repeated 3 times.
And, now the same thing if you try to do here that if you do not write here anything and
you try to simply say here rep 1 to 4 and 3 you get the same thing here. And, if you try to
use here the command here each in place of times so, what will happen here? Each is
equal to 3 you can see here 1 1 1 2 2 2 3 3 3 and 4 4 4 all the numbers are repeated 3
times.
So, one thing you can see here that when you are writing here times and you are not
writing anything, but you are simply writing the numbers, so, whatever is the value at the
5
577
second place after the comma that is always going to be considered as t i m e s times by
the R software. That is the default choice. So, unless and until you write here each it is
not going to be considered as each. So, that is what you have to keep in mind when you
are trying to execute these operations, right.
Now, I try to show you here the outcome that what happens when you are trying to use
each and times command both together. So, if you try to see here I take here a sequence
here 1 to 4, 1 2 3 4 and I say here each is equal to 2. So, you know means every value 1 2
3 and 4 they are going to be repeated 2 times. Now, in this case I try to add here the
option times equal to 3.
So, now, what will happen here? Now, you can see. The first this will be executed and
the outcome of this is given here like this and if you try to see the same thing is repeated
here again 1 2 3 4 like as a like and this is repeated 3 times, right. On the other hand, in
case if you try to write down the same command in this format that you try to
interchange the positions of this times in each that will not make any change.
So, you can see here 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 and 1 1 2 2 3 3 4 4 they are the same,
right. There is no change. So, that is what you have to observe and keep in mind that how
R works right, ok.
6
578
(Refer Slide Time: 09:04)
And, this is here the screenshot of the same operation you can see.
Now, I give you here one more example that suppose I try to take here a sequence like as
1 to 4; that means, there are four values 1 2 3 and 4, and after that I write down here
another sequence 2 to 5 that is 2 3 4 and 5, right. So, now, what is going to happen? So,
first you have to observe the way it is working. So, first of all this is going to be taken as
say here times. So, first of all this value 1 is picked up and this is repeated 2 times you
can see here.
7
579
Then after this the next value 2 is picked up and this is repeated 3 times you can see here.
Then after this the third value 3 is picked up and it is repeated 4 times you can see here
and finally, the last value 4 is picked up and it is repeated 5 times you can see here. So,
this is another way the repeat command works when both are the data vector. So, this is
what you have to understand that how these operations are going to work, right.
And, this is here in the screenshot so, you can be confident that ok that when are you
going to do it on the R console the same outcome will be there, ok.
Now, I try to show you here the similar operation where every object is repeated for a
different number of times, right. So, suppose I try to create here a sequence from 2 up to
here 8 and by 2. So, this sequence is going to be like 2 4 6 8 and it is value is stored in
the variable a n s answer. And, now you try to write down here command here rep this 1
colon 4 that is 1 2 3 and 4 and what a n s times. So, this is here 2 4 6 and here 8.
So, what will happen? The same thing will happen that let me try to write down here like
this 1 2 3 and here 4 and then 2 4 6 8. So, 1 is going to be repeated for 2 times you can
see here, then next 2 is going to be repeated 4 times, then it will pick up the third value 3
is going to be repeated 6 times. So, 1 2 3 4 5 6 and finally, the last value. What is the last
value? 4. So, this is 4 is going to be repeated 8 times. So, you can see here 1 2 3 4 and
then 1 2 3 4 that is 8 times.
8
580
So, what you have to observe that how this R is going to work with these commands that
will help you in creating more such sequences using the repetitions, right.
And now before going further let me try to show you these things on the R console also
so that you get confidence that ok that these things are working, right. So, let me try to
take here the data vector here as say 1 to suppose 5 I take and then I try to say here repeat
x a times equal to 3 and then I try to write down here x try to repeat for each equal to 3.
So, you can see here this is happening like this, one more example, not bad right.
And, then I try to take care it like this that I try to take care suppose repeat each equal to
3 and then also I try to add here post 2. So, you can see here that in this case this 1 1 1 2
2 2 3 3 3 4 4 4 5 5 5 this is one time and this is here 1 1 1 2 2 2 3 3 3 and like this is
second time. So, that is going to happen.
9
581
Now, in case if I try to take here one more example suppose x takes the value here 1 to
suppose 3, x is here like this. And if you try to write down here rep x times equal to
suppose here 4 it will give you like this 1 2 3 is going to be repeated 4 times you can see
1 2 3 and here 4. And, now try to see here in the same command try to use here each is
equal to suppose 2 times. So, you can see here now each is going to overtake and now 1
2 3 is each of the cell is repeated 2 times and then it is repeated 4 times.
So, this is what you have to keep in mind when you are trying to work in the R software
that how actually R works. So, whosoever has done the programming I means you have
to follow that rule unless until you try to write your own program. Similarly, if you try to
see here what will happen to here rep 1 to 4, 2 to 5 as I shown you? This is the same
command that 1 is repeated 2 time, 2 is and then 2 is repeated 3 times and so on, right.
So, that is here like this.
And, in case if you try to say here like this the same example which I shown you here on
this one if you try to write down here your answer is going to be like this a n s and then
after that if you try to repeat this value what will happen here? You try to repeat the
sequence 1 to 4 for the times which is controlled by this a n s values. So, 1 is going to be
controlled or repeated 2 times, 2 is going to be repeated 4 times, 3 is going to be repeated
6 times and 4 is going to be repeated 8 times.
10
582
(Refer Slide Time: 14:02)
So, you can see here these are not very difficult operations, but you have to understand
how they are trying to give you the output. Now, I try to give you take care one more
example where I am trying to consider here a matrix of order 2 by 2 in which the data is
1 2 3 4 and the observations are arranged by rows. You already have learnt the matrix
operation, so, x matrix here look like this.
Now, in case if you try to use here the command rep x, 3 then you know the way it is
going to work it is very different. So, you have to understand, right. So, if you try to see
here the outcome here is 1 3 2 4 1 3 2 4 and 1 3 2 4 this is 1 times, 2 times and 3 times.
So, this 3 is corresponding to this 3 and now this here x here is like this, but the data has
been arranged like 1 3 2 and here 4.
So, you have to understand how the matrix is going to be repeated when you use the rep
command well actually you do not want this thing you wanted that this matrix x is like
this 1 2 3 4 this and then 1 2 3 1 2 here 3 4 like this and 1 2 3 4 like this. So, for that if
you remember you had the command like bind it was rbind or says cbind you have done.
These two command for binding row wise or column wise.
So, just be careful that was my objective to show you that when you are trying to operate
it with the matrix what is going to happen, right, ok.
11
583
(Refer Slide Time: 15:24)
So, now after this whatever repetition command you have learnt for the numbers they are
also valid for character. So, if you try to see here I try to take here a data vector
consisting of three alphabets which are given inside the double quotes a, b and c and I try
to repeat them two times that is times equal to 2. So, you can see here that this a b c and
a b c they are repeated two times because here the times is the default.
Now, similarly if you try to take here instead of a b c if you try to take here apple,
banana, cake and you try to repeat it 2 times we can see here apple, banana, cake and
apple, banana, cake both are repeated here 2 times, ok.
12
584
And, similarly if you want to have here some more operations like this one that I want to
repeat here to, but how many times? The length of the data vector should be 5; that
means, five times. So, my length is pre specified. Now, in a univariate case you may not
see the difference, but as soon as I try to take data vector then you will see the difference
that how the outcome changes.
So, if you try to see here 2 and I try to give here the option length dot out is equal to 5, it
will give you here five values, no issues. And, this is the same thing if you try to use here
rep 2 and this option length is equal to 5 it will again give you the five values no issue,
but if you try to take here a data vector then you try to see what happens. If you try to
take a data vector of two values 2 and 3 and if you give here an option length is equal to
5, so, that means, you need only five values.
So, this 2 3 will start repeating 1 time there are two values; 2 times there are two values.
So, now, there is a space only for one value. So, this ideally it should be here 3, but this 3
will not come here and process will start will stop only at 2, right. And, similarly if you
try to take here three elements 2, 3 and 4 and still you try to take the length is equal to 5.
So, this 2, 3, 4 is going to be repeated here and then once again 2, 3, 4 will be repeated,
but then there is no place for the last value 4 because the fifth place is occupied by 3. So,
that is why it will stop here and you will get here only 2 3 4 and 2 3, right. So, this is
what you have to keep in mind.
13
585
And, the same thing will also happen with the character that if you try to take here the
character say apple and you try to repeat it 5 times. So, you will get here apple 5 times
you can see here 1 2 3 4 and 5 and similarly, if you try to take here a sequence of a, b, c,
but your length here is 2. So, only the first two letters a and b will be printed here, right.
And, if you try to take here length is equal to 5 then a, b, c will be repeated here and
ideally after this a, b, c has to be repeated again, but there is no place for the c because
your limit is that you want only 5 characters. So, 1, 2, 3, 4 and 5 the process will stop at
b and no c will enter here.
So, that is how the things will work and this is here the screenshot and let me try to show
you these things on the R console how do they work and now you have seen that they are
not very difficult thing to do, right.
14
586
So, if you try to see here first look let me try to take this matrix example. So, you can see
here this is your here matrix and then you try to repeat this matrix three times. So, you
can see here like this, right. In this case if you try to write down here each is equal to 3
then you can see this outcome is different that each of this value in the same order 1 3 2
4 this is repeated three times 1, then 3, then 2 and then 4, right.
And, similarly if you try to take here some here character and you want to repeat them
so, it is here like this the sequence of a, b, c is repeated here 2 times. Similarly, the
sequence of apple, banana and cake and if you want to repeat it 2 times you can see here
apple, banana, cake and apple, banana, cake are repeated 2 times.
15
587
Similarly, if you want to have here these type of commands so, you can see here that if
you try to use this command here. If you are trying to repeat the value 2 for five times it
will be repeated five times, but in case if you try to take here a data vectors say here 2
comma 3 and if you try to repeat it 5 times you can see here the answer comes out to be
2, 3, 2, 3 and after that it stops at 2, right.
And, if you try to increase this number suppose if I say 2, 3, 4 then you can see here the
whole vector 2 3 4 will come here, but after that after 2 and 3 it will stop because there is
a limit that you cannot increase this sequence beyond five elements, right. So, similarly if
you try to see here for this sequence of characters also this is valid that you are trying to
repeat the apple for five times and if you try to repeat here this 3 values a b c two times.
So, you can see here it is only here a and b are appearing here, but if you try to make it
here suppose here 5. So, you can see here a b c will come here and after that only a b will
come here, right. So, this is how we try to do all such calculations without any problem,
ok. So, now we come to an end to this lecture and you can see here we have done very
simple operation today.
We have learnt how we can generate the sequence in which the values are getting
repeated and yes, these things are going to be useful when you are trying to do the
programming, you are trying to handle the databases, you are trying to generate different
types of reports at that place the importance of these basic commands will be seen.
At this moment you just try to learn it as such that if you are asked to repeat the values
how are you going to do it, right. So, now it is you it is your turn that you try to take
couple of example based on numbers, based on character, try to use each times etcetera
their combination and try to see how this is working.
You have seen that there is a special way in which the R is trying to repeat the values
when you try to take each or times or a combination of them or you try to repeat them
with the sequences means both the times and the data vector both are in the form of data
vector that you want to repeat. So, once you try to understand and the main thing is not
to get the output, but try to understand that how this is working.
So, you try to practice it and I will see you in the next lecture. Till then, goodbye.
16
588
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 29
Sorting, Ordering and Mode
Hello friend. Welcome to the course Foundations of R Software and you can recall that
in the last couple of lectures we had discussed various aspects of the data handling. So,
continuing on the same line, in this lecture also we are going to take up one more topic of
the data handling. This is about Sorting, Ordering and Mode.
So, you know that whenever you are trying to handle the data that is related to
calculations, simulations or database management etcetera; many times you need to
arrange the data in a particular order either it an increasing order of values, decreasing
order of values or you want to order them etcetera.
And, these are the very common operations which you can do any software. So, the
question here is how the same operation you can do in the R software and how R works?
What is the difference between sorting and ordering? If I ask you, then these are the type
of questions which we are going to answer in this lecture.
1
589
So, we are going to take couple of examples and then I will try to explain you how do we
manage the sorting, ordering and what is the difference between sort and ordering of the
data. And, after that I will give you a quick review of the modes right. So, let us begin
our lecture.
So, first we talk about sorting. So, I am 100 percent confident that you all know what is
the meaning of sorting. Sorting means you are trying to arrange the given data increasing
or a decreasing order, right. For example, if I say here I have here value say 5 1 3. So,
you would like to arrange an increasing order or an ascending order like 1, 3, 5.
And, in case if you want to increase the data in the descending order that is in the
decreasing order, that will be the largest value will come at the first position, then the
second largest and then the minimum value, right. So, the question here is that how you
can do such functions in the R software. So, we have a function here sort, s o r t. This is a
function which sorts the values of vector in ascending or descending order.
So, if you try to use only here sort, the default choice is going to be ascending order, ok.
So, now doing this sorting in R software is extremely simple. You simply have to write
here the command sort, all in lower case alphabets and within the parentheses, you have
to write down the data vector in which you have arranged the values and you want to sort
them out. After this there is an option here decreasing, d e c r e a s i n g and all in lower
case alphabets.
And, this is a logical variable and you have to give it its value in terms of say TRUE or
FALSE. So, when you try to give it here FALSE, right. So, decreasing is equal to
FALSE. If you try to understand the meaning of this statement; that means, decreasing is
FALSE, then what is TRUE? Increasing. So, decreasing equal to FALSE means,
increasing so that means, the data in the vector x is going to be arranged in an increasing
order, right.
So, this is how we try to take a call that we have to arrange the data in an increasing or a
decreasing order. And, then we have one more option here na dot last. This handles the
missing values. In case if you try to give here the value TRUE, then the missing values in
the data they are kept in the last.
2
590
And, if you try to give it value here the FALSE, then they are put at the first place. And,
if you try to use here the option here NA, what I have written here, then the missing
values are removed. And, after that there are many more options in the sort and I would
request you that you try to look into the help, ok.
Now, let me try to take here some examples and try to show you what is really
happening and how the things can be done in the R software. So, if I try to take here a
data vector say 8, 5, 7, 6. It has 4 values and I try to store it in a variable y. So, now, this
is here the data vector here y and now if you try to see the values in the y are not really
arranged in an increasing or a decreasing order ok.
So, now I try to sort it. So, I use the function here sort and in the parenthesis I try to write
here y. So, now, you can see here that the minimum value out of 8 5 7 6 which is here 5.
So, this is arranged first. Now, first is out and then the minimum value among 8, 7 and 6
is found this is here 6 and then out of the remaining values 8 and 7 whichever is the
minimum value that is found and finally, the last value is here 8. So, this is the order in
which the sorting is done and you get here an outcome like here 5 6 7 8.
Now, in case if you try to increase this value in a decreasing order, then you have to use
here the option decreasing is equal to TRUE right. So, decreasing equal to TRUE what
will happen? That whatever the operation, which has been done in the earlier command,
3
591
where we got the outcome 5 6 7 8 that is just reversed, right. And, then what will
happen?
That, first the maximum value out of this 8, 5, 7, 6 which is here 8 that is obtained and
then maximum value among the remaining value 5, 7, 6 that is obtained which is here 7.
Then, after this the maximum value among the remaining value 5 and 6 that is obtained
6.
And, then after this whatever is the value remaining 5 that is obtained and this outcome
is given here like this. So, you can see here that either of the approaches you can follow
to understand what is really happening when we are trying to give the command here,
sort y with an option decreasing equal to TRUE, ok.
So, now let me try to show you these operations first in the R console and then I will try
to come to our next topic on order. So, if you see here if I try to take here say c 8, 5, 7
say 2 and if I try to take here sort y. You can see here this is 2 5 7 8 and in case if you try
to use here the option decreasing is equal to FALSE, then let us try to see what do you
get here. So, this is here the same.
So, you can see here that decreasing equal to FALSE is the default option in the sort
function here. And, in case if you try to give this decreasing equal to TRUE, then you
4
592
can see here that it is arranged in such a way such that the maximum value occurs first
and the minimum value occurs at the end, right, ok.
After this our next command is order. So, the question here is first of all what is an
order? What is the difference between sorting and ordering and how to get it done in the
R software? So, first I explain you what is the command in the R software to order the
value and then I will try to explain you what is the difference between this sorting and
ordering, right.
So, the function in the R software for ordering the value is o r d e r, order and then
whatever values in the data vector you want to order you have to store in a variable like
here x. Just like what we did in the case of sort command. So, you write down here
order, then within parenthesis you try to write down here and then after that we have a
option here decreasing is equal to TRUE or FALSE.
So, when you are trying to use here the option d e c r e a s i n g, all in lower case
alphabet, decreasing is equal to FALSE. Then, as the meaning suggests that you do not
want decreasing. So, what do you want? You want here increasing. So, if you want to
order the value in an increasing order of the variable, then you have to use here say
decreasing is equal to FALSE or that is the default option. So, you do not even need to
write it.
5
593
And, in case if you want to arrange them in the decreasing order of the variable then you
have to use here decreasing is equal to TRUE ok. And, similarly here if you try to add
here one more option here n a dot last is equal to TRUE, then this is a variable. If you try
to give it here TRUE, then the missing values in the data are put at the last that is
towardas the end.
If you write it as FALSE, then they are put at the first place and if you try to write down
here NA, then they are removed and the order is obtained for the remaining values; the
values, which are available in the data vector x. So, this operation is just like what we
have learnt in the case of sort.
But, now first we try to understand, what is the difference between sorting and ordering,
right. So, if you try to see here, I try to take here a data vector which has five
values; 9, 8, 5, 7, 6 right. Now, if you try to see whenever we are writing a value in a
data vector, then there are two components. One is the value and second is the position.
For example, if you try to see here this value 9, this is at the 1st position, this value 8 this
is at the 2nd position, 5 is at the 3rd position, 7 is at the 4th position and 6th is at the 5th
position.
So, in order to explain you I try to just write down these values here. So, here I am trying
to write down the all values 9 8 5 7 6 and then here location or positions, their address in
the data vector which is 1 2 3 4 5 for all these values. Now, what you try to do that first
you try to order the values. So, if you try to see out of this 9 8 5 7 6; what I try to do here
6
594
whatever is the minimum value of means among 9, 8, 5, 7, 6 I try to write down here,
this is here 5.
Then, the minimum value out of the remaining values 9, 8, 7 and 6, this is here 6. Then,
the minimum value out of the remaining values 9, 8, 7 this is here 7. Then, the minimum
value between the remaining values 9 and 8, this is here 8. And, then after that we have
the value here 9. So, these are my ordered values. So, I try to write down here them here
like this say this value 5 6 7 8 9. I am writing here 5 6 7 8 and here 9.
Now, if you try to compare these values from the values which are given here, let me call
it here let us say table 1. In the table 1 what I try to do here, I try to write down these
values from the table, but I try to move them with their position values. So, I will try to
see here, what is my first value here? 5. So, I will go here and I will try to pick up here 5,
I will try to bring it here to the 1st value and I try to write down here, 5 and here 3.
So, that is what I am writing here. After this whatever is my next value which is here 6th,
I go to the table number 1 and I try to pick up and I try to bring it here, the value along
with the position. After this I try to pick up the 3rd value which is here 7. So, I try to pick
up here the value here 7 and its position and I try to pick up and bring it here. And, after
that I try to do the same operation with here 8.
So, I try to bring this 8 and its position to here, this value. And after finally, can I try to
choose the last value 9 which is at position number 1 and I bring it here. So, if you try to
see here now, what is the value of these positions here? This is here 3, 5, 4, 2, 1 ok. Just
try to keep in mind.
7
595
Now, I do one thing here. First you try to look here. I try to execute this command order
y on the R console and I get here the outcome 3 5 4 2 1. So, now the question is what is
this 3 5 4 2 1? So, whatever I have done here, if you try to look what I have written in
my in my hands 3, 5, 4, 2, 1; well I have this copied and pasted this 2 tables here also.
So, you can see here this is your here 3 5 4 2 1. So, this is the outcome here.
So, if you try to see many times people get confused that the sorting and ordering in the
R software they are the same, right. But, here you have to be very careful and that is
what I always emphasize that you have to understand what R is trying to do and
accordingly you have to manage your actions.
So, if you try to see here when we are trying to use the command here sort, then it is
trying to arrange the values. But, when I am trying to use the command order, then it is
trying to arrange the location of the values right. So, it is here like this. So, this order
command, this is giving us here an outcome 3 5 4 2 1 which is the location of the values
when they are ordered.
Or, I will say in very simple words that when you are trying to sort the values, you are
not only sorting the values, but you are also trying to sort their corresponding positions.
In the sense, that whenever the value is sorted along with the value its position is also
carried over.
So, after the sorting, the sort command will try to give you the values and the order
command will try to represent the values of their locations in the ordered data vector. So,
that is what is happening. So, if you try to use here the command here order with the
decreasing is equal to TRUE, then you will you can see here this is like 1 2 4 5 3. What
is happening here? This is just the reverse order of 3 5 4 2 1, 3 5 4 2 1, right. So, this is
the difference between sorting and ordering, right.
8
596
(Refer Slide Time: 15:40)
And, you can see here this is here, the screenshot of the same outcome. And, now let me
try to first show you these things on the R console and then I will try to move forward.
So, let me try to show you this order here also so, that you can also see what is the
difference right. So, I try to write down here, order of this here y. So, y here is 8, 5, 7, 2
like this, right.
So, if you try to see here this is now here 4 2 3 1. So, this is here 4 2 3 1, but when you
try to sort it the values of why this was 2 5 7 8. And, similarly if you try to use here the
command is equal to here decreasing is equal to here FALSE. Then, you can see here
this is the same outcome, right.
But, in case if you try to make this option decreasing is equal to TRUE, then it will be
just reverse. It is 4 2 3 1. Now, this will become here 1 3 2 4 or 4 then 2 then 3 and then
1 in the reverse order. So, this is how we try to sort and order the values.
9
597
(Refer Slide Time: 16:56)
So, once again the more important part here is that you have to understand number 1
what is the difference between sorting and ordering in R software and, how these
functions are going to be executed in the R software. And, one request please do not get
confused that sorting and ordering they are the same thing in the R software. And, they
are going to produce the same outcome which is not happening here, I have shown you.
Now, after this I come to another option here, another topic here which is about mode.
About mode, we have briefly learnt these things. If you recall that in the beginning of the
lectures, in the first few lectures we had talked about the mode. And, I had told you very
clearly at that time itself that with the name mode, do not get confused with the mode
that is used in statistics. Like as mean, median or mode which are used to measure the
central tendency of the data, right.
So, this is not that mode. Remember, this is like modes of transportation, modes of
transportation are say train, air, road etc. So, that is this mode. So, I would just like to
give you here a brief overview of this mode here, because now we have learnt some
more topics in this. So, every object has got a mode and the mode indicates how the
object is stored in the memory. For example, stored as a number, characteristic string, list
of pointers to the object or function or there are couple of other things also, right.
10
598
(Refer Slide Time: 18:33)
So, in order to know the mode of the object you know the command here is mode; mode
and you have to write down here mode. And, and inside the parentheses you have to
write down the data vector or the object of which you want to find out the mode.
So, for example, if you take a number like as here 1.234, then its mode is numeric. And,
when you are trying to take a data vector in which all the values are some number. This
is essentially a vector of number. So, this is also going to have the mode as numeric. In
case, if you try to take a word like India and you try to write down within the double
quotes, then this is actually a character string and the mode of this is character.
11
599
And, similarly if you try to take a data vector consisting of this character string like as
India and say CANADA, you can see here; then yeah this is again a vector of character
string. So, its mode is again character. Now, similarly if you try to take here say here
factor. So, factor is such a thing we have not done up to now. But, factor are the thing
which tries to convert or assign a number to some character, that is one of the very
simple definition I can explain you at this moment. And, in the next couple of lectures
we are going to talk about the factor.
So, if I try to write down here character string like as here MP and UP, MP means
Madhya Pradesh, UP means Uttar Pradesh and they are the character string. They are the
data vector of characters. So, they are going to be character as you have seen here. But,
in case if you operate the factor command over this, then this is this factor has a mode
which is numeric. So, that is something new which you are doing it and that was my
reason that, why I took this topic of mode once again.
After this you are going to learn about one more topic which is about list. So, that we are
going to learn very soon. So, if you try to write down some character strings inside the
double quotes within the bracket and you try to use here list. Then, this is actually called
as also a list and this mode is also list, list. And, similarly we are going to learn about
one more aspect in the forthcoming lecture, this is about data frame, right.
So, data frame is something like you can think as a spreadsheet; spreadsheet in which
you try to arrange the data in rows and columns, right. So, you know that in spreadsheets
you can arrange the numerical data as well as character data. One software which can
provide you the spreadsheet is Microsoft Excel software. So, and in usual words we call
it as an Excel file.
So, in Excel file you can give the names as well as number, you can also do the
mathematical operations also. So, that is why this data frame is the framework in which
you can understand in a very simple language that we are going to handle the
spreadsheet. So, this is called here as a data frame and its mode is also here as a list.
And, after this we have handled many such functions for example, print, sum etc. So,
these things they are the function and their mode is function, right. So, that is what you
have to keep in mind so, that in the forthcoming lectures whenever we are trying to do it,
you can understand them.
12
600
(Refer Slide Time: 22:04)
So, just for the sake of example, I try to show you here that if you try to find out the
mode of 2.432, this will come out to be numeric. In case, if you try to take the mode of a
data vector consistent of only the numerical value 3, 4, 5, 6, 7, 8, then it is numeric. And,
if you try to take here some character like as India, then its mode is coming out to be
character.
And, if you try to take a data vector of this character string and try to find out its mode,
say I try to consider here two strings India and CANADA. This comes out to be here as a
character right and this is here the screenshot.
13
601
And, similarly if you try to take the factor of characters like as UP, MP then its mode
comes out to be here numeric. And, if you try to consider here list say India and USA,
there are two strings, there are two characters. Then, its mode will also come out to be
here as a list. And, if you try to consider here the mode of a data frame, that I will try to
show you that what is this thing, then this will also come out to be here as a list. And, if
you try to find out the mode of a function like print for example, it will come out to be
here as a function right.
So, this is here the screenshot of the same operation. And, now I try to show you these
operations on the R console so, that you can understand them very easily, right.
14
602
So, if I try to find out here the mode of some number here 3 4 5, you can see here this is
numeric. And, if I try to find out the mode of a data vector say 2, 4, 5 etc. This will also
come out to be here numeric. Now, in case if you try to find out here the mode of a
character like here apple, right. So, you can see here this is a character.
And, in case if you try to take the data vector of the character. Suppose, if I take it take
here see here apple and say here banana and say here cake right. You have to put all
these things inside the double quotes so, that R can understand that these are your strings.
So, this is also a character.
Now, if I add here one more value here like here as a 12. So, 12 is a number. So, what do
we expect what will happen, what will be the mode of this data vector which has apple,
banana, cake and then number 12? So, this will again be a character. Why? Because,
once in a numerical vector if you try to add any character string, the mode will become
correct because you cannot do the mathematical operation on it right.
So, similarly if you try to take here the these examples here like as here factor. So, if you
try to take care of the same, let me try to clear the screen. So, you can see here that mode
of this data vector is a character. But, when you try to find out the mode of the factor of
the same variable here, you can see here what happens. This comes out to be numeric,
right.
15
603
So, we will try to understand what is the operation of a factor so, but here at this moment
you can see that factor is trying to convert the nature of the mode which is from character
to a numeric. And, then I will try to go for here say here list command, say here mode of
say here list, say here UP and say here MP. And, you can see here this is here a list, right.
And, even in the list if you try to remove the character and if you try to put here some
values also, you can see here it will remain as a list. It is not going to change as a
numeric or a character.
Similarly, if you try to take here the mode of our data frame so, you can see here this is
also here a list. And, if you try to find out here the mode of a function like as here print.
So, you can see here this is function and if you try to take care any other say this function
like as here sum, you can see here this is again a function.
So, now we come to an end to this lecture and you can see here that this was a pretty
simple lecture. But, the main objective is that it was very important to understand what is
the difference between sorting and ordering, right. People try to take it exactly in the
same way and yeah they have the similar meaning also right. But, the way R is trying to
consider it that is more important for you to understand, because you have to do all the
operations in the R software.
And, you have to give the instruction to the R software, but what is to be done only you
know. So, that is why these operations are very important to understand that whether you
want to find out the values or whether you want to find out the values of their locations.
And, based on that you have to choose the sorting and ordering commands.
And, similarly for the mode although we already had done it, but now you can see here, I
have introduced here some more types of modes which we are going to handle in the
next couple of lectures. So, I thought that ok, let me give you here this brief idea so, that
you are comfortable at this moment.
Yeah, at the moment you may not be able to understand what is the difference between
list, factor or say data frame. But, that is my promise that I will try to clarify all these
concepts very soon. So, now my request is that why do not you try to take some
numerical values and try to operate with this sorting, ordering and mode operations and
try to see how do you get an outcome.
16
604
But, the main thing is for example, if I take the operation of mode. So, try to take some
value, try to first think what do you expect the mode is going to be like this and then you
try to execute it. So, if your thought process and the outcome of the software matches;
that means, you are done. You have understood that how R is going to think. So, you try
to think, try to practice and I will see you in the next lecture.
17
605
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 30
Lists
Hello friend. Welcome to the course Foundations of R Software and you can recall that
in the last couple of lectures we had talked about different topics related to the data
handling and data management. So, in this lecture also we are going to talk about a new
topic, that is about List.
So, now you know that in R software, we have different types of observations, for
example, they are numerical values, they are some characters etc. and when you want to
combine them together, combine them means you want to create a list of all such aspects
all such objects, then how to get it done.
So, list is an option by which you can create a list of different types of objects in the R
software and then you can handle it. So, what is this list first we try to understand once
you understand what is the list and how it is created then after that we will try to have
different types of operation. For example, if you have created a list then how are you
going to extract a particular element in the list, how are you going to extract a sub list
from the list and how are you going to make different types of data manipulations in the
list, these are the different topics which we are trying to learn here, ok.
Before we try to understand what is here a list, let me try to ask you a very simple
question. Try to consider a situation that happens in our home, suppose your mom ask
you that, ok, please go to the market and bring something and she tells you ok, bring
some milk, bring some cloth, bring some vegetables etc. etc. what is your first reply?
You simply try to say mom please try to prepare a list and then give it to me, what is this
list? That is what you have to understand, whatever is this list, this list is the same as in
the R software what we are going to consider today.
What do you do? You simply try to write whatever you want to buy in that list, you will
write number 1 milk, number 2 say vegetable, number 3 medicines, number 4 clothing
etc., all these things are different. Clothing is a clothing, medicine is a medicine and
1
606
above of all they are not really going to be sold at the same shop, you have to go to
different places.
So, now if you try to see what is a list; list is an object in which you are trying to club
objects of various types. And after this once we have prepared the list then you want to
do different types of operation which are helpful. For example, suppose you go to the
market and then you want to understand that what your mom has written in the vegetable
what you really want to have.
So, what you will say? You will try to pick up a particular number or a value of that
variable and you will call your mom and ask ok you have written here vegetable what do
you really want me to bring. So; that means, by the address of that object you are trying
to get some more information on it. So, these are various types of information which we
are going to handle here.
So, remember whenever you are trying to work on the list, we are going to have different
types of objects and based on that we have to learn how we can call an element and how
we can do different types of manipulations on the list. So, let us begin our lecture and try
to understand first this basic concept ok. So, now we try to understand here list.
So, list as you see here, list is combination of different object like as vector matrices
array etc. right, and they contain a one type of data right, that is numerical or say
2
607
character etc. For example, a vector may contain all numeric data or all the character
data. So, this list is a special type of object which contain data of multiple type and lists
are characterized by the fact that their element do not need to be of the same object type.
For example, in the list if you try to see here you can write down here at place number 1
clothings, then medicine, then vegetable etc. So, similarly you have different types of
data objects which is character, numeric, data frame, etc. right.
So, now, when you are trying to obtain here a list, what you have to understand that list
can have different elements and these elements may have different types of elements,
which may have different modes like as one mode may be character, one mode may be
numeric one mode may be something else.
So, this list is a platform that help us in collecting all such type of data on which we can
do different type of observations number 1 and list can even contain some other type of
structure objects such as lists data frame etc., which allow us to create the recursive data
structure. So, this statement you will understand as soon as we begin our lecture and we
try to understand the data frame that from the list also you can create here sub list, which
is again going to be a list and which has the structure data.
Means you know where is what data type of thing and the list can be indexed by its
position. For example, you are writing point number 1 vegetable, point number 2
3
608
clothing, point number 3 say this medicine and these elements can be accessed by its
position. For example, if I have a list suppose I call it here as say x. So, and I want to call
the fifth element of this list.
So, for example, I can write down here x and then I have to write here say double square
brackets and then I have to write down here the location or the index or the position of
that value which I want to call. So, what I am trying to tell you that ok, means different
values in the list can also be called by their indexes.
And these indexes can be a single value or that can be a means a data vector also. So,
when you are trying to give more than one indexes together, then it is going to extract a
sub list right. So, for example, if I try to write down here suppose x is here a list and you
are trying to extract the data at the second and fifth element.
So, it is going to be like here c and inside the parenthesis you write 2 comma 5, but this
is also a going to be a list. So, this is the sub list, but its modes is again going to be a list
and the elements of the list also can have names and you can call those elements by their
name also.
For example, if I try to write down here x and inside the double brackets if I write within
the double quote say Students, right, then it is going to refer to an element whose name is
the Students and then even I write this way or I can write down here x dollar Students
4
609
like this. So, they both of them are going to refer to the element whose name is Students.
Well I am not taking here to show you that these are the possibilities when you are trying
to do it.
Now, one basic questions comes here, then what is the difference between the vector and
the list. Because in vector also you can write number like this c here 5 and then you can
also write down here apple. So, this c is a numeric and apple is a character. So, what is
the difference between the list and the data vector?
So, in a data vector all the elements must have the same mode, right, unless and until you
give here all the numbers the mode is not going to be numeric. And so, that is the thing
and in the case of a list the elements can have different mode for example, one value can
be numeric another value can be character and so on.
So, that is the main difference between the two. So, let me try to give you here some
examples through which I can explain you the concept law of list in a better way, right.
So, let me try to create here two matrices. Now, you know how to create a matrix. So, I
try to create here two matrices x1 and x2. So, x1 is a matrix of order 2 by 2 in which the
data is going to be like 1, 2, 3, 4 and it is arranged by rho is my another 2 by 2 matrix in
which the data is 5, 6, 7, 8 and it is again going to be arranged by rho right. So, this is my
here x1 and this is my here x2.
610
Now, in case if you try to add here x1 and x2 then their sum is going to be obtained here
like this, you know how to get the matrix operation. But you can see here that when you
are trying to add x1 and x2, that is going to give you a result and you can see here all the
values here in x1 and x2 they are numbers, they are numeric.
Now, I try to do one thing, I try to replace one element by some character. So, in the x1
matrix I try to replace the value in the second row and first column as a “hello”. So, now
this x1 becomes here like this and you know how to address a particular element in a
matrix, but now after this if you try to see here that “1” “2” and “4” they are the
numbers, but “hello” is the character.
So, if you try to now obtain here x1 plus x2. So, you can see here that it is not going to
give you an output because these two matrices cannot be added because their modes are
different. One is numeric and the mode of this matrix which is now obtained after
replacing the element by “hello” that will become a character, right. So, this is the
difference.
611
(Refer Slide Time: 10:59)
So, now I try to just consider the same matrices x1 and x2 here, just like this what I just
shown you and which are here this matrix.
And I try to show you that how you can create a list from this matrices and then how we
can operate it. For example, the basic command to create a list is very simple, just try to
write down here the list all in lowercase alphabets and after that whatever elements you
want to join in the list or you whatever you elements you want to incorporate in the list
just try to write them inside the parentheses separated by a comma.
612
So, just write suppose if I have these two matrices x1 and x2. So, I write down here list
and inside the parenthesis x1 comma x and let us try to store this list in the name as
matlist. So, this is a short form say matrix list so that you can recall what we are trying to
do. So, now if you try to see the structure of this matlist what happens here, right.
So, this is here like this, you can see here your x1 matrix comes here and your x2 matrix
comes here right, and this is here something like this you can see here this is written here
in the double bracket, double square brackets 1 and double square bracket 2. So, this is
how the this now list will look like, right.
So, this list do not you think it is like that your mom asked you that you bring some
vegetables and then bring some clothings and bring some medicine and then you try to
create here a list like this. So, this vegetable is at the location number 1. So, this will be
here like this, clothing is at the location number 2 and medicines they are at location
number 3. So, that is the same thing what we are trying to do here under this operation.
And now in case if you want to extract a particular element, as I told you that it is
possible to extract a particular element in this list. So, how to get it done? We simply
have to write down the name of the list what we have given like as here matlist and then
you have to write down the square brackets and then you have to give the index, that
which element you want to have first or second. So, if I try to write down here say
matlist and inside the square bracket 1. So, you will get here the first value the matrix
here x1.
613
And if you try to write down here matlist 2, then you are going to get here the second
matrix here x2, right. But you can see here one thing that now here this 1 will remain as
1 in both the cases because that is now going to indicate you that what is the position of
this element in the new sub list, right. So, that is what you have to keep in mind. So, let
us try to first understand these operations on the R console here and then I will try to
show you that how are you going to handle it, right. So, let me try to first copy here this
matrix.
See here x1. So, this is here my x1 matrices and then I try to create here my x2 matrix
and you can see here this is my here x2. So, x1 is here like this and x2 here is like this.
So, you can see here x1 plus x2 this is here like this right.
614
And now in case if you try to suppose replace say x1 inside this x1 matrix if you try to
replace the element at the second row and first column, say here as say apple, right. So,
your x1 becomes here like this and x2 will remain here like this, but if you try to see here
this x1 and x2, this will give you an error because these things cannot be added, right.
So, now let us come back to our I mean this original matrices. So, that I can show you
what you really want to do. So, let me try to x1 create x1 as earlier and x2 is here like
this. So, now, if I try to create here a list so matlist. So, if you try to see here, I am simply
trying to write down here the name mat list and then I will type here list and inside the
parenthesis I will simply try to give the names of these two elements x1 and x2 and it is
created here, right.
10
615
So, in case if you try to see what is happening here at this mat list you can see here like
this right. So, this is here the 1st matrix which is here in the x1 and 2nd one here x2 here
is here like this.
So, you can see here that this is your here x2 right.
So, let me try to show you here more clearly it is here like this. So, now if you try to
extract here the first element or the element at the index number 1. So, you have to
11
616
simply write down the index number inside the square brackets and you will get here like
this one.
And similarly, if you want to have the value at the index number 2 in this list you have to
simply write down here the name of the list and inside the square bracket you have to
write down the index number. So, you can see here this is now your second matrix which
has element 5, 6, 7, 8. And if you try to see here this element x1 is in the first position in
the mat list this one and this mat list two, it is x2 is at the first position that is why this
double square bracket one is appearing here right.
So, now, we come back to our slide and try to create here one more example. So, that
you can understand that what is this matrix doing.
12
617
So, now, let me try to give you a more realistic example in the sense that earlier I had
considered only two elements and both were numeric. So, now I try to create here a list
with 3 different types of values. So, I try to take here my first data vector here as a
character, which has 3 values water, juice and lemonade. And then I try to consider here
the second value which is here rep 1 colon 4 each is equal to 2. So, the sequence 1, 2, 3,
4 is going to be repeated two times at the option each is equal to 2, so that is also going
to be a numeric.
So, the first value here is character second value here is numeric and third value here is
now matrix. So, that is the same matrix which we have considered just now your x2
matrix which has data 5, 6, 7, 8, it is the 2 by 2, matrix with number of rows equal to 2
number of column equal to 2 and the data is arranged by row.
So, now you can see here now I have created here a list of character data vector and then
here numeric vector and then here once again I am trying to take here matrix. So, now, I
am trying to create here a list of these 3 things together. So, you can see here that the
modes of these individual elements they are actually different right.
And now I am trying to combine them together with the command here list ok and I try
to give it here are names say z1. So, now, if you try to see the structure of here z1 this
will be here like this at the position number 1 you have here water, juice and lemonade
and this was located in the at the first position in the list you can see here like this.
Now, the second element here is 1, 1, 2, 2, 3, 3, 4, 4 what is this is the second value or
the value at the second position in the list, this is obtained from the command rep, right.
And after this what is your here third value? Third value here is a matrix and this matrix
was present at index number 3 in the list as say here matrix.
So, now you can see here you have created here a list which has got different types of
objects which may have different modes. So, this is the role of the list and here you can
see that the good part what you see here the matrix looks like only as a matrix, in the
order of the values inside the matrix is not changed, right.
13
618
(Refer Slide Time: 19:28)
So, this is here the screenshot of the same operation, I will try to show you it on the R
console also.
But before that I try to show you here very simple some operation that how are you
going to execute them. Suppose I want to access a particular element in the list then how
to do it? So, for that I have to use here an operator which is like double square brackets,
you have to write bracket square bracket and then write another square bracket. For
example, if you want to access the contents of the elements at the first place you have to
14
619
write down here the list name which is here z1 and then inside the double square
brackets you have to write down here the index of the or the location.
That means, you want to access the element in the list which is at the position number 1
or the index number 1. So, you can see here what is your here in the position number 1?
This is here water, juice and lemonade. So, this is coming here as a water, juice and
lemonade and now I have another job. I want to access here this second element of the
first object in the list, which is here juice. So, in order to obtain the juice value here you
have to see; what is this?
This is the second value in the object located at first position. So, now, I will try to take
here two option and I will try to show you that you should not make this type of mistake.
So, suppose I write here say name of the list z1 and after this in single brackets I write
down here 1 and 2 like this and I believe that ok this is the index number 1 of the list and
this is here 2 which is the index number of the element in the first position of the list.
So, now it will give you here a value here null and we wanted here juice. So, this is not
the correct command, what you have to inform here that, you want here the 2nd element
of what? 2nd element of the object which is located at the first position. Now, you have
to understand how you are going to address the object which is located at the index
number 1 in the list z1, it is here like this z1 and then you have to give two square
brackets and then you have to write one inside it. What mistake you have done here?
You have written only here a single square bracket.
So, that is trying to give you a value here NULL and if you recall that earlier in the
lecture we had understood the meaning of say NULL, when we consider the NA and
NULL. So, NULL is the value which never existed, that goes back to the same example
which I took that there is a student who has not got admission in the school. But when
we are trying to take the attendance of the student, then what will happen this is NULL
because that student was not admitted in the school.
So, that is why and if the student is absent today his position is going to be NULL. But
on the other hand in case if a student is admitted in the school and the and if the student
is absent today, then we are going to mark absent that will be your NA. So, this that is
why you are getting here NULL.
15
620
So, now the correct option here will be, if you want to access the second element of the
first object of the list, then you have to write the correct address here first, see z1 and
inside the double square brackets you write 1 and after that you write here inside the
square bracket 2. So, once you write this thing so this position is like here 1 as water, 2
here as juice and 3 here as lemonade.
So, now the control will come to the second position and it will give you the answer here
juice. So, that is how you have to understand that the things are working in this R
software.
And you can see here this is here the screenshot of the same operation which I shown
you here. But let me try to first show you these operations on the R console. So, let me
try to first create this here list.
16
621
So, I try to create here the and then I come to R console and I try to copy and paste my
command. So, this will give you here like this. So, this is your here actual list you can
see here, right. Now, in case if you want to access means any particular element here, I
can explain you here a very simple approach, try to see what is this here. This is the
address of the first element of the list, then this is the address of the second element of
the list and this is here the third value or the address of the third value of the list.
So, you can see here this is given here as say inside the double brackets, after this you
have to suppose you want to extract here the juice. So, you have to write the correct
address. So, there can be a confusion whether you have to write single bracket or double
bracket as I showed you in the example, if you try to see after that it is written here only
the single bracket.
So, you have to write simply z1, then double brackets whatever index you want and then
try to write down here single brackets and then try to give the location 1, 2 or 3 in matrix
you have to see that how these locations are going to work. So, now let me try to show
you here this operation which I shown you on the say here this one, this is the same
where I want to access a particular address.
So, you can see here I can write down here see here 1. So, this will give me here like as
water, juice and if I try to write down here say here suppose here z double square bracket
1 and then single bracket 2, this will give you here the juice, right, you can see here.
17
622
Now, in case if you try to simply make a single bracket. So, that is the mistake what I am
going to do here, because I want to x yeah here like here this then you can see here this is
going to give you the answer here NULL, you can see here because this value does not
exist here. So, now we come to an end to this lecture and you can see here that in this
lecture I simply have introduced the concept of list and I have explained you how you
can access a particular element in the list. And after this there are many more operations
which are possible on the list I will try to take them in the next lecture.
So, but more important part today is that you please try to understand this concept and
try to consider different types of objects which are numeric, character etc. and then you
try to create a list. Yes, I agree you cannot use here at this moment data frame and
factors. But whatever you have learnt try to consider them and try to see what happens
and one exercise for you, I have not shown you here on the screen, that in this case the
third element was your matrix.
Try to see how you can access different elements of the matrix in this list you know that
how you can access the particular element of a matrix, but in this case if you try to write
down here z1, with double brackets here 3 and after that you have to write 1, 2, 3, 4 what
do you get. Try to observe it. Because unless and until you experiment with see these
things you will not understand how R is going to work, how R is going to think.
So, you try to practice it and I will see you in the next lecture with more commands on
the list. Till then goodbye.
18
623
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 31
Operations on List
Hello friend, welcome to the course Foundations of R Software. And, you can recall that
in the last lecture we initiated a discussion on the topic of list. And, we had understood
that how list can be created and what are the different properties and characteristics of
this list. The list essentially can contain different types of objects and we had learned that
how we can access a particular element at a particular location inside the list.
So, now today we are going to extend this discussion and we will try to learn various
types of other Operations in the List. And, these operations are, suppose you want to add
something at some place or you want to remove something from some place and like
this. So, these operations are exactly on the same lines the example what we took on the
last lecture that your mom has asked to bring couple of thing from the market. And, you
ask her to please write them in the form of a list and you take the list and then you bring
the things. And, now suppose you are in the market and your moms calls you and says
ok, at please add some milk into the list. So, up to now you were, there were three
objects in the list; vegetable, medicine, cloth and now she is asking that ok, add the milk
in the list. So, you will add there.
So, what you have to do? You have to write the fourth value at some location. So, how to
get it done? Numbers two, there is another option that she asks you ok, whatever I have
written at the point number 2, do not bring it. So that means, whatever is written at the
second place in the list that has not to be brought. So, that is another option and there is
another possibility that you are in the market and she suddenly calls you and says ok,
there are couple of things which you have to bring, ok. I am sending you another list with
your brother or your sister. Now, you are in the market, your brother or sister comes
there and you combine both the list together and then complete the shopping. So,
similarly in the list also we have different types of options that how we can add, remove
or append or merge different elements and different lists.
624
How to get it done in the R software that is exactly what we are going to do in the lecture
today. So, let us try to understand and try to take some examples to understand how R is
going to work and what are the commands. So, let us begin our lecture, right.
So, in the last class, we had understood that how are you going to create a list. So, to
create a list, we simply have to write down here the command l i s t and then inside the
parentheses you try to give the values. So, I try to create here two lists, say list1 consist
of 3 numbers which are numeric 1, 2, 3 and list2 contains here three characters like as
say drinks - water, juice, and lemonade.
I try to write down here list and inside the parentheses I just try to write down all the 3
elements separated by this comma. So, now if you try to see here, these two lists they
will look like this. In the list 1, you have here 1, 2 and 3 and then the list number 2, you
have here water, juice and lemonade. Now, I want to merge these two lists. So, now, how
to do this merging of the list, that is first I would like to explain you.
625
(Refer Slide Time: 04:11)
So, in order to merge the list, it is very simple. Just treat the list as a data vector and
simply use the command here c and then inside the parentheses try to write down here
list1 comma list2. And, the both the lists are going to be joined in the order you are
trying to write them inside the parentheses.
So, suppose I try to combine both this list, I try to merge both list and their outcome is
stored in a list12, right. So, this outcome will look like this, now you can see here. This
first part this is your here list1 and this second part here this is your here list2. So, both
the lists are merged together without any problem.
626
So, you can see here this is not a very difficult operation and if you try to see it on the R
console also, this is my list number 1, this is my list number 2. And, now this is
appearing in the merge list at here and this is appearing in the merge list at here, right.
So, I will try to show you this operation on the R console also, but before we try to
understand one more aspect of this list. So, now, what you have done here that you have
converted a vector to a list, now I want to do the opposite thing. I want to convert the list
to a vector. So, for this operation you have to use the command here unlist, u n l i s t and
inside the parentheses you try to a give the name of the list which you want to unlist.
And, what will happen? This is actually going to change the mode, right. For example, if
you try to recall the mode of a list is actually list. So, when you try to list it or you try to
use the command unlist over the list, then what will happen? Let us try to see through
this example, right. So, let me try to consider the same list1 and list2 here.
So, now, if you try to see, I simply try to write down here unlist list1. Now, this is
converted into a vector. How to identify? If you try to see here in this earlier one, when
you were trying to look at this list number 1, it was looking like this that it has here and
this is like inside the double square bracket. But, now this is only here 1 2 3 and similarly
if you try to here unlist here this list2.
So, simply write unlist u n l i s t inside this parentheses and write down the name of the
list, list2. You can see here this is coming out to be simply here as a vector. And, if you
627
try to verify it here, you can see here that in the earlier one, your this list2 was looking
like this. It has the addresses in the format of double square. So, that was the particular
structure of the list.
Now, but suppose you want to verify what is really happening with this unlist command,
so obviously, if you try to find out the mode of see here list1 and list2, they are both of
them are going to be here list, right. You can see here, mode of list1 is list. But, when
you try to unlist it so, the mode of this unlist of list1 becomes here numeric. So, now this
is here a number, that is what I wanted to show you.
628
And, similarly I will try to show you that what will happen if you try to unlist the list2.
So, you treat try to just think what do you expect what will happen if you try to unlist the
list number 2, right. So, first let me try to create here this here two lists say list1 and list2.
So, you can see here, I try to consider here this list1 is like this and list2 here is like this
right. So, you have to just see.
Well, you have to write it like this ok. So, this is here list2.
629
(Refer Slide Time: 08:37)
So, you can see here now I can write down here, see here I want to now combine them
together. So, I try to write down here list12 is equal to say c, say list1 and comma list2
and you can see here what happens, now this is your here list12 like this.
630
(Refer Slide Time: 09:01)
You can see here that first 3 elements, they are going to be from the list1 and the
remaining 3 elements water, juice and lemonade they are from the list2, right. So, like
this.
So, now, if you try to unlist them, so, you can see here this is your here list1 and if you
try to unlist here list1 what happens? This is here like this. So, you can see the structure
very clearly. This is the structure of the list and this is only a vector. And, in case if you
try to find out the here the mode of this list1, you can see here this will come out to be a
8
631
list whereas, if you try to find out the mode of this unlist of list1, this comes out to be
here numeric, right.
And, similarly if you try to look here for here list2, this is here like this. And, what will
be the mode of this list2? This is a list. Now, you try to unlist, unlist here list number 2.
We can see here this will become just like a vector. And, if you try to find out the mode
of this unlisted list2, it comes out to be here a character; that is what I told you. This is
character, right.
So, now let us try to consider some more operations and now I try to show you that how
you can append something in the list. Means, append you know that you want to add
something, right. Now, you have two option, whether you want to append something at a
particular location or at the end.
So, first try to consider the same list, list1 and list2 here and you can see here both of
them have 3 elements. The list1 has three elements 1, 2 and 3 and the list2 has three
elements water, juice and lemonade. Now, suppose I want to add 100 in the list1. So, for
that I simply use the command here append. a p p e n d, all in lower case alphabets.
So, I try to write down here a p p e n d and then inside the parentheses, I write down the
name of the list and then I try to write down whatever is to be appended. I write here
100, now you see what will happen. Means, earlier your list was up to here 1 2 3 list1,
9
632
but now you have appended and now you have here the 4th element here 100. So, now
this is your here new list.
And, similarly if I want to add coffee in the list number 2 here, so, I try to use the
command here append a double p e n d and inside the parentheses I try to write down
here list2. The list in which I want to append and without giving anything, I simply try to
add here the name coffee. So, you can see here once you enter, this is these first 3
elements they are from the list number 1 and now the 4th value is added here.
And, one thing what you have to keep in mind here, it is very important that you have
not given any here location. But, whatever you are adding here that is simply coming at
the end, right, whether it is numeric or character. But, now suppose you want to add
these number at a particular location. Suppose, you want to add this 100 say somewhere
here or say coffee somewhere here for example, then how to get it done, right?
So, this is the screenshot of the same operation. You can see here that once you are
trying to add here 100, then after this, what is happening that this is coming here and
coffee is coming here, right.
10
633
(Refer Slide Time: 12:49)
And, now if you want to add something at a given location, suppose, you want to add
some number after given location, then what are you going to do? For that, we use the
command here, see here append. So, I try to take here the same list command, list 1, 2, 3
and list water, juice, lemonade. So, and I try to suppose I want to add here a number 100
at after the 2nd position.
So, suppose I write down here append then list1 and then whatever I want to 100,
whatever I want to add that I write and then I try to write down here after is equal to 2.
So, now you will see here this 100 is going to appear where? This is here 3, means this is
after 1, this is after 2 and this is here the position. And, similarly if you want to add
suppose coffee here after 2, then you have to give the command here like here append.
See, here append list2, then coffee and then after that you write after equal to 2, a f t e r
is equal to 2. So, you can see here the 1st element is your water, then 2nd element is your
juice and the 3rd element in the list was lemonade, but now this coffee has entered after
the 2nd position and this is now here, right. And, the lemonade is now shifted to the 4th
position. So, this is how you can actually append the list after a given position.
11
634
(Refer Slide Time: 14:21)
And, these are here the screenshot because it was quite long. So, I have shown you here.
This is your here list1, this is your here list2. And, you can see here once you are trying
to add here this 1st element of list1, 2nd element of list1 and 3rd element comes here
now here in the 4th position. Now, this 100 is entered here and similarly the same thing
is happening in the list2, that you have appended here the coffee, right.
So, similarly I try to give you here one more operation, that if you want to remove
something from the list. And, suppose you want to remove something then definitely you
have to give that what you want to remove and its location. So, if you want to remove
something at a given position, then you have to use the command here like this. You
have to write down here the name of the list, right.
12
635
And then you have to write down here the negative sign here and then you have to write
down the number of location or position. So, suppose I want to remove suppose the 2nd
element; that means, the element which is at the 2nd position. So, I will have to write
down here -2 and similarly if I want to remove something which is at the 4th position,
then I have to write down here -4 like this.
So, we try to consider the same example and let us try to remove some elements from
those lists. So, let us consider the same list, list1 is equal to like as here consisting of 1,
2, 3; list2 is consisting of water, juice and lemonade. Now, suppose I want to remove the
2nd element from list1. So, what is the 2nd element in your list1 which is here 2. So, you
simply write down here the name of the list which is list1 and inside the square bracket
simply write try to write down the -2.
So, you can see here now this 2 is removed and then you have only here 2 values 1 and 3
and yeah you can save it in a new list also and you can operate it. Similarly, if I want to
remove water from the list2. So, you can see here, this is the position number 1, position
number 2 for juice and position number 3 for lemonade. So, I have to simply write down
here list2 which is the name of the list, inside the square bracket I have to write down
here -1.
So, you can see here this water is removed and you have only here juice and lemonade
right which are corresponding to the 2nd and 3rd element in the list2. So, this is how you
can remove any element from a list which is given at some given position.
13
636
So, you can see here this is the screenshot. You have here list1 and from there you are
trying to remove the 2nd element. So, this 1st element comes here and the 3rd element
now become the 2nd element because and this 2nd element is now removed. And,
similarly in the here this list number 2, if you try to see here you wanted to remove the
1st element. So, this 1st element here water is removed and the 2nd element which is
here juice comes at the 1st place and lemonade comes at the 2nd place.
So, you can see here that these are not very difficult operations. But, let me try to first
show you these operations on the console so that you are more confident about them
right. So, first we try to create here these two lists and then we try to first try to use the
command here append right.
14
637
So, I try to consider here say here list1 and list2, that we already had clear have created.
So, you can see here this is your here list1 and this is your here list2, right.
So, now, suppose I consider the list1 and I want to append here this 100. So, you can
write down here, see here append and the name of the list, list1. And, then suppose I
want to append here 200. You can see here, now in the list1 there is only see here 3
elements and suppose I want to save it as say here list say 1 with 200, right.
15
638
(Refer Slide Time: 18:46)
So, you can see here this is now very clear 1200. What is this value comes out to be?
You can see here that now at the 4th position, this 200 is added.
And, similarly if you try to take here suppose here the list number 2 and you suppose, I
want to add here my own name. So, I write down here Shalabh. You can see here now
this water, juice, lemonade and after this my name Shalabh is added here, right. And,
now in case if I want to add suppose this Shalabh here means after 2. So, what I will do
here that I will simply try to write down here after is equal to 2.
16
639
(Refer Slide Time: 19:27)
And, you can see here that now the Shalabh is added. Earlier, it was at the 4th position,
now it is at the 3rd position.
And, similarly in case if you want to add here means in this list1. Suppose, you want to
append here a value like this one, that earlier you had added only here the value 200, but
I want to add 200 say after say equal to 1, right. You can see here. So, now, it is added
here, in the example you had added it after 2, right. So, this 1 2 3 etc., they are going to
17
640
give you the location, the index, right. So, you can see here now it is added at here after
the 1st position that is at the position number 2, right, ok.
So, now if you try to see here this is your here list1 and suppose I want to remove here
something, so if I try to write down here; suppose I want to remove the say this 2nd
position. So, I will try to write down here list1 and inside the square bracket I will write
down here -2 and you can see here this 1 and only 3 are there and 2 is removed.
18
641
And, similarly if I try to consider the list number 2 and suppose I want to remove here
the 1st element. You can see here, you simply write down here -1 inside the square
bracket and you will get here juice and lemonade, right. So, these are not very difficult
operation, but you have to simply keep in mind that what are the things that can you can
do and how this R is working over them.
So, now, I try to give you here one more operation that you have got here a list and you
want to extract a list or from this list. So, you want to create a sub list. So, how to get it
done? So, the operation is very simple, that you simply try to give here the name of the
list from where you want to extract and inside the square brackets you try to give the
location in terms of indexes and then it will work.
So, I try to consider here one more example where I try to take the list1 as the number 1,
2, 3, 4, 5, 6 and in list number 2, I try to add here water, juice, lemonade, tea, coffee and
milk. So, I have added some more elements in the earlier list1 and list 2. And, now I try
to execute these four commands right and I try to show you what happens, right.
So, yeah, so, I have written it here like this. So, this is your here list1 consisting of here
six numbers which is here like this, you can see here. Now, if you try to see I am writing
here list1 inside the square bracket 2 colon 4 which is 2, 3 and 4. So, you can see here
that the elements at the 2nd, 3rd and 4th position they will come here. So, the elements at
the 2nd, 3rd and 4th position were 2, 3, 4, now they are here.
19
642
And, similarly if you want to extract only some elements which are at particular given
position, simply suppose I want to the extract the elements at the 1st, 3rd and 5th
position. So, I give here list1, a name of the list and inside the square bracket I try to give
here the location of these values, the data vector.
So, I write down here square brackets and then 1 comma 3 comma 5 and you can see
here that it is going to give you this outcome. So, you can see here the elements at the 1st
position, 3rd position and 5th position they are extracted here, right.
And, similarly if you try to see the screenshot here, this is the screenshot of the same
operation which I shown you right.
20
643
And, now the same operation if you try to do from the list2, so, this list2 is consisting of
water, juice, lemonade, tea, coffee and milk. So, now if you try to write down here list2
and then inside the square bracket 2 colon 4, that is 2 comma 3 comma 4. You want to
extract a sublist consisting of the elements at the 2nd, 3rd and 4th position in the list2.
So, in the list2 if you try to see here at 2nd, 3rd and 4th position you have a three element
juice, lemonade and tea.
So, now, they are coming here as a juice, lemonade and tea and this is your here sublist.
And, similarly if you want to do the same operation that you did in the case of list1, that
you want to extract the elements in the list2 which are at the 1st, 3rd and 5th position. So,
you can see here at 1st you have water, then at 3rd you have lemonade and at 5th you
have coffee. So, if you try to operate it here, you get here a list which has here these
three water, lemonade and coffee.
And, this is here the screenshot of the same operation that you have here all the 6
elements and now here are the abstracted elements, right, ok. So, now let us try to do
these operations in the R software so that you get convinced that how the things are
working. So, if you try to see here, I am trying to create here these two lists.
21
644
(Refer Slide Time: 24:40)
22
645
So, this is my here list1 consisting of 6 values.
So, now, if I try to give here, suppose I try to extract here from the list1, so, I try to give
here the number 1 and then I try to give you here 2 colon say 5. So, the values as 2nd,
3rd, 4th, 5th will come here, right. So, you can see here.
23
646
(Refer Slide Time: 25:14)
And similarly if I try to do the same operation in the list2 also, you can see here that this
juice, lemonade, tea and coffee, they are the outcomes.
And, suppose if you want to extract a sub list which is consisting of elements at
particular position, see here 2 comma 5 only. Let me try to take only 2 values here. So,
you can see here now this will give you the 2nd and 5th values in the list1 which are here
2 and 5, easy to remember. And, same operation if you try to do in the list2, you will get
here the values at the 2nd and 5th position, which are juice and coffee, right.
24
647
So, now we come to an end to this lecture and we stop here. So, you can see here now
you have given, we have understood the various types of very elementary operations for
the list. And, you can now imagine that whenever you are trying to handle the data, you
always try to extract the data, merge the data, combine the data etcetera; then how are
you going to do it, right?
So, these are very simple commands. But, the main important part what I always say is
that you have to understand that how R is functioning, how R is trying to behave with
this commands. So, that once you understand the behavior of this command, then you
can modify or you can control your statements to do something in the R software. So,
you try to practice it.
Why do not you try to take some examples, try to create examples yourself which are not
difficult, because creating a list is not an, not a very difficult thing. And, try to think
about the common operations and try to execute them in the R software. And, see are
you getting the same outcome what you expected, I am sure you will get it. So, you try to
practice it and I will see you in the next lecture.
25
648
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 32
Vector Indexing
Hello friend, welcome to the course Foundations of R Software. You can recall that in
the last couple of lectures we are talking about the tools for the data management and we
have learnt a couple of commands for the data manipulations. So, in this lecture also we
are going to continue on the same broader topic and we are going to talk about the
indexing of the vector that is Vector Indexing.
So, the first question comes here; what is a vector index? Can you recall that whenever
you try to look into your books, after the cover page the first thing is index. What does
this index give you? First of all on the left hand side usually there are the topics which
are the contents of the book and on the, right hand side, there is a page number
corresponding to every topic.
So, similar is the operation in the data vectors also, that when we have a data vector we
have the values of the data vector they are located on the first, second, third, fourth
positions. So, now, we need to understand how we can correspond between the values of
the variable and the location of those values and beside those things we will try to
understand some other very small topics which are very important for the management of
data tools. So, let us begin our lecture and try to understand.
1
649
So, first let me try to give you a very simple operation related to the vector. So, suppose I
try to consider here a data vector which has got the values 1 to 10 like this,, right, this is
here x. Now, I just want to show you a couple of operations which are related to this
vector. So, now, suppose if you want to know that how many values in this data vector
are more than 5 and what are those values,, right.
So, in case if you want to know that which values are more than 5 so, for that you can
use here a command like here x greater than 5. So, this is going to give you an outcome
like here for the values 1 2 5 it will say here false and for 6 to 10 it will say here true.
now, your objective is not really to find out only the true and false, but you want to know
that which are those values which are greater than 5 or in simple words which are those
values for which x greater than 5, says true, right.
So, now if you want to do it the rule is very simple just try to write down the name of the
data vector and write down the square brackets and enclose the condition inside the
square. So, if you try to see here when I am trying to write down here x greater than 5
that is going to give the outcome as TRUE or FALSE.
But now I am trying to do here is the following, that what are the values in this data
vector x which are greater than 5 or which are the values of this data vector consisting of
TRUE and FALSE in this x greater than 5, what are the values corresponding to which
we have TRUE. and it will give you the answer here 6 7 8 9 10.
So, you can see here that these are the values here 6 7 8 9 10 which are more than 5.
Similarly, if you want to know that which of the values in this data vector x are even or
say odd, in that case what we can do for each of the values in this data vector x we can
have the modulo operation modulo division by 2 and in case if the remainder comes out
to be here 0 then we have to find which are those values corresponding to which we have
the remainder 0.
So, when you try to write down here this x modulo division 2 is exactly equal to 0 once
again this is going to give you the answer in terms of TRUE and FALSE. And you need
to find out what are the values corresponding to which we have the TRUE outcome or
the outcome as TRUE T R U E logical TRUE and then when you try to write down here
x and try to write down this logical condition inside the square bracket this will give you
2
650
what are the values in this data vector x corresponding to which the modulo division with
2 gives the remainder 0.
And this comes out to be here like this 2 4 6 8 10. So, what will happen? It will go for 1
2 3 4 5 and so on when it goes to 1 answer comes out to be here FALSE, then it goes to 2
answer comes out to be here TRUE, then it goes to 3 the answer comes out to be here
FALSE, for TRUE the answer comes out to be here TRUE and so on and then FALSE
values will not come here only the TRUE values will be reported here or the TRUE is
coming corresponding to a values 2 4 6 8 and 10.
And similarly, if you want to know about the out so, you simply try to do the modulo
division with respect to 2 and you want to find out those values corresponding to which
we have the outcome as 1 that is the remainder is 1. So, you try to write down here x and
try to see that what are the values in this x for which the modulo division with 2 is giving
you an answer as FALSE F A L S E logical FALSE and you try to write down this
condition inside the square bracket and you get here like this here 1 3 5 7 9.
So, all those values whose remainder is coming out to be 1 they will be reported here out
of the vector, ok.
Similarly, I try to show you here one more operation and then after that I will try to show
you these things on the R software. Suppose, I try to replace the 5th value in the data
vector by NA so, here I am trying to inform you here one more operation that if you have
3
651
a data vector x and if you want to access any particular value then you have to write the
square bracket and inside the square bracket you have to write the location index.
For example, if I want to access the 5th value in the data vector x I will write down here
x and inside the square bracket as 5. So, now, you can see here you are trying to replace
the 5th value in the data vector x by NA. So, now, you can see here this data vector is 1 2
3 4 and then 5th value 5 is replaced by NA and then we have a 6 7 8 9 10. Now, you try
to do logical operation here and you want that the missing values are removed from this
data vector and all the available values are stored in a data vector y.
So, what I try to do here that in order to know that which are the values, you can see here
I am writing here this symbol a symmetry sign that was the negation logical dot is dot na,
na inside the parenthesis x, right and you know that is dot na is a command to find out is
there any missing values NA in the data vector x and you are trying to say that negation;
no, you do not want.
So, whatever are the values which are available which are not missing in the data vector
x they will be reported here, you can see here this outcome will come out to be here 1 2 3
4 and then 6 7 8 9 10 and 5 here is missing.
So, now the advantage is that if you try to find out the mean of x which has got here NA.
So, you will not get here any value you will get only here NA, but in case if you try to
find out the mean of the values in the data vector y this will come out to be here like this
5.55, right.
4
652
So, and you can see here this is the operation on the R console. So, let us try to first see
these operations on the R console and then we will try to see here, right.
So, now I try to create here a data vector here x, x here is like this and suppose I want to
know here that what are the values in x suppose which are greater than 6 this will give
you here 7 8 9 10. And similarly, if I want to know here what are the values which are
smaller than 6 you can see here 1 2 3 4 5, right. And similarly, if you want to know about
this even odd then what I can do here, I would like to find out those values of here x for
which the modulo division with 2 is giving me an answer which is logically equal to 0.
So, all those values when divided by 2 if they are giving the remainder 0; that means,
that the number is even and similarly if you want to know what are the odd values. So,
you can have the modulo division with 2 and try to see where the remainder is 1 and
these are this value 1 3 5 7 9, right.
5
653
And now this is your here x and if you want to x is any particular value say I want to
access here 5th value. So, this is here 5 and in case if you want to replace this 5th value
by here NA. So, you can see here because here is like this and now x become here like
this.
So, now I want to store here all the values in x which are dot equal to NA. So, I try to
write down here negation sign is dot na and then here x like this. So, you can see here y
contains here all the values except the 5th value which is here NA and now if you try to
find out here mean of here x it will give you here NA, but if you try to find out the mean
of y this will give you here this value.
So, you can see here this is very useful operation and it will try to help you in searching a
particular type of value and the condition for searching can be given in the square
brackets and just outside the square brackets try to see the vector in which you want to
do the search operation.
And besides those things you can do some other type of operation which I will try to
show you with this operation.
Now, just about the indexing I want to show you here that, how do you index? So, if you
try to take here the same data vector is from 1 to 10 then what is this indicating x inside
the square bracket minus 1 colon 5?
6
654
So, it is something like here x see here -1, -2, -3, -4 and here -5, right. So, what is
happening here? So, it is trying to read from the end and it is trying to go in the negative
direction because it is here minus. And then it is trying to count here the last 5 values
which are here 6 7 8 9 10 and it is giving you here.
So, this is the same outcome that if you try by writing x inside the square bracket from 6
to 10 that is 6 colon 10. So, this will give you here the outcome 6 7 8 9 10, right. So, you
can see here that the same outcome can be obtained by the two different options, but
surely you have to keep in mind the mathematics behind it that how it is working, right.
Now, let me try to give you here one more example and this operation is about changing
the names in the list, right. So, names is a function here for example, n a m e s this is
used for functions to get the or to get the names of an object. What really happened that
when you are trying to give some values inside a data vector or a list then R
automatically give it’s a name, right and you want to change the name. Then how to get
it done? Once you change the name then you can access those values by the name
instead of the index.
So, for example, I try to consider here a list where I am trying to create here a list with
the elements which is number a 1 equal to 1, the second value a 2 is equal to “c” which is
a string character and third value here a 3 as the numbers 1 2 3, right. So, now, if you try
to see here you are trying to give it here a name say a 1, a 2, a 3, right and earlier if you
7
655
try to see if you do not give any name possibly it was using like this double square
bracket.
So, now if you try to see the construct of the name this is here like this z this is 1 “c” and
1 2 3 and the first value has got the name a 1, the second value has got the name a 2 and
third value has got the name a 3 as you have given in the list.
Now, in case if you want to know that what are the names in the list, somebody has
given you the list and you want to know the name, name of all the objects in the list. So,
you simply have to write down here names n a m e s and then parenthesis you try to
write down the list. So, this will give you here like this “a 1”, “a 2” and “a 3” which are
your strings and this is the same operation that you can do on the R console also and this
is the screenshot.
Now, I want to change the name, suppose I want to change the name of the third object.
So, third object here is like this here and whose name is a 3 and suppose I want to change
this name as a “c 2”. So, how to get it done? Very simple, try to understand how I am
doing it, just trying to write down the about names and n a m e s and then in the
parenthesis try to write down the name of the object which is here z.
Now, you want to access the third element. So, you try to write down here is square
brackets and then inside the square bracket you write 3. So, this is now indicating that
8
656
you are trying to find out the names of the object z and then you are trying to find out the
name of the third value in the z.
So; obviously, this will be here something, but you are trying to assign it a new name say
“c 2”. So, you try to write down this name within the double quote because this is a
character and now you see what happens. This in the z first and second names remain the
same, but the third name is change here as “c 2”, right.
And similarly, in case if you want to do that you are given the names and corresponding
to the names you want to know the value then how to get it done. So, let me try to take
here one more example in which I am trying to take here 3 values say water equal to 1,
juice is equal to 2 and lemonade equal to 3, right.
So, now you can see here intentionally I am not taking here list, but I am simply trying to
take here data vector here c. So, if you try to see here the names in this vector x are like
as “water” “juice” and “lemonade” and now you want to know in this data vector x what
is the value of the juice.
So, you try to write down here the name “juice” inside the square bracket and then try to
write what is the value inside this data vector x corresponding to which the name is juice
and if you try to enter here this will give you here value 2. This value 2 is this value here
and this is here the screenshot of the same operation, right.
9
657
(Refer Slide Time: 15:26)
So, and then you have one more very simple operation that when you are trying to write
down here x equal to 1 colon 10 it will give you this output, but if you simply write
down here x and then here after this if you simply write down here x and then this square
bracket here this will give you the same outcome. So, depending on your need means
you can use this arguments and your data manipulation will become simple.
So, now, let me try to show you these commands on the R console here, so that you get
here more confident.
So, I try to create here one data vector here say here 1 to 10 which is here like this. Now,
if you want to see here what will the value from here 1 to 5 you can see here this is going
to be a 1 2 3 4 5 or yeah you can write it directly here like this, but now if I try to write
10
658
down here say minus times this 1 to 5 you see what happens like this, right. So, and this
is same as if you try to write down here 6 to 10, right, there is no problem in this. So,
now, I try to create list.
So, this is your here list and if you want to know the what are the names in this z you can
see here like this, right. And in case if you want to change the names at the third place
you can see here names z and if you try to see here 3 what is this coming out to be here,
“a 3” and now you try to write down here that I want to give it here a name say here
“Shalabh” and enter here, right.
11
659
And then you try to find out here the names of here z you can see here what happens here
“a 1” “a 2” and “Shalabh”. So, this is how you can operate with the names, right.
And in case if you try to take here a simple here list like as here 1, 2 and 3. So, you can
see here this is your here like this and if you try to see here names of here z they are
simply here null because you have not given it any name these are the default indexes
which are taken here, right, but in this case if you try to see here you want to give the
second value a name say here say “second”. So, you can see here now the names will be
here like as NA “second” NA, right.
12
660
So, this type of means practice you can do yourself and I will try to show you here
something more. Well, so, if you have supposes got a list which has got the mixed mode
mixed mode means you have you do not have all the data values of the same mode like
as all say numeric or all the characters.
So, in this case if I you have if you want to arrange them in the form of a matrix, it is not
actually matrix because matrix is something like means a mathematical operator, but
what I am trying to say here that you want to arrange the values in some rows and
columns then how can you do it, right. So, let me try to show you this with an example
suppose I want to create here suppose I create here a list of the 6 values 1, 2, 3 and say
“X”, “Y”, “Z” and suppose I try to give it a name here ab and then I use here dimension
of ab are equal to c 2, 3.
So, you are trying to inform ab that this values that the values which are here in this ab
they have to be arranged in two rows and so you can see here just like the operation that
happens in the matrix similar type of operation will happen here that 1 to 2 it will come
then goes to 3, then comes to here 4, then comes to here 5th value and then to 6th value.
So, it will be like here 1, 2, 3, “X” “Y” “Z”, right. So, this is but remember one thing this
is not a matrix.
What is the mode of this print ab this is here “list” and these are the operation which you
will do on the here this one, right. So, now, I try to show you this operation also on the R
13
661
console. So, that you can be confident that these things are working this is your here ab
you can see here like this, right.
14
662
(Refer Slide Time: 19:36)
So, now you try to give here dimension of ab is like say c let us try to do give 3 comma
2, right and then if I try to see here print ab and see here this is like here 1 2 3 “X” “Y”
“Z”. And if you try to give here see here 2 comma 3 then it will be like here this, right.
So, you can see that the data is arranged column wise and if you try to see the mode of
here this one you can see here this is here list, right.
So, now, we come to an end to this lecture and you can see here that we have considered
very elementary operation in this lecture, my objective was that there are many many
things which are very small and I want to compile all of them in this lecture. So, that
means I can give a logical break to such operations, but definitely there is a very long list
of such operations which can be done over the R software.
So, I will stop here with these operations, but my request to you all will be just try to
look into the books try to look into different resources and try to see what are the various
other operations which are available and particularly the type of operation which you
want, because you are doing work in a particular area, in a particular field and you would
like to continue in that. So, you will need those tools for those operations.
So, I am so, I have tried my best to cover here the most commonly used operation, but
you will need to know such operation in the area in which you are working. So, you try
to look for them, try to practice them and I will see you in the next lecture till then
goodbye.
15
663
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 33
Factors
Hello friend. Welcome to the course Foundations of R Software. Now, in this lecture, we
are going to begin with a new topic which is about Factor. So, as usual, what is the
factor? That is my first question. Now, if I try to give you a very simple example, you
will understand it very easily. You have seen that many times we try to collect the data
on the gender of the people like as male, female, male, female etc.
Now, when we are trying to collect the data male and female, then we always try to give
it a value. For example, we will write if the person is male, I will write 1 and if the
person is female I will write 0. Why this is needed? Because, you see means anyway
when you are trying to collect the data after that you want to do some mathematical
operations on it. For example, if you want to know that out of 100 students how many
are male and how many are female.
So, you need to count them and for counting them inside a software you need to convert
them into a numerical value. So, how to convert those characters, because male is a
character female is a character, how you can convert those values into a numerical value,
that is the job of this operation factor. And, in statistics this is called as categorical
variable or binary variable. Binary means when there are only two classes, but if there
are more than two classes, in general you can define it as a categorical variable.
For example, if you are trying to take some data on the that how people feels after
drinking a coffee, that can be good or say bad or say ok. So, now, there are three
categories. So, now, for these three categories, you want to assign them three numerical
values. So, how to get it done? That is the job what we are going to do in the lecture
today.
And, we will try to see how we can convert or how we can associate a number with the
categorical variable. And, that is the job which is called as factor in the R software,
means we are trying to do the factorization.
1
664
(Refer Slide Time: 02:45)
So, let us try to begin this lecture and try to learn how to do this factor in R software, ok.
So, if you try to see we have two types of variables. One is here quantitative variables in
which we try to get the data on say their numerical value like as height that is measured
in meters 1.65 meter, 1.76 meter etc. And, another type of variables are here qualitative
variables for example, the tender that is male or female performance can be excellent,
good, average, bad etc.
2
665
So, now we always want to convert this data into some numerical values. For example, if
I am taking the variable here as a gender. There are two possible values male and female.
So, I try to take it here of the variable X equal to 0, if the person is male and X equal to 1
if a person is female. And, similarly if there is some performance and that performance is
like excellent, average, good or bad, then we try to give it a value here.
If the performance is excellent, we give the number 1, if it is average then we give the
number 2, if it is good then we number 3 and if it is bad then we try to give the number
4. So, now these values which we I am writing here as a excellent, average, good, bad or
say male or female, they are called as labels. And, whatever the numerical code I am
trying to assign them like as here 1 2 3 4 or as say here 0 1 in the case of male or female.
They are the numeric code which are assigned to these labels and these categories are
stored internally actually as 1 2 3 4. And, these labels have been chosen in such a way
such that they provide some meaningful names for each of the code here right, like this.
So, now the main important thing what you have to learn is that the factors in the R
software they represent the categorical variables and are used as grouping indicators,
right. You will see in statistics that many times we are trying to do categorical variable,
categorical analysis etc.
3
666
(Refer Slide Time: 04:57)
So, in R the same thing is called as factor. So, let me try to give you here one more
sample and try to explain you first that what is this factor. Suppose, we have a balls of
three colours, say here red, blue and green. Now, we try to give them a number. For the
colour red, we give the number 1, for blue we give the number 2 and for green we give
the number 3 like this.
And, suppose now we have five balls, they are like this. First ball is red, then the next
two balls are green, then the fourth ball is blue and the fifth ball is red. So, now this
outcome of five balls can be coded by the numbers also. How? This red can be indicated
by 1, green can be indicated by 3, blue can be indicated by 2 and this red can be
indicated by 1. So, if somewhere it is written like say here I have got the ball number 2
and 3; that means, I have got the blue and green balls. This is what we mean, right.
4
667
So, each this character is mapped to a code and the factors they represent the categorical
variable and are used as grouping indicators. And, these categories are stored internally
as numeric code and these numeric codes have been given some labels. And, this label
provides some meaningful names for each of the code like as 1 is red, blue is 2 and so
on.
The order of this label is very important; because for example, the first label is mapped
to code 1, second label is mapped to code 2 and so on, right. So, the values in this course
are always restricted to some finite number say 1, 2, k; otherwise how will you give it a
number. So, this k values are going to indicate the k discrete categories, right. For
example, in this case the color red ball is mapped to code 1, blue color ball is mapped to
code 2 and green color ball is mapped to code 3. So, this is the name and its code.
5
668
So, when we are talking about factors then we have a vector of character string or the
integer. And, whatever in statistics we call as categorical variable that is called as factor
in the R software. And, in the R software, each possible value of a categorical variable is
called as level. Now, you have to be very careful with my audio because now there are
two words label and here is level, right.
So, please try to be careful whether I am saying label or level. So, label will be like this
and level will be like this. So, a vector of levels is called a factor. And, a categorical
variable is characterized by a number of levels which are called as factor levels, right.
And, here in this case the number of factor levels are going to be finite, ok.
So, now the question is how to define the factors in the R software? So, to define it, we
start with a vector of values. And, then we try to define another vector, the second vector
that gives the collection of possible values. And, then there is a third vector that gives the
labels to the possible values, right.
6
669
(Refer Slide Time: 08:13)
And, in order to do it, we have a command here factor f a c t o r. So, this is a function
that encodes the vector of discrete values into a factor, right. So, if you have some data
vector or a vector of strings or say integers, if you try to write down here factor x, then it
will be done. For the command is like here f a c t o r and inside the parenthesis you write
here x.
And, then in case if you have a vector that does not contain all the values, but it contains
only a subset of the possible values, then we try to include one more argument that gives
the value of the possible levels of the factor, like this factor x. And, then inside the
parenthesis say x comma and then you write down here labels. At this moment, you
simply try to understand what I am trying to say and as soon as I take an example, means
each of this thing will become very easy that is my promise to you right.
7
670
So, the usual command for the factors in the R software is here like this, that we write
here factor and then here x is the vector of integer or a strings and then we have here
levels. So, this levels are going to determine the categories of the factor variable and
usually they are coated out with all the individual distinct values in the x. This is an
optional vector, this is a labels, right.
Labels of the categories in the labels here, right. The names of labels are stored in the
labels and after that you have here exclude. So, exclude is going to handle the missing
values. So, it defines which levels will be classified as NA, if the output is a factor;
output is obtained using the factor variable.
And, if you want to have more information about this factor function, I will request you
that you please try to look into the help menu right. And, then you can see here there are
factor variable and then you can see here is dot factor etc., right.
8
671
(Refer Slide Time: 10:16)
But, anyway now let me try to give you here one example and which will make the
things very clear. You all know what is this, this is a die and this dice has six faces, in
which there are these types of dots here 1 dot, then 3 dot and so on. So, there are six
possible values 1, 2, 3, 4, 5, 6 right when you roll a die. So, suppose die rolls seven times
and we observe the point 1, then 4, then 3, then 5, then 4, then 2 and then 4. And, all
these values are stored in a data vector y.
Now, as you know that there are 6 possible values in this die. So, I try to store all these 6
values here in a data vector c 1, 2, 3, 4, 5, 6 and I try to give it here a name possible dot
dieface. So, I am just trying to give it a name which conveys the meaning also. So, these
are the possible values of the points on the face of the die, right.
9
672
Now, what we want here? We want here that now we have got this outcome and this
outcome is coming out of these possible values and I would like to write this 1 as one, 2
as two, 3 as three and so on. So, that when I am getting here an output like 4, I should be
able to see here as f o u r four. So, how to get it done? Let us try to understand. So, first I
try to define here the labels of this die face.
So, for these values which are here 1, 2, 3, 4, 5, 6, I try to define here a corresponding
data vector which has got these values which are string like as one, two, three, four, five
and six. And, I give it a name here labels dot dieface, right and now I try to take this
data, this here y. And, I try to use here my command factor and I tells the levels here are
the possible dieface which is here vector c 1, 2, 3, 4, 5 and 6.
And, then the labels on this possible dieface is obtained by this command, labels which is
here. This data vector labels dot dieface which is here like this.
So, now if you try to operate it, what do you see? And, whatever is the outcome trying to
store it in the variable name facy; that means, the factor of y. So, now, if you try to see
what is the outcome of this facy? Let us try to give you here the value like o n e one, f o
u r four, t h r double e three, f i v e five, f o u r four, t w o two and f o u r four. What are
these things? Can you recall you here this outcome data vector which was here 1, 4, 3, 5,
4, 2, 4.
10
673
And, now you can see what this factor command has done, this has converted this
number 1 into o n e one, number 4 into f o u r four, number 3 into this t h r e e and
similarly here 5, 4, 2, 4, this is here like in say character five four two four. And, these
are here the levels. And, this levels here are indicated by here one two three four five six;
that means, these are the level which have been used. And, the data in the y vector has
been converted using these labels, right.
So, that is how the things work and if I try to show you this operation on the R console,
possibly this will make the things very clear here.
11
674
So, this is now your here y and this is your here the data vector possible dieface and then
I try to define here this labels. And, then I try to define here this here operation, this
factor right and I try to store it in the variable facy. So, you can now if you try to see here
what is your here facy? Like this and if you try to yeah change your data vector, suppose
if I try to take here the data vector like as here say 3, 2 say 3, 2 and so on.
Then, if you try to once again do this operation factor, then you can see here what is the
output here? This facy is 3, 2, 3, 2 and now it is three two three two, but in words. So,
this is how now you can see that these operations are working and now we come to an
end to this lecture. You can see that it was a very simple topic. You are simply trying to
convert some characters into some numerical values and those numerical values are the
levels which are provided by you. Well, these things are very common in real life when
we are trying to conduct some experiment and trying to collect the data.
Because, whenever you are trying to analyze the data, you always need to associate some
numerical values; unless and until you associate the numerical values, you cannot do the
computation, you cannot do the calculations. For example, the way you try to work that
how many students are male, how many are female, what you try to do? Whatever is the
data say male, female, you try to give them a value like 1, 0 and you simply try to sum
them 1 plus 0 plus 1 plus 0.
And, that will give you the number of people in the category 1 and the remaining will be
in the category 0. And, now you have to see 1 correspond to the male and 0 corresponds
to the female. So, that will give you the total number of male and female students in the
class or in the school. So, now, we stop here and it is your turn to take some example, try
to practice it, and I will see you in the next lecture with some more new topic.
12
675
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 34
Factors - Class and Unclass
Hello friends. Welcome to the course Foundations of R Software. You may recall that in
the last lecture, we initiated a discussion on the Factors, and we had understood that how
factors work and how do they help us. So, now, in this lecture, I will try to give you
some more applications and commands for the factors.
For example, if you have got a data vector how can you convert it into a factor and
beside that thing there is a very important application of the factors when we are trying
to use factors with the command Class and Unclass, right. For the class, in one of the
earlier lecture we had briefly talked about it, but now today I am going to give you an
application of the class and unclass function using the factors. So, let us try to begin our
lecture and try to understand the further topics in this factors.
So, you see you can recall that in the last lecture we had learnt about the command
factor, f a c t o r, all in lower case alphabet and this factor function actually encodes the
vector of discrete values into a factor. Factor means we had understood in very simple
676
language that it is a categorical variable, right. For example, if I take an example that a
student has to be classified as male or female, then suppose male takes value 1 and
female takes value 0. So, then the data will look like 1 0 0 1 etc., where 1 means male
and 0 means female.
And similarly, if we have some competition in which people are trying to perform and
we are trying to give the grades they are excellent, good, better, worse etc. So, they can
also be classified as a categories 1, 2, 3, 4. For example, 1 is indicating excellent, 2 is
indicating good and so on, right.
So, these are our factors and if you want to use this factor, you have to give the data
vector here as x and then you have to specify the levels and then you have to specify the
labels. So, levels are going to contain the name that you want to assign to the levels,
right. And then you also can handle the NA values using the option exclude.
So, after this, I would like to show you couple of things through the example. So,
suppose you have got a data vector, numerical data vector and you want to convert it into
a factor. So, how to get it done? So, for that the command is as dot factor, a s dot f a c t o
r. So, you can see that as dot factor is a similar command as you did in the case of as dot
numeric, as dot character etc.
677
So, let us try to understand that how it works. So, we have considered here a data vector
which is consisting of these values which are 3, 4, 5, 6, 1, 2, 3, 3, 4, 4, 5, 6, right. And I
want to convert these numerical values as factor. So, factor means we know that it will
try to identify the unique numbers out of this data vector x and then it will try to classify
them as factors. And it will give the name them as say those unique numbers.
So, if you try to see here I write down here as dot factor and inside the parenthesis I write
down the data vector, and whatever is the outcome I am trying to store in a variable y.
So, now the outcome looks here like this. So, if you try to see here, it is trying to give
here this data which is now in the form of a factor and after that it is trying to give you
the levels and this levels had been given the labels as 1, 2, 3, 4, 5, and 6.
So, one point which you have to observe here is that when you are trying to consider a
numerical data, then these single levels are ordered numerically. For example, you can
see here this. If you try to see here they are arranged like this one, means first and then in
an increasing order, then 2, then an increasing order 3, and 4, and 5 and then 6 finally.
So, this is how the factor a command work when you are trying to deal with the numeric
data and you want to convert it into a factor.
Similarly, if I try to take here one more example where I am trying to consider the data
which is in the form of a strings; so, suppose I try to take here a data vector, say here
678
which has values like lemonade, lemonade, juice, lemonade and water. So, you can see
here there are 3 possible categories. There are exactly 3 categories actually juice, water
and lemonade, right. So, now if you try to factor it, then you can see here what happens.
So, this is your here data, data vector, right.
So, if you try to see here I am trying to store its outcome as factor in the value here x. So,
this comes out to be here like this. So, we can see here first this data vector is reproduced
here just like here this, and after this it has automatically assigned the levels. So, if you
try to see here there are three unique values in the data vector which are juice, lemonade,
and water, and these values have been assigned here as say levels, juice, lemonade and
water.
And if you try to recall that when we had considered this numerical value, then these
values were arranged increasing order mean the values of levels arranged in the
increasing order. But when we are trying to handle the strings then these single levels are
ordered alphabetically, right.
Alphabetically means first the word whose alphabet comes earlier that will appear, and
after this the next alphabet that appears in the list of alphabets that will appear and so on.
So, you can see here you have here three words juice, lemonade, and water. So, j comes
first, then comes l then comes w, right means in the sequence of a, b, c, d up to z. So,
juice comes here at the first position, then lemonade, and then water. So, you can see
here this levels have been arranged in an alphabetical order, right. So, this is an
application of this here factor.
679
And this is the screenshot of the same operation which I shown you. So, look let me, try
to show you these operations on the R console also, so that you become here more
confident.
So, if you try to see here x here, like here, this we can see here this is your here x, here
like this. But if you try to convert this x into a factor then you have to write down here as
dot factor x, and you can see here that these values have been arranged like this, the
different levels have been assigned to this data vector and you can see here this is
happening.
Now, similarly if you try to take here this is the string variable means all the values are in
the form of a string, then we can see here that this comes out to be here like this, all the
values are here and then levels are assigned in an alphabetical order, juice, lemonade,
and water, right. So, you can see here, this is a very simple operation which is not
difficult, but it gives you good way that how you can convert the numerical and
characters into factors.
680
(Refer Slide Time: 07:40)
After this I would like to give you one more application using the factor command. This
is about the class and the opposite of class will be unclass. So, you can recall that in one
of the earlier lecture I had briefly explained you about the this function class, c l a s s, all
written in the lower class alphabets. And we had understood that this class actually gives
us the information that what is the class of the object in the R Software. Means every
object in the R Software has got a class, and this function reports the class of that object.
And we have different classes like as here numeric, logical, character, list, matrix, array,
factor and data frame, right. And these type of this class objects they are used essentially
in the object oriented programming in R. Well, we are not considering here that object
oriented programming, but this is for those who are quite familiar and expert in the
programming.
681
Well, firstly, let us try to understand that what is the meaning of this class and how it
gives us the information. So, if you try to take here a number say 9 and if you try to find
out its class then I have to write down here class, and inside the parenthesis 9 and it will
give you the class as “numeric”. Now, I try to convert this “9” into a character. So, you
can see here I have written here double quotes, and then I write the value 9 inside the
double quote. So, that it now becomes a character.
So, it comes the class of this correct comes out to be here a character. And similarly if
you try to take any function like as print, then the class of the print is like here
“function”, right. You just write class and within parentheses write down the name of the
function.
And similarly if you try to take here a matrix so, if you try to see here that I have taken a
matrix here of order 2 by 2 in which the data is 1, 2, 3, 4. And now if you try to see what
is the class of this x? This is “matrix” as well as “array”. You know that matrix is a
special type of array, right in which the values are arranged in rows and columns in a
particular way and the that way is actually according to the theory of matrix, ok.
So, you can see here this is the outcome here, right.
682
(Refer Slide Time: 10:07)
After this I try to give you here a command and its utility using the factors. So, first we
try to understand this command and this command is unclass, u n c l a s s. So, this is a
function, right which who can say in the simple words that is just opposite to the class.
When you try to something class, then unclass will remove the effect of the class
temporarily. So, for example, if you have an object like data frame so, this data frame is
going to be printed in a certain way and the plot function will display it graphically in a
certain way, right.
So, what is this data frame? We are going to consider it in a forthcoming lectures. But
when you try to use this unclass function then it will temporarily remove the class of the
effect of the class, right. If you want to have more information on the unclass function I
would request to look into the help and try to have a look, right.
683
But anyway I will try to take here an example, and through that example I will try to
explain you what is the use of this unclass function, right. Suppose, ok I am going to
continue with this example in the next three slides so, I will try to explain you, but you
please do not try to lose the track, that is my request.
So, if you see first of all I am trying to consider here a group of strings and I try to store
it as a factor. So, let us try to suppose there are three brands of some clothing’s and I try
to categorize them as a A, B, C and the data on those brand is a character which is stored
here in the data vector whose name I have given here as a brands.
So, this brands are stored like as a A, A, B, B, B, B and C, right. So, now, I try to convert
this brands into factor. So, I try to use here the factor command and then inside the
parenthesis I write here brands, and whatever is the outcome I try to store it in a new
variable brands underscore fac. So, that means, brands are factored, right.
And this outcome will look like this. You know that there are three unique strings in this
A, B, C, just like we had considered the example of juice, water, and lemonade. So, now
this A, B, C will be assigned as levels and the data on this brands which is reproduced
here also from here, right. So, now, you see in this slide we have simply considered the
brands and we have converted them into factors, right.
684
Now, in case if you try to use the unclass function, then unclass will convert the factors
to their numbers, right. For example, you can see here your this value here is A, A, B, B,
B, B and C, and if you try to unclass it, on this outcome brands underscore fac, then A is
converted into 1, B is converted into 2 and C is converted into 3. You can see here is a 1,
1, 2, 2, 2, 2 and 3. So, this was actually here A, A, B, B, B, B and here C.
So, you can see here that these levels which there earlier that levels are the same, you can
see. The levels are still the same as A, B, C, but those values are converted into numbers,
ok. Now, what is the use? So, suppose I want to give these brands A, B, C I want to
assign different colors, right and suppose I try to assign say A blue colour, B green
colour, and C as red colour. Now, how to get it done?
So, if you try to understand it in very simple words now, I can explain you that whatever
is your data here, you want to replace A, B and C by their colours like as this that A, B,
C are replaced by these colors. A is replaced by blue, B is replaced by green, and C is
replaced by red.
So, what I try to do here is the following. That I try to give here, I try to create here a
data vector of this colors like as here blue, green and red which are here like this, right
and I try to store them in a variable colours, c o l o u r s.
10
685
Now, if you try to see here what is happening here now? You can recall this thing, right.
This is the same output which you had got here. So, I have just copied it here, so that you
can see them on the same slides. Now, I try to use here a command colors, and inside the
square bracket I write down here unclass brands underscore fac that this variable. And
now if you try to see what happens.
Now, your brands were earlier like as first two were A, next 4 were B and the last was C.
But now they are converted into colours. A is now converted to blue. So, you can see
here there are two blues which is corresponding to these two A’s. Then, now there are
four green and these four greens are corresponding to this category B or the brand B.
And finally, the last value here C, this is replaced by here red.
So, that is the advantage that if you want to change something here you can do these
types of operation using the this class unclass functions and then you try to extend it
more. So, if you try to see here the main utility of class and unclass function is
essentially that if you are trying to consider here a data which is consisting of strings like
this one.
Then this will have some string values and then unclass function is trying to convert it
into some numbers, right. So, if you try to show you all these operations on the R
console here, then it will be then you will understand it very easily, right.
11
686
So, if I try to find out here the class of the value, numerical value here say 8, this will
come out to be here “numeric”. And if I try to convert this “8” into a character by writing
the 8 into the double quotes, the class will come out to be the “character”. And similarly
if I try to find out here the class of a string like as here “apple” which we have used
many times in the earlier lecture, this will also come out to be here see here “character”.
And similarly if I try to create here a matrix with nrow equal to 2, ncol equal to 2, and
data is equal to 1 to 4, you can see here this matrix will come out to be here like this.
And if you try to find out here the class of this matrix here x, this will come out to be
here matrix and array like this, right, ok.
Now, let us try to consider the example of this class unclass. So, I have given here the
brands in three values A, B and C which are a strings. And if I try to convert them into
factors, so this will be here brands underscore factor and it will here factor of here say
here factor brands which I have used here a variable. So, you can see here this brands
underscore fac, this will come out to here like this, right.
Now, I want to give it here colour. So, let me try to write down here, colours is equal to
say here c, blue, green and say red. And so this is now my here colour. And you can see
here why it is trying to give you an error here because you have not given it in the double
quotes. So, let me try to give you give this value inside the double quotes because these
12
687
are the strings, and then you will see that these mistakes will not happen. And surely,
when you are trying to do such programming these things are going to happen, right
So, now you can see here this colours is like here like this. And after this I try to, first I
try to use here the function on class and then I will try to use here this colours on it. So, if
I try to write down here this brands underscore fac and I try to unclass it, you will see
here it will come out to be here like this, right.
And in case if I try to operate the colours on this unclass brands underscore fac, this will
come out to be here like this, right. So, you can see here I am just trying to give put it on
the same screen. So, you can see here whatever here is A, this is now become here blue
because blue is at the first place. So, this is here at blue. Then in the data vector you have
here 4 Bs. Please try to have a look where I am trying to highlight it.
And then we have given it the second value in the colours vector here is green. So, next
four values are here green. And then the last value here is C, and the last value in the
colours vector is here red. So, the last value here 3, this becomes here red. So, now, you
can see here that this is a very important function which is going to be useful when you
are trying to deal with the real life data, right.
Similarly, just to make you more confident here, let me try to take here some more
example. And I will try to simply show you that how you can use the class and unclass
13
688
function to convert a data vector which is obtained by factor on a string variable and then
how it is converted into the numbers.
So, let me try to take here this example where I am trying to take here the factor. Say
which is have the data lemonade, lemonade, juice, lemonade and water. This is the same
example that we did in the beginning, and this, and in case if you try to see the outcome
the outcome will look like here this data, and then this juice, and then lemonade, and
then here water they will be the levels which are arranged in the alphabetical order.
So, now if you try to unclass it, then you can see here this outcome will become here like
this. The data will become here 2, 2, 1, 2, 3. So, this lemonade is at place number 2. So,
this lemonade, this lemonade they will be converted as number 2, and this lemonade we
will be converted into 2.
And after this here juice, juice is at the value 1. So, wherever is the juice here in the data
vector x that will be converted into 1. And then finally, you have here the see here water,
water is at place number 3. So, this water will be replaced by number here 3. And then
whatever is the outcomes 2, 2, 1, 2 and 3, this is obtained here 2, 2, 1, 2, 3. And the
levels are the same, juice, lemonade, and water. The only thing is this now this data has
been converted into numbers, right.
14
689
And this is here the screenshot of the same operation which I shown you. I will try to
show you to on the R console also.
And now if you try to see in all the examples whatever we have considered, the name of
the levels is assigned automatically, right. For example, you can see here in this case the
levels are automatically assigned as juice, lemonade, and water in an alphabetical order.
But suppose you want to give the levels of your choice, right.
So, remember earlier we had here juice, and then lemonade, and then water in an
alphabetical order, now you want to give it new combination. This combination can be
see here levels equal to here water, juice, and lemonade, right. So, this is not in the
alphabetical order, but this is your choice.
So, now how to give the levels of your choice or how to change the levels which are
obtained by the default operation? So, in order to do this thing you simply try to do the
same operation that you try to factor the data say lemonade, lemonade, juice, lemonade,
and water. And after that you try to give here one more command here levels equal to
whatever levels you want to give, give here, right.
So, if you want to have a different assignment for the levels, try to use the parameter
levels, l e v e l s, and try to give the name of the new variables in the order you want
under this levels. And after this you will see here this levels will become here water,
15
690
juice, lemonade, right. So, this is here like this. And it does not make any difference
because you are simply trying to give it here a levels, that is all, right.
So, and now in case, if you try to unclass it, then you will see here. Now, the new levels
are going to be water, juice, and lemonade, and the data will remain as here 3, 3, 2, 3, 1.
Now, if you try to see here this is exactly in the same way. Means now in case if you try
to look at this outcome, and suppose you want to get your data back.
So, now, this is here 3, right, 3, 3; so, 3. So, water is at 1 place, juice here is at 2 place
and lemonade is at place number 3. So, now, this is here 3, 3, so that means, this is here
lemonade, lemonade. And then you have here 2, so 2 here is juice, so then you have here
juice. And then you have here once again 3, so 3 here is lemonade. And then you have
here 1, so 1 here is water.
So, now, can you verify whether your original data was like this lemonade, lemonade,
juice, lemonade and water, try to see, lemonade, lemonade, juice, lemonade and water.
So, data is not changing, right only the levels are going to be change. And when you and
once you try to unclass it, the new levels will appear in the data. And in that case if you
try to find out the levels of this x, they will be coming out here water, juice and
lemonade.
16
691
(Refer Slide Time: 24:37)
So, this is how you can make different type of operations here. And this is here the
screenshot of the same operation. And first let me try to show you this outcome on the R
console, and then I will try to show you one more operation, right.
So, if I try to take here this data here x as factor here like this. So, you can see here now I
try to unclass it. So, at this output I have stored in here x. So, you can see here. Now, this
levels are changed and accordingly these values are now converted into some numerical
values, right. So, after that I try to change the levels here. So, I try to write down here
17
692
this a factor of the same data, but I also try to add here this option here levels. So, if you
try to see here, I have added here like this and yeah. So, I write down here factor and
earlier I had taken only the data, but now I try to add here levels equal to c, and then if
you try to see here what type of data I get here, right.
So, now you can see here in earlier x levels were juice, lemonade, and water, and now I
am trying to replace it by water, juice, and lemonade. And if you try to see the new
outcome of here x here, the levels is now water, juice and lemonade, right. So, earlier it
was juice, lemonade and water, right.
And now in case if you try to unclass now once again this outcome x, then you can see
here now this is remember you are observing here 2, 2, 1, 2, 3, right. But now once you
try to unclass it, you are getting here 3, 3, 2, 3, 1, right. Please try to observe here I am
trying to highlight, right.
So, you can see here the earlier which was lemonade, lemonade, juice, lemonade and
water that was coded as 2, 2, 1, 2, 3, but now you have given the new levels as water
juice and lemonade. So, your this data lemonade, lemonade, juice, lemonade and water
this is now coded as 3, 3, 2, 3, 1, right. So, you can see here these are not very different
options, very difficult operation.
18
693
And if you once again if you try to obtain here the levels of this is a new x you can
obtain here, this is the water, juice and lemonade. And the levels is also given you here
the same option water, juice and lemonade, right.
So, after this I try to give you here the last example where we I am trying to use here the
command ordered. So, whenever you see you have some data which is arranged in some
order, suppose if I try to see here the temperature, temperature is say low, medium and
high, right. And suppose you are trying to get the data like this, you of temperature like
this high, high, low, medium and medium, right, and you try to define the levels here as
say low medium and high.
So, they are actually ordered. They are ordered in the sense that low temperature is
indicated by low, which is less than the medium temperature, which is indicated by
medium and this is smaller than the high temperature which is indicated by the wood or
string high.
So, now what you will happen here? That you want to factorize it. So, you use here the
command ordered, o r d e r e d, right. So, in this case, the same process what happened
under the factor command will also happen here, but your levels are going to be ordered,
right. So, you can see here, the data comes out to be here the same as high, high, low and
medium, medium, but the levels are here like this low, less than medium, less than high.
19
694
So, remember, means earlier your these levels which were in the form of some string,
they were arranged in an alphabetical order, but now they are arranged in some
increasing order or decreasing order, whatever you are trying to say. So, this is like as a
low less, than medium less, than high. And earlier this order was alphabetical. So, this is
what you have to keep in mind, right.
And now in case if you try to use the command here unclass over the same output
income, then you get here 3, 3, 1, 2, 2 and you can see here this is again the levels are the
same low, medium, and high, right. If you try to see here this low is indicated by 1. So,
this is here, and this is the same data here low.
And then there is here medium, medium is indicated by 2. So, these two values they are
the medium. They were here, this was here, they were here, like this medium and here
this medium and then finally, you have here say high, high is indicate by here 3, and this
is here 3 and 3, and this is here this high and this high, right. So, this is how you can do
all such operation without any problem.
And you can see here this is the screenshot of the same operation, but let me try to show
you these things on the R console, so that you become more confident, right. So, you can
see here this is my here the data see.
20
695
(Refer Slide Time: 30:00)
And this has been ordered, so the outcome looks here like this income. And if you try to
unclass it, like this, it will come here like this income.
And if you want to see what will really happen if you simply try to use here the factor
command, so, we try to see just before you I am trying to replace the same order
command by factor and if you try to see what happens here income.
21
696
(Refer Slide Time: 30:21)
You can see here, means earlier it was like as here low less than medium, less than high,
but now there is no like ordering, but by chance it is in the like is here simple simply like
here low, medium and high, right. But there is no ordering here, ok. So, now we come to
an end to this lecture. And you can see here that we had learned very interesting
application of this factor when it is combined with the class and unclass functions.
And earlier when we had done the class command, I had explained you very briefly that
what it is trying to represent, but then how it is going to be useful I wanted to club it with
the factor application. So, that is why I had not explained you earlier, but now I
explained here in this lecture.
So, I hope you have understood it. And yeah, in order to understand it in a better way
please try to have a revision of this lecture. And try to note down this data value with
your own hand, with your own pen, on your own paper. Because when I am trying to
speak here low, medium, high or say juice, lemonade, water, this looks very trivial unless
and until you understand that what is the difference between juice, lemonade, and water;
and water, lemonade and juice.
So, once you try to type the command yourself and then try to execute it, and then
always try to compare the outcomes that when you are trying to change the levels and
when the default levels are coming, then what is happening in the nature of levels that
22
697
how they are reported, in the nature of the data that is converted etcetera. So, unless and
until you try to observe that how the outcomes are changing when you are trying to make
such a small changes in the command, you cannot understand that how R is working
which is very important for you to understand, which I always say.
So, why do not you try to take some example and try to create your examples yourself
with factor and say ordered both. And try to operate it on the R Software and try to see
how do you get the outcome, and are you getting the same outcome which you expected
or there is some change if there is some change try to see why this change is coming. So,
now, we stop here. And it is your turn to take some example, try to practice it. And I will
see in the next lecture with some more new topic.
23
698
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 35
Strings - Display and Formatting
Print and Format Function
Hello friends, welcome to the course Foundations of R Software. And, now from this
lecture we are going to begin with a new topic, with a new concept that how to
implement it in R. You see whenever you are trying to write a program, the program will
have some output and you want to write the output in a particular way. For example, you
know that many software are popular because they can generate the reports in a very
good format.
So, in R also you can format your reports the way you want, means you want this
statement, this number on this line, here there and so on. So, how to get those things
done in the R software is the broad topic that we are going to consider in this lecture and
in the forthcoming lectures. For example, you have learnt about one function which is
print, many times we have used it. What is print? Print is trying to show you the outcome
on your screen, right and you want to write some output in a particular way. For
example, suppose you want to write ok the sum of 2 and 2 is now; whatever is the sum
that will come here, which is an output of the program.
So, now, how to write that the sum of 2 and 2 is 4, where the output is coming from the
program and if you wish you can also insert the 2 and 2 also from the program. So, how
to get all this thing that is exactly what we are going to learn in this lecture and in the
forthcoming lectures and for that we have a couple of commands. So, let us begin our
lecture and try to understand all these things.
699
(Refer Slide Time: 02:27)
So, you know that whenever you are trying to work with strings then you would like to
present them in a formatted way and you want to control the display of the strings.
Sometimes you want to operate on those strings also so, that you get a required outcome.
So, we need the formatting and display of strings to obtain the results of a specific
operation in a required format.
700
So, in R software we have couple of commands which helps us in doing so, for example:
print, format, cat, paste etc., right. So, we are going to learn about all these options one
by one and I will try to take couple of examples so, that I can explain you and that how
the outcomes of this function will look like. So, first let us try to consider here the
function print. Well, that we already have used many times, right. But, this is the place
where I am going to formally introduce it, because earlier I always used to say ok, we are
going to talk about in the future lectures, further lectures, right, ok.
So, the way you would like to use the print function is p r i n t and then inside the
parenthesis, you try to write down whatever you want to print. So, this is a generic
command and that is available for every object class. Now, you will understand that what
is the meaning of this object class. This is the class or this is the class of objects which
we use to find using the function class ok.
So, now, let me try to take here some examples and through which I try to show you how
the print command works. So, suppose I want to print the value of square root of 2. So, I
try to use here the command print p r i n t and then within the parenthesis I try to write
down here sqrt and inside parenthesis 2.
So, you can see here, this outcome will be shown on the computer screen like this
1.414214 and you know that this number can go further also. But, suppose you also wish
to control the number of digits in this output. So, for that we have one more option here,
701
that we try to write down here print within the parenthesis square root of 2 and comma
then we try to write down here option d i g i t s digits in this lower case alphabets.
So, you know what is the meaning of this word is that total number of or numbers what
you want in the square root of 2. So, suppose you say I want 5 degrees in the value of
square root of 2. So, now, you can see here and try to compare what is the difference
between this outcome and the earlier outcome, here you get here like this 1.4142.
Now, if you try to see how many numbers are there 1 2 3 4 and 5, right and, you can see
here this point decimal, this is not counted. Well, you cannot argue why? Because that is
the way the R software works, right. And, in the earlier command if you try to see you
have more than five numbers, 1 2 3 4 5 then 6 and 7. Similarly, if you want that these
number of digits should be 10; so, I can simply use here the same command print square
root of t 2.
And, then I try to add here the option d i g i t s, all in lower case alphabets equal to 10.
So, you can see now here this is here the outcome. So, how many values are there?
This is the 1st value, this is the 2nd value, this is the 3rd value, this is here the 4th value,
now this here 5th is the this 2 is the 5th value, 1 is the 6th value, 3 is the 7th value, then 5
is the 8th value, 6 is the 9th value and 2 is the 10th value, right. So, you can see here now
there are 10 digits in this outcome. So, this is how you can control the outcome with this
command. And, this is here the screenshot of the same operation which I shown you on
the which I will try to show you on the R console.
702
So, here I have talked about the print command with number, with some numeric values.
Now, I try to use the same command print with some characters, right. So, suppose I try
to say here take a word apple that is our popular example in this course; apple, banana
and cake, right. So, I try to take here the apple within the double quotes and I try to use
here the command here print. So, this will here print like this apple, right.
Now, you cannot talk that how many alphabets or how many letters you want here
because yeah these are the strings. So, similarly if you want to print here a data vector
which has suppose two characters, two strings, right. So, apple and banana then if I use
here the command print, you can see here you are getting here apple and banana.
And, now even if you try to take a data vector consisting of the characters like as apple
and banana and some numbers like as here 6 and 10, in that case you know the mode of
this data vector will be character. And so, everything is going to be printed here as a
character only, you can see here. Now, this apple is inside double quote, banana is inside
double quote, 6 and 10 they are also inside the double quote.
Because, now this is 6 and 10 they are going to be taken as a character and whereas,
earlier if you try to see, means if you are trying to take here 1.414. This square root of 2,
this is not inside the double quotes, right. So, let me try to first show you these
operations on the R console and then I will try to introduce you with more options.
703
So, if you try to see here, I can say here print say 89 and this is here like this. Similarly,
if you try to take here a data vector say 89, say 98, 78 etcetera; it will be printed here as
just like this, 89 98 78, right. Similarly, if you want to print here square root of 2 here so,
this will be like this and if you want to add here the command d i g i t s, digits is equal to
suppose 7.
So, now, you can see here this is here 1.414214 and if you try to make this digits equal to
17, you can see here now you have here 17 numbers in this or 17 digits in this outcome,
right. So, that is not difficult that you can see, ok.
So, now let me try to take here some examples related to the characters. So, if you try to
see here, I try to print here apple. So, here this is here apple and then if you try to print
here data vector of this characters so, I say here banana. So, this will be your here print
and here apple and banana. And, now if you try to add here some values which are say
number say 5 and here 7, then you can see here they are going to be printed as here 5 and
7, right.
704
(Refer Slide Time: 10:00)
And, now I try to show you here some more examples here with the using the next
command here format. So, as you understand that what is the meaning of a format, you
want the outcome in a particular way, in a formatted way. So, this command here format
is used for say some nice printing the way. So, that you can control the way the outcome
is coming and for that the command here is format, f o r m a t and inside the parenthesis
you have to write the object and typically it is a numeric value.
705
But, there are some other options also which I will try to show you here with this
command and then I will try to show you some examples so, that you can understand
how you can use this format command with the print command. So, you know whenever
you are trying to format something, do you can you recall that when you are trying to
type a document for example, one popular software to type that document is it a MS-
Word, Microsoft Word.
When you try to type there something, then you have an option that the text which is
printed there that is justified on the left hand side, right hand side or say centered etc. and
then you can control the width etcetera. So, similar type of outcome, well it is not a word
processor, but those types of things can be controlled using the format command and we
have many options here. For example, if you want to trim an outcome, you can give here
it is in terms of true and FALSE.
Similarly, you can use the option here digits. So, they are going to show how many
significant digits are to be used, right. And, simply here if you want to control that how
many numbers or how many digits should be on the, right hand side of the decimal point,
then you have to use here the command here say nsmall. So, this shows the minimum
number of digits to the, right of the decimal point then simply you have here justify here.
So, I mean for if you want to justify on the left hand side,, right hand side, centre or none
that will be the default, then you can use this thing and similarly you have here width etc.
706
So, there are many many options which are available. I will request you that you please
try to look into the help and try to see the application of all such options. Well, I try to
use here means some examples here to show you what really happens. Suppose, I simply
want to print is 0.5 and what I want here that the number of digits should be equal to 10
and the number of this nsmall value is here 15.
So, you can see here what happens here, it will give you here this type of outcome so, if
you try to count how many digits are there, how many values are there; 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15. So, you can see here you are trying to say the number of digits on the,
right hand side of this decimal point should be 15. So, they are here 15 and total number
of digits you are trying to say here 10.
But, then so, total number of digits are going to be counted here like this, this is here 1 2
3 4 5 6 7 8 9 10, but you also have given here nsmall. So, these values are increase
further. So, that is you what you have to keep in mind. I will try to show you that when
you try to use only one of the command on the R software, then what happens ok.
Now, after this I try to show you here the utility of the option width, what does this mean
actually, right. So, I try to suppose I have here the data vector of some characters. So, we
I have here characters like as here A, then double B, triple C and 4 times D. And, yeah I
am trying to choose here the option here, justify is equal to centre. And, yeah how this
justify work that I will try to show you in the next example.
707
But, here you can say this that all the values are centered. And, now I try to use here the
width equal to 7. Now, I use here the same data vector; A, double B, triple C and 4 times
D and justify is also the same as earlier centre. But, I try to use here now with is equal to
14. So, if you try to see here in the first command and in the second command means
everything is same, only the outcome of width is going to affect the outcome.
So, if you try to see here that how many values are inside this double quotes, try to see
here. This is here the width and if you try to see this width in the next outcome, where
you are trying to take the width is equal to 14. This is going to be here like this. So, if
you try to compare this with this, you can see the difference. Similarly, if you try to take
here the second width, second with here is like this which is compared to this width.
And, similarly if you try to take here this the fourth width D, that is going to be
compared with this. And, if you try to yeah see into the screenshot that gives you a better
outcome that in the first the width is here is like this and in the second option when you
try to say with is equal to 14, this is like this.
So, this is actually trying to increase the width of the this outcome of every element in
the data vector. So, that is the use of here width comma. So, you can see that with this
example it is quite easy for me to explain you that how width is going to control the
outcome.
10
708
Similarly, I try to consider here 4 cases; case number 1, 2, 3 and here 4. In all the 4 cases,
I have taken the same data vector and same width. So, the data vector is same as the
earlier A, double B, triple C, 4 times D and the width is 7. So, this you can see, this is the
same in all the 4 cases, right. And, now you can see here what I try to do is that I simply
try to change here the option here justify.
In the 1st case, I try to take justify is equal to here centre, in the 2nd case I try to take
here justify is equal to the left. In the 3rd option, I try to take justify here as say, right and
in the 4th option I try to take here justify is equal to here none and try to see the outcome,
how the outcomes are going to vary. So, whatever is the effect in this outcome that is
going to because of the justify option.
So, you can see here try to look into the 1st case and yeah I will try to look into the 1st
outcome only. So, if you try to see here the location of here A which is in the middle of
this double quotes, right. Then, if you try to see the location of this A in the 2nd case so,
within this double quotes, the location of A is on the left hand side and this space is
blank. Now, similarly if you try to see here that in the case number 3, within this double
quotes the location of A is here on the, right hand side and this space is blank, right;
because this is due to the option, right.
And, then in the 4th case because you have given the justify to be none so, it is simply
trying to take the default here A. So, if you try to see here, when you are trying to use the
options here; like as here centre, then all the values are in the centre. Centre of what?
Centre of double quotes.
Similarly, if you try to use the take the 2nd case where you try to choose the option left,
then all the values are on the left hand side of the double quotes. Similarly, if you try to
take the case number 3 where you are trying to choose the option, right, then all the
values are here on the, right hand side of this double quotes. And, in the 4th case you
have not chosen any so, these are printed as a default, right.
11
709
(Refer Slide Time: 18:39)
So, this is how you can see the outcome changes and you can see this change in this
screenshot very clearly. Here you can see here this is here A, this is here A here like this
and then this is here A and this is here A like this. And, similarly you can observe that
what is happening to the other options, right. So, why not to do these operations first on
the R console and try to see what do they do, right. So, let me try to take here this first
this print command of this 0.5.
12
710
So, now, let us try to first see this option that I am trying to print here 0.5 with the this is
equal to 10 and nsmall equal to 15 and you get here this outcome. So, now let me try to
show you here that if you try to use one of the options here what happens. So, if you try
to see here, if you are trying to use here only this 0.5 with the format command and using
the digits inside the format command what happens?
It will give you only here the value 0.5. There is a reason, I will try to show you that if I
if you try to write down here only here pin 0.5 and with digits is equal to here say here
10. Even then it will give you 0.5, why? Because, 0.5 is an exact value whereas, if you
are trying to use here square root of here 2, this is here like this in which the values are
here more. So, here it will try to control the number of digits.
But, if you want to print here 0.5 in which you want to increase the number of digits like
here this. Then instead of using here the digits, try to use here the option, nsmall and it
will give you the value which is required here. So, this is here like this and you can see
here there are 15 values after the decimal point. So, this is how you can control it.
But, I try to now add here the command here, justify, right. So, you can see here, if I try
to give here the justify here as say centre, it is coming out of here like this. I can reduce
the font size so, that you can see it very clearly, right.
13
711
(Refer Slide Time: 21:02)
So, this is here centre and similarly if you try to make it here left. So, you can see what is
happening, everything is coming on the left hand side. And, similarly if you try to make
it here, right, then this is coming here like this and if you try to make it here none, then
you can see here it is the default. So, this is what I wanted to show you that when that
how you can control all these operations in the R software without any problem, right ok.
14
712
So, after this I try to show you that what happens when you are trying to play with the
different say R objects. For example, I try to consider here a matrix. So, I try to consider
here a matrix of order 3 by 2 in which the data is arranged by rows and the data values
are 1 to 6, right. So, this is your here matrix x. So, now, if you want to see this matrix;
so, matrix has a pattern. So, if you want to print the matrix also, you simply have to write
down here print x and it will give you exactly in the same way as the matrix look like,
right.
And, if you try to see the same command we used to get when we used to write only here
x and enter. But, remember one thing, there is a difference; x is trying to give you the
value which is stored inside it and print is trying to show you the value which is stored
inside it. In a simple case, it will not make any difference because you are trying to take
only the numerical values.
But, when you try to take the numerical and characters, then you will see this difference
very clearly that how these things change. If you want to control here the digits so, that
you also you can control here, right.
So, after this I try to give you here one more option in the format. Sometime, you want to
print the large quantities. Large quantities means, if the number of digits are quite large.
For example, you know that when you are trying to write down the rupees means Indian
15
713
rupees. So, we try to write down here if I have to write down here say 500, I will write
down like this, if I have to write 5000 I will try to write down this.
But, if I try to write down here 50,000, then you have seen that we try to make here a
comma, right. And, similarly after this if I want to write here more digits here, we try to
give a comma at suitable places, right. So, similarly if you have any requirement where
you want to print such large quantities, where you want to control this type of spacing or
any symbol inside this number, then how to get it done, right.
So, in this case what exactly do we want? We want to print the number with the
sequence, which are separated by some value including the blank space suppose, I want
to write down here 1 2 3 4 5 6 7, as here like this 1 comma 234 comma 567. Then, I have
to use here a command here big dot mark; b i g dot m a r k. So, this command is used
along with the format command and it will control the formatting that the values are
separated by the given symbol.
So, suppose I want to separate them by comma. So, I will write down here the format,
the number and then I will add here big dot mark is equal to within double quotes this
comma. So, this comma will be start from the, right hand side and after third place it will
be put here, then again after third place it will be put here. Now, in case if you try to
increase the number of digits in this value. So, now I try to take here 8 values.
So, you can see here, then it starts from the, right, it will put it at the third place, then it
will put at the means another third place which is means after 54 and 3 and then it will be
here 2. And, similarly if you try to take here 9 digits 1 2 3 4 5 6 7 8 9, then the comma is
going to be put which will begin from the, right hand side. And, it will be first comma
after the third place, then the second comma once again after the third place from the
first comma. So, you can see here and then it is 1 2 3. So, this is how you can control
these outcomes in the R software also.
16
714
(Refer Slide Time: 25:45)
And, suppose if you want to choose any other symbol that also you can give within the
double quote quotes. Say for example, if I simply want to separate the numbers by a
blank signs.
So, you see here I am trying to give it here see here 1 and 2 blank signs and then you can
see here this 1 2 3 4 5 6 7 8 9, this will be printed here like this where you have here
these blank signs, right. So, let me try to first show you these options on the R console
so, that you can operate them. And, you can see here well this is the command which you
have used many times.
17
715
This is the matrix here that x will look like this, but if you try to see here print say here x,
this will also give you the same outcome. But, definitely I will try to clarify this different
that whatever will be the advantage of then using the print command in the next lecture,
when I try to consider one more command that is cat.
And, now similarly if you try to take here this example, that you want to use here the big
mark so, you can see here when your big mark here is like this comma and if you want to
use here some other star here, even then that can be done here. So, that depends on you
and if you try to increase here the number of digits here and suppose I want to use the
star. So, you can see here the star is coming at every third place beginning from the, right
and if you try to make it here say here 9, this will be here like this, right.
So, this is how this R software works with the print and format command ok. So, now,
we stop here and that was a pretty simple lecture, it was very easy to understand. And, I
have just demonstrated the use of print command with the format and in format there are
many options actually. So, you can see here that there are two types of command. One
command which work within the parenthesis of print and there are another command
which I am trying to give within the parenthesis of format.
And, which of the option does what, unless and until you practice it you will not able to
learn it. I have taken here only some collected simple example to motivate you that you
18
716
try to explore further. So, now, this is a very easy job, but it depends on you that how
easy you want to make it. So, try to take some example and try to experiment with
different combinations and try to see what happens.
You have to write your statement for formatting to get a desired output following the
way in which R operates at this stage; unless until you try to write your own function and
own program. So, at this stage I would request you try to look at the utility of all these
options and try to incorporate, try to take different combination and try to observe the
output and see how R works. So, you try to practice it and I will see you in the next
lecture.
19
717
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Hello, friend. Welcome to the course Foundations of R Software. You can recall that in
the last lecture we started a discussion on the print command and we wanted to learn
how we can generate a report in which there are many types of formatting is also needed.
So, we discussed the command Print and we had used the format option.
And, now today I will try to continue on the same lines and I will try to give you here
some more options like a cat. So, I will try to discuss that if you want to do something
then if you try to use the print command, then what happens and if you try to use cat
command then what happens and with the format how you can control the output. So, let
us try to begin this lecture and try to understand these concept through some examples,
ok.
Suppose you want to print the zero occurs at twice of pi radiance; pi you know pi is the
mathematical pi whose value is 22 by 7 like as 3.14 etc. So, pi is the word or a command
718
in R that gives you the value of pi. So, I want that let R takes this value here pi itself and
it give me the value like the zero occurs at say 2 into 3.14 is something like 6.18 at 6.18
radiance this is what I want to do, right.
Now, what I try to do here I try to use here the print command and see. So, suppose I can
also use here my this R knowledge and what I can do here that I can divide this sentence
into two parts – first part which is character and another is here mathematical and then
here character.
You can see here this first and third part they are here the character and this twice of pi is
mathematical, right. So, now, let us try to use this command here and because I know
from my past knowledge that I have to give the characters inside the double quotes and
the mathematical operation just like that. So, what I try to do here that I try to write down
here the zero occurs at inside the double quotes, then I try to write down here twice star
pi and they are separated by comma and then I try to write down here radiance inside the
double quotes.
But, as soon as you use the command print over this value, it gives you here an error,
right. So, why this is happening and why print is not working and what is the way out?
That is the question which we would like to explore in this lecture. So, the print function
719
has a significant limitation that it prints only one object at a time and here what are you
trying to do?
You have here a character and you have here a numeric and you want to print both of
them together. So, the question is now how to do it? Ok, first option is that ok means I
can bring this print command at various places wherever we have character and numeric.
So, I can write down here print for this zero occur set.
And, then I can write down here say here print twice star pi and then I can write down
here print radiance. And, let us try to see what happens like this print the zero occurs at
and then yeah, I am giving here this semicolon and then I am writing here print 2 star pi
then I am writing here once again semicolon and then I am writing down here print
radiance. So, if you try to recall what you wanted, you wanted like this the zero occurs at
6.18 radians, but now it is giving you here like this the zero occurs at 6.283185 and then
radians.
So, is it the same format in which you wanted to have it? No. This outcome is like that
you are going to get this outcome in three different lines – line number 1, line number 2
and line number 3 and if you try to do it in the R console it will look like this is the thing
which you do not want you want everything to be to be written in the same line. So, how
to get it done?
720
So, in order to solve such issues we have here an alternative function which is here cat
and this is an alternative to print and print function and it allows you to combine the
multiple items into a continuous output. For example, here you wanted to print the
character number and then character. So, these are the things which can be done with the
cat function.
So, the first question comes here. What is the meaning of this cat? Cat is not a cat, right
which speak like meow meow, right. Cat is a short form of concatenate. What is the
meaning of this concatenate? That means, it is trying to link in the same sequence. So,
this function cat this links different numbers and characters in the same sequence what
you try to give it inside the parenthesis, inside the arguments and then it prints the entire
string in the command window, right.
And, this cat is quite useful when we want to produce the output in a user defined
function in a user friendly way, right? Actually what it does? It converts its argument to
character vector then links them together in the same sequence to a single character
721
vector. And, after that in case if you try to use here an option like sep then it will try to
append the given separator which is obtained through sep to each of the element and then
outputs it.
What does this mean? I will try to show you with an example. So, the usual function is
here cat and within the parenthesis you have to give that what you want to print and then
you have to give various options. So, there are various options here. So, if you want to
print something inside a file, then you have to specify here the name of the file in which
you want to print the output, then we have here sep which means actually separator.
Separator means it will try to separate the different say strings and numbers whatever
you are going to give inside the argument inside the parenthesis and then it has here fill
that I will try to show you what it does, then we have a labels append etcetera. So, as the
name suggest for example, append; append means whatever you have printed you have
to print after this, right.
And, then I can show you here or I can inform you in advance that there is a character
here like this backslash n. So, this is an indicator of the new line, right. So, the outcome
otherwise you will see once the outcome of the cat comes, then the pointer or the symbol
which is here on the R console that will be just after the output, but you want to bring it
722
on the next line. So, if you want to bring the this prompt symbol to the next line then you
have to use this character.
I will try to show you once again with the one more thing and then we have there are
many more things about this cat and I would try to and I would request you that you try
to read it from the help menu some important things I have written here, yeah.
For example, this option here is fill, this is a logical variable. It gives the value in terms
of true and false and it tries to control how the output is to be broken into the successive
line. Similarly, we have here labels and this is a character vector of label for the lines
which are printed and it is ignored if fill is FALSE. So, that is also a logical variable
having the value TRUE and FALSE.
And, simply append is also logical variable. It takes the value in terms of TRUE and
FALSE and it is to be used only if the argument file is the name of the file means when
you want to save the outcome into the file and so on. So, if it is TRUE, then the output
will be appended to file; otherwise it will be overwrite on the contents of the file.
So, if you have written if you have already written something and if you want the output
to be written at the end means every time you try to repeat the your program whatever
the outcomes outcome comes if that is to be written after the end then you have to use
723
TRUE otherwise if you want to overwrite the earlier outcome you can you need to write
here FALSE.
So, there are many such options and I would request you that you please try to look into
the help. Now, I try to give you here couple of examples so that I can show you what is
the outcome of the cat and first I would like to show you what is the difference in the
outcomes of the print and cat.
So, I try to take here the same example which I just took it that I wanted to print the 0
occurs at 2 pi radian. So, once you are trying to use the print command with each and
every character or numeric values, then it is going to print the outcome in three different
lines, but you want to have it on the single line.
So, then you can use here the cat command and if you try to see here that simply have to
write down here cat and then whatever are the characters they have to be written inside
the double quotes, whatever are the numerical values they have to be written without
using the double quotes and everything is going to be separated by comma. That is a very
simple format for writing the commands or what you want to print with the cat
commands.
So, for example, if you want to print here the zero occurs at so, you know this is here a
character. So, you try to write here like this. Then you know the twice of pi which is 2
7
724
into pi that is 2 multiplication pi this is a number. So, I am just typing it here as a 2 into
pi and then this radians, radians is a character. So, I am trying to give it inside the double
quotes and then all these things they are separated by the comma and now, after this I try
to give you here this symbol within double quotes backslash.
And, so, this is going to change the line after you have executed it and the control will
come to the next line. So, if you try to execute it on the R software, it will look like this
the zero occurs at this 6.283185 radian. So, exactly this is what you wanted. But, when
you use the print command it give you here this type of outcome. So, this is what you
understand that what is the difference between print and cat. Print and cat both are doing
the same thing, but the way in which they are going to operate they are different, right.
And, if you try to see here this is the screenshot of the same outcome, right. So, let me
try to show you these things first on R console and then I try to give you here some more
commands over here.
725
(Refer Slide Time: 10:02)
So, if I try to hear this here print you can see here it comes in the three different line, but
in case if you want to write down here in a single line using the command cat. So, you
can see here that it is coming in the same line. And, yeah, now if I try to show you what
will happen if you do not try to use here this backslash n.
Now, you will see that as soon as you have executed this outcome came here and then
this prompt sign that came on the next line, but now if you do not do it you can see here
the prompt sign comes here, right. But, on the same line if you try to give it here the
earlier command, then the prompt comes here on the next line. So, this is the use of this
backslash n. So, this gives you the new line ok.
726
So, now, I come to our next example and try to show you. So, for example, you have
learnt that there is a command in the R software to find out the time and date. And,
suppose you want to write that today’s date is and then after that whatever is the date or
time that should come here. Actually these types of things will be very useful when you
are trying to write down the report.
So, if you try to print this date and time with some statement, then people will very
clearly understand that for example, if you write the report was printed on and then you
try to give the command for the date or the function for the date. Then whenever the
report is generated whatever is the date and time in the computer system that will be
printed automatically there. For example, if you try to see here I try to use here variable
here d to indicate the date. So, that was the command to find out the date and time.
So, I want to write here like this I want to write as a string that today’s date is like this.
So, I try to write down inside the double quotes and then I want to write down here
whatever is the date today that should come. And, you know that both of them are going
to be separated by the comma, but comma is not printed in the outcome and after that as
a rule I will write down here backslash n.
Now, if you try to execute it, now you can see how the outcome will look like. Today’s
date is that is coming from here, then it is here the value of d Wednesday December 01
time is 22 hours 52 minutes 48 second 2021. Well, that is the time and date when I had
prepared the slide, right and you can see here this outcome on the R software also.
10
727
Now, at this moment if I try to do it here, then it is going to give you the time and date
when I am recording this lecture, right. And, this is going to be different when you are
trying to [Laughter] do it on your computer. For example, if you try to see here suppose
if I try to see here d is your here date, right. So, you can see here the value of here d here
is like this. Today is Thursday December 02 14 hours 57 minutes 29 seconds 2021 and if
you try to have here a statement like this one today’s date is like this and then d and then
here backslash n.
Similarly, if you want to suppose I want to write here that this report was prepared on
and after that I give you here date and time or after this also I can say here say by
Shalabh, right and comma you can see here. This will come here like this, right if I try to
reduce the phone size you can see it very clearly.
This is the outcome here this report was prepared on this today and yeah date is here like
this yeah like this, right. And, if you want to make it here more general because now you
can see here because d is coming directly, so, if you try to simply use here the command
here directly this will also work.
So, you can see here the difference. If you are trying to give here the date directly here
then when I executed the earlier command this was at 14:57:29, but now this command
is at 14:58:42. So, this will give you the exact date and time when this report was
11
728
prepared, right. So, let us try to come back to our slides and try to see what else we have
to learn, right.
After this I try to give you here some combination of some numerical values and their
separator. As the meaning of this separator suggest is that it is trying to separate two
numbers or two strings etcetera. So, in order to explain it let me try to take here one
example. I try to take here the number x 1 to 10, 1 2 3 4 5 6 7 8 9 10 and now I want to
print this number, but you can see here between two numbers it is only here a blank. So,
I do not want here blank, but I want to print here some number some symbol or whatever
I want.
So, in order to do it the option here is that try to use the cat command. Try to give here
the value or the data vector in which you have stored the values and then use the
command sep; sep means separator then equal to. And, whatever you want to give in
place of these blank spaces you try to give it within the double quotes.
So, suppose you can see here I have given here blank space then two plus signs and then
a blank space and I want that between any two number it should be like here this plus
plus and then it should be here some here some here blank space.
So, now, if you try to do it this outcome will look like this 1 2 3 4 and you can see here
between every pair of number this separator has inserted is inserted, right. And, yeah
12
729
because I am doing it continuously on the R console so, I am using here the command
cat and within parenthesis within double quote backslash n. So, this is used to change the
line.
Otherwise if you do not do it your this prompt sign will come over here as I explained
you in the earlier example. And, suppose I want to change this separator and I want to
have here separator like as here this slash sign. So, I try to give it here blank sign then
slash and then blank space. So, you can see here between any two numbers, this symbol
whatever you have given inside the double quotes is entered here, right.
So, this is how the things work and this is here the screenshot of the same operation
which I shown you. So, let me try to first show you these operations on the R console
and then I move forward, ok.
13
730
So, now if you try to see here I try to take here x equal to 1 to 10, and then I try to give it
here like this. Now, you can see here 1 to 10, but this blank sign and plus, this is entered
between every pair of the number you can see here. And, now you can see here your
after this is executed the control is coming here, right. So, you have one option is that
you try to take it here like to give here backslash n within the double quotes you using
this cat command. So, it will come to the next line.
Or other option is this I can tell you which is a better option that you try to give here
itself cat a cat is already there. So, you have to just give here backslash n inside the
double quotes and now, you will see here this control will come to the next line you can
see here instead of here, right.
Now, if I try to use here this command and suppose instead of here plus plus I try to give
it here see three star. You can see here, now it is here like this and even if you want to
give here only the blank spaces here like this you can see here now they are separated.
So, this sep command is used to insert different types of separation with symbols or
blank space within this command between two number, right.
Similarly, I try to take here one more example. You have seen this type of examples in
your mathematics books and I want to do it that I have here number here x equal to 7;
this number is going to be changing, varying and it is user dependent. What I want that I
14
731
want to write that the square of this number is going to be whatever is the output of the
square. So, I try to write down here the square of and this is a string.
So, I try to write down it within the double quotes and then I want to write down here the
square of this number. Which number? I try to write down here x and then I try to write
down here s within the double quotes and then whatever is the square of x which is x
square I try to write down here it mathematically say x hat 2. And, after this I want to
write down this exclamation sign and then next line.
So, if you try to execute it, now you can see what is happening. The square of 7; 7 is
coming from where? Here is 49. 49 is coming from here 7 square and this is here the
screenshot of the same operation, right. So, you can see here this is how you can control
the output in the way you want, right.
Similarly, if I want to use here the format command also and suppose I want to have an
outcome like this one. First you try to see here the outcome. The square root of 7 is
approximately 2.65. So, if you want to do it now you have to look how you can frame
your cat command. So, I try to write down here cat then the square root of this is a
character. So, I try to give it here inside the double quotes.
And, after this, this is here a number 7 which is coming from here x. So, x is coming here
and then after this you have once again pair character is and then it is actually is
15
732
approximately. So, this is approximately is given here within the double quotes and then
after that you want here this 2.65 and your objective is that yeah if you try to see here
square root of 7, that is going to take more number of digits. So, you want to restrict the
output to have only 3 digit.
So, I try to write down here format and then inside the parenthesis sqrt square root of x
and then digits is equal to 3. So, this is going to give you here this type of outcome and
then yeah obviously, this backlash n will change the next line. So, and if you try to see
here this outcome it will look like this. So, you can see here that whatever commands we
have used here format and digits square root etc. that we already have learnt, right.
Now, in both these examples I have taken the input variable x as a single value a scalar.
Now, I try to take here a data vector for example, if I try to take here a data vector of 5
values 2 4 6 8 10 and let us call it as a even number that is even and odd. So, this is this
data vector. Now, I want to print like this. So, instead of looking at the command first,
you try to look at the outcome that this is what we want and now you have to think how
you have to write the command.
So, if you try to see here the first few even numbers are this is a character. So, it is
written here inside the double quotes and then you want to print like this 2 4 6 8 10
etcetera. So, this is coming from where? From this even number. So, you try to write
16
733
down only here like this e v e n n o and then comma. And, then after this if you try to
observe you want to write down here dot dot dot this means continued.
So, for that this is also character. So, you try to write down here like this here dot dot dot
and then the action to change the line and you will get here this outcome that you can see
in the screenshot here. So, you have seen this type of outcome many times they are
printed in the mathematics books and in different types of reports. So, now, you can see
it is not a very difficult thing to do, right.
And, before I move forward let me try to give you here these things how do they work in
the R console. So, I try to take here some value here x equal to suppose here 9. So, this is
here x is 9 and if you try to print here this command the square of x is x square, so, what
you get here? The square of 9 is 81. Similarly, if you try to see here what is the value of a
square root of say 7 you can see here this is 2.645751.
And, if you try to take it here now here x equal to here 7 and if you try to use here this
command here that the square root of x is approximately format square root of 27 square
root of 7 it will come out to be the square root of 7 is approximately 2.65. So, this is an
outcome of the statement, right.
17
734
(Refer Slide Time: 21:26)
And, similarly if you try to take here this example where you are trying to take some
here this data vector so, if I try to say here even number is equal to c 2, 4, 6, 8 like this.
So, if you try to press here like this then it will give you the first few even numbers are 2,
4, 6, 8 after dot dot dot, right, ok.
Now, I give you one more example that suppose I want to do a very specific type of
operation like this one. First you can see here at the outcome. So, what is happening
here? You have here this number 1 2 3 4 up to here 10, then you have here alphabets in
18
735
lower case a b c d up to here j. And, you want to print them like this that there is here a
parenthesis inside the parenthesis this is here are the alphabets and after this there is a
colon and then this here one.
So, how to write down here? How to write this thing using the fill command here and
with the cat. So, first in order to generate here this numbers 1 to 10 like here this one I
try to use here the command x is equal to 1 to 10; 1 colon 10. So, this will give me here
this 10 values now I want to generate first this here lowercase alphabet. So, I try to give
here the I try to use here the command letters in lower case le t t e r s and within the
square bracket 1 to 10, right.
After this you have to how you are going to print here. So, if you try to use your cat then
what you want to print x and then this fill is equal to 2 and then now, you have to give
here the labels and labels you want to hear paste. Well, we have not done the command
paste here, but in the forthcoming lecture we are going to learn about it and after that you
can revise this example once again and then you will understand it very clearly, right.
So, if you try to see here what I am trying to do here, this is a parenthesis for the
command paste. So, no issues. So, now, whatever I have to write I am writing the in this
color red. So, whatever is this here this parenthesis this is now given here like this inside
the double quotes and after that you want here in letters for example, this is here a. So,
you are going to write down here letters 1 colon 10 and after that what you want here is
this number here colon and can hear this number 1.
So, this colon is and actually you want here this bracket and colon together. So, this
bracket and colon together they are here like this, right and then you can see here you
want here x. So, this is coming here number. So, now, if you try to see this outcome will
look like this. So, let me try to show you it on the R console and then you will
understand it very easily, right.
19
736
(Refer Slide Time: 23:55)
So, if I try to take here this command here x like this you can see here x is equal to here 1
to 10 and then here like this you can see here you are getting here this type of value. But,
yeah I agree at this moment because I have not told you about the face command. So, it
may not be 100 percent clear, but anyway in the next lecture I am going to talk about it.
So, after this it would not be difficult for you to understand this command over here and
this example here, right.
So, now we come to an end to this lecture and you can see here that in this lecture that
was very simple and I have explained you the use of the cat function and with this cat
function you have an advantage that you can join the things together and which was not
possible in the print command and in cat there is a possibility that you can combine
different types of data object like as you have combined the character and number
character means some statement and the number means that can be input that can be
output or that can be some function also the value of some function.
So, now this opens a lots of opportunities for you to create example and try to practice.
You can just look into any book or that or any report from the software how they are
trying to generate it and then you try to see how you can generate the same report in R
software. Well, there are some paid software which generates this type of report
automatically you just say click and after doing the analysis whatever you want and then
they will generate a report which is very user friendly.
20
737
The only difference in the R software is that you need to understand what type of report
you want where you want to write what and then you have to use this print, cat, paste and
different types of formatting commands to generate the exactly the same report and that
is without any cost. But, you have to work, you have to write the program. So, you try to
practice it and I will see you in the next lecture.
21
738
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Hello friends. Welcome to the course Foundations of R Software. You can recall that in
the last two lectures we initiated a discussion on that how to control the output in the R
software. And you wanted to have the output in a particular pre specified format. Format
does not mean format that we have used in the print and cat, but it is a general format,
right.
So, now, we have learnt that how to print the output using the print function and cat
function. Now, in this lecture today we are going to talk about a new function, which is
about paste, right. We already have used the paste couple of times in the earlier lectures,
but this is the lecture where we are going to formally learn what is the role of paste and
what it exactly does.
So, as the paste means something like pasting, joining together, etcetera. So, with a very
simple and literal meaning of the paste you can understand what paste is going to do. But
here I am going to take a couple of options with the paste and try to show you that how
the paste can be used in different ways to get different types of outcome. So, let us begin
the lecture and I will try to take up here couple of examples to explain you.
739
Now, if you try to see here the function for pasting is paste; paste and then inside the
bracket within the parenthesis within the arguments we try to give different types of
option. So, the role of paste function is to concatenates several strings together, right.
And what it does? The cat also does the same thing that it concatenates several things
several strings together and along with the numbers, right.
What really happens in the case of paste that it creates a new string by joining the given
strings end to end and the advantage is that the outcome of the paste can be assigned to a
variable, which is not possible in the case of cat. So, although cat is also doing the
concatenation, but the outcome of the cat function cannot be assigned to a variable which
can be used further, but in the case of paste this is possible.
So, that is the main difference between the cat and paste in spite of they are doing the
same job, but they are different.
Now, if you want to use the function here paste. So, we simply write down here p a s t e
and then inside the parenthesis we try to write down the different options and there are
many more options. And I would request you to go to the help and then try to see what
are the different options which can be used. But here I am trying to basically explain you
about two important options, one is here the sep, that you know this means separator.
740
So, this parameter separator that helps in separating different strings and this can be
controlled, in the sense that whatever separator you want to give you can give inside the
double quotes. So, this is the same thing what we have done earlier also. Now, there is
another option here which is collapse c o l l a p s e all written in the lowercase alphabets.
So, what do you understand with the meaning of this word collapse?
So, collapse means if there are two things like as here, when they collapse then they are
over each other or they are the side by side whatever you want. So, they are trying to
stick together. So, this collapse is an optional character string to separate the results,
right. So, what does actually paste does?
That if you do not give here any option for the separator then paste will try to insert a
single space between the pairs of a string and after this we would like to break up the line
using the command, backslash n that we have used in the earlier lectures also. So, this
will give the outcome in the new line next line, ok.
So, now the next question comes, how paste actually work? This is very important for
you to understand so that you can compare it with different types of other commands like
cat. Do you remember that we had done a command as dot character that it will try to
convert for example, if I try to take a number as here 9 and I try to use here as dot
character.
741
And inside the parenthesis, I write the 9 then it will try to give me an output 9, right. So,
now, this paste works based on the as dot character. So, what it does that paste is going
to convert whatever the values are given in its argument. That is within the parenthesis
using the command as dot character and then concatenates them and if you want to
separate them then it will use the option form sep otherwise it will give a blank space.
So, essentially whatever you try to give within the parenthesis of the paste function, they
all are automatically converted to character and then they are concatenated together, they
are joined together they are linked together exactly in the same sequence. So, in case if
the arguments are vectors, then they are concatenated term by term to give a character
vector result. That means, if you are trying to take more than one this vectors then they
are also concatenated term by term.
And finally, the outcome is going to be a single character strain. And in case if a value
for the option collapse is defined that is specified, then the values in the results are
concatenated into a single string with the elements being separated by the value of
collapse. Whatever value of collapse that you have given that will be inserted and then
they will be collapsed together. Well these things will be very clear when I try to take the
examples, right.
742
So, now in a nutshell we have learned that the paste function concatenates several strings
together and it tries to create a new string by joining the given string end to end. And the
result of the paste can be assigned to a variable which is not possible in the function cat.
And beside this paste function there is another function here paste0. So, this is here you
can see here this is 0 not o that is the number 0.
So, paste0 is equivalent to this option that paste where you are trying to use separator
where there is no space and then you are trying to use here collapse. So, if you want to
do such an operation, then instead of using this thing you can simply use here the option
paste0, whatever you want to give here and then simply use the option here collapse ok.
This thing will be clear when I try to take an example, I will try to show you that when
you try to use the paste function and the paste 0 function, how the things work, but
anyway.
Now, let us start taking some examples, so that I can justify whatever I have explained
you. So, as I said how the paste works. So, paste converts it is a argument to a character
string via the as dot character and then concatenates them. That means, it links them
together in the same sequence. So, now, let me try to give you here two things and then
you try to compare. Suppose I try to use here the function paste 1 colon 12. So, they are
going to be the numbers 1 colon 12, is 1, 2, 3, 4 up to here 12.
743
So, now you see these are numbers, but when you are trying to do it here paste, then the
outcome is going to look like this. And if you can see here that every number is a now a
character, this is enclosed within the double quotes. And in case if you try to take only
the numbers 1 to 12 and if you try to use the command here as dot character; that means,
whatever are the number here 1, 2, 3, 4 up to 12, now they are converted into a character
and then you will see the same outcome here, right.
That all the numbers are enclosed within the double quotes, now they have become
character. So, you can see here this is how the paste command work, right. And this is
here the screenshot of the same outcome. So firstly, let me try to show you this example
and then you will learn it enough, at least you are clear about this concept how it works,
then you will understand it very clearly.
So, if you try to see here if I try to write down here the numbers this is here like this and
if you try to see here the numbers, the 12 have the mode, which is here numeric. But in
case if you try to use here as dot character and if you write down here 1 colon 12, then
you can see here what it happens it is like here like this. And now if you try to see its
mode this comes out to be here as a character.
So, you can see here this is happening and now if you try to do the same thing you
simply try to write down a paste and simply write 1 to 12, you can see here this is here
744
like this. And if you try to find out the mode of this here paste you can see here this is
again character. So, this is how paste work this is exactly what I want to explain you,
right, ok let me try to take here some more example and try to show you that how it
works with different options.
So, now I would like to give you an example of the option sep that is separator. So, I will
try to take here an example, a statement and then I will try to use the separator and then
try to compare what really happens. So, if you try to see I am trying to take here 3
characters, “everybody”, “loves”, “R programming”, right and if you try to paste them
together they will be here like this.
So, you can see here every this character has a spacing, which is here single space, that is
the default that is what I explained you in the beginning. Now, I try to use the same
statement, you can see here from here to here I am writing here the same thing, but I try
to add here sep is equal to say star and I try to write it in the within double quotes. So, as
soon as I try to write down here this separator here you can see here the difference, still I
have everybody loves R programming.
But the space between them it is separated by say star, do not get confused that why
there is no star here, because you are trying to take here R programming as a single
character. So, that is why it is within the double quotes and that is why there is no star
between R and programming.
745
Now, similarly if you want to take here some other separator instead of a star, just try to
take the same statement and try to use here the sep is equal to say this 3 times equal to
sign, right. So, it is just a symbol, so you can see here now we have here everybody loves
R Programming and then there are separators which are trying to separate the different
strings.
So, you can see here this is not a very difficult command to understand and this is here
the screenshot of the same operation, right. So, let me try to first show you this thing on
the R console and then we will try to do some more operations, right.
746
So, you can see here I just try to first take the first statement here that everybody loves R
programming, there is no separator and then I try to add here the separator. Separator is
equal to say here within double quotes, which is star, right and you can see here that this
comes out to be here like this. Everybody loves, now there is no spacing between here R
and programming because this is here like as a single string and a separator here is like
this, right.
And similarly, if you try to take here means another here means another separator.
Suppose I try to take it here see here three stars, you can see here this becomes a like this
everybody loves R Programming and the separator becomes here like three stars and if
you want to write down here something more.
If you want to write down here two space and two space after, the stars in the earlier
command you can see here it becomes a like this. So, space is also taken as a separator.
So, this is how this command actually works. So, I am sure that now it is not difficult for
you to understand that how this paste command is working.
747
(Refer Slide Time: 11:34)
Now, I try to give you one more application, right. Suppose you have a condition here
like this, that you want to consider a vector of the characters and it has got some couple
of names. For example, you can see here I am trying to write down here three names,
Professor Singh, Mr. Venkat and Dr. Jha, right.
And after that I want to write whatever is the name, after that I want to write is a good
person and then I want to repeat it, yeah I do not want to use here the concept of loop.
So, I want to write down here whatever is the name here Professor Singh, here and it is
printed at Professor Singh is a good person, then the next name Mr. Venkat is taken and
then it is printed here as say is a good person and the same thing is repeated with the Dr.
Jha which is the third name.
So, now this thing can be done in the R software very easily. Well I am trying to take it
at a very elementary level otherwise this is only a one line command. So, what I try to do
here, first I try to store all the names in a data vector, say here names n a m e s, right and
yeah then I try to use here the command here paste.
And then I try to write down here is a good person and full stop. And I try to write them
inside the double quotes means if you wish you can also write them in a single quotes
also like as is a good person, that also can be done, but anyway I am trying to explain
you in a very lucid way.
10
748
So, now what will happen? That R will start working on it and it will try to pick up the
first name from here Professor Singh or say here and it will try to write or combine it
with all other strings which is here Professor Singh “is” “a good” “person” and after that
it will come to the next name in the vector names and it will pick up here, Mr. Venkat
and once again it will try to operate it with is a good person. So, it will print here Mr.
Venkat is a good person.
And you can see here now this Professor Singh is a good person is a single string and
similarly Mr. Venkat is a good person is a single string. And similarly then at the end it
will try to pick up the third name which is here Dr. Jha and it will try to add here is a
good person. So, it is trying to combine here Dr. Jha, then is then a good and then person.
So, these are 4 strings, but they are combined together and you get here an outcome, Dr.
Jha is a good person and this is now single string. So, that is the advantage actually in the
paste command and now you can see as you are moving further you can simply use these
things in a different way.
And now I try to show you here the application of collapse also and then I will try to
show it on the R console. So, this option here collapse c o l l a p s e this is a parameter
that defines a top level separator and instruct the paste command to concatenate the
generated string using that separator, right.
11
749
For example, I will take here the same example and I will try to operate here with the
collapse. You can see here I try to take here the same data vector which is has 3 values
Professor Singh Mr. Venkat and Dr. Jha, right. And then I try to use here the option here
paste and then I try to pick up the names from here and then I try to use here the same
string is and another string are good and that the last string as person, right.
But now I add here collapse is equal to and. I have given here a black box just to indicate
that how this and is appearing in the outcome, there is no other reason. Now, if you try to
see here this outcome, how the outcome is working. So, it tries to pick up the first name
here from here names, this is a Professor Singh and then it tries to combine it with
Professor Singh is a good person full stop and after this it tries to add here and this is this
and is added.
Here now in the second step, it will try to pick up the second name Mr. Venkat and it
will try to add here is a good person and it will try to print here Mr. Venkat is a good
person. But after that, now it will add here this and this and here is again added from this
and which was given under the collapse and after that it tries to pick up the third name
Dr. Jha and again it tries to use the is a good person and it tries to add here Dr. Jha is a
good person.
Now, couple of things which you have to observe. First of all try to see whether this
outcome is a single string or they are the three strings. For example, if you try to see in
the earlier commands you had here 1st string here like this, Professor Singh is a good
person, 2nd string Mr. Venkat is a good person and 3rd string was Dr. Jha is a good
person.
You can see here 1, 2 and here 3, but when you are trying to work here with this
collapse, it is giving you here only one double quotes here and here and all the three
statements have been joined together. But the collapse which is joining the two
statements is controlled by this double quotes and here backslash and, right. Anyway
double quotes have to be there because it is a connective it is essentially the comma and
this comma and they are appearing together here, right.
12
750
(Refer Slide Time: 16:46)
So, this is the use of this collapse and you can see here now here very clearly, there is
here only these two double quotes are there. So, this whole is a single string. So, this is
the difference between the separator and collapse. So, let me try to show you here these
examples first on the R console and then I will try to move further, right. So, first let me
try to create here this names here data vector which is going to be used in both the
examples. So, names here is like this.
So, you can see here there are three names here Professor Singh, Mr. Venkat and Dr. Jha.
And then I will try to use here the command here for paste and in which I will simply
trying to pick up the names from these data vector and then it will going to add is a good
13
751
persons. You can see here professor sigh is a good person, Mr. Venkat is a good person,
Dr. Jha is a good person this is just added from here, right.
And the names are coming from here Professor Singh, Mr. Venkat and Dr. Jha ok. So,
now, after this I will try to take this here one more example where I try to use the paste
command, but I use to, but I am using here the option for these here collapse. So, we can
see here now these things are together yeah in order to see it I will have to reduce the
phone size otherwise because this is a long statement, so you cannot see.
But you can see here this is a single statement, you can try it on your computer also and I
will bring my this phone size back otherwise you cannot see it, anyway.
14
752
So, I have shown you it on the screen short also, right ok. Now, after this I try to take
here some more examples so that you can understand the application of this paste. So, if
you try to see here, I try to write down here paste then I am writing here two things, one
is number and say another is character.
So, I am writing number character and then once again number and character and I try to
repeat this 3 times. So, I try to use here number 1 and then the character is 1st, then
number two and then within double quotes a character is 2nd and then number 3 and then
a character within double quotes is 3rd. And I want to separate them using the hash sign.
So, now if you try to see if you try to see the outcome it will look like this, that one hash
because hash will come here, hash will come here, hash will come here, hash will come
here, hash will come here, you can see here 1 hash, is 1st hash, 2 hash is 2nd hash, 2 hash
is 3rd, right. And in case if you just modify this example and I try to give here some
blank space within this double quotes, that is 1st is 2nd and is 3rd which are the
character.
So, you can see here I am trying to add here this blank space, right, you can see here.
Yes, same blank space and you can see here like this. And now you see the effect on the
outcome and the separated is the same hash. You can see here this outcome here. Now,
there is a space here also, right and then there is a space here also.
So, now, you can see here that what I am trying to see here that when you are trying to
use the separator, then this blank space is also playing a role and if you want to give
more space artificially then you can create such a thing without any problem, ok.
15
753
Now, after this I take one more interesting example, you can see here I try to take here a
character here within double quotes Ex we could not call it as a short form exercise. And
then I am trying to call here the numbers 1 colon 5. So, this is 1, 2, 3, 4 and 5 and after
that I write down here the separator sep is equal to say underscore within the double
quotes.
Now, if you try to see what will be the outcome this x is going to be attached, this Ex is
going to be attached with each of this number and then these numbers are going to be
separated by this underscore sign and this will create one say string for each of the
number. So, you can see here the outcome looks here like this x, then the separator
underscore and then the number here 1, then Ex then separator and then number 2 then
Ex and then underscore number 3 and the same thing is with here 4 and 5.
And each of them is a separate string, you can see here they are in double quotes. So, 1st
string, 2nd string, 3rd string, 4th string, 5th string. So, you can call it number 1, number
2, number 3, number 4 and number 5. Now, suppose I want to call here a particular
string. So, how to call it? So, I am trying to store all these outcomes in a variable here x.
So, suppose I want to call the first strain.
So, what I have to do here? That I simply have to write down here x and then within
square bracket the index number. So, I have to write down here 1 and it will give me
here this value here you can see. And similarly, if I want to call here the second value in
the x, I have to simply write down here x and inside the square bracket two which is
giving the location of the second string or the index and it will give you here the second
value in the outcome Ex 2 and similarly you can go for say x 3, x 5, etc. So, that is not
very difficult thing, right.
16
754
(Refer Slide Time: 21:37)
And now in the same line I try to consider the same example, but I try to use here the
collapse also. So, what happened that I take the same example paste, then within double
quotes Ex and the number one to 5 and the separator here is underscore that is the same
thing which we considered in this earlier example here you can see here.
But now I add here collapse is equal to now, there is no space here. It is just two double
quotes, right. Now, if you try to see what happens, whatever was the outcome earlier that
there was some space here which I am trying to now mark in the red color, this space this
is space, this is space, this is space.
This is now removed and now you can see here, there is no space here, no space here, no
space here and no space here, but all the other operations are going to be the same is
trying to take here Ex then it is trying to separate the other value by the underscore and
then it is trying to take it here 1. And similarly, then it try to take here Ex then
underscore and then 2, then Ex underscore and then 3 and so on.
But now there are only two double quotes. So, this is a single string, right. So, here if you
want to call the first string, then it will you can also write down here as x within square
bracket as 1, but it will give you the same outcome. There will not be x 2 or x 3 or like
this, right.
17
755
(Refer Slide Time: 22:57)
So, and if you try to see the difference between the two outcomes, I have put both the
things on a single screen. So, you can see here this part is the same in both the cases,
only this collapse is changing, but now because of this whatever was the space between
the two numbers between the two strings here in the 1st case, which is this is removed in
the 2nd case, right they are joined together and that is the role of the collapse set.
All this exercise 1 and exercise 2, exercise 3 and exercise 4 and exercise 5, they just
collapse together and they are joined, right. So, let me try to show you first these
examples and then I try to give you one more, result with paste and paste 0.
18
756
So, if I try to take here this type of operation here, you can see here like this. So, you can
see here this is the one is first etc. and suppose I just remove the space and that is all and
then you try to compare. You can see here one is there is 1 space here and there is 1 is
there is no space here. And the same thing can happen together also, right. So, similarly
if you try to take here this example here, where you are trying to paste exercise one to 5
with the separator underscore.
You can see here x comes here like this and if you want to call here x the first value then
you write x and 1 within the square bracket there will be this will give you the 1st value.
If you want to call the 2nd value just write x 2 inside the square bracket and for the 3rd
value just write 3, for the 4th value just write 4, for 5th value you write only Ex 5, right.
So, this is how you can do it very easily. Now, in case if you try to use here the
parameter collapse also, then what will happen? So, you can see here this is your here
now you are trying to use a like this x is equal to paste, with the same up to here this is
the same, but now here I have collapse. So, you can see here this comes out to here like
this. So, this separation, this is removed by this collapse operator and you get here only
here a single string.
19
757
For example, now if you want to call the first value in this new string x, this is like this.
But if you try to call the second string, there is no because there is only 1 because all of
them are joined into a single string, right, ok.
So, after this I give you very interesting example for the use of paste and paste0, right.
So, when you are trying to use the paste function there is an alternative which is paste0,
which is written here like the paste and then this is here number 0, right. So, actually
both of them they have got the same outcome when they are used over a single vector
and they were thus exactly in the same way for example, in the paste I have shown you
that first all the values inside the argument they are converted using the command as dot
character.
And then they are joined together the same thing happens in the paste0 also. For
example, if you try to write down here a single vector say one to 10 which are the values
one to 10 and if you want to use here paste or paste 0, you can see here when you use
paste0, this has this type of outcome.
And when you try to use here paste command over one colon 10 then they have the same
command actually and if you try to see there is no difference that you can observe very
clearly from the screenshot. There is no difference in the use of paste0 and paste when
you are trying to work with single vector for the values 1, 2, 3, 4 up to 10.
20
758
(Refer Slide Time: 26:17)
But now I try to show you under what type of condition they are going to make a
difference, right. Suppose I try to take here the numbers here paste0, with number 1 to 10
and after that I want to concatenate them in a vectorized way and for that I try to give
here a data vector here, which has like within double quotes “st” then within double
quotes “nd” and within double quotes this is “rd”.
What are these things? You know, like when we write the 1st so we write one like “st”.
When we write 2 so this is your “nd” when we write third it like a three and then “rd”
and so on. So, these are those “st” “nd” and “rd” and then I try to repeat here see here
this th for 7 time, right. For example, if you try to say fourth and then here 5th and then
6th and so on, 7th, 8th, 9, 10th etcetera, right.
So, this is what I want to print. So, if you try to see now, I have made it clear what I
want. II want to print here 1st, 2nd, 3rd, 4, 5th, 6th, 7th, 8th, and 9th, and 10th. So, for
that I am trying to get the numbers 1, 2, 3, 4 from this statement and after that these
alphabet like as “st” “nd” etcetera they are coming from another vector, which is here
like this. We can see here now the outcome is going to be here like this, 1st the number 1
will be joined with this, here first number 2 will be joined here with the 2nd.
Then 3rd 3 will be joined with here the 3rd from this vector and then 4th 5th, 6th, 7th,
8th, 9th, 10th which are total 7 they will be joined here with the “th”. So, this you can see
21
759
here that how efficiently you can operate with this thing, with paste0 command. Now,
the question is cant you do the same thing with the paste command also? Yes you can do
it, but if you try to see the difference in the outcome, it will be clear. So, you can see here
paste 1 to 10 and the same command here.
I simply try to replace paste0, with paste and you can see here this is the outcome, there
is a blank space which is given by the paste command which is the defaults single space.
Unless until you try to write down here the collapse, you remove this space this will not
work. So, this is the only advantage of using the paste0 command and this is here the
screenshot.
So, let me try to show you these outcomes on the R console here and then I will try to
then we will finish this lecture today.
So, you can see here this is here like this, this is like this and if you try to simply remove
here paste0, with paste like this it is like this. So, this is how we can actually work with
paste and paste0 and now we come to an end to this lecture and you have seen that this
was a pretty interesting lecture that using the paste command and if you try to think in a
different way you can do wonders, right.
So, its not actually wonder these are the needs, these are the requirements because when
you try to produce a report, then definitely every report has some specific requirement.
22
760
And in order to fulfill those requirements you have two options either you try to write
down a completely new function. Or you try to use these commands intelligently and try
to combine them together so that they give you the same output which you would obtain
after writing a new function.
I am not doubting on your capability to write a new function, but this is much straight
forward and easier. And since this is an elementary level course. So, I would not
recommend you at this moment to write the function to do these things which can be
obtained directly by using these commands. And that was the advantage of the R
software. So, now, once again you have ample of opportunities to create the examples,
try to think what you can do, what are the different possibilities with the R software.
So, why do not you take some more examples and try to see that wherever you are
working, what is the whatever is your need, whatever type of report. You always
generate try to take some segments of that report and try to see how you can do all those
things with the R software. So, with this request, I stop here and I will see you in the next
lecture, till then goodbye.
23
761
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Hello friends. Welcome to the course Foundations of R Software. You can recall that in
the last lecture we had talked about that how you can paste different types of strings. So,
now, in this lecture we will continue on a similar topic, but we will try to learn here how
you can split the string. Well, this is a very common operation that sometimes you need
in the data manipulations and database at some particular values which you want to
define.
So, how to do it in the R software this is the topic what we are going to learn in this
lecture. And for that I will try to take a couple of examples and through those examples I
will try to show you that how R can also be used to split the strings at some desired
points. So, let us begin our lecture.
So, in case if you want to split the elements of a character vector, the command is strsplit
which means string split, right. Now, this split can be a single character, that can also be
762
a character string, right. So, because I will try to explain you in this lecture through some
example that what do I mean by saying that the split can be a single character or
character string. So, the command here is like this, that you write down here strsplit all
lower case alphabet and then you have to give the string or the character vector which
you want to split.
And then after that you have to give here an option here split and this is character vector
that contains an expression and there you want to inform the R software that at a this is
the point which is given under the split. There the string under the x has it to be splitted.
And then you have here one more command here fixed that can take two possible values
that is true or false.
So, if it is true then it matches the split exactly, otherwise it will use the usual regular
expressions, right. So, I will try to show you with some example, then all these
statements will become very clear.
So, now suppose I try to take a statement here like this, the syntax of paste is available in
the online help and you can see here I am trying to give here certain characters and an
exclamatory signs at these places, right. So, this is your here statement and you want to
suppose split this statement x, at all those places where you have an exclamatory sign.
763
So, this will be splitted here, splitted here, splitted here, splitted here, here and then
finally, means say here like this.
So, now, how to get it done? So, this is very simple in R that you can do such an
operation. You simply write down here strsplit and then you try to write down this
expression here, as I told you. And then you try to use here the option split equal to and
now within double quotes try to give the character at which you want to split. So, you
can see here as soon as you execute it you will get here this type of outcome.
And if you try to understand this outcome, you can see here this outcome this is the split
of this thing you can see here. And similarly, if you try to take here this outcome this
syntax and, this is here the 2nd split, right. And similarly, if you try to take here the 3rd
one off & and this is here the 3rd split. And similarly, if you try to take here the 4th one
here this is a paste & and. So, this is your here this paste&.
And then you have here this is also here and after that you have here the and available.
So, this is here this one and then after this you have here in the online help like this here.
So, this is coming here. So, you can see here this is what is happening, that this string
split command is simply trying to split the statement at those places where you have
given the instruction under the option split, right.
764
And this is here the screenshot of the same operation.
Now, I try to give you here this one more example and this is the same statement what I
just explained you, but now I am trying to change my character at which I want to split.
So, first you try to see, I am trying to give here now the split here as say like and&
exclamatory sign within the double quotes, earlier I had given only the exclamatory sign.
So, if you try to see what will happen wherever is the exclamatory sign this is going to be
splitted. So, if you try to look at the outcome and the statement what you have given you
can understand it very easily and after that if you try to see the next symbol is coming
here. And after this the next symbol is here, after that here is the next symbol. What do
you think about this? This is not the symbol; why? Because you have given here and &
exclamatory sign whereas, this is here exclamatory sign and &.
So, it will not be splitted here and after this if you try to see in continuation this is here
exclamatory sign and here and. So, it is not going to be splitted here also. So, this split is
going to happen here at the then syntax, then it off then here paste and after that the
entire string will come at one place. So, if you try to see here the first outcome here is
corresponding to this the because the split is happening here and then here it is here
syntax, which is happening here and then if you try to see here the third here is off which
is happening here.
765
And then after that you have here paste. So, paste is happening here and after that there is
no more splitting and hold this statement this and here this is coming at one place. So,
you can see here this is how the things are happening, in this when we try to split the
string and this is here the screenshot of the same outcome, which I just shown you here,
right.
766
So, you can see here. So, now similarly if you try to now replace your here this spit
command here by this and & exclamatory sign. Now, you can see what will happen, this
exclamatory sign and & where is this happening. If you try to see this is here and
nowhere else, if you try to see you cannot consider this, you cannot consider this, you
cannot consider this, you cannot even consider this because they are and & exclamatory
sign, right. So, what will happen here? That if you try to take the same contents here and
if you try to split at this symbol here.
Then it is going to be broken only at one place here and you can see here very easily that
this is the first split which is happening here with this part and the second part which is
here like this and here this, which is happening here this, right. So, you just have to just
follow my this pen, that how I am doing it, but you can see here this is how the string
split work.
So, now in this case if you want to access any particular component how are you going
to work on this? Now, actually you have in different types of data objects. So, similarly
if you try to look here, that in this case you have to observe here like this one this is
inside double square bracket it is here one and after this is the value which is here, this is
here bracket number 1, square bracket number 1 and the second value is here bracket
number 2, square bracket.
767
So, if you want to suppose call the first element here what you have to write suppose you
have stored this outcome in the vector y, you have to write down here the complete
address, then first you write the one in the double square bracket and then you try to
write down the location of your element which you want to call, right.
So, this is exactly what I have done here. That if you try to write down here like this that
y and then first you have to write down the double square brackets and then you try to
write down here this 1. So, this is calling this part this is the first part of your split and
similarly if you want to call the second part of the split. How you can do it? You simply
have to write this 1 and here 2, and if you try to see here this is like this available in the
online help.
And if you try to verify it here, this is like this y and then you are simply trying to write
down here 1 and then it is here 2 and this 2nd value will appear here.
And this is here the screenshot of the same operation. Now, in case if you try to see, what
will really happen if you try to write down here y 1 and then after you write 3. So, in the
double square brackets you are write here 1 and then here 3 and then you get here the
outcome like edge here NA.
Why this is NA? Because this is not available, if you try to see here there are only two
partition, this is the partition number 1 and this is here the partition number 2. So, there
7
768
is no 3rd partitioning. So, it is giving you here an A, right and this is the same outcome
which you can see here in this screenshot. So, let us try to first see these operations on
the R console and then I will try to give you some more aspects.
So, first let me try to write down this here x and you can see here. So, you can see here x
is here like this. Now, if you try to write down here the command for this here string is
split and you are trying to write down here like this split is equal to exclamatory sign. So,
wherever is your exclamatory sign, this is partitioning here, right.
And now in this case if you try to write down the command here that split is equal to and
exclamatory sign, you can see here, this is here this is here and so on. So, you can see
here the change in the outcome. You can see here there is a difference here, now you can
see here means earlier there were seven partitions now you have only here five partitions.
769
And similarly, if you try to do here like this, let me try to clear the screen and then if you
try to see here this is your here x. And if you try to write down here exclamatory sign
and and, right, then you know that what is going to happen there is going to be only here
two outcome, right. So, similarly if you try to save this outcome in some here variable
here y and you want to call the first element.
So, first you have to write down here this 1 and then you try to write down here this one,
right. So, it will give you the first value and similarly if you try to write down here 2, it
will give you here the second value, right and, if you try to give here y 3. So, you can see
here there is no y 3. So, it will give you here the NA, right. So, this is how you can
access any particular value after splitting means any partition after splitting, right.
So, now we come back to our slides and now I try to give you here one more example.
And this example is very interesting that you will see that I will try to split some dates to
a matrix. This looks very strange how can you convert some dates here to a matrix. So,
what I want here is that I have here this 4 dates, 24-7-2020, 24-July-2020, then 25th-
August-2021 and then you have here 26-September-2022 and then 27th-October-2023.
And suppose I want to write down here a matrix like this one, that in the first column of
the matrix I need here years like this then in the second column I want to write down
here, the months like a 7, 8, 9 and here 10. And then I want to write down here the dates,
770
like as here this is 24th-July 1st value, 2nd value 25th-August 3rd value is 26th-
September and 4th value is 20 so here 7 October, right.
So, first I try to make a split of this data vector of four dates, at a point wherever I have
this hyphen. So, you can see here this will be splitted here, then here, then here, then
here, then here, here, right. So, for that I try to give here a statement strsplit, then here
inside the parenthesis date and then within double quotes I try to give here the hyphen
sign. So, you can see here now I have here this splitting, but you can see here the
problem now is that you have here splitting, which is inside the double quotes.
So, these are essentially your characters number 1, number 2 you can see here that in the
earlier examples you always took only one string. So, there was a partitioning and you
were only observing there the this one inside the double brackets. But now here you can
see here you have here say string number 1, string number 2 string number 3 and string
number here 4. So, that is why you have here this four values in the square brackets, 1, 2,
3 and here 4. And after this whatever are the dates they are partition, right.
So, now what you can do? That now, if you try to see this outcome is going to be in the
form of a list. So, first you would like to use the command here unlist, right. So, as soon
as you try to here write down here unless, what will happen? The data which is stored in
the date split, right, that will be free from this list. And then now you are trying to use
10
771
here the command here matrix and then you try to use here nrow is equal to 4, ncol is
equal to 3 and byrow is equal to TRUE.
And you can see here this data will be arranged here like this, right. So, this is your here
1st column of the year, 2nd column of the months and 3rd column of the date. So, this is
your here year, then here months and then your here dates, right, but you have to observe
here one thing, that once again inside this matrix all the values are in the form of a
character only, they are inside the double quote, right.
So, now if you want to get rid of this character and you want to convert it into a number,
because if you want to make any mathematical manipulations over the matrix, then all
the elements inside the matrix have to be in the form of some number. So, now, you can
use now here as new dot numeric and then you try to have this data that you have
obtained through the unlist of datesplit data vector, right.
And again you are trying to use here nrow equal to 4 and ncol is equal to 3 and byrow is
equal to TRUE and now you get here the same outcome you can see here. If you try to
compare it here with this outcome so, when you are trying to use here only unless it was
going to give you the data which is in the format of character, but if you try to use once
again numeric over this data what you have obtained and the earlier outcome now this
11
772
will give you here a numeric value. So, if you try to see here, the reason why I took this
example, right.
And you can see here this is the screenshot, right. So, in this screenshot you can see these
are your here dates, when I try to split it they are getting here splitted, but everything is
in the form of a list, you can see here these are the addresses, right. And so now, you
wanted to get a rid of these addresses.
So, you first use the command unlist and then you use the command as dot numeric over
this data what you have obtained here and you get here the mat, right. So, now let me
first try to show you this outcome on the this screen, on the R console and then I try to
show you here what is going to happen, right.
12
773
So, let me try to just copy this here data. So, first let me try to copy here these dates. So,
these are your here dates. So, you can see here there are 4 dates, right. Now, you try to
use here the command here, string with split over this data here and you try to store it in
the value date split datesplt, you can see here like this, right.
So, now if you want to see what is the mode of this datesplt, you can see here this is a
“list” and that is why you want to first made this outcome independent of the features of
the list. So, that is why I have to use here the command here unlist over this data vector
and then I will try to show you that what is happening.
13
774
So, if you try to see here now what happens here, datemat. Now, you have used the
command here unless on this earlier obtained data and you can see here that this is here.
Now, like this, but this is in the form of character and if you try to find out the mode of
this outcome here you can see here very clearly this is character. So, you want to get rid
of this value here and you try to use here the command here as dot numeric.
And once you try to execute it here you can see here this date matrix comes here like this
and if you try to find out here the mode of this date matrix, you can see here now this is
numeric, right. So, if you want to do any mathematical operations over this matrix, then
you can do it.
So, the reason why I took this example was that, if I ask you in the beginning of this
example that if you have this type of dates and can you convert it into a matrix, possibly
it will be difficult for you to think, but if you try to see whatever commands we have
used here. If you simply try to use them intelligently in a logical way, possibly you can
achieve and this is the best part of the programming that how you have to think the logic
in which you can execute and can get what you want, right.
14
775
(Refer Slide Time: 17:07)
So, just as I shown you in the beginning, that when you are trying to use this string split
then you can split it at strings as well as characters. So, up to now, whatever you have
seen here you can see in this example you are trying to split these things as some strings,
right. But now, I try to show you here that even if you want to split the individual
characters, what you can say here you simply try to write down here string split and then
in the split you do not write anything within double quotes you just do not say anything.
Just two double quotes that is all, right. For example, if I want to write down my name
here, here say x say “s” “h” “a” “l” “a” “b” “h” which is a string here, but if you try to
split it with this command you can see here this is splitted after every alphabet and this is
here that is the screenshot. So, you can see here this is not a very difficult command and
just try to show you here that how you can convert the individual string into strings as
well as individual alphabets or individual character, right.
15
776
(Refer Slide Time: 17:59)
So, you can see here it is here like this ok. So, now, we come to and end to this lecture.
So, this was a short lecture and I have just shown you here only one command that how
you can split a string. But the example which I have shown you that will indicate you
that why I have taken this command here and I have tried to show you that well in the
beginning it looks that as if you are simply going to know how you are going to split the
string that is correct.
But then how it is going to be used in various types of application that was not clear in
the beginning. But now, at the end of the lecture you can see that how you have splitted
the string and then you have converted the things into some numerical values. Similarly,
you can think about such applications and then try to execute it.
Now, it depends on your capability that how you can think as you try to do it you will
become a very good programmer. So, you try to practice it and try to become a
wonderful programmer and I will see you in the next lecture. Till then, good bye.
16
777
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Hello, friends. Welcome to the course foundations of R software. And, you can recall
that in the last couple of lectures, we are trying to learn different types of manipulations
with the string. And, we had learnt earlier that how you can join different strings using
the pace function and then how you can split them using the string split function.
And, now in this lecture I will try to continue on the same lines and we will try to learn
some other types of operations very small operation which are related to the string
manipulations and alphabets. So, I will try to take here couple of examples and through
those examples I will try to explain you that how these small functions are working when
we are trying to deal with different types of characters and strings. So, let us try to begin
the lecture and try to understand that what type of commands are we going to learn in
this lecture, ok.
778
So, the first function which I want to explain you here this nchar. Actually you see many
times what happens that you are writing a statement or a string and you want to count
that how many characters are there, right. For example, if you try to recall that when you
are trying to work with some word document like as in the MS Word and when you are
trying to write something.
Then on the left hand side in the bottom you always get how many characters you have
used, how many words you have used etcetera type of information. So, that type of
information how you can obtain in the R software for example, if you are given a
statement and you want to count that how many characters are there then how are you
going to use it, right. For that we use here the option or the function here say nchar
which is simply the short form of the say number of characters, right.
So, what it does it takes the character vector inside it is argument and tries to return
vector whose elements contain the size of the corresponding elements which are inside
the x, right and similarly we have here one more function here nzchar. And, the
difference between nchar and nzchar is that nchar this gives us the output which is
numeric. And, this command nzchar it is gives us the output in terms of TRUE and
FALSE.
So, what it does actually you see whenever you are trying to write something the string
can be empty or the string can have some characters. So, this nzchar command helps us
in finding out if the elements of a character vector are non empty strings or not, right. If
they are not empty string or they are empty string then it will try to indicate it using the
logical output TRUE and FALSE.
779
Well, let us try to take some example and then I will try to show you that what is the
difference between the use of these two functions. But, before that when you are trying
to use this function here nchar then how to use it you write nchar then within the
parenthesis you have to give the string in which you want to count the character and then
you have couple of options.
For example, it is here c it is a type. So, this type is a character string and it gives you a
partial matching to one of the option like is here bytes, characters or say width, right and
yeah. So, you have here different types of option well, I am not going to take here all the
option because this is a very simple command which you can understand, but I will leave
it up to you that you try to use this is a different types of option and try to see what it
gives, right, ok.
And, then after this here we have a function here allowNA. So, it is here a double low
capital NA. So, this is also a logical variable TRUE and FALSE and as the name
suggests that whether you want to allow the NA values or not in the output that you can
control from here. Similarly, you have here one more option here keepNA, k e e p is in
lowercase alphabet and NA is in the upper case alphabet. So, it is also a logical variable
and it tries to answer the question should NA be returned when x is NA, right?
780
So, I will try to show you these options and other type of outcome through the example.
So, let us try to consider here a string say R course 24 dot 0 7 dot 2022 – this is my string
you can see here, these are my double quotes and I want to count here that how many
characters are used in this string x.
So, I try to write down here nchar and inside the parenthesis I simply write down here x
and you can see here how it is trying to give us the value. So, please try to count how
many characters are there. See here 1, now suppose I am now confused whether I should
include here the blank space or not. So, at this moment I say ok blank space should not
be counted. So, I am skipping blank.
Later on we will see whether the R is counting the blank space or not because as I always
say towards the end that you have to understand how the R is working. So, this is my
here first value R, then c is second third fourth fifth 6 7 8 9 10 11 12 13 14 15 16 17.
And, so, now, you can see here there are here two blank spaces which are also to be
counted. So, this will give you the number 19 here.
So, the rule here is this command nchar this is also counting the blank space n.
Obviously, they are inside the double quotes. So, this is a whole string. So, black space is
automatically going to be counted as a one of the character. Similarly, I try to take here
one more example and I try to write down here number of participant colon and then 25.
So, let us try to first count that how many numbers are there and now, you know how to
count it. So, this is here 1 2 3 4 5 6 and then blank space is 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 and then blank space 24 25 26. And, you can see here this is here the
outcome of nchar and inside the parenthesis I gave here y, right. So, you can see here it is
simply trying to count the number of characters in the string, right.
781
(Refer Slide Time: 06:27)
And, similarly if you try to use the command here nzchar on the same variables then let
us try to see what it does, right. So, it is output is going to be either logical TRUE or
logical FALSE, right. So, if you try to see here in this string is there anything which is
empty? No, you already have count that it has 19 characters and the same here is in the y
also there are 26 characters.
So, how it can be non empty? So, when you try to use the same statement x and y here as
you use earlier the outcome of this nzchar and nzchar y that is going to be TRUE this
means the vectors are non empty yeah after that I will try to take an example where I try
to take a empty vector and then I will try to show you what it gives, right.
782
So, this is here the screenshot of the same outcome. So, firstly, let us try to see how the
outcome will look like in the R console.
So, let me try to take it here x here is like this and let me try to take here y here as like
this. So, you can see here x is this and y is this and then I try to use here nchar x. This is
here 19 and if I try to use here nzchar, this is TRUE and similarly, if you try to take here
the nchar in the y that mean the number of characters in the y, this is giving you here 26
and if you try to use here nzchar it is giving you here TRUE, right.
So, you can see that it is not very difficult to use these commands and yeah, many times
in the data manipulations and when you are trying to do different types of operation, they
actually help us.
783
So, now, I try to show you here that how this nzchar command will work if there is an
empty argument, right. So, let me try to take here first an example where I am trying to
take here three strings apple, banana, cake – our popular example. So, these are 3
characters which are stored in the data vector here x and this is here x and if you try to
see here if you try to use here nzchar, x it is giving you here TRUE, TRUE, TRUE.
This is obviously, apple has how many characters? 1 2 3 4 5. Banana has how many
characters? 1 2 3 4 5 6 and cake has 1 2 3 4. So, definitely they are non empty. So, it is
trying to give here this TRUE and TRUE for apple banana and cake respectively, right;
that means, each of the string is non empty.
Now, you see how do I modify the same example. I try to take here the apple I try to take
here the cake, but for the banana I drop it, but I simply try to write down here only
double quotes and there is no blank space also that is what you have to keep in mind if
you try to give the blank space, then it will give you that the string is non empty.
Now, if you try to see here this is your here outcome first is here apple, then it is here
blank; blank means you know nothing and then it is here cake. But, since you are you
have given here double quotes, so that means, it is going to be a string means if you do
not give the double quotes then it is say taken only here as a blank space. So, that is the
reason that now you have created a string which is blank and then if you try to use here
nzchar it will give you here TRUE FALSE and TRUE.
So, this TRUE is correspond to here TRUE to here apple which has 5 characters, then
FALSE here is like this because there are no characters. So, there are 0 characters and
then you have here four characters in the cake. So, it is corresponding to TRUE. So, this
is a utility of this nzchar, right. So, let me try to show you first this operation on the R
console so that you can understand and you are confident, ok.
784
(Refer Slide Time: 10:02)
So, if you try to see here like this. So, if you try to see here nzchar this is here x, this is
here like this TRUE and if you try to simply use here only nchar, see what it gives? It
will give you that there are 5 6 4 values, right. So, similarly now I try to modify my here
this x and you try to see what I am trying to do here I am simply to delete all the values
and I have only here just these two double quotes, right. Now, this is and I suppose I try
to write down here as say y.
So, now, if you try to find out here what are the nzchar inside this here y you can see
here this is giving you here TRUE, then FALSE and then here TRUE. And, if you try to
use here the command here nchar y this will give you here 5 0 4. So, you can see here.
Now, this has 0 character so, it is trying to give you a reply in terms of the logical
FALSE, but the more important part is that you have to understand how to interpret the
result, right.
785
So, if you try to see here now I try to give you here some more example that how you
can count the numbers and the or the characters. So, I have shown you here that if I try to
take the string here let us say apple, banana, cake then nchar is going to give you the
outcome 5 6 4. So, 1 2 3 4 5, 5 it is alphabets or 5 characters in apple; banana 1 2 3 4 5 6
characters in banana; c a k e cake which has 4 characters. So, this answer is coming here
as say 5 6 4.
Now, similarly if you try to take here three numbers 2, 4 and 6, what do you expect?
What the command nchar over this y will do? It will give you the answer here 1 1 1; that
means, it is here 1 character, 1 character and 1 character if you try to see it is not trying
to read the number, but it is trying to read the number of characters.
Similarly, if you try to modify this example and try to take care these 3 number such that
they are 11, 222 and 3333 such that the first step this 11 has 2 characters; the second
number 222 it has 3 characters and 3333 it has 4 characters. So, if you try to operate your
command here nchar over here z it will give you here 2 3 4, right; so, 2 in 11, 3 in 222
and 4 in 3333.
Now, in case if you try to take here any other number with decimal point then what
happens. So, you have to remember that the decimal points are also taken as a character.
So, if you try to take here this data vector where I have taken here these 3 values. So, the
first value is here 1.1, this is a 1 2 and 3. So, the number of characters in 1.1 is 3, then
my next value is 2.22. So, in this case 1 2 3 4. So, I have 4 characters in 2.22 and
similarly here in 3.333 it is 1 2 3 4 5, there are 5 characters in 3.333.
And, if you try to see here the outcome of this nchar over z1 which is the this vector here
it has the outcome 3 4 5 and this is here the same outcome which I will try to show you
on the R software, right. Now, so, before I try to move forward let me try to show you
these things over here. So, let me try to yeah.
786
(Refer Slide Time: 13:10)
So, although I have shown you, but still I will try to show you here nchar over x. So, this
is going to be here 5 6 4. Now, if you try to take here is a data vector of sub number 3, 5,
and 7 it will be your here if you try to see the number of characters in y will be here 1 1 1
and if you try to modify that first value is 3, second value becomes here 55 two digit and
third value become here suppose five digit 77777.
Now, in case if you try to see your here nchar y this will be here 1 2 and 5. So, there is
one character in 3, two characters in 55 and five characters in 77777, right. So, this is the
outcome here now in case if you now try to convert these numbers into some digit with
decimal point like a 3.6, 55.76 and 77777. say here 89.
And, then if you try to say here nchar y. So, this will be your here say 3 5 and 8. So,
there are 1 2 3 and then 1 2 3 4 5 and then 1 2 3 4 5 6 7 8. So, this is here the outcome 3
5 8, right. So, these very simple operation by which you can count the number of
characters in any string or any number.
10
787
(Refer Slide Time: 14:21)
Now, after this I try to give you here one more operation which is related to the alphabets
and that is a very simple operation that suppose you have some characters or some
strings which are based on some alphabet. And, you want to change the lower alphabets
into upper case alphabet or say upper case alphabets into the lower case alphabet.
And, if you try to see if you have used the software like MS Word, then if you try to see
on the left hand side on the top there is one option here where there is an option like
something like this say small a capital A etcetera and it gives you here different option
change to lower case change to upper case change to contents case etcetera.
So, similar type of operation many times are needed when we are trying to deal with
different types of a strings and particularly, when you are trying to prepare a format for
generating a report these types of operations are sometimes needed. For example, if I
take a very simple example suppose 5 people are trying to prepare a report and you want
the heading should be in the capital letters.
Now, suppose different people have used different formats. Somebody has used the
sentence case like as the first alphabet as in the capital letter and all other remaining in
the lower case alphabet or somebody has used only the lower case; somebody has used
the upper case etcetera. So, if you want to generate the report which is uniform for all
then you can simply use here this type of commands.
11
788
So, whatever characters are there they will be converted into either lower case or to the
upper cases. So, how to get it done? This is I would like to show you with the command
this to lower t o l o w e R and to upper t o u double p e r. So, all in lower case alphabets
they are written. So, as the name suggest to lower; that means, whatever is there try to
convert it into the lower case alphabets, right.
So, this to lower what it will do? It will try to convert all the lowercase alphabet means if
you try to give any string and if it has any characters in the upper case all the characters
are going to be converted into the lower case. And similarly, this command here to upper
if you try to give any string here in the x which has suppose some lowercase alphabets.
So, they are going to be converted into all uppercase alphabet, right.
And, yeah so, the first question also comes in the mind that if there are some numbers
like as non alphabetic characters like a symbols or numbers, then they will be remain
intact they are not going to be changed.
So, let me try to take here some example and try to show you the application of these two
commands. So, let me try to take here this first example where I try to take here as string
R course will start from 24 dot 07 dot 2022. So, you can see here this is here in capital
letters whereas, everything here is in lower case alphabet and after this I have here
numbers or some special characters.
12
789
Now, when I try to use here this command here to upper all in lower case alphabets and I
write here x inside the parenthesis, then you can see here what will happen. R is already
in the capital letter. So, it will remain as upper case and, but all other letters they are
going to be converted into the upper case you can see here and these numbers they will
remain intact as such.
Now, similarly I try to take here one more example and I try to take here a string z in
which I am trying to write INDIAN INSTITUTE OF TECHNOLOGY and all are written
in the upper case alphabets. Now, I try to use here the command here tolower t o l o w e r
and inside the parenthesis I write the string here z we can see here now everything is
converted into the lower case alphabets, right you can see here. So, this is how this
actually works, right.
So, if you try to see this is here the screenshot of the same outcome and let me try to first
show you these operations on the R console so that you can be here more confident.
13
790
(Refer Slide Time: 18:12)
So, I try to take here say here x here x is like this and if I try to see here say toupper and
here x you can see here everything is converted into means upper. And in the same if I
try to take it here tolower, then what will happen? You can see here that only R is in the
upper case alphabet. So, this will also be converted into a lower and rest everything was
already in the lower case. So, it will remain as such.
And, similarly if I try to take care this next example where I am trying to take care z
which is in the upper case alphabets and if I try to say here tolower. So, here z and now
you can see here everything is converted here like this, right and in case if you try to use
here say here toupper and then I you try to use here the same command which you have
obtained say tolower and here z you can see what happens this is converted once again
into this thing.
Yeah, first you are trying to lower case and then you are trying to do the upper case. So,
this is quite obvious, right so, ok. So, now we come to an end to this lecture and you can
see that in this lecture we have considered a very small operations which are very useful
also when you are trying to generate different types of reports and you want to make
such a small operations. So, they can be done automatically using this simple functions.
So, now once again, you try to see that what are the areas in which you can use such
commands. I am sure that you are working in somewhere either as a student or as a
14
791
professional and you might be needing these types of operations in your work. So, try to
see and then try to practice it that what really happens when you are trying to experiment
with these operation. So, you try to practice it.
And, I will see you in the next lecture. Till then, goodbye.
15
792
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 40
Strings - Display and Formatting
Substitution and Replacement of Setting
Hello friends. Welcome to the course Foundations of R Software. And you can recall
that in the last couple of lectures, we were talking about different types of operation
related to the strings like as partitioning, pasting, etc. So, in this lecture we are going to
continue with the same topic, and today we are going to learn about a new aspect of
string operation that is find or replace.
You know that whenever you are trying to work with any data file or any document.
Usually, there is a command like find or say control f. So, what do you do? For example,
if you want to find that, if there is some string, some character, or some number, you
want to find that where it is inside the document.
So, you will go to the find option and then you will type the number, string, whatever
you want to search and then you say ok. So, after that it will search the entire document
and it will show you that where is this word, letter, or say any string.
Similarly, you have one more operation that is replace. In replace, what do you do? That
suppose in the document if there any number or character or a string and you want to
replace it by something else. So, then, there is an option for replace. So, there are two
boxes, in one box you write replace and then by what you want to replace.
So, the first box relates to the find and second box relates to the replace. That means,
first you are asking that first you try to find out the word where it is, and then you are
asking that please replace this word by this new word.
So, similar is the case in the R software also. That we have here two types of operation,
find and replace. So, this operation is very useful when you are trying to work in any
document, any data file or anywhere.
793
So, today, we are going to learn about these two types of commands. And we will try to
take here couple of example through which I will try to explain you that how you can
find and replace a particular string in the bigger string or a character in a string.
So, let us begin our lecture and try to see how we can do it, ok.
So, when you are trying to find a word and you want to replace it in the R software, then
we have here two functions, one is here sub and another is here gsub. So, you can see
here sub means substitution. So, these are the first 3 letters which are possibly indicating
the meaning of this function.
So, sub is indicating the substitution; that means, like replace. And after this there is here
one more option gsub, this is also the function for a substitution, but there is a difference
between sub and gsub, the way they give us the outcome and the way they work. So, this
sub is working when you want to replace only the first instance where the matching
happens and gsub is something like you want to replace all the strings or characters in
the entire document. And you can compare it with your any other document also.
For example, when you are trying to replace something, and as soon as you write the old
word and then you write the new word like this. Say old word and then you write here
the new word by which you want to replace, after that if you try to enter, it gives you a
different types of options say replace and say here replace all, right.
2
794
So, what really happens in the case of replace? In the replace, only the first matching will
be replace. But when you are trying to say replace all, then all the string wherever is the
match in the entire document that is going to be replace. So, similar is the role of here
sub and gsub. So, gsub you can say here it is a sort of global replace, right or say global
substitution.
So, now we try to understand these two command. So, as I told you that sub and gsub are
used to replace the first instance of a substring, right. So, in order to use here this sub,
what you have to do?
First you have to write down the word that you want to replace, right. It is something like
here the old word that you want to replace. And then, after this separated by comma you
have to give here new. New means, this is the new word by which you want to replace
the old word.
And where you want to do? You have to give it here. This is the string under which you
want to replace the word, right. So, this sub function finds the first instance of the old
substring within a string and replaces it with the new substring, right. So, this old will try
to search in this string that where is this string is matching, and then it will try to replace
it with the new string, right. And this function here gsub, that is global substitution, this
also does the same thing.
But what happened? It is like replace all. So, it will replace all instances of the substring,
not only the first one. And the way it is going to be used, it is here the same, that you
write down here gsub and within parenthesis, the old, the new and here the string. So, old
is the old word, new is the new string, and string is the string in which you want to
search and replace, right.
795
(Refer Slide Time: 06:11)
So, let me try to take here some example and try to show you that how it works. So, let
me try to take here a string “number of participants colon 25” and suppose I want to
replace this 25 by here 30, right. So, I can write down here like this sub and then because
it is a string although “25” is a number, but it is inside the double quotes, so it is a string.
So, you want to replace the string 25 by the new string that that is the new value “30”.
Where? Here, here in y.
So, you can see here if you try to execute it, it will come out to be number of participant
same as here, but this 25 will be replaced by here 30, right. And this is here the
screenshot also.
796
So, I will try to show you, but let me try to take here some more example. Suppose, I
take here one more string, Mr. Singh is the smart one and then I try to write once again a
sentence which has Mr. Singh. So, then I write Mr. Singh is funny too, right. So, that is
your string.
Now, what I try to do? That what I want is that I want to replace this Mr. Singh and this
Mr. Singh by a new word Professor Jha, right. So, first I try to use here the command
here sub and try to see that what happens.
So, when you are trying to use here the sub, then Mr. Singh is going to be replaced with
the Professor Jha only at the first instance. Because the control will move from the first
place, it will go in this correction. So, this is the first encounter where this sub command
will find the matching with Mr. Singh, and then as soon as it finds the matching, it will
replace it with Professor Jha.
And when you are trying to use here the command gsub, then it will move further. And it
will also find out the occurrence of Mr. Singh at the place number 2 and there also it will
replace Professor Jha.
So, in case if you try to see here the outcome of this sub command. So, you can see here
this Professor Jha is coming here at the place number 1, but at the place number 2 Mr.
Singh is still there. That is not changed. So, this is the role of sub, right. And this is here
the screenshot of the same operation.
797
(Refer Slide Time: 08:20)
And now in this case if you try to use here the gsub that is global substitution, then what
will happen? That here you have y Mr. Singh, Mr. Singh and this here y at two places,
and now you are saying here gsub. And you want to replace Mr. Singh by Professor Jha.
So, Professor Jha will be replaced at all the places in the string y, where it will encounter
Mr. Singh or where it will find the word Mr. Singh, right.
So, you can see here, this is here the outcome and here, right. And if you try to recall in
the sub, what was happening? It was replacing only at the one place. And at the second
place the Mr. Singh was still there. There was no replacement.
798
So, that is the difference between the role of sub and gsub and this is here the screenshot.
So, you can see here this is here Mr. Singh, this is Mr. Singh. And then in the case of
gsub when you are trying to replace Mr. Singh by Professor Jha, Professor Jha is
replaced here, Professor Jha is replaced here at both the places. But in the case of here
sub, only the Professor Jha is replaced at the first instance. After that Mr. Singh is not
replaced by Professor Jha.
So, this is how it actually works. So, let me try to show you these operations on the R
console also, so that you get more confidence, so, right.
If I try to take here like this here y, and then you try to replace here 25 by 30, so you can
see here this 25 is replaced by here this 30 and now you are getting here number of
participant is 30, right. So, that is pretty simple, right.
And now you try to take here this here another command here and if you try to see here
this is your here y, right. Mr. Singh is occurring here and Mr. Singh is occurring here
also. And you try to use here the first here the command here sub, right.
And if you try to see here, now this Mr. Singh at the first instant is replaced by Professor
Jha, but at the second instant Mr. Singh is also there. So, now, if you try to replace it sub
by gsub. So, you can see here at both the places Professor Jha, Professor Jha, Mr. Singh.
And at the second place Mr. Singh is replaced by the word Professor Jha, right.
7
799
Remaining is outside this range, but anyway that does not make any different. I have
shown you here the screenshot also, right.
So, after looking into this screenshot here, you can come to now one more operation
which is about find. So, this is the first command about the replacement and the global
replacement. Similarly, you have here one more statement which is for here finding a
match, right.
For example, if in a document if you simply want to find something that whether this
number or this string or a character is occurring at what places in that document, there is
an option there something like here find, right. And you try to write down these
characters or string inside this box, and then you press enter. And after that it will search
the entire document and it will show you that where this word is occurring with the
batch.
So, now, here also you have two options. First option is that the control will start finding
the word and it will stop at the first match. So, it will give you only the place where the
first match occurs number one.
And second option is that it is trying to or the control is trying to search the entire
document and it will inform you all the places in the document wherever is the matching
occur, right. Wherever the matching occurs, it will simply try to give you the location.
8
800
Now, this location can be given in two formats, first format is that in the form of the
string or in the form of some index. So, in order to do this thing, we have here two
options here which is grep and grepl, right. Sometime people get confused with this word
like as here, because it is here grep. So, sometime people get confused that rep is trying
to indicate the replacement. But you please do not get confused. It is not replacement,
but it is only finding for replacement you had this sub and gsub commands, right.
So, how to use this commands? Let me try to show you here. And they work almost
exactly in the same way as you try to find something in a document in the usual way,
right. So, the meaning of this grep is that it globally search the regular expressions, and it
will try to show you the match on your computer screen.
So, the way it will work here is this you write grep, and then pattern you want to search,
right. And where you want to search? In the string here x, right. And after that there is an
important option which is here like this, ignore dot case, all in lower case alphabets.
So, now, you can see that after reading this statement, that ignore dot case, do you think
that you can understand what it is trying to say? It is simply trying to say whether you
want to consider the lower case or say upper case alphabets or you simply want to
ignore.
801
For example, if somewhere it is written here capital R or somewhere it is small r. So, do
you want to differentiate between the capital letter and a small letter or the uppercase or
lowercase alphabets or you want to consider them both as simply R, right. So, this is the
meaning of that ignore dot case.
So, if you try to see here ignore dot case is equal to FALSE, this will try to match and it
will try to ignore the characteristic of upper case or lower case of any alphabet, right.
And if you try to say here TRUE, then it will also match the case. And then it will try to
return an integer vector of the indices of the element of x that yielded a match.
So, let us try to see what happens, right. So, what does this actually mean? That when
you are trying to find something in a string, then you have two options. First option is
that it will try to give you the location in terms of a string where the matching occurs or
the second option is that it will try to give you the index of that string where the string is
located, right.
So, as I said, means if you try to use here this ignore dot case, it will be related to the
case sensitive behavior of the characters. And if you try to use here one more option here
value, if this value is here FALSE, then there will be a vector of integer. Actually, that is
indicating the location in terms of the indices of the matches which is determined by this
grep, right.
10
802
And if this is here TRUE, then what will happen? Then, it will try to give you the whole
string itself. So, if you try to put value equal to FALSE, then you will get the outcome in
terms of some index, some numbers. And if you try to put a value equal to TRUE, then
you will get the value of the string where the match is occurring.
So, let us try to see the application of this function through this example. And after that I
will try to show you that how you are going to work with the grepel, right.
So, here I am now going to use here the option here this value equal to TRUE. So, I will
use here grep. And then within parentheses, I will try to give here the pattern. And then
the string in which I want to correspond this pattern, and then I am trying to give here
value is equal to TRUE, right.
So, in case of value equal to TRUE, what will happen? The outcome is going to be in
terms of the string. So, if I try to take here this data vector of a string, say first is here R
course, at a second place this is here exercise, third place it is here include examples of R
language. So, you can see here, this is place number 1, this is place number 2 and this is
here place number 3. So, there are 3 strings in this data vector. And I am calling it as a
str.
And then now what I want here is that in this string str, I want to find where is my this
“ex” I try to give this ex inside the double quotes and then I try to give the string in
11
803
which I want to find. And now I want to know the result in terms of the string, so I try to
give here values equal to TRUE.
Now, first you try to see where is this ex is occurring. So, you can see here first ex is
occurring here and then the next ex is occurring here. So, it is going to give you here this.
So, if you try to see here that ex is occurring in second and third. So, these two are
occurring here, but there is no ex in the string number 1 or the string which is at the
position number 1. So, that is why it is not occurring here.
So, that is why now you can see grep is trying to give you here all the strings wherever
this ex is occurring.
Now, I try to do here one thing, that I simply try to use here value is equal to FALSE,
right. In the earlier case, you have used here value is equal to TRUE, right. You can see
here. Now, I try to use here value is equal to FALSE. The same string str I try to consider
here, right and if you try to see.
The answer is now coming here as a 2, 3. What is this 2, 3? As I said this is my location
number here 1, this is my location number here 2, and this is my here location number 3.
So, there are 3 strings in the data vector str. And this 1, 2, 3 their locations are indicated
by this outcome.
12
804
So, this 2 is indicating because you have here ex, so it is indicating that ex is occurring at
the string which is placed at the second index. And then you have here ex here. So, this 3
is indicating that then your ex is occurring at the string which is on the third position in
your data vector str. So, this is the meaning of say using value equal to TRUE or FALSE.
And this here you can see this is here the screenshot of the same outcome.
And now I try to show you here one more example here, where I am trying to use the
option ignore dot case, right. That you are ignoring the property of lower case or upper
case of the alphabets, right.
13
805
So, when you are trying to use this function here grep, then you try to write down here
the pattern that we want to search, where in the string x. And then, you are using the
option here that ignore case is equal to 2. That means, you please ignore the case. I am
not bothered whether the matching is happening with respect to the lower case or with
respect to the upper case.
So, I try to consider here an example which has here 4 strings, like as first here is “R
course”, then “exercises”, then “includes example of r languages”, “in R software”. And
yeah means you can see here I am trying to use here is small r, lowercase alphabet and
here I am trying to use here capital R, right. And here also in the first string also we have
here capital R.
Now, I want to replace the capital “R’ by this in this string str which is here and I want to
ignore dot case is equal to here FALSE, right. That means, you are not allow to ignore
the case, right. Ignore case is FALSE, that means, you cannot ignore the case. And then
value is equal to here TRUE. So, value equal to TRUE mean, that means, it is going to
give you the output as a string, right. So, I want to show you here both.
So, now, if you try to see the control comes here and see here capital R, and then it finds
the capital R here at two places, this and here this. And this here is lower case alphabet.
So, when you try to use here the command grep and you say here please replace capital
R in str, then it will give you here these two outcome. This is here R course and in R
software. So, R course is coming from here and in R software it is coming from here,
right.
Now, on the other hand, in the same screen I try to show you one more option that now I
try to change this option, and I try to write down here ignore dot case is equal to TRUE,
T. So, now, if you try to operate the same command here, and now it is asking that please
find capital R in the string, str, and you have to ignore the case because it is TRUE, right.
And value is equal to TRUE means that you want the answer in terms of here is string,
otherwise only the index of the positions of the strings is given.
So, now if you try to see here this here is a matching here “R course”, which is here
happening R, then you have here another here R where you can see here like this, right.
14
806
So, this is coming here in this “exercise”. And then you have r here, and this is coming
here in the string number here 3. And then in the 4 here this is here R.
So, all the 4 strings are coming here as in the output. When you are trying to say here
that ignore case is equal to TRUE and when you are trying to say ignore case is equal to
FALSE, then only two outcomes are there, right. So, this is how you have to understand
the use of ignore dot case.
And in the same outcome if you try to use here value is equal to FALSE. That means,
you need the outcome in terms of the location indices. So, you can see here when you are
trying to ignore the case, then this R is present in all the strings at all the locations. So,
all the location 1, 2, 3, 4 are given here. It is here like this, this is here 1, this is here 2,
this is here 3 and this is here 4.
And similarly in the case of this first example, this R course and in R software, they are
occurring only at first and fourth location, right. So, now, if you try to see here all these
outcomes, I can give you in brief from this screenshot that you are trying to find out here
R, when you are trying to say here ignore case F, then R course is here where you have
here R and then in there is a R in the R software.
And if you try to change this here as value is equal to TRUE to value equal to FALSE,
then it is giving you the same thing here as a 1 and 4. And in case if you try to take the
15
807
second case, where you are trying to say ignore dot case is equal to TRUE, and the value
is equal to TRUE and FALSE, then in the case of first, you are getting here all the values
1, 2, 3, 4. And then in the second case you are getting, the same result in terms of
numbers.
So, you can see here this is not a very difficult thing. So, let me try to first show you
these outcomes in the R software, so that you become more confident. And then I will try
to something else, right. So, I try to create here this string, right.
16
808
So, this is your here string. And then you try to replace here, and then if you try to see
here it is coming out to be here. If you are trying to find out here ex, so ex is here and ex
is here. So, in second and third. So, these two are coming.
And if you try to give here value is equal to here FALSE, then you can see here it is
giving you here the location number 2 and 3. So, this is here this 2 and here this 3 that is
coming, right. So, that is how it is working, right.
Similarly, if you try to take here this example here, where I am trying to take here this
rep command with this string here is like this. And if you try to see here this grep, so you
are trying to see here consider here the capital R and try to find where this capital R is
occurring.
So, it is giving you here these two values. So, you can see here this capital R is occurring
here and capital R is occurring here. And since you are trying to say that ignore dot case
is equal to FALSE. So, here also there is lower case r, but that is ignored.
Now, in the same example, if you try to make it here TRUE, then you can see here
capital R is occurring here, small r is occurring here in the second position, then R is
occurring in the third string also and R is occurring in the fourth string also. So, you can
see here this is here coming like this.
17
809
And in case if you try to make it here value is equal to here FALSE, so you can see here
that it is coming out to be this 1, this 2, this 3 and this 4 this is coming here as 1, 2, 3 4,
right.
And in case, if you try to take here his command and you try to make it here value is
equal to here TRUE, then instead of now you take a value is equal to here FALSE, so
you can see here this is coming here 1 and 4, right. So, this is the value at the first
location and this is the value at the fourth location which is coming here as say 1 and 4.
So, you can see here this is how grep actually works.
So, now, I try to take here one more example here. And in which, what I am trying to do?
I am trying to consider here two strings, right. One here is x and y, x here is “R course 24
dot 07 dot 2021” and y here is say “Number of participants colon 25”.
And then I am trying to combine these two strings which is giving me here like a c x, y.
So, this will become here like this, that the value at x will be here and the value at y will
be here.
Now, if you try to see, I want to find “our” in this c x, y. So, I try to find out here this
“our” in this c x, y. So, now, can you find here where is this “our”? There is no our, but
if you try to see very carefully, this inside the course there is a word here o u r and that is
what is giving you here as say 1, right. And there is no o u r in here y. You can see here.
So, this is how actually it works.
18
810
And similarly if I try to repeat the same example, but now suppose I try to find out here
another word here “Num” n u m and you can see here that in this first string there is
known num, but in the second string there is here a num in the number and so this is
head position number 2, “Number of participant colon 25”.
So, when you are trying to write down here grep and say here “Num”, you can see here it
is trying to give you say here answer 2, right. So, this is how it actually works.
So, let me try to show you this operation on the R console also, and then I will try to give
you one more example. So, let me try to create here this my here x and y.
So, x here is like this, y here is like this, and after that first I try to see our word here
“our”, right. So, you can see here this is here “our” at here like this, and if you try to see
here “Num” num is occurring at 2, right.
19
811
So, that is pretty straightforward; that here I wanted to show you that you can combine
the two strings together and then you can work also. And this is here the screenshot of
the same operation that I just shown you, ok.
Now, I would like to give you here the details about the command grepl. So, grepl also
works like a grep, but the only difference is that in this case the outcome is coming in
terms of TRUE or FALSE, logical TRUE or logical FALSE, right. So, when you are
trying to find out something, then for example, in the case of grep you are getting
number or the string in which the value is present.
Now, it will try to give you the outcome in terms of logical TRUE and logical FALSE,
and this will be for all the strings in your string in which you want to start, right. For
example, if data vector has 3 different string, then this TRUE FALSE will be there 3
times.
And when the answer is TRUE, that means, that character is present, the matching is
done successfully. And if the answer is FALSE, then that means, the matching was not
successful.
So, let me try to take here one example and try to show you here. Suppose I try to take
here the same example, right that your string is str, “R course”, “exercises”, “includes
examples of R language”. And now if I try to use here grepl and I try to look here for the
20
812
word R or the alphabet R. So, you can see here now I am trying to give it here R where
in the str, now if you try to see this is here like this string number 2. Here you have the R
is present. So, it will try to give you answer here TRUE.
Then you come to here, the secondary string, here there is no R, means capital R, so this
is here FALSE. Then in the third one which is here like this there is here R. So, this R
here is present. So, it will say here TRUE and this is here the outcome. You can see a
TRUE, FALSE, and TRUE, right.
So, now, after looking into this statement TRUE. FALSE. TRUE. you have to identify
where the R is occurring. So, wherever you have an answer TRUE, the R is occurring
there.
And similarly, if I try to take the same example, but I try to use here the grepl with the
say word here ex, right. So, you can see here that this is here 1, this is here 2, and this is
here 3.
Now, in this case in 1, there is no ex, so the answer is coming out to be here FALSE.
And then there is here is an ex, so answer here is TRUE. And then there is here ex in 3,
so the answer will come out to be here TRUE. So, this is you can see here FALSE,
TRUE and TRUE, right.
21
813
So, now if you try to look at this answers, since because you have here TRUE at these
two places, so that is going to indicate that the ex is present in the string number 2 and 3
in the data vector str.
And in case if you try to do this operation on the R console also. So, you can see here
this is your here string and now you are simply trying to use here, see here grepl for this
string and you can see here this is here TRUE, FALSE, TRUE. And if you try to replace
here the word here R by here ex, we can see here this comes out to here FALSE, TRUE,
and TRUE.
Why? Because this R is here, here and here. So, it is giving you answer TRUE and
TRUE for the first and third. And there is no R in the second, so it is giving you here
FALSE. And then similarly, here for this ex in there is no ex in 1, there is ex in 2 and
there is ex in 3, so it is giving you here FALSE, TRUE and FALSE, right.
And this is here the screenshot of the same operation which I shown you, ok.
22
814
(Refer Slide Time: 29:47)
So, now I come to an end to this lecture. And I stop here. So, you can see here that in this
lecture, we had just learnt about two functions which are used for finding and
replacement, finding and substitution, and they had two more variants. So, you have to
understand that how these functions are going to work.
And it depends on you that what do you want, whether you want the terms of; whether
you want the result in terms of strings or say number or TRUE or FALSE, etcetera,
etcetera, based on that you can write the suitable function, with suitable options.
So, now I will request you as usual once again, that you please try to take some examples
and try to execute this command. They are very simple commands, but the main thing is
that you have to understand that how the output is going to be. If it is terms of numbers,
then what it is trying to indicate, and if it is terms of TRUE and FALSE, then what it is
trying to indicate in the original string.
So, you try to practice it. And I will see you in the next lecture.
23
815
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 41
Data Frames
Hello friends, welcome to the course Foundations of R Software. Now, from this lecture
we are going to begin with a new topic, this is about Data Frame. So, one natural
question that crops up is that, what is this data frame yeah ok. For a while we forget this
question and let me ask you a very simple question, have you ever worked in a
spreadsheet? Now, you will ask me what is a spreadsheet. For example: I believe that
most of us have worked in a software Microsoft Excel.
And usually, we try to create a file in common language that is called as an excel file,
right. If you try to recall how this excel file looks like, if you try to see there are some
values and these values are arranged in rows and columns and you can see there that the
values need not always be numerical values, they can be numerical values, character,
names, string, etc.
And if you try to see with this excel file you can do many types of operations, you can do
mathematical operations on those columns where you can do some such operations. You
can sort, you can order, you can do different types of queries.
For example, if I say suppose I have some students and then I have entered their marks
like physics, chemistry, maths and then their total, then if you want to know for example,
that how many students are there who have got say more than 70 percent marks in
mathematics and say less than 30 percent marks in chemistry.
This type of operation these types of queries, these types of questions you can generate
in this file and you can get an answer, right. So, whatever we are trying to do in the
spreadsheet which is the general name, generic name or in a simple words an excel file,
the same operations can be done in the R software also, but on what. So, whatever is the
spreadsheet or an excel file, that is the same thing which is called as data frame in the R
software.
1
816
So, data frame is a format in which we try to arrange the data in the rows and columns on
which we can do similar type of operations what we try to do in the spreadsheet or an
excel file, right. So, first let us try to understand these concepts and then I will try to
show you different operations and for this I will try to continue with this topic of data
frame today in the lecture and then in the next couple of lectures.
So, I will try to take up these topics one by one and I will request you that whatever type
of data file in MS Excel or any other equivalent software you have handled, try to look at
it and try to recall that what type of operations you were doing over there. And here in
this lecture and in the forthcoming lectures I will try to take up some common operations
what you usually do. So, let us begin our lecture and try to understand what is this data
frame.
So now, the first question comes here that whenever you are trying to handle with the
data sets, then there are different type of data sets and you are interested in combining
the data. So, in order to combine the data, you already have learned couple of topics like
a c, cbind, vector matrix etc. If you try to recall you use this concept of data vector,
where you would like to write the values as 1, 2, 3, 4 and then possibly here A and so on.
So, this is also going to combine that with data.
Similarly, you had cbind, rbind in which you used to combine the values similarly you
had vectors matrix etc. So, in all of them there was always some conditions; for example,
2
817
if you want to do the matrix manipulation then inside the matrix all the values have to be
numerical values. So, similar to these different functions which are used for combining
the data, another option is data frame, right. So, in this data frame what we try to do? We
try to combine the variables of equal length, with each row in the data frame containing
observation on the same unit.
So, what does this mean? If you try to see here suppose I try to write down here the
names of three subjects here, say physics, see here mathematics and here chemistry and
then after that we try to find out the total of these marks.
So now, you have here student number 1, student number 2, student number 3 and you
try to write down the marks in physics, mathematics and chemistry for the student
number 1 here, then you try to take the marks of student number 2 in physics, maths and
chemistry here. And for student number 3, you try to write down the marks in physics,
maths and chemistry here and then we try to find out their total.
So, this is what I am trying to say that number 1 the data is arranged in the rows and
columns and every variable which is here physics that will have same number of
observation in the rows. Because if you have say three students so there are going to be 3
marks in physics, 3 marks in mathematics and 3 marks in chemistry, right. So, this is
equivalent or this is similar to the matrix or cbind function, but the only thing is this in
the matrix if you want to do mathematical operation all the numerical values have to be
of the same type, right, say numerical, right.
But in data frame you can have different types of variable, which can be numeric as well
as character, I will try to show you, right. And one big advantage of using the concept of
data frame is that, without making the change in the data file you can make the changes
in the data.
3
818
(Refer Slide Time: 06:29)
So, you will not change your original data file and in case if you want to use only a part
of the data you can create such a data set without affecting the original data file, right.
And another very big advantage is that, in case of data frame you can combine various
types of objects, like as numerical values, character string as well as factors under the
setup of keta frame.
For example: if you try to see the commands which you have used earlier, you have
learnt earlier cbind and matrix they cannot be used to combine different types of data
values, right. So, this data frame is also a special type of R object which is especially
designed to handle the data sets, right. So, data from format is similar to spreadsheet
where the columns contain variable and observations are contained in rows like that,
right.
4
819
And data frame contain, they contain complete data that are mostly created with other
programs like as a spreadsheet-file, software like SPSS-files, Excel-files etc.
And the variables in the data frame they can be numeric or they can be categorical which
have the values. For example, numeric variables have the values in numbers whereas, the
categorical variable has the values and characters or say factors and so on.
So, now before we move forward let me try to request you that you please upload the
library MA double S in your software, that is inbuilt actually. So, we are going to use
this package here MASS, MA double S, right.
So, if you remember in the beginning we had talked about a built in package MASS and
this MASS was the, the name MASS is coming from the name of the book “Modern
Applied Statistics with S”. So, its MASS, actually that was the book which was written
to explain S plus software and that was written by Professors Venables and Ripley. So,
whatever the data sets were used in that books, they are combined in this package here
MASS, right.
5
820
(Refer Slide Time: 08:09)
So, now I will try to tell you here many operation through examples which I personally
believe that it is easier to learn, right. So, in this MASS package there is a data frame
whose name is painters; painters is all in lower case alphabet, this is available in the
library.
There are many data set we which are available in this package and actually it contains
all the data sets which have been used in the book, right. And then yeah and the second
thing what you have to keep in mind, that here I am trying to give you some idea that
how the data is going to look like, sometime the data is big.
So, it may not be possible for me to give you here the entire screenshot or the entire
detail or sometime even the screen is also going to the couple of screens actually. So,
whatever I am doing here, you please try to keep on doing on your computer also for a
better understanding, right. So, what I do here that first of all I try to load the package
here MASS, right and then in order to access this data frame painters, I simply type here
painters on the R console and you will get here this type of data set.
So, actually I can just briefly explain you before moving forward that what is this. So,
this data set is containing the information about some painters. And for example, this Da
Udine, Da Vinci, Del Piombo, Del Sarto etc. they are the names of different painters,
right. And their paintings were analyzed by some experts and then they have tried to give
6
821
some values to those things and those variables on which the data has been collected on
their work is like the composition, drawing, color, expression, school, etc.
For example: their school is something like there are different schools under which
people try to learn this art, like as painting or music etc. So, this is that school, right. So,
if you try to see here this Da Udine has a value of 10 units in the composition, then 8
value for the drawing, then 16 for the color, 3 for the expressions and the painter is
coming from a school A.
And similarly, if you try to see about Da Vinci, the value of the variable composition is
15, drawing is 16, color is 4, expression is 14 and the school here is A. Yeah, I would
like to very honestly accept that yeah, I do not know this much information about the
drawing and they say paintings etc. So, kindly excuse me if I try to misinterpret, I will
try my best not to do it, but yeah please do not expect that I know each and everything
about the painting. So, that is my very honest confession, right.
So, you can see here that in this case whatever are the names here, this one this one etc.
they are working as a row identification. Means if somebody ask me, what is the value of
the drawing for the painter Del Sarto. So, I will go here and I will try to find out here this
value, right. So, in this case you can see here that every row has got this type of
information, right.
7
822
So, now you can see here if you try to see on the R console actually, this will look like
this.
So, before moving forward let me try to show you these things on the R console.
So, if I try to see here, first I try to make a library mask and it is here like this, right. And
if I try to show you here the painters it is here like this, right. One thing you have to keep
in mind, that I have decreased the phone size so that I can show you most of the
operation on a single screen, right.
8
823
(Refer Slide Time: 11:53)
So, what I will request you that as I am moving forward you also try to do the same
operations on your computer, right.
Now, I try to take up here different types of operation what you can do in a usual
spreadsheet. So, it is going to be like that, that you are simply getting the file from some
external sources and you do not know what are the contents of the file. So, you would
like to find a different type of information by using the R functions from this file or from
this data frame. For example, in case if you want to find what are the row names in the
data frame painter, right, you have seen that in the names of the painters are given and
they are the identification marks.
So, for that you have a command here row names, row like as names and then within the
parenthesis you try to write down the name of the data frame. So, you can see here you
will get here this type of outcome. So, the first name is Da Udine, Da Vinci, Del Piambo
then Del Sarto etc. So, there are so many names and that is why if you try to see here, I
have given here that it is continued and that is what I meant that when said that it is not
possible for me to show you all the values here, right anyway.
9
824
(Refer Slide Time: 13:01)
So, this is the screenshot of the same operation. So, now it is clear that if you want to
find out the names of the row in any data frame, you simply have to use the command
rownames ok.
Now, the first question comes here, there are different types of variables here, right, you
can see here actually this composition its some there are some numbers whereas, in the
school there are some alphabet. But; obviously, you have to just imagine that this file is
not for you means that is inside some storage and you cannot just open it and can
10
825
determine that which of the variables are going to be or what is the structure of the file,
what is the behavior of the file, what is the type of the variable, etc.
So, suppose I want to know that what is the type of the variable. For example, you can
see here, that here as I shown you here these variables here composition, drawing, color,
expression you have here the numerical data, but for the school you have got here data
like A, B, C D etc. So, there are here four variables say composition, drawing, color and
expression, which have got some numerical values and there is here one variable here
school, which is F factor variable, right. But as such I do not know.
So, I would like to first understand how to get it done. So, now, I am going to tell you
two things, one thing you already know and the second thing is I will be explaining you
in detail in the forthcoming lecture, but here I am going to use it. Now, if you try to see
here, you have here a data frame, whose name is painters and this has got several
variables, like as composition, then drawing, etc. So, now, if you want to access only the
data of a particular variable, then how to get it done.
So, there are different ways, but one simple option is that you simply try to write down
the name of data frame and then put here a sign and then write down here the name of
variable. So, what will happen? That the data corresponding to this particular variable in
this data frame that will be attached and then you can do different types of operations
over that data value. It is something like this for example, a spreadsheet where you have
this type of column and suppose this column contains the marks in maths.
And you have some the marks here and you want to access only the marks in the
mathematics. So, how to get it done?
So, for that you try to write down the name of the data frame in which all these values
are stored and then try to write down the name of this variable and both of these, the
name of the data frame and the name of the variable they are joined together with the
dollar sign, right, this is the do double lar, dollar sign. It is like this S and then here two
vertical signs this is available on your keyboard, right and then keyboard it is written like
here like this.
So, suppose I want to see whether the variable whose name is school in this data frame
painters is it a numeric? So, I try to write down here our command is dot numeric and
11
826
within the parenthesis, I try to write down the name of the data frame and then here the
variable name and then I try to attach it with the dollar sign. So, you can see here this
comes out to be here for, right. And similarly if you want to know whether the data on
the drawing is numeric or not. So, you try to write down here painters, color, drawing
and then is dot numeric and it will come out to be true yeah.
Very important point what you have to keep in mind that whatever is the exact name of
the variable, that you have to give here. For example, you can see here that in the
variable here is school, this capital S and in the word here drawing this capital D, they
are in the capital letters, right. So, that is what you have to keep in mind and similarly if
you try to take here the variable here is school. Now, because you have seen here that
this school is coming out to be non numeric. So, now, you would like to check is it a
factor.
So, if you try to write down is dot factor and within parentheses you write painters dollar
school it will come out to be true.
And similarly, if you try to see here you have seen here is numeric painters drawing, this
is here true, but now if you want to know is dot factor this painters dollar drawing. That
means, you want to know whether the variable under drawing and it data values is it a
factor answer comes out to be here false, right.
12
827
(Refer Slide Time: 17:25)
So, let me try to show you these things on the R console, but before that as you have
learnt about the row names. So, similarly we have here one more function here for
finding the column names, right.
And if you try to operate it you see here this is here composition, drawing, color,
expression, school, it comes out to be here like this. So, let me try to show you these
operations on the R console and then we try to obtain. You can see here it is here like
this. So, you can see here these are the names of the columns and these are the names of
here row, right.
13
828
So, what I try to do here that, I simply try to find out the row names, row names of
painters it is here like this you can see here, right.
And similarly, if I try to find out here the column name. So, I have to take here colnames
all in lower case alphabets and you can see here it is here like this, right. Now, here I
would like to explain you the simple way by which you will not make the mistake when
you are trying to address a particular variable. Suppose I want to know is dot numeric
now what? Suppose I want to know about this school. So, now, the question here is there
is always a possibility that you may make a mistake in writing the school, beginning the
lower case alphabet.
So, the best option is that which I suggest is that you try to copy this name of this data
frame so that there is no mistake in making a mistake, then you put here the dollar sign
and then whatever is the name of the variable exactly you just try to copy it, leaving the
double quotes and you try to write down here. So, this will ensure that you are not
making any mistake in typing or reproducing the name of the variable, right.
So, you can see here this comes out to be here false, right. And similarly, if you see here
is a factor variable, you can see here it will come out to be here true, right. And similarly,
if you try to look for this expression here the same thing. So, if I try to see here is this a
factor variable, answer comes out to be here false, right. But if I try to see here is it
14
829
character false, then I would like to know is it numeric, is dot numeric and the same
expression and this comes out to be here true.
So, similarly you can make here different types of such operations and which will give
you more insight about the structure of the data frame, right. So, and yeah if you so now,
you have seen here these operations.
Now, I want to give you here just one example that how the data operations on this data
frames is going to help us. For example, we are going to learn about this concept at a
later stage and we have a here function summary s u m m a r y. So, this summary
function gives us the information about or a quick overview about the minimum value,
maximum value, 1st quartile, 2nd quartile, mean, 3rd quartile.
So, if you try to see here if you try to use here this summary painters s u m m r y and
then within parenthesis you try to write down the name of the data frame.
Then all information about all the variables will be available here in a single command.
For example, this is for here composition this is for here drawing, this is for here color,
this is for here expression and so on, right. And yeah one thing you have to here that
these are the categories here A, D, E, G, B, C and others are here actually A. So, there
are more categories, but only so it is showing here as say other.
15
830
So, at this moment you do not have to think about these thing, but the main thing is I
want to show you that, if you want to have the information or any statistical or
mathematical operation over all the variables, yeah if that is applicable. Then you can
use the concept of data frame and that will help you in giving you a quick answer, right.
So, this is here the screenshot of the same thing and yeah if you want to do it for the
individual variables also you can also do it.
So, let me try to show you here it on the R console. So, you can see here that you have
all these variables by the names here painters dot expression etc.
16
831
So, if I try to see here what are your here columns of here painters, right. So, that will
help us. So, now, if you want to make the this summary command for this expression.
So, I have to write down here painters, dollar, expression and if I say here enter it will
give me the first value.
Similarly, if I want to use it for here see here school. So, one by one I can do it like this,
but in case if I try to do it over the whole data frame and I simply write down here
summary and then within parenthesis the name of the data frame, you can see here that it
will give you all the values like this here, right.
So, this is what I meant ok. So, I think let me stop in this lecture and my idea in this
lecture was simply to introduce you with the concept of data frame. So, that you can
settle it in your mind and try to do this very small operation. And in the next lecture I
will try to give you some more commands here which are going to really help you in
making different types of operation in the R software.
So, you try to have a look on the concept of data frame and try to practice with this very
simple command that how can you be comfortable when we are trying to meet in the
next lecture. And believe me once you are comfortable in the next lecture, I am going to
repeat this command I am going to use this command. So, you try to practice and I will
see in the next lecture. Till then goodbye.
17
832
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 42
Data frames: Creations and Operation
Hello friend, welcome to the course foundations of R software and you can recall that in
the last lecture we initiated a discussion on the Data frame. So, on this topic we had
learned that what is this with data frame. And, in short means I can say, the type of file
that you try to create from Microsoft Excel and which you popularly call as MS Excel or
a Spreadsheet. This is equivalent to the concept of data frame in R software.
So, now as you know that in a spreadsheet you try to make different types of operations
and so, our objective is that we want to learn that how we can make the same type of
operation in the R software also. So, in this lecture today, we are going to continue on
the topic of data frame and first we are trying to learn that how you can create a data
frame and then how you can do different types of operations on it, right.
And, as, we have used the package MASS in the last lecture and we had used the data set
painters in the last lecture. So, we are going to continue to use the same and the same
data set in this lecture also for illustrating different type of operations.
833
So, let us begin our lecture, ok. So, if you can recall that we had used the library mass to
upload the package mass MA double S and after that we have used the data set p a i n t e
r s painters, which had different types of column names like as composition, drawing,
colour expression, a school and there were different row names which are the names of
different painters, right.
So, this is only a small piece excepts of that data set. And, it will look like this when you
try to look it on the R console, ok. Now, the first question comes here that whenever you
are going to get a file from some external sources you do not know is it data frame or
something else, right.
So, if you want to test that whether the file which you have obtained is a data frame or
not. So, for that we use the command here is dot d a t a dot f r a m e, right. So, you can
see here that, there are two dots here or two full stops here, one after is and another after
theta. So, is this is dot theta dot frame, right. So, it is equivalent to like as other
commands which you have used in the past like is dot numeric is dot character etc.
So, in order to use it you have to write this command and inside the parentheses you
have to write down the name. So, for example, we want to test here whether this data set
painters is a data frame or not, so I try to write down here p a i n t e r s all in lower case
alphabets and it comes out to be here true, right.
834
So, this means, this is a theta set which is in the format of data frame, right. Now, before
I move forward in order to explain about different functions related to the data frame the
first question comes here, how you can create a data frame?
For example, here I am taking the example which is based on the built-in data frame, but
suppose you have got different data set and if you want to create data frame, how you
can do it?, right. So, for that we have a command here theta dot frame d a t a dot f r a m
e. So, this is a function which is used to create the data frame by adding column vectors
to the data frame. So, suppose if I try to take here 3 different types of objects.
Say, I try to take here the numbers 1 to 16 and I store them in the variable x then I try to
take here a matrix based on this 16 values and I try to arrange them in the 4 by 4 matrix,
in which there are 4 rows and 4 columns and this is here y and then I try to take the
alphabets, lower case alphabets from a to up to here p which are indexed from 1 to 16
and I try to store this under the variable name here z. So, you can see here x is like here
like this 1 to 16 numbers.
835
(Refer Slide Time: 05:09)
Then y here is this matrix and z here is the 16 alphabets. And, now, I try to create here a
theta frame of x, y, z. So, what you have to see is that, I write down here d a t a dot f r a
m e and inside the parentheses I try to write down all the variables separated by comma,
right. So, now you can observe what is really happening. So, this is your here x and this
is your here matrix which is your here y and this is your here z.
So, if you try to see here x, it is the set of numbers from 1 to 16 and this is exactly the
same what you have given here, right. Similarly, if you try to see here this here z, this is
here the set of lower-case alphabets from a to p and this is the same set which we have
taken here under z.
Now, if you try to look here at this matrix here y. So, this is here like this in which you
can see the numbers from 1 to 16 are arranged column wise, from 1 2 3 4 and then it
goes here, then 5 6 7 8, then come here then 9 10 11 12 and then come here then 13 14
15 16.
Now, if you try to see what is happening in this data frame? So, in this data frame, if you
try to see here first, I am trying to make here a block; this is here your matrix y. But,
after this if you try to see this matrix is repeated here 4 times, because in the data frame
the number of data values for each of the column remain the same. So, what it has done
that there were 4 observation in a row 1 2 3 4.
836
And then, in another column there are 16 observations. So, it has repeated the matrix y 4
times here, right. So, that is what is happening here, ok. So, that is what you have to be
watchful when you are trying to create different data frames, right.
Because, in data frame there is a condition that the number of elements in every column
should be the same and even the number of elements in the rows for different row names
could be the same. Otherwise, R will try to adjust it automatically and the way it is trying
to adjust that is the thing which we have to understand, right.
So, you have to be extremely careful when you are trying to create that data frame. So,
you can see here this is the screenshot of the same outcome.
837
And this is here the data frame that we have just created, right. So, let me try to show
you these operations first on the R software and then I try to move forward, right. So, I
try to create here this, 3 variables here. So, you can see here x is here like this, y here is
like this and z here is like this, right.
And then, I try to create here a data frame and I try to store it in a name d a t a f r. So,
data frame and then I try to write down here x, y and here z, right. And, then if you try to
see, what is this value? It is here like this. So, that is exactly the same thing which I
shown you on this screenshot also, right here, ok.
838
So, now after this, I come to another aspect and we try to consider some more operation
which can be done on our data frame.
So, for that, I am just going to use data set on painters. So, first question I try to address
here is that if that, how can you see the structure of the data?, right. The structure of the
data frame. For example, if you want to know what type of variables are there? How
many observations are there, etc., in a given data frame so the command here is str. str is
the short form structure. So, this will become here str and inside the parentheses you
have to write here painters p a i n t e r s and you will get here this type of output.
If you try to see what is telling you here, the first line is indicating that this painters is a
data frame and it has 54 observations on 5 variables, right. And then, it is trying to give
you the name of those variables which are here composition drawing, colour, expression
and a school. And then, after that it is trying to give you the type of observation these
variables are having. For example, it is written here i n t. So, i n t means here integer,
right.
So, you can see here the data on the composition is integer, the data on drawing, colour,
expression is integer, but the data on the school is factor. And, if you try to see this is
exactly what we had learnt in the last lecture also, right. And after that, it is trying to give
you briefly the some values of the data. So, that you can understand that what type of
data is available, right. So, this command str will give you the structure of the data
frame.
839
And if you can see here this is the screenshot of the same operation, ok.
After this, if you want to extract a variable from a data set how to get it done? And, if
you try to recall that we had very briefly discussed this operation in the last in lecture
also, but here I would now like to give you a formal explanation. So, the rule is very
simple; whenever you want to extract a particular variable from a data frame, you have
to write down the name of the data frame.
And then, you have to write down here the sign, colour operator and after that you have
to write down the variable name. That is all, as simple as that. For example, if you want
to extract the variable say school from the data set or data frame painters, you have to
simply write down here the name of the data frame p a i n t e r s and then dollar operator
and then a school. And then, yeah, you have to keep in mind that this variable name has
to be exactly in the same way as it is mentioned, right.
So, if you really want to know what is the exact spelling, best is to first use the operator
column names and c o l m n a m e s to find out the column names and then try to use
these names over here, right. So, you can see here these are the values that are stored in
this variable and this is here the screenshot of the same operation, ok.
840
(Refer Slide Time: 12:39)
And, now, in case if you want to extract a particular data set from a data frame, then how
to get it done?
So, this operation is something like as you have done a similar operation in the matrix,
that when you wanted to access a particular element in the matrix, a particular value
which is located at a particular row and a particular column inside the matrix, right. So,
in the. So, similarly in the case of data frame also you have to write down the square
brackets and within the square brackets you have to write the address of the row and the
address of the column. And which are separated by comma, right.
And then, you try to enter. So, it will try to give you the value which is available at that
particular row and particular columns intersection, right. So, suppose you want to extract
the information on the first painter which is here Da Udine and you want to see the value
of the composition. So, if you try to recall this Da Udine, it is in the row and composition
is here in the column.
So, if you try to write down here the name of the data frame as p a i n t e r s and then,
inside the square brackets try to write down this row name here. So, now, this is here a
name not a number. So, you have to write down the name in the double quotes Da Udine
and yeah, that is the same name which is given in the data set. And then you have to
write down the name of the column which is here composition.
841
And if you try to see here, it will come out of here like this 10. And this, you can see
here this is the screenshot and if you really want to see you can see here in this one, this
was the value here if you try to see like this. So, this was here like this. So, this is here
the row name Da Udine and then it is here composition column and its value here is 10.
So, that is exactly I have shown you here.
But without going into the screenshot, I have shown you with the R commands, right.
And, the advantage of finding out these variable is that, you can use them just like any
other variable. For example, if you want to create a plots and graphic on such variable
for example, if you simply want to create a bar plot, although we have not discussed it
we will try to discuss different types of graphics soon in the forthcoming lectures.
But this is the command here that you have to write down here the command table t a b l
e and inside the parentheses you have to write the name of the variable. So, now, you can
see here the name of the variable comes out to be like this painters dollar a school and
then you can create the bar plot here like this. And similarly, if you want to create a pie
chart for that we will try to discuss it in more detail in the forthcoming lecture.
But you can use here the command here pie and then the same table and the variable
name. So, you can see here you get the pie chart also, right. So, that is the advantage of
10
842
using such names, but now before going into more detail, let me try to first show you
these operations on the R console, right.
So, first of all you have to upload here the package. So, I try to use here the library mass.
And then, I if you try to see here the painter here is like this, right. So, you can see here
this is the name of the Da Udine and this is here the name of the variable composition
and its value here is 10, which I just shown you, right.
11
843
But, suppose if you do not know what is this data frame and if you want to know the
structure of this painters. So, you can see here this will give you this is the data frame
which has got 54 observations on 5 variables and so on and it has all such information,
right.
And similarly, after that if you want to extract here a school. So, as I said the best
approach is that you try to first see the column names, otherwise, we can always make a
mistake, right. So, now, if you want to see the, what is the data on any variable what you
have to do here, you simply have to write down the name of the this data frame which is
here painter.
And then, you can write down here, say here composition if I try to see here, right. So,
composition here you can see here like this, right. And, similarly if you try to see here
for a school if you simply write down here painters and your school you get here like
this, right.
And similarly, if you want to create here, suppose here some diagrams on it. So, I can
use here the see here this command here bar plot, on here schools. So, this will be table
and then here this bar plot and if you try to see here you get here this type of curve, right.
And similarly, if you try to use here the pie chart.
12
844
(Refer Slide Time: 18:17)
So, we can see here you will get here this type of graphic. So, you can see here that
doing such operation is not very difficult. And similarly, if you want to obtain any
particular observation from here also.
So, you can see here, suppose I want to have the data on the Da Udine and composition, I
want to know the value it is here like this. Similarly, if you want to know what is the
value of a school, you can see here this will come out to be here like this, right. And if
13
845
you want to see here this value, you can see here you can see here this is here the value
of the school here in Da Udine and the composition value here is 2.
Similarly, if you want to know this about this painter Guilio Romano and if you want to
know the value of his drawing. So, this is here you can see here 16. So, I try to use here.
14
846
And I try to write down here the name and then I try to write down here the drawing. So,
you can see here this value comes out of here 16 as such what you have obtained there,
right.
So, you can see here that obtaining this type of information is not very difficult.
And similarly, you can do different type of say this calculations on this extracted
variables also, right. So, now, we come to an end to this lecture and you can see that this
was also a short lecture and I have taken very small number of commands here; because
my objective is that, I want to give you some time. So, that you can settle down these
concepts in your mind.
And you can have some time to practice these commands. So, gradually I will try to
build up more commands in the next lecture. So, I would request you that, why do not
you take up any data set which you have created in any in the format of any spreadsheet
or in your MS Excel and try to see how you can get the same information in the R
software also, right. So, you try to practice and I will see you in the next lecture till then,
goodbye.
15
847
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 43
Data Frames: Some More Operation
Hello friend, welcome to the course Foundations of R Software. And, you can recall that
in the last two last couple of lectures, we had talked about the Data frames, right. So,
now, in this lecture also we are going to continue with the same topic. And as we had
learnt some basic operations in the data frame to extract different types of information in
the last lecture.
So, similarly in this lecture also, we will try to learn some more concepts some more
commands to do the common operations in the data frame. And for that we are going to
once again use the package mass in which there is a data frame called painters. So, now,
in this lecture, we are going to learn about some more operations. So, why not to take
those examples and try to understand how to execute them? So, let us begin our lecture,
right.
So, if you try to recall we were using the data set painters. So, for that you please upload
your library by using the command library mass and then you can see this was the data
848
frame painters which has five variables; composition, drawing, color, expression,
schools. And then in the rows there are names of different painters.
So, it looks like this, right. So, now you are familiar with this data set.
So, now one operation which we learnt in the last lecture was that if you want to extract a
particular variable from a particular data frame then how to do it? So, for that, we had
learnt that a you would try to use the command that you would use the name of the data
849
frame and then dollar command and then variable name. And then the advantage of using
this was that now this works just like single variable and after that you can do any types
of operation.
For example, if you want to find out the mean or sum or anything of those numerical
values stored in a variable in a data frame then you can do it directly here, right. So, for
example, we had used the command summary. And summary will give us like a mean
first quartile, second quartile, third quartile, etc. for the quantitative variables. And then
it will try to give you a sort of frequency distribution for the categorical variable right
factor variable.
So, now, suppose if I try to take an example that how are you going to use this command
over a variable then it is like this you will try to write down here the summary. And then
you will try to write down here the painters dollar school, right. Now, similarly if you
want to find out the sum of any value, so you will try to write down here the painters and
then whatever is the name of the variable that you will try to write down here.
So, now, in this process if you try to see what essentially do you want you want to know
the summary or the mean of a particular variable. And in order to access that variable
you have to first write down the name of the data frame and then the name of the
variable and both have to be joined by a dollar sign. So, basically in order to access a
particular variable, first you have to write down the name of the data frame. So, now
there is an alternative command in the R software to do such operations in which you do
not need to write again and again the name of the data frame, right.
850
So, for that we try to use the concept of attach and we try to attach the data frame once
you try to attach a data frame. Then all the variables whatever are contained in that data
frame they can be accessed directly by their names, right. So, if you try to write down
here the command like a t t a c h which is the function for attaching, then all in lower
case here a double t a c h. And then inside the parentheses you try to write down the
name of the data frame.
Then after that if you try to execute any command then you will not need to write down
the name of the data frame followed by the dollar sign, right. So, for example, means
once you try to attach the data frame here this painters just by writing a t t a c h all in
lower case alphabets and the inside the parenthesis painters. Then in order to access any
variable you need not to use this painters and dollar you can directly access them by their
names for example, column names or the names of the variables in the painters were are
composition drawing color expression school.
So, earlier, if you wanted to access the data on the variable school or expression you
would like write this painters dollar school or painters dollar expression, right.
But after this, you can see here that without writing the painters dollar, you can if you
want to find out the same command that summary of a school, you will get here this
option. And similarly, if you try to write down here summary of the variable composition
851
then you need not to write down the painters dollar. So, actually when you are trying to
work with a single database for a very long time then actually these type of commands
are really useful, right.
So, you can see here although I am not saying that you cannot do to writing the name of
the data frame and the dollar. So, and there is no difference between the two also, but it
is just more convenient, right.
And once you have done the job means once you have work. So, as you have use the
command attach then you also have to use the command detach d e t a c h, right. So,
once you use here the command here d e t a c h and inside the parentheses you have to
write down the name of the data frame. Then, it will go back to earlier and after that if
you want to call a variable then you have to use once again like painters dollar and the
variable name, right.
For example, if I try to show you here suppose I execute here detach painters and then I
try to now give you here once again summary school it says that error in summary
objective school not found whereas, just before that you are trying to find out this
summary without any problem. So, this is the way by which we can write down these
things. So, let me try to first show you this operation on the R console. So, that you
become more confident and then I will try to show you here right, ok.
852
(Refer Slide Time: 07:41)
So, now I first try to upload here this package here MASS. So, this is here. Now, this and
then you can see here this is your here data frame painter.
So, you have here different types of variable here composition drawing etc. and so on,
right.
853
(Refer Slide Time: 08:01)
So, now in case if you try to execute here, this command here say here summary;
summary of here the painters, painters and after that you try to here take here school. So,
you can see here this we are come like this, right. And, if you try to see the data in this
painters dot school here is like this, right. So, similarly let me try to take here one more
example.
So, you can see here the data in the drawing is here like this, right. And, suppose I want
to find out the arithmetic mean of this data. So, I simply try to write down here mean, but
7
854
then I have to write down here painters dollar and then drawing, right. So, it will give
you here this value right now I try to use here the command attach right and you can see
here what happens ok. Let me try to just, repeat this option here also so, that you can
compare it very easily, right. So, now, I try to use here attach and then painters, right.
Now, I try to find out simply here the summary of this painters, dollar, school just by
writing here summary and inside parenthesis I will write simply the name of the variable.
We can see here you get here the same outcome this outcome is the same as this outcome
and now in case if you want to find out here the mean of this drawing. So, I can write
down here mean and simply here drawing, right. So, you can see here this is the same
value which we had obtained here. So, that is the advantage right, ok.
So, now let me try to give you idea about some more operation yeah, but let me to also
try to show you the about detach. So, now, I try to use here detach painters. And now
you will see that in the same screen, now if you try to find out here mean or drawing it
says there is error. And if you want to find out the summary of a school it says error,
right.
But, just before that it was working and if you try to find out here the mean by writing
the painters dollar school or summary you can find out easily here. So, that is what is
happening, right.
855
(Refer Slide Time: 10:43)
So, now, let me try to show you here one more operation that is about how will you try to
extract a subset from the data frame, right. For example, if I try to say here I have here
different datas arranged in rows and columns like this. And suppose you want to just
select some particular data set out of this whole data frame, right. So, subset of the data
frame can be obtained by using the command here subset s u b s e t, right. And, inside
the parenthesis you have to write down the name of the data frame and then you have to
write down the condition that what type of data frame do you want, right.
For example, you have seen that when you were trying to see the data on the school the
school had different category like A, B, C, D, E, F etc., right. So, if you want to have the
subset of those painters who are coming from the school F, then how to get it done? Ok.
So, let me try to first show you that how this data will look like on the R software. So,
that you are you can understand it better, right.
856
So, if I try to write down here this is here painters, dollar, say school, right. So, you can
see here that there are here four such painters who are coming from the school whose
name is or categorized by F.
And if you try to see here if you try to see here, these are the four painters you can see
here whose schools are here F in the last column like here, here, here and here. And out
of this bigger data set you simply want to extract this only part of the data set where the
school is equal to F. Now, you have to use your earlier knowledge right the first question
comes here that how are you going to indicate that schools are going to be equal to only
this F. So, F is a factor variable.
So, this F is a character. So, I try to write down here the name of the variable is school.
And then double equality sign double equality sign you know this is the logical equal and
then within course I try to write down here the name of the character that is here capital
F. So, I try to use here the command subset and then I try to write down here the name of
the data frame painters. And after that, I have to write down the condition this condition
can be anything for example, if you want to know that how many painters are there
whose drawing values are say more than 10, etcetera.
So, this type of data set you want to extract from the bigger data frame painters. So, once
you try to execute it you will now see here this type of outcome, right. So, you can see
10
857
here that we have here these four schools and this is the same data set which you saw on
the R console also, right.
And similarly there is a another outcome if you try to recall you had done a command
like something like x, x is equal to say here F type of thing. So, the outcome of this
variable x in which the values are equal to here F means they will they will be obtained
in terms of true and false. And then you try to write down here x and inside the square
bracket, if you try to write this condition then whatever are the values in the x for which
the value is coming out to be true they are also reported.
So, if you want to use that type of command also. So, there is an alternative command
that you try to write down here this painters and within the double quotes you try to write
down the name of the variable that is here is school and it is logically equal to F. And F
is going to be inside the double quotes and then you try to write that what are the values
under this condition which have the value true and inside the data vector inside the data
frame painters, right. So, it will also give you the same value here, right. So, here also
you will get the same outcome.
11
858
(Refer Slide Time: 15:15)
So, and similarly if you want to have a subset of those painters whose composition
values are less than or equal to 6. So, you can see here this is your here composition
variable and all those painters whose composition values are smaller than or equal to 6
they will be reported here. So, you can see here now you can means extract different type
of data sets.
So, why not to first means execute these operations on the R console, right. So, firstly, let
me try to use here this command here subset and let us try to see how it operates. So, you
can see here with the subset command we are going to extract this value of data set
Durer, Holbein, Pourbus and Van Leyden, right.
12
859
So, if you try to see here I am simply writing here subset and then painters, school is
equal to F. And I have got a same data set Durer, Holbein Pourbus and Van Leyden,
right.
Similarly, if you want to find out here those painters whose school is here A, you can
write down here like this, right. And if you want to see the other alternative which I
shown you here if you try to use it here you here get the same outcome.
13
860
And similarly, if you want to take it here the school here to be A, you get here the same
outcome. So, these are different ways. So, you know that when you are trying to work in
the programming in order to do the same thing there are different ways actually.
And similarly, if you want to have the subset of that data set in which the composition
has a value which is less than or equal to 6, then you can do like this. And similarly, if
you want to have it on the on here drawing is more than 10. So, you can change this
command here, and I can write down here drawing is greater than suppose here 10.
14
861
And you can see here you get here. All these data set right means every drawing is more
than 10, right.
And similarly, if you want to have here all those painters whose drawing value is smaller
than ten less than 10. So, it is here like this. So, you can see here that you can make here
different types of such operations, right. So, now, let me come back to our slides and let
us try to understand one more command ok.
Now, I try to explain you here one more operation in the data frame suppose you have
got a data frame where there are several columns and you want to create here a subset in
which you want to ignore or you want to eliminate some of the columns for that we have
15
862
a command here select s e l e c t. So, for example, if you want to have a subset from this
painters whose school is equal to here F, but now earlier you had shown that this has
here all the values here you can see here like this composition, drawing, color,
expression and school.
So, here you can see here this composition is here column number 1, drawing here is
column number 2, color here is column number 3, expression here is column number 4,
and the school here is column number 5, right. So, what I try to do here? Suppose I want
to remove or eliminate the column number 3 and column number 5. So, under this option
select s e l e c t all in lower case alphabet equal to I will try to write down the column
numbers and then I will use here the minus sign.
So, as soon as I use here the minus sign that will inform the R ok, that these columns
have to be eliminated from the subset, right. So, for example, if I try to take here -3 and -
5, so I try to use here the data vector. So, what will happen in my outcome? This third
and fifth column are going to be eliminated and you will get here this type of outcome.
So, you can see here that in your means earlier this 3 and here 5. They are going to be
eliminated. So, you have only here composition drawing and expression and if you try to
see here, here you have got here composition drawing and here expression and this is
here the screenshot. So, let me try to show you this operation on the R console also.
So, if you try to see here I try to take here like this. So, you can see here that here the
third and fifth column are eliminated. And if you want to see here what are those
16
863
columns you can simply see here there are composition, drawing, color, expression,
school. And then we have here only composition, drawing and expression and the
column number 3 for the color and column number 5 for the school they are eliminated
here.
And yeah this data can be stored in a new variable and then you can do different types of
operation on that data frame, we can see here that you can make here different types of
such operations, right. So, now, let me come back to our slides and let us try to
understand one more command.
So, now if you try to see suppose let me try to take the same example here and yeah first
let me try to show it on the R console itself so, that you can understand it better, right.
17
864
So, if you try to see here in these data set painters you can see here this is here is school.
And these many values they are corresponding to when the school is A, and then the
there are these many values which are highlighted on the screen which are corresponding
to school B.
18
865
(Refer Slide Time: 21:39)
Then F and then here G. And then can here H finally, right. So, you want to split the data
at all these point wherever the school is changing or you can actually split at any other
value also depending on the variable. So, now how to split the data is the question that
we are going to answer. So, in order to split the data set by a value that is corresponding
to any specific variable that can be done by the command split and we always prefer to
have this splitting preferably for a factor variable.
19
866
So, for example, I have shown you for this school this is the factor variable and the
values are like A, B, C, D, E, F, G, H, right. So, let us try to see how it will look like
because this will continue to a couple of screen.
So, that is why I have used here the commands here is split and then whatever is the
outcome I have stored here in this variable is splitted. So, I try to command here say
split. And then I try to write down here the name of the data frame say painters and then
comma. And then I try to write down here the name of the variable by which I want to
partition it, ok.
So, now if you try to execute this command you get here this type of outcome that is
going to continue in the couple of screens. So, I have simply copied it here, and I will try
to show it on the R console also. So, the value of the split will be the first value here will
be dollar A. So, it is trying to indicate that this is the first variable by which the
partitioning has been done. So, you can see here in this case all the schools have got the
value A. So, that is indicated by this here dollar A.
And then entire data set which is corresponding to this school A, this is stored here and
after that this is here is say dollar B and whatever is the data corresponding to school B,
this is stored here like this.
20
867
(Refer Slide Time: 24:01)
And similarly, if you try to continue, this is here is school C. And this data corresponds
to school C. And this is indicated here by here dollar C. And similarly, we have here
dollar D. And then school Ds are here and the entire data which is here which is
corresponding to school D, right.
And similarly, we have here school E and here is school F. And then their data is
indicated by dollar E and here dollar F. And this is the data which is here like this for E
and this is the data for here F.
21
868
(Refer Slide Time: 24:35)
And similarly, for the G and H also this is here the data corresponding to G. And then
corresponding to here H you can see here like this, right. So, in yeah in case if you attach
you can simply use here the variable name and if you do not attach the data frame then
you have to use the name of the variables using the name of the data frame and the
variable, right.
And yeah you can see here this is how the screenshot will look like this is your here A
this is your here B, this is your here C, this is your here D, this is your here E, this is your
22
869
here F, this is your here G, and this is your here H yeah definitely. If I have to compile
all the things in a single screen the phone size is going to be (Refer Time: 25:27) smaller.
And then, yeah you have to just reduce your phone size. So, I will request you that you
try to do the same operation yourself on your computer and try to see how the values are
going to be, right. Now, the next question comes that whatever is the outcome of this
splitted value number 1, how are you going to access only a subset based on this
splitting.
So, if you simply try to write down here the name of the and followed by dollar sign and
then here the name of the factor variable under which the data has been splitted; for
example, if you want to have this data here which is for here A like this one here. So,
you simply have to write down here this dollar A along with this splitted sign. So, if you
try to see here you are I am simply trying to write down here splitted dollar A.
So, that will give me the data, and if you want to see whether the splitted value are they
also data frames; for example, you have a splitted dollar a splitted dollar B, etcetera, etc.
up to splitted dollar H, right. These are the partitions or these are the values which are
obtained by splitting the data frame with respect to the variably school A to H, then are
these partition value are data frame?
23
870
So, if you try to see here I use here the command is dot data frame and inside the
parenthesis I splitted dollar A, and this gives me answer true so; that means, whatever
splitting you have done right that is also a data frame, right. So, let me try to do this
example on the R console and try to show you that how do you get here this type of
outcome, ok.
So, if you try to see here as soon as I try to execute this split the split command s p l i t
all in lowercase alphabets with the name of the data frame and the name of the variable
painters, dollar, school, then you can see here this is going to be like this. And yeah
definitely it is going to continue in a couple of screen. So, you can see here this is here
dollar A.
24
871
This is here dollar B.
Then dollar D.
25
872
(Refer Slide Time: 27:53)
26
873
(Refer Slide Time: 28:01)
And then dollar G, and then here dollar H. So, all this dollar A, B, C, D, E, F, G, H they
are going to indicate the sort of set of the data frame where school is equal to, B, C, D, E,
F, G, H respectively, right.
So, if you try to see here if you want to see what is the data under a. So, you simply have
to write down here splitted A and you will get here this data. So, this is how you can
access the different partition value splitted B.
27
874
(Refer Slide Time: 28:27)
So, this will give you here the data set under the splitted B. And similarly, if you try to
write down here splitted dollar C which is the name of the variable under which you
have stored these values. So, it will actually work like this.
And if you want to see here whether these values say either A, B or C whatever it is it a
data frame. So, you try to write down here is dot data dot frame and then you will see
here this is here true, right. So, then it is whether it is C or B or here a this is here true
28
875
right and if you try to see here is it a numeric is dot numeric for example, it will say here
false.
So, you can now see here that whatever partitions you have obtained by splitting the data
set with respect to the variable school they are also data frames. So, now, once they are
the data frame then you can execute all the commands whatever you have learnt under
the data frame on these data sets also and for that you can save them in different values.
For example, if I say here like this suppose if I want to save here the values of say school
C.
So, I can say here value say here c and now you can see here the value c here is like this
and after that yeah means you can make different types of operation whatever you want
on it right ok. So, now, we come to an end to this lecture and then you have seen that ok
this was a also a pretty simple lecture. And I have tried my best to explain you that how
the things are functioning when you are trying to deal with the data frame.
And this is the most important part in any programming that you have to understand that
how the software is working and how it is giving you the output and what is the nature
and behavior of the output. So, that will help you in writing the correct program for
executing what you want. So, I would now request you once again that you please try to
take a data frame or try to see or try to look into your older some excel file or
spreadsheet and whatever operation you were trying to do there try to do it in the R
software also.
Yes, there is a question that how can you read the data from the excel file or any other
software in the in inside the R software, that I will try to tell you after say one more
lecture on that data frame. So, you try to practice it and try to understand how these
commands are working and I will see you in the next lecture, till then goodbye.
29
876
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 44
Data frames: Combining and Merging
Hello friends, welcome to the course Foundations of R Software. Now, in this lecture we
are once again going to consider the topic of Data frame and we are going to learn that
how you can combine different data frames in a single data frame. Now, it is possible
that suppose you are getting some data frame from different sources and you need to
merge them together.
Now, this combining or merging that can be done in different ways, the first option is
this whatever the data frame suppose 1 comes here, 2 comes here and then both are
merged horizontally. Second option is data 1 comes here, data frame 2 comes here and
both are merged vertically.
And third option is that in case if the two data frame they have some variable in common
then possibly what would you like to have? You would you would not like to have that
the same variable should occur in both the data frames when you are trying to combine
it.
So, you would like to use it only once and if there are more than one such variables or
you have a choice that you want to combine the data set with respect to which variable
that has to be controlled. So, these are different types of questions which arise when you
are trying to combine different data sets or data frame. So, these are the type of questions
which we are going to understand in the lecture today.
And after this I will stop with the topic of data frame also. Well, as I said in the last
lecture towards the end that ok, in data frame there are many many operations and it is
not practically possible for me to cover here all the operation. But I have chosen some
common operation to give you feeling to give you an idea that how the spreadsheets can
be handled in the R software, but now the success of using the data frame and its related
topics, its command etc. that depends completely on you.
877
(Refer Slide Time: 02:31)
So, we today begin our lecture and we try to take some examples to explain that how you
can combine the data. So, you see, when you want to combine the data frames then there
are three possible ways and for that we have three different commands, one is that you
want to combine the two data frames with respect to their columns and the two data
frames are side by side just like this. And second option is that if both the data frames
have got variable which is common between the two then you would like to join them
here like this.
See, whatever is the common variable here that comes over here, right. So, the common
variable does not occur in two data frame, but it appears only once. And third option is
that you can simply stack them together over each other, right and such that one is
appending other. So, it is like this one, right. So, now, how to get it done and what are
the commands that is what we have to understand.
So, when you want to combine the two data frames side by side horizontally then our
command is c bind c b i n d all in lower case alphabets, right and within parentheses we
try to give the name of the data frame with some other options. Similarly, when you want
to combine the two data frames using a common column then our command is merge, m
e r g e all in a lower case alphabets and within the brackets within the parentheses you
have to write down the name of the data frames and then you have to give other options,
right.
2
878
Similarly, when you want to combine the data frames such that they are stacking over
each other like this one in a vertical direction I would say then the command here is
rbind r b i n d all in lower case alphabet.
So, now we try to take here some examples and through those examples I try to explain
you the application of these three commands, right. So, first I try to take the here
command c bind which horizontally merges two data frames side by side, right. I am
clicking here example of two data frame, but you can take actually more. So, that is why
I am saying here two. So, for that, what we try to do?
We simply try to create here two data frames and so that we can understand that what is
really happening. So, I try to create here a data frame whose name is df 1; that means, d
means data frame is f and 1 is df 1, right. And in which I try to take here two variables
here say one is here is state or two columns here one here is a state and another here is
population size p o p n s i z e.
So, it is like this type of data set which is trying to indicate here that these are here states
and then here is the population size. So, I am trying to take the states here “UP”, “MP”,
“AP”, “JK” that is Uttar Pradesh, Madhya Pradesh, Andhra Pradesh and Jammu and
Kashmir, right and then their corresponding populations are here 1000, 2000, 3000, 4000
respectively, right.
879
And similarly, I try to take here one more data frame and I try to create it using two
different variable the three different variable and one of them is going to be common. So,
this is df 2 in which I try to take here the state as say “UP”, “MP”, “AP”, “JK”. So, this
is like this one that here I have here three columns, one here is a state and then I try to
take here sample size. So, this is here in the second column it is sample size and then in
the third column this is survey completed, right.
So you can imagine that it is a sort of data set and yeah means it is related to a survey in
these four states where they are trying to specify the population size and the sample size,
right. So, and then these states are “UP”, “MP”, “AP”, “JK”, sample sizes from states are
100, 200, 300, 400 and then there is a status that is string in terms of “YES” or “NO” so,
“YES”, “NO”, “YES”, “NO”, right. So, if you try to look here that there is a common
variable here is state this is here common, right.
So, now if you try to see this data frame 1 and data frame 2 that is df 1 and df 2 will look
like this, right. So, you can see here this state here is common in both df 1 and df 2.
880
Now, we try to make here different operations. So, first I try to take here c bind. So, c
bind and then you have to simply write down here df 1 comma df 2. So, now you can see
here whatever was your here df 1 this is coming here like this and then you have here df
2. So, this is your here df 1 and this is your here df 2 and if you want to have a look you
can have a look here, right.
So, now, if you try to see both these data frame they are combined horizontally and this c
bind operation is not trying to consider because the state is in both the data frame. So, it
has to be only once or twice. So, it is just copy and paste that is all, right.
881
And if you try to look here at the screenshot of this outcome it will look like this, this is
your here df 1, this is your here df 2 and you can see here, now here this is your here df 1
from here and this is your here df 2 from here, right. So, you can see that both the data
frames are joined horizontally ok. So, let us try to first make these operations on the R
software, so that I can show you. So, I try to create here this both this data frames, right.
So, you can see here df 1 is here like this and df 2 here is like this and then when you are
trying to combine them so using the function c bind. So, this is say df 1 comma df 2 and
then it comes here like this. So, you can see here this is the same operation which I
showed you on the screen shot, right ok.
882
(Refer Slide Time: 09:38)
Now, after this I try to take here second command which is about merge. So, now, the
advantage in merge is that it also works just like c bind, but it takes care of the common
columns or the row names, right. So, this merge will merge the two data sets horizontally
and it will use the common column or row names. For example, in the same example
which you have just taken there is common column name state in this df 1 and df 2 here,
right. So, same data frame which we have just created.
883
So, now if you try to see how you can merge this together so, first you have to write
down here merge m e r g e which is the command for merging the two data set in the R
software. And then inside the parenthesis you have to write down the two data sets
separated by commas and after this you have to specify that which of the column of this
data frame has to be used for merging.
It is like here by, by dot x, by dot y and then after that if you want to give here the option
here sort; that means, whether you want to sort the data after this or not. So, this is going
to give you a true or false and similarly there is here another command here no dot dups
n o dot d u p s. So, this is also a logical variable indicating that the suffix are appended in
more cases to avoid duplicated columns in the result, right. So, this will take care of the
duplication. So, anyway I will try to make this presentation and lecture simple.
So, I will not take many option, but I will leave it up to you that first you understand how
the merging is being done then you have to experiment with the different other options.
So, now, if you try to see here this is your here df 1 which we have just created and this
is your here df 2 which we have just created in which this state is a common name
common column name.
So, now I try to use here the command here merge m e r g e and then I write down here
df 1, df 2 and after that I use the option here by and then I try to specify here the name of
884
the column with respect to we want to merge it, right. So, I try to give it here the name
“state”, s t a t e within the double quotes exactly in the same way as it is given, right now
you will see here what will happen here.
Now, state is coming here only once and then this population size which is here in df 1
this is coming here and then after this whatever is the sample size and the outcome of the
survey whether survey completed or not this is coming here. So, you can see here this is
how the merging has been done whereas, if you try to look into the other case here in this
case the state was getting repeated in both the data frame.
So, this is the advantage and if you try to see it on the console also, it will look like this
is your here df 1, this is your here df 2 and this is your here the outcome of merge
command where this state is appearing only here once. And after that this is here the
merge command of df 1 and df 2 of those column which are not common, right. So, let
us try to have a look on this operation also in the R software, right ok.
885
So, let me try to copy here this command. So, you already have this df 1 and df 2. So,
you can see here and now if you try to merge here, you can see here that state is coming
here because it was common and after this population size from df 1 and sample size and
survey computed from df 2 they are coming here. So, you can see that it is not a very
difficult operation the only thing is that you have to understand how it is going to work
ok.
Now, after this I try to take the last option that is r bind. So, r bind is used to join the data
frames vertically; that means, this will stick the two data frames on top of each other
10
886
appending one to another, right it is just like this, right. So, we try to consider here three
these two data frames and now they are different than others I have created it artificially,
so that I can explain you this concept easily.
So, I try to consider here data frame by creating by considering two variables and
creating it. So, I try to take here state which are “UP”, “MP”, “AP” and “JK’’ and their
population size are 1000, 2000, 3000, 4000 and this data frame is like a df 11. And then
after that I have considered here one more data frame df 2 2 which is constructed by
considering the state as “Bihar”, “Delhi” and “Punjab” and population size is 100, 200,
300.
So, you can see here the difference between this case and another case is; that means,
earlier you had a data frame 1 like this and data frame 2 like this which has more number
of column, but now I wanted to make it the same number of columns, so that they can be
joined vertically without any mistake. So, that is why I have to create these two data
frame separately.
11
887
(Refer Slide Time: 15:33)
So, now if you try to see these two data frames will look like this, right and now in case
if you try to use here the command here r bind then you can see here that they are joined
together.
And it is like this, you can see here this is your here df 1, this is your here df 2 and if you
try to see here this is your here df 1 from here and this is your here df 2 from here, right
and this is here the outcome. So, you can see here both the data frames are joined
together. And now this is a new data frame in which you can make different types of
operation, right.
12
888
(Refer Slide Time: 16:37)
So, let me try to show you these two examples on the R console also, so that you can
understand what is really happening. So, this is your here df 1 1 and this is your here df 2
2, right and then I try to merge it using the command r bind. Try to see here it is like that
this first four values in the df 1 1 they are coming here and this three values in the df 2 2
they are coming here in this at the end.
So, they are stacked together and if you try to actually if you try to change the order here
then you will see what will happen, I try to change it here, here 1 1 and here 2 2, right.
So, now, you can this is 2 2, right. So, if you try to see here now this has been reversed
mean earlier Bihar, Delhi and Punjab we are coming in the bottom, now they are coming
in the top and after that yeah. So, that is pretty straight forward now, right, ok.
So, now we come to an end to this lecture here and as you can see that was a pretty small
lecture and we have learnt a very simple operation that how you are going to merge two
different data frames together. Now, it is your turn try to take some more data frames
more than two and try to see try to make some columns common and then try to see how
you can handle them. And similarly try to look into the help and try to see there are
various options which are given here which can be used to handle more complicated
situation.
13
889
My objective was very simple that I wanted to give you an idea that number 1 the
merging is possible in R software when you are trying to deal with data frames and you
can do it very easily without any problem.
So, now, I will stop in this lecture with the topic of data frame also. So, as I said in the
beginning there are many things which are left, but now I will leave it up to you that how
much you want to learn and depending on your needs actually you can choose what
commands are going to be useful for you, you try to practice them. And I will see you in
the next lecture till then goodbye.
14
890
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 45
Data Handling - Importing and Reading CSV and Tabular Data Files
Hello friends, welcome to the course foundations of R software. And, now from this
lecture we are going to begin with a new topic. You know, that whenever you are trying
to handle different types of data sets in the R software, usually the data is prepared by
someone else under different types of software. Somebody is trying to prepare the data
set in say Microsoft Excel, somebody is trying to combine the data in some tabular
format dot txt file.
Somebody is trying to bring the data from some other type of softwares like SPSS,
MATLAB etc. So, now, the question is that, when you want to handle such a data in the
R software you cannot handle it directly, right. Because, if the data file is created under
the software say this Microsoft Excel, then internally that has something which the R
may not be able to read it. So, we need to read the data file and we need to import the
data set in the R software.
And after that, whatever are the rules of the data handling we have learned or we have
learned, how to handle different types of say this operation through the, through different
commands they have to be implemented under the R software. So, now, in this lecture
and in the next lecture, I will try to a consider a couple of some popular formats of the
file, wish we would like to read them in the R software.
Once you can read them after that there is no issue you have learnt many many this data
handling commands and you can do the suitable operation. So, in this lecture today, I am
going to talk about two types of file, right. One is CSV file that is comma-separated
value file and say other is tabular data file. Do you know what is this CSV file and
tabular data file? CSV files are those files in which the data is written and it is separated
by comma, ok.
And, then data is written on different rows or say different lines and similarly in the
tabular data file, the data is separated by either by blank space or by something else,
1
891
right. And, then, every line or say every row contains 1 complete set of data; that is like
suppose if I try to take a very simple example of students and their marks. Suppose, I
have got three subjects physics, chemistry and mathematics and I have suppose 5
students, I want to write down their marks.
So, you will see that in the first row I will try to write down the names of the student,
student number 1, 2, 3, 4, 5 and after that their marks in physics, chemistry, maths for a
student number 1, for a student number 2 once again marks in the physics, chemistry,
maths and so on. So, every row is going to contain the same number of fields. For
example, in this case there are three fields physics, chemistry and maths.
So, every student will have the marks in the in those 3 fields. So, this is your tabular data
file. So, now, in this lecture, we will try to understand how we can bring the data under
the dot csv format and dot txt format in the R software. So, we try to begin the lecture
and we try to take here some examples. So, that I can explain you how these operations
can be done, ok.
So, now, first of all what I have done that I have prepared some very simple examples.
And, I have kept all those files in my directory which is located on the C colon in a
folder whose name is R course. So, my first job is that wherever are my files located,
which I want to import inside the R software that directory has to be set as the current
working directory.
2
892
And we already have discussed that if you want to set the working directory the
command here is setwd and then you have to give the location of the data set along with
the path. So, if you try to see here, I have written here is within double quote C colon
backslash R course or you can also write like this. And if you really want to know that
what is your current working directory, we already have learned that we have a
command here getwd that is get working directory.
So, by which you will come to know. Now, my request is that, you also try to create a
working directory anywhere wherever you want I have created it on the C drive,
similarly you can create it on any drive, right. So, that you can also create this type of file
and you can keep all the files and their results in the same folder. So, I will be using this
folder in this course, right.
So, now our job is that suppose we have got some data on our computer and we want to
import it in the R software. So, as we have discussed that there are different formats of
the file that in which the data can be saved and those formats can be used in the R
software. For example, comma separated value which are popularly known as CSV data
files and tabular data file like table files or say dot txt or a spreadsheet like a for example,
the files which are created in the MS Excel software and their extension is like dot xlx or
dot xls and so on.
3
893
Similarly, you have HTML files the data is given in the HTML format and then there can
be some other files which are imported from other softwares like as SPSS, Minitab etc.
these are some statistical software. So, we try to handle the data under the software. So,
they also have a storage facility and you can read the data over there and store the data
over there, but in their own format, right.
Now, before going forward, means I just want to show you that when you are trying to
read the data, it is not a condition that data has to be only on your hard disk or your
computer. The data can also be read from any internet site, right. For example, what I
have done I have uploaded file on my home page. The address of my home page is home
dot iitk dot ac dot in backslash tilde s h a l a b. And, after that I have created directory
here there also R course and under which I have file munichdata dot asc.
So, actually this data is consisting some data from the Munich city in Germany about
their house rent. But, anyway, my objective is that, I just want to show you if somebody
has uploaded the data on the website how you can read it inside the R software directly.
And after that, you can bring that file inside your R software and after that you can make
such manipulations.
So, how to get it done? Well, I have not just explained you that how you can read such a
data file, but I can tell you here and I will then explain you in this a lecture that, in order
to read this tabular data we have a command here read dot table and after that you have
4
894
to give here the file name along with the address. So, this file name is here, this address
and then after that I am trying to give here one option header is equal to TRUE.
Well, what is the meaning of this read dot table header etc.? That we are going to discuss
very soon, right. But, if you try to see here, this command can be used here and then this
data file can be read directly in the R software, right.
So, first what I will do here that I will try to show you this location on the R internet site.
So, I try to you can see here, I try to write this data and you can see here this is here the
data, right, that is a long file. But I can show you here; that means, I can increase the font
size here like this and if you want to see the entire file it is here like this, right, anyway.
So, you can also see this data over there anyway. My objective was not to handle the
data, but to show you that this data is here on this on my home page, right.
5
895
And, now, what I try to do here, that I try to bring this data file in my R software and I
try to give it here a name data munich. And, then I come to this R software and if you try
copy and paste this command here, you can see here this data here is like data munich
like this one, right and you can see here this data is here, right.
Anyway, I am not bothered about this data. But my thing is this, ok I just wanted to show
you that this data is on the internet site, but I can bring it here directly on my R console.
And now, this data is here just like here data munich and then I can handle this data for
example, if you want to do anything over this data you can just do it, right. So, this is
here the data you can see anyway. So, I clear the screen.
6
896
And, so, I have told you that in R software it is possible to read the data from your local
directory, from your computer, as well as from some internet site, right. So, now, this has
a very big advantage, that if there are more than 1 people who are staying at different
locations and the data can be then shared among them and they can access this data
remotely from their own places, right, ok.
So, after this I try to take here one example, right, ok. Before going further, let me try to
show you here that you see this is my here this folder R course, you see here this name R
course. And, then, I have here 2 files; one is example 1, example 3, I will try to use them
in the lecture today. And, if you try to see this is my here example 1 and if you try to
open it this comes here like this.
7
897
So, what I have done here, that there are only here these 5 rows and 3 columns in which,
yeah, just for the sake of easy understanding, I have taken 5 values here 1, 2, 3, 4, 5 in
the first column 10, 20, 30, 40, 50 in the second column, 100, 200, 300, 400, 500 in the
third column.
So, the number of digits are indicating the location of the column also. First column has
only 1 digit, second column has 2 digits and third column has 3 digits. And, similarly,
the rows are here 1, 10, 100; then 2, 10, 100; 3, 30, 300; 4, 40, 400 and 5, 50, 500. So,
these numbers are also indicating the row numbers like as if I say 2 so; that means, this is
the second row and first column. So, that is just for the sake of easy understanding. So,
that you can compare what you are trying to do and what are you getting here.
So, I want to read this file and this file is actually comma separated value file; that
means, the file has been created in such a way such that the values are separated by
comma. So, now, my question is how you can read the CSV files? How you can read the
comma separated values files? That is all. So, for that we have a command here read dot
csv and then within the parenthesis we have to give the name of the file as file name dot
csv and after that there are a couple of say this options which are used depending on the
nature of the file, right.
So, what I have done, because this file is located in my directory. So, I already have set
my working directory and yeah to this thing so, I will cut it on my R console also. And
after that, I will try to read this file using the command read dot csv and within the
parenthesis, I try to give here the name example 1 dot csv and I try to store it inside a
variable name d a t a data.
Now, one question comes here. When there is a file name here already, when there is
already a file name like example 1 dot csv, then why are you trying to give it here say a
new file name here as say data? You see, one thing you have to understand this example
1 dot csv is the file which is a comma-separated value file and that has some internal
structure and that can be opened in the suitable software. But I want to bring the data
values and other related information inside the R software.
So, when I am trying to import these values, the structure or the internal structure of the
file is going to change. And, now, this file is capable to be read by the R software. So,
this the structure of the file which is capable to be read inside the R software that is given
8
898
by data, right. So, now, once I read this file, then after this, I will not be bothered about
the file example 1 dot csv, but I will be working only with the variable name here data,
right.
So, if you try to see here just for the sake of your review, the example 1 dot csv file
which I just shown you, it looks here like this, right. So, I have just explained you how it
looks like. So, you can see here that the column names here given by are here A, B, C
and row names are given by here 1, 2, 3, 4, 5, ok.
9
899
Now, what we try to do here, that we try to read this file in the R software. So, first I try
to show you actually through this screenshot, what is going to happen and then I will try
to explain you inside the R software. So, once you try to read this file and you name it
under the data it looks like this. And this is here the screen shot. And this is here the
example 1 dot csv file.
Now, can you compare it what is really happening? Can you see some changes in the
structure? Ok. If you try to see here, what is this here? X 1 X 10 X 100 and you can see
here in the screen shot also it is here and there was no X 1 X 10 X 100 here no X 1 X 10
X 100 here. What is happening here? Number 2, if you try to see here the means another
difference; in this original file your data values are starting from 1, 2, 3, 4, 5. But, in this
file the values are starting from here 2 3 4 and 5.
So, the first question comes here, how this X 1 X 10 X 100 entered automatically and
you have not given any such command, right. And, number 2, why this data is beginning
from here 2 instead of here 1. Now, before I move forward, let me try to have your
attention here. Can you see here what is here 1? What is here 10? And, what is here this
100? Do not you think that something is happening that when this structure is imported
in the R software, then in the first row this X 1 X 10 X 100 is entered? And this
happened automatically.
So, now we try to understand, why this is happening? And you have not done it, but still
this is happening, right. If you try to see here more clearly here, that original data has
10
900
like this column names A, B, C and data is starting from 1 to 100, sorry, like here 1, 10,
100 in the columns. And, but, when you are trying to read it inside the R software it is
becoming like this. So, let us first try to understand this thing and try to see whether this
is happening in the R software or not? So, first I need to set my working directory, right.
So, I click here. So, now, you can see here, my working directory here is like this C
colon backlash R course. But, yeah, that is not a very big deal for you also to do such a
thing. And, then, I try to read here this example 1 and I try to store it under the data,
right. And, now, if you try to see what is happening here with the data is giving you here
X 1 X 10 X 100. So, the question is what is this happening? And, then the data values are
starting from 2 20 200 rather than 1 10 100, right.
11
901
So, now let us try to find out the answers one by one. You know, that when you are
trying to handle the data files there is something like called as header; means header is
something like, I try to write down the data values here and I try to write down here the
see here subjects say physics, maths and say chemistry. And here I try to write down the
value there 50, 60, 90, 70, 80, 20.
So, now, when you are trying to give the data values, these two rows they are your data
values. This is only a header means this is only the name this is not the data. But it is a
header which has to be read only as a name and this does not contain the data. So, what
is happening in this example here, that whatever is your here first row 1, 10, 100; the R
software is trying to read the first row as header and it is doing it automatically.
And then it is trying to give these headers a name like as X 1 X 10 and X 100. So,
essentially what it is trying to do it is trying to fix the alphabet here X X 10 and X 100
like this. So, now, it is something like this now physics becomes here something like X 1
maths becomes X 10 and chemistry becomes here X 100 like this. So, you have not
given it, but on the other hand you also have not informed the R software that there is no
header. So, that is why the default here is that R is trying to assume that the data file has
got a header and that is why the first row is being considered as header.
Now, in case if header is not there, then we need to have one option which can inform
the R software that in this file there is no header and similarly this will give you a control
also, that if you are trying to read a file and if there is no header you should inform the R
software that there is no header. And, if there is a header then you must inform the R
software that that there is no header, right.
So, the absence and presence of header in the file has to be notified to the R software.
And, for that, what we try to do here, we try to add here one option header h e a d e r
which is all in lower case alphabets and this is a logical value. So, it will try to take the
options in logical TRUE and logical FALSE. And so, as soon as you try to say here
header is equal to FALSE; that means, there is no header and so the data in the R
software will be read from the first row.
And, if you see here header is equal to TRUE; that means, the header is there and so the
data will not be read from the first row. So, now, you can see here, now gradually we can
increase here many many options which are required, but I will try to make this
12
902
explanation simple. So, I will be considering here only couple of important options in the
whole lecture.
So, now my this new command becomes here read dot csv and inside the double quotes
you have to give the file name and then after that you have to specify whether header is
present or absent by saying this logical FALSE or TRUE.
So, now, in this case if you try to see what the data which you have taken here, do you
think that you have specified the header? Answer is No. So, that is why, I would like to
add here that header is equal to FALSE. So, my command becomes here read dot csv and
then within the double quotes I try to give here this double quote and the file name. So,
the file name is the same example 1 dot csv, but now I have added here header is equal to
FALSE.
So, now, you can see here this type of outcome. Now, if you try to see here whatever was
your data that is now intact. And, R software has inserted a row here, what is this row? V
1 V 2 V 3. Actually, R has inserted the header and it has given the names of the columns
automatically as V 1 V 2 and V 3. So, V 1, V 2, V 3 they are the columns or the name of
the column, right, so which has been given by the R software automatically.
13
903
So, now when R is trying to give these names automatically, then there is a possibility
that you may like to give a name which makes sense or which is needed according to
your data, right.
Now, you can see here now in this screenshot, this was your data. And, now this is your
here this data which is matching. Now, the data is read correctly, but now there is a
header also.
14
904
So, now, in case if you want to change the names of your columns. So, you already have
learned this command which is here name n a m e s and inside the parenthesis you try to
give the name of this data which is d a t a and then you try to give the names of the
columns that you want to give, suppose, I want to give Column 1, Column 2 and Column
3.
So, I try to create a data vector of these three values and all these names are going to be
written inside the double quotes because they are the characters. And, if you try to now
use this command here names and inside the parenthesis data, then the names of these
columns are going to be changed and earlier you had the names here like as V 1, V 2, V
3. Now, they will become here Column 1, Column 2, Column 3.
And, you can see here this is your first column consisting of the values 1 2 3 4 5; second
column consisting of the values 10 20 30 40 50 and the third column consisting of the
values 100 200 300 400 500. And similar are your rows. So, exactly the same data has
been obtained here.
And you can see here this is the screenshot of the same operation, that here are the names
V 1 V 2 V 3. But now the changed names are Column 1 Column 2 Column 3. So, now,
let me try to show you this operation on the R console also, right.
15
905
(Refer Slide Time: 21:31)
So, now you can see here this was your here data with example 1 without any option. So,
if you try to give it now your header is equal to here FALSE; that means, there is no
header, then you can see here the data here is like this, right.
Suppose, if you try to give here header is equal to here TRUE; then now you can see
what is really happening, this is the default and that was happening when you were
trying to not use this option header then it was automatically taking header is equal to
16
906
here TRUE, right. So, that is why there was a name like X 1 X 10 X 100 and your data
values were starting from X 2 20 200.
But anyway, let me try to have here this here FALSE. So, that the data here is like this
with the name V 1 V 2 V 3. And, now, I would like to change the name. So, I try to use
here the option here say names of the data vector here like this. And, once I try to operate
it here, you can see now data becomes here this Column 1 Column 2 Column 3. So, this
is how you can read these variables here.
17
907
And, now if you try to see here if you try to write down here like this, what do you get
here? Data dollar Column 1 is your 1 2 3 4 5; data dollar Column 2 is your head 10 20 30
40 50; and data Column 3 is your here 100 200 300 400 500. So that means, what are
you getting? You already have learned that if you want to read a particular column of a
data set then the rule here is name of that variable data, then dollar and then that name of
the variables. So, now, you can see here the name of your variables in the data are
Column 1 Column 2 Column 3.
Now, you can even access the individual value that after that now you can do whatever
operations you want. For example, if you want to find out here the mean of the values in
the column number 3 you can simply write down here like this. So, if you try to see what
you have done, the data was in CSV format. The data was an external drive that was
outside the R software. Now, you have brought that data object into the R software.
Now, this file can be read under the R software.
All the values are being accessed by the R software and after that you can do all sorts of
data manipulation by using the command that you have learned and you can handled it
very easily. So, that was my objective to show you here in this lecture.
Now, in the first case, I have given you all the things in quite detail. Now, after this you
will see the similar type of operations will be there when I try to consider different types
of files, right. So, now, in this comma-separated value file, there is one option here the
18
908
option here is that instead of using the comma you can also use another separator and
this is controlled by the option here sep, right.
For example, you also have heard about the tab delimited files in which the data is
separated by the tab. So, in that case you have to give here the option here s e p that is
separator is equal to within double quotes backlash t, right. For example, I have taken
here only the CSV file, but if the values in the files are separated by the tab, then you
have to use the same command read dot csv. And then you have to give here the name of
the files inside the double quotes and yeah, do not forget to write the complete name,
right.
For example, data file name dot csv and then you have to use here the option here sep s e
p and then if this is a tab del limited file you have to give it here back slash t. And,
similarly, means there are many other options, for example, if you try to see another
option is the blank space the different data values are separated by the blank space in that
case you simply try to use sep is equal to blank space.
And, you try to add here sep is equal to blank space, yeah, along with this if you want to
use the option here header is equal to TRUE or FALSE that also you can use and
gradually you can increase this complexity, right, ok.
19
909
So, now after the CSV file, let us try to understand how we can read the tabular data file?
And, how we can import the tabular data file inside the R software, right. This tabular
data files are simply the text file, right. For example, they can be created in the notepad,
they can be opened in the notepad and they have a very simple format. The each line
contains 1 record and within each record the field separated by a 1 character delimiter
such as space, tab, colon or comma that is all.
For example, if I have to write down the marks of my student, I can write down here like,
suppose I will write down here 50, 60, 70, then 30, then 10 and 20, then 15, then 25, 35.
So, I know that for example, this would not be there, but I will write down here physics,
this is chemistry, this is here maths and then here student 1, student 2, student 3 like this.
So, every here row, that is trying to indicate a value and every value has got the same
fields. So, each record contains the same number of fields.
For example, every student has got the values in three different field that is physics,
chemistry and maths. And, now we want to read such a text file that contains a table of
data. For that the command here is read dot table r e a d dot t a b l e. And, after, that
whatever you have learned in the case of read dot csv everything will be valid here. So,
you will simply write down here read dot table and within double quotes inside the
parenthesis you have to write down the file name, right.
20
910
So, for example, I can show you here, that I have created here a small file to show you in
the same folder here, whose name is example 3. Yeah, you can ask that why I have not
taken here example 2? Actually, example 2 is something else I will try to show you. But,
somehow at the time of presentation I thought that, ok, this is a better option to first give
you about this thing.
So, this is your here example 3 and if you open it this will be here this file. So, it is
opened in the note pad and you can see here these are the values. So, if you try to see just
to make the illustration and explanation simple. Once again, I have taken the same
values. And, the first column 1, 2, 3, 4 and 5; second column 10, 20, 30, 40, 50 and in the
third column 100, 200, 300, 400, 500 just like what I did in the case of this example 1.
That is just to make the life simple and easy to understand. And, if you try to see here,
the values here are separated only by a blank space single blank space, right.
That can be comma also, that can be some other character also, right. So, if you try to
now see here, I have given you now the screen shot also. So, that you can understand it
very easily. So, this is the same thing which I just shown you, right. So, now, and this is
your here data and this is separated by here this blank spaces. Now, I try to read this data
from this example 3 dot txt. So, this is your here read dot table I will use the command.
And, then I will try to use here within double quotes example 3 dot txt. So, you have to
give the complete name. And, after that, because these values are separated by a blank
21
911
space. So, I am trying to write down here sep is equal to within double quotes a blank
space. And once you try to read it here, it will give here this type of outcome. Now, you
will understand what is really happening. What is this here? V 1 V 2 V 3 this is not here
if you try to see, right. So, what it is trying to do?
And you can see here, this is here the screen shot. And, after that, the in the column
names here that also you can do.
22
912
So, let me try to show you these operations in the R software also. And, so, you can see
here. Now, the data here is like this, right. And, if you want to change here the names
here that also you can do. So, now, if you try to see here, this will become here like this
Column 1 Column 2 Column 3 exactly the same way as you have done in the case of this
earlier dot csv file. And, similarly, if you now try to write down here say data say dollar
here say column 1.
So, you can see here, this will give you here this value. So, 1 2 3 4 5. Then, similarly, if
you try to take here Column 2, this will give you here the values in the second column.
And, similarly, if you try to take here the data dollar Column 3, this is here the values in
the third file. And, then similarly, if you try to suppose want to find out the mean of the
values in the Column 2, you can simply find out here mean and then the name of the
variable data dollar Column 2 and this comes out to be here 30.
So, you can see here the way you are going to handle this file this is going to have a
similar structure and commands are also the same and most of the commands you
already have done in the earlier lectures and the rule is the same. So, now, it is up to you
that how do you want to use it, right. So, now, we come to an end to lecture and you can
see here that was a pretty interesting lecture at least to me, that you have learnt that how
to read two different structures of file and how to call them in the inside the R software.
23
913
And, the best part is that, now you have learned many things. And, once I tell you that
how you can read the file; after that you can handle all the commands very easily, right.
So, as you can see that, as we are moving further towards the end of this course, your
responsibility is increasing day by day. And, now it is up to you how much you want to
think, how much you want to execute and, how much you want to experiment with the R
software.
So, now this is your turn. Now, you know how to read a CSV file, how to read a txt file
why do not you take up some file and try to read it and then do not stop there. But, try to
do some operation which you are doing on the CSV or the txt file, which you were doing
before learning the R software. So, we saved as dot txt file or dot csv file. So, you can
get the CSV or txt file directly from your Excel file also.
Well, in the next lecture, I will try to show you how you are going to read the Excel file
also. After, that there is no problem you can read any file you can handle, any data set in
the R software. So, you try to practice it and I will see you in the next lecture, till then
goodbye.
24
914
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Data Handling
Lecture - 46
Data Handling - Importing and reading EXCEL and other data files
Hello friend welcome to the course Foundations of R Software and you can recall that in
the last lecture we had studied about the topic that how you can import different types of
data files in the R Software and how you can do different types of operation. So, in the
last lecture we had learned that how you can import the CSV and tabular data files in the
R software and for that you had learned a couple of options also that how the R can read
them correctly and can bring them inside the R software
So, now in this lecture today we are going to learn one more type of file which we want
to import in the R software and this file is the spreadsheet which is created in the
Microsoft Excel software. So, you know that Microsoft excel is a software in which
these spreadsheets can be created and the files can be stored in TXT or CSV and various
other formats.
So, now in this lecture our very modest objective is that how to read those excel files in
the inside the R software. Just for your information means earlier in this Microsoft excel
there were two options to save the file that is dot XLS and another was dot XLSX. And
when you are trying to read the excel file in the R software earlier they are used to be
different packages and with the help of those packages we used to read the file in the R
software.
But now with the latest development now there is only one package which is used to read
both types of files which are created in the excel file and yeah these things may change
actually. For example, this TXT and CSV file formats they are the they have a built in
facility inside the R software.
So, if the R software gets updated these commands will be there, but in case if something
else is happening you have to be very careful when you are trying to read such files from
some external sources and you have to always cross check whether the package which
you use earlier is it continuing or not, right. So, now with this objective we begin our
1
915
lecture and we try to understand how we can import a data file which is created in the
MS excel software.
And then you have to just keep in mind that we are not going to do something new today.
We already have learned that what are the different options which can be used in the in
reading such files. So, the similar type of options we are going to use here and I will try
to keep the discussion at an elementary level so that you can understand.
And after that there are many other options which are available which can be used to
read the excel files. I would request you that you please try to look into the help menu
and try to understand those option. So, let us begin our lecture, right ok.
So, as we discussed in the last lecture we had set up our working directory on C colon as
a folder RCourse and the same thing we will continue in this lecture also. And you also
need to first set your working directory where you are going to store your file, because in
order to explain you as I did in the last lecture today also I have created a very small
excel file from which I will try to read the data.
Before we move forward let me inform you that in order to read the spreadsheet from the
excel software we need to install a special package which is readxl, all in lower case
alphabets. So, this is the package and after that we will try to use the command read
underscore excel.
2
916
Now, you have to be very careful in my pronunciation because when I am trying to say
excel whether this is xl or this is e x c e l that is difficult to discriminate. So, you have to
just focus on the slide set that where I am trying to mark my pen. So, I will repeat once
again. We need the readxl package to the spreadsheet which are created inside the
Microsoft Excel software and for that we need a command here read underscore e x c e l
and inside the parenthesis we will try to give the file name and some options, right.
So, when you are trying to read it and you know that in the excel spreadsheet sheets are
within the same file you can have more than one sheets, right, those sheets can be even a
number they can be identified as an index or they can also be given some name. So,
when you are trying to use this command read underscore excel then the first sheet of the
spreadsheet is going to be read, right and after that you have to give some option give
some command. So, that you can read a particular excel file, right so ok.
So, first I try to do that I would rather request you that you please try to install this
package by the command install i n s t a l l dot packages p a c k a g e s and within double
quotes you try to write down here read xl readxl and after installing it you just load it
using the command library l i b r a r y and inside the parenthesis readxl, yeah. I already
have done it on my computer; so, right ok.
Suppose there is a file whose name is datafile and excel software there are two options
that you can have a file with the name xlsx as an extension and another file name as
3
917
datafile with an extension dot xls. In order to read both types of file we have the same
command here read underscore e x c e l, right that you have to keep in mind.
And in case if you want to read any particular sheet number or sheet name that also can
be given here by the command that you try to use the same command read underscore
excel and give here the file name. And then after that you have to use if you are trying to
use the sheet number you have to give the command here a sheet underscore number and
in case if you are trying to read the name of the sheet, then you have to write within
double quotes sheet underscore name that whatever is the number or whatever is the
name, right.
So, I will try to show you this is an example so that you can understand it more clearly
ok.
So, now if you try to see I have created a file here whose name is s p e x c e l and this is
the same file similar type of file which I used in the case of psv and text while that it has
got say this 5 values here and there are 3 columns and the values are here like 1 2 3 4 5
in the first column; 10 20 30 40 50 in the second column; 100 200 300 400 and 500 in
the third column and the first mean there are rows and in which the data is like here 1 2 3
4 5 and then 10 20 30 40 50 and so on.
4
918
And now I have done here one more thing that I have given here are name to these
columns which are here Variable 1, Variable 2 and Variable 3, right.
And just to show you I mean I can show you this file is located on my this Rcourse
folder you can see it this is here.
And if you try to open it this will look like here this, right. So, I have just taken a
screenshot to explain you. So, you can see here this is my first row where I have written
5
919
Variable 1, Variable 2, Variable 3 and I want to indicate in the R software that this is
going to be the header, right.
Do you remember that in the last lecture we had considered this option that whether there
is a header in the file or not and then this is my first column, this is second column, this
is my third column. So, anyway we come to our slide and try to understand it from here.
So, and yeah after that I had explained you in the last lecture also, but that means I will
just tell you here once again that once you read the file after that you have to use the
standard commands to do the usual operation. For example, if you want to access a
particular variable from this file, you have to just use the same approach that object name
try to write in which you are trying to store the file and then write down the dollar
operator and after that you try to write down the variable name whatever you have given
in the file, right.
So, ok suppose now in this file what I just shown you I have here only one sheet, right
and the name of the file here is spexcel, right. So, after that the extension is dot xlsx. So,
I try to use here the command read underscore excel, then the name of the file exactly in
the same way as you did earlier and then comma sheet is equal to 1, right. And then I try
to store the outcome in an object whose name is dataspexcel, right. So, that is indicating
the name that whatever is the data in the file name spexcel that is stored here in the
dataspexcel, right.
6
920
So, now if you try to see here this type of outcome will come here. So now it will show
you here that is the first row here this is here like this; Variable 1, Variable 2, Variable 3
and after that you have here these values in the first column, second column and third
column, right. So, that is, right exactly the same way as you did earlier and it is trying to
show you here that it is a table of order 5 by 3 means 5 rows and 3 columns, right ok.
Now, suppose if you want to import the data under the Variable 1. So now, you have to
be a little bit watchful that how are you going to do it. If you try to see here in your excel
file the variable name was simply v a r i a b l e and then 1, right, but here you have to
give here this quotes, right. And so I will write down here the object name here as a
dataspexcel dollar and then this variable name and you will see here this is giving you
here this 1 2 3 4 5.
Similarly, if you try to find out the values in the Variable 2, similarly if you write down
the data object name dataspexcel dollar and within quotes Variable 2. So, you will get
here the data which is stored here under 10 20 30 40 50 and so on, right and you can see
here this is here sheet number here 1.
So, if the data is in sheet number 2 or 3 you have to accordingly use, right the
appropriate number. And now after that if you want to find out the mean of the values
which are stored here in this dataspexcel Variable 1 you simply have to write down here
7
921
the variable name and then you have to write down here mean and you can do the similar
operation.
And you can see here this is here the screenshot of all these observations, but let me try
to show you these things on the R console so that you become more confident here.
So, ok, so the first thing is so first I need to set my working directory. So, you can see
here this is now here like this C colon RCourse and then after that I have to upload the
package. So, I already have installed it on my computer, but you need to install it, right.
8
922
So, this is uploaded and then I have to give here this file name dataspexcel here and you
can see here as soon as I enter this data is stored in the dataspexcel.
So, if you try to see here dataspexcel if you try to write here you can see here this is here
the variable like this. And similarly if you try to if you want to really access any
particular variable my advice will be once again the same that you try to copy this name
from here or you can also use the command like here colnames (Refer Time: 13:01) here
dataspexcel and then you can see here these are the values here like this, right.
And in case if you for example, if you are want to see the values in the column number 2
it is here like this and if you really suppose if you want to find out the mean of the values
in this data you can simply use it here like this. Similarly, if you wish if you want to find
out here the sum of these values which are stored in the variable number 2 you can find it
out, this is 150.
So, you can see that it is not a very difficult thing to get all these operation done just by
extracting a particular variable from here, right.
9
923
(Refer Slide Time: 13:42)
So, now I try to give you here one thing here more that suppose if I try to create here
sheet number here 2 in which there are some values here and if you want to read these
values from the sheet number 2 what we have to give here that is only that read
underscore excel give the file name and then try to write down here sheet equal to 2.
So, in this case you will get here the values like here. This one you can see here this is
here 6 7 8 9 10; 10 values are here. The second column here are 110, 120, 130, 140, 150
and yeah in the and in the column number 6 they are here like this one 110, 210, 310,
410, 510, right.
10
924
So, well now you know that it is not a very difficult job here and similarly if you try to
see here this is the screenshot of this sheet number here 2. So, you can see here where
very clearly here that it is here sheet number 2 which have these values. So, if you try to
use here the same operation here I mean the data object name and then the variable name
like as here 4 if you give it here it is giving you the same observations here.
And similarly if you try to take here the second column here of 110, 120 etc. that you can
access by this by the same command just by replacing the variable number 4 to variable
number 5 and similarly you can access the third variable here by this command, the
name of the data object and then variable 6 inside the quotes.
So, you have here this thing and then if you want to find out the mean of this value just
try to use here mean of these values. So, these are the very simple operations, right. So,
that is what I wanted to show you.
And you can see here this is the screenshot of the same operation, right.
11
925
(Refer Slide Time: 15:29)
So, now after this I tried to show you some more commands and can I will try to come
back to the R software, right. These are pretty simple commands. Now, in case if you
want to limit the number of data rows it to be read, for example, you want that ok, you
just want to read only the means at the most three rows, then you have a command here
try to read down you the same command read underscore excel name of the file and then
use hidden option n underscore max is equal to for example, 3, right.
And then similarly if you want to read only a particular range of this data file, then what
you can do here? Then you can use the same command, but then you have to use a
particular option. So, the command here is the same read underscore excel the name of
the data file and now you have to give it here range r a n g e all in lower case alphabets
and now you have to give here this type of address within the double quotes.
So, this is the address of the cell. Means you know in the MS Excel you will have here
row number 1 2 3 and then here you have here column A B C etc. So, it is giving you
here the address of this column or the cell C1 which is the say this column C and row
number here 1 like this.
So, you can see here from C1 to E7 you want to read the range or the second option is
like is here row column notation like as here from R1C2 to R2C5; that means, R1 is like
as here 1st row and C2 is here the 2nd column and similarly this here R2 is the 2nd row
and C5 is the 5th column, right. So, you can give this here range and this is how actually
12
926
it means when I try to write an R2C3; that means, the cell at the 2nd row and the 3rd
column. So, this is the way it tries to read it.
And then yeah you can do it very easily. For example, if I try to say here read underscore
excel dataset. Whatever is this if I want to read the number of row is equal to here 3 and I
try to use here the file name spexcel dot xlsx and I try to give here the option n
underscore m a all in lower case alphabet equal to 3 then it will try to read only the first
three rows of this data file, right.
13
927
And you can see here this is the screenshot of the same operation, right.
And similarly if you want to read here say here any particular file any particular range of
the data set in the file, so, you can see here I am trying to give you here this file name
and then range is equal to A2 colon B3 sheet number 1. Or if you want to give the range
in the sheet number here 2, so, you can write down the range here within double quotes
A2 colon B3 and sheet number here 2 and so on and you can see here these values are
obtained here, right.
So, let me try to show you these operations first on the R console and then I will try to
move ahead ok. So, now, let me try to read the data from the sheet number 2.
14
928
And I will try to show you here that in the sheet number 2 here this is here the data I can
read, you can see here. This is in the this is a sheet number 2 you can see here in the
bottom where I am trying to move my cursor and sheet number 1 was like this, right.
So, now you can see here this data is there. So, now, I simply have to use this command
here like that and then it will try to see here this is dataspexcel12 and you can see here;
there is 12 here like this, you can see here like this, right. So, this data is here actually,
right. Here I had given wrong name. So, you have to be very careful that when you are
trying to give a wrong name it will not read it ok.
15
929
Now, if you try to read a particular value from this variable you can simply write down
here you can simply copy this variable name here and then you try to write down dollar
and then here variable name. You can see here this is here and similarly if you want to
write down the value and the variable 6 these are here and if you try to find out the mean
of these values you can very easily find out exactly in the same way, right, ok.
So, now after this I come to your here next execution. I am trying to read here this
number of rows. So, you can see here I am trying to read down here like this say n max
equal to 3 and you can see here this is here like this and if you try to see here
dataspexcel4 this is like this. So, you can see here it is reading only the first three rows in
the data vector, right.
And in case if you try to obtain here quickest part here that you want to write down here
range is equal to A2 colon B3. So, you can use here for example, the same thing here and
if I try to show you here I simply try to modify my earlier command and I try to write
down here range is equal to so, within double quotes C2 to B3 and if you try to see here
and then I try to write down here sheet number, number 1.
You can see here this is error. Why if you try to see? Because you are trying to give the
range here C2 to B3 in the opposite direction, but if you try to make it here B2 to C3 then
it will make sense, yeah. If you try to see here, now it will read.
16
930
(Refer Slide Time: 21:41)
And if you try to read it from the sheet number here 2, you can see here it gives you here
these values here, right like this, ok anyway.
So, now let us come back to our slides and try to understand that what are the other
things. So, now, you see once you have learned how to read excel file, how to read txt
file, how to read means csv file now there is no end. So, now, I will very quickly give
you that how you can read the data files which are generated another software.
17
931
For example, so, there is one package here foreign f o r e i g n and you will need to
install this package if you want to read the files which are generated in the SPSS
software, right then after that you have to use the command here read dot spss and then
whatever is the file name datafile dot sav you have to give and then it will read it and it
will work exactly in the same way as you have done with the xls file.
And similarly if you want to read the HTML tables those data files which are in the form
of HTML data files. So, you have to use here a package XML and after that you have to
use the command read this HTMLTable, but you have to see here this r e a d they are in
lower case HTML and T they are in upper case and a b l e that is in lower case, right. So,
you try to install it and after that the command is similar, try to write down this
command and then within double quotes the file name, right.
18
932
And similarly if you want to read here some here Octave and MATLAB file then you
then your command is read dot octave. If you want to read the files which are generated
in the SYSTAT software then the command is read dot systat. If you want to read the file
from which are generated in the SAS XPORT software then the command here read r e a
d dot xport.
And then if you want to read the data files which are generated in the Stata software then
the command here is read dot dta, after that you have to give the file name. So, actually
this Octave, MATLAB, SYSTAT, SAS Stata etc. these are different software which are
used in statistics. So, people are trying to get the data from there and from there they will
try to use these files over here.
Now, we come to an end to this lecture and then I will also stop with this data handling
and now I am pretty confident that you can understand very easily how you can means
read different type of data files which are generated from some external software. But
now you know now more responsibility is coming towards your head as we are moving
further because now you have learned how to read different types of file.
And then after that now you can read any data any value and now you know lots of data
handling functions and commands. So, now you are more mature more experienced more
and you have done lot of practice. So, now, if you try to experiment those thing what you
were doing earlier before learning the R software with the data files and try to do them in
the R software possibly that will make you more confident and a better programmer.
So, you try to take some example, try to practice it and I will see you in the next lecture
with some more commands on the data handling, till then goodbye.
19
933
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Data Handling
Lecture - 47
Data Handling - Introduction, Frequency and Partition Values
Saving and Writing Data Files
Hello friends, welcome to the course Foundations of R Software and in this lecture we
are going to take up the last topic of the Data Handling. Well, I am not saying that this is
the last topic in the sense that there are no more topics which are related to the data
handling, but possibly in this course today we are going to take up the last topic of this
broader topic data handling.
Now, up to now what you have done you have created different types of files, you have
learned how to read the file and how to do different types of operation. Now at the end
the last question come, how are you going to save the file? For example, whenever you
are trying to do some programming there is going to be an outcome there is going to be
output. So, the question is up to now, what you have seen you are trying to see the
outcome on the screen, right using print, cat, etcetera.
Now, my objective is that I want to save these files to an external file or in simple word I
want to save the output in some external files. So, how to get it done that is a very simple
topic and in R there are couple of ways in which you can do it, but to make the life
simple and to make this presentation simple and because we are learning at an very
elementary course. He will try to consider here some elementary operation through
which you can save the outcome of your program in a file.
So, whenever I am trying to do it yeah you have to first specify the location of the folder
in which you want to save the file, otherwise that will be saved in the default folder of
the R software. So, that is up to you where you want to save your file, right. So, let us try
to begin this lecture and try to see how we can save the output of a command into a file
ok.
934
(Refer Slide Time: 02:20)
So, we have a function here write w r i t e. So, you see as the name suggest write the job
of this function is to write the data to a file, right. So, if I say if I want to write a data
which is here x and I want to write it inside a file. Then usually you will see that with
this write function the data in x is usually in the form of a matrix, right.
And in case if you want to make some changes you have to do it accordingly, for
example, if x is a two-dimensional matrix and if you need to transpose it. So, that
columns in the file are matching with the columns in the outcome then you need to
transpose it, right. So, I will say very simply that you have to look that how your data is
arranged in the outcome and how it is saved that you have to look into the behavior of
the R software that how it is trying to save and then if you want to do it in other way.
You have to use a proper logic and then you have to convert it in the way you want, it is
as simple as that, right.
935
(Refer Slide Time: 03:34)
And when you are trying to use the function here, right then there are many many
options which can help you in saving the file in the format you want.
So, for example, if I say here x; so, x is the data which is to be written usually an atomic
vector and after this I have to give here the name of the file. So, I write down here f i l e
all in lower case alphabet and is equal to within double quotes you have to give the name
of the file, right.
So, this file is indicating a connection or a character string naming the file to write to it,
right and if you try to use that double quotes the print to the standard output connection
is done. And similarly if you want to give that how many columns should be there in the
outcome that you can control by the option n c o l u m n s then after that there is an
option of here, append; means append means for example, you are trying to execute a
program and some program comes here now you try to re execute the program.
So, now there will be next outcome. So, what do you want do you want to write it after
this or means do you want to add it or do you want to append it or this is your earlier
outcome and if you are coming with the newer outcome this is going to be just
overwritten on it. So, append is trying to control the option whether you want to append
or overwrite, right.
936
So, it takes value in terms of TRUE and FALSE, so if you say this append is equal to
false; that means, the data is going to be overwritten, right. Now, similarly you have here
another option here sep; that means, separator means how the values are going to be
separated and within the double quotes you have to give the symbol or blank space
whatever you want here, right.
So, and in case if you try to use here is equal to backslash t then it will give you a tab
delimited output; that means, the values are separated by a tab, right. So, similarly you
can see here these are here n columns and append, right.
937
So, let me try to show you here an output an example and I will try to show you that the
outcome how the outcome is going to be diverted to a file, right. So, just for the sake of
simplicity I try to take here the numbers from 1 to 100 and which are stored in the
variable here x. Now, I want to write this x in a file whose name is shalabh, right. So,
what I try to do here that I try to write down here write and then I try to write down here
x and then file name and then here s h a l a b h, right.
So, if you try to see what will happen here on the R console what I am going to do yeah
first I have to change my directory. So, that I can show you where the file is going to be
then I will write down here this data value here x. And then I execute the command here,
right and then you will see here that in the notepad I can open a file here this will be
opened after this.
So, I will try to show you, so this is going to be your here is step 1 and this is going to be
a step 2, where you will see that all these 100 values which you have generated in the x
they are stored here, right. So, let me try to show you this operation on the R console. So,
that you become me confident ok.
938
(Refer Slide Time: 07:12)
So, first you could see as usual I will try to change my working directory as to C colon R
course. So, I have created a folder R course on the free drive of my computer. So, I set it
here and yeah you can see here that this is here the folder R course and you can see here
very clearly that this folder is completely empty.
You can see here it is written the folder is empty, right, so that you can believe on me
that ok whatever is the outcome that is going to be generated and that is going to be the
same.
939
So, now I try to come here on the R console and I try to generate here say x is equal to
here 1 to 100 like this, so you can see here x here looks like this, right. And then I try to
write down here write x and inside the file name here shalabh. So, if you try to see here I
am trying to execute it here and there is no outcome here.
Because the outcome has gone to this here R course if you come here you can see here
means just before that there was no file and now there is a file, and if you try to open it
here you can see here this is here the file, right.
940
So, you can see here that when I am trying to execute it here. Now if you try to see what
is the difference between this, right and x, if you try to say here use here print x. So, what
this print x is going to do? Print x is simply going to show you the output on the R
console only, right, but when you are trying to do here, right then write is going to save
the output in the file ok. Now, let me try to show you here one thing here more.
Now, suppose I try to generate here one more here x say x is equal to suppose c say 101
to say 120. So, this is here like this, right now here x and now I want to show you that
when I try to write down this outcome to this here file and if I give here append is equal
to here TRUE then what happens, right.
So, let me try to execute it here and you will yeah let me try to close the file first yeah
and then if you try to see here I have executed it and then now if you try to come to this
file here.
941
(Refer Slide Time: 09:50)
And if you try to see here this file here you can see here, now there are values here 101
to 120 which are added here, right.
And similarly if you want to get here more confidence then what I can do here that I try
to take here say here y is equal to l e t t e r s say letters 1 to say here 20, right. And if you
try to see here y now here is like this alphabet and I try to write down the same here y in
the same file whose name was shalabh and I am saying here append is equal to TRUE,
right.
So, now after I execute till it look let us try to see what happens to the file shalabh. So,
you could see here now if you try to see at the end what is happening here.
942
You can see here that all the alphabets they have come here, right and similarly if you try
to see here, for one more example here say z is equal to here say here matrix now let me
try to say here nrow is equal to support 3, ncol is equal to 3 and data is equal to 1 to 9,
right.
So, now if you try to write down the same z in this file here what happens if you try to
see now here? Now what will happen I try to open here this here file here shalabh and
then you can see here after this the 20th letter all these values here they have come like
this, right. So, and if you try to see here how this z looks like it is like here 1 to 9
number, but they are arranged column wise. And in this file you can see here all the 1 2 3
4 5 6 7 8 9 they have just arranged in this way, right.
And similarly in case if you try to say here say if I try to take here x is equal to here
suppose 1 to here 100 now once again and then if you try to see this is your here x. And
now what I try to do here that I try to create here say this n columns is equal to 10; that
means, now I want to store this data into 10 columns and then I try to create here another
file here shalabh 1, right and then I try to write down the this here x. So, if you try to see
here what happened?
10
943
(Refer Slide Time: 12:28)
Now, there is a new file shalabh 1 which is created here and if you try to see here now
the data is arranged here from 1 to 100, right. Now similarly if you come here and try to
generate here one more y and then you try to see here that I try to generate the data from
101 to 200. And then you try to write down this y in the same file here shalabh 1, but
now suppose I in try to increase my number of columns let us see what happens and the
append is equal to here true.
So, right if you try to now see here what will happen here that I try to open this file once
again you can see here that 101 to here these values here you can see here up to 1 to 100
they were stored in 10 columns, but 101 onwards they have been stored in a data file
which has 15 columns, right.
11
944
You can see here this is the actually first row which has 15 values this is here the second
row which has the 15 values and so on. So, this is how you can see that we can proceed
and we can save these files in different alphabets, right.
And now similarly I have given you one example in detail and after that I believe that
otherwise if you want to save other types of file then you simply have to look that what
is the correct command, right, and after this the functioning will remain the same, right
ok. So, if you try to suppose if I want to know that how I can save a tabular or CSV data
files.
So, now what you have to look into the R software and then try to find what is this
function. So, the function will you will find is write dot csv; w r i t e dot c s v. So, this
function can write the tabular data to an ASCII file in the CSV format and the way it
writes the data is like each row of the data creates one line in the file and the data items
are separated by commas.
So, for that also you have a simple command here write dot csv and then here x you try
to give here the file name and then you can use here append is equal to TRUE or FALSE
exactly in the same way as you have done in the earlier case. And after that you will see
here there are many more options and I have just I given you here this idea that how you
can use them.
12
945
(Refer Slide Time: 14:40)
And similarly if you want to write the tabular data file. So, similarly you have a
command here, right dot table w r i t e dot t a b l e and the format is the same write dot
table and within the parenthesis you try to write down the content the outcome and then
the name of the file and then you have to use append is equal to TRUE or FALSE.
13
946
(Refer Slide Time: 15:15)
And after that similar to write dot csv we have here many more options which well I am
trying to give you here x. I already have explained you, file I already have explained you,
append I already have explained you and after that yeah you can see here code separator
eol, na etc., right they can be used here. For example separator you know what is this,
right.
And if you want to print character at the end of each line that is a row then you have to
use the here the option eol if you want to handle the missing values then you have an
option here na, right and so on. And then there are complete details and examples if you
try to look into the help of this function, right.
So, now I come to an end to this lecture and you can see that it was a pretty simple
lecture and what I have done I have shown you the examples related to the write and then
I have taken couple of examples. So, that I can convince you that write function can
work in different ways and you can save the file in different ways you want, right.
Similar is the command to write the CSV file tabular data files and there are many more
say such structures, but I will leave them up to you I have given you the basic idea, now
you simply have to look into the correct function or correct command to execute those
actions.
14
947
And after that yeah please do not forget to look into the help menu and try to read that
what are the different options, because whenever you are trying to generate something
you always need it in a format; so, that this file can be used for some further execution,
right. It is possible that the outcome file which you have created here that is some data
file and that is going to be used as an input in some other program.
So, try to be careful that what type of format do you want and this format should be
compatible for the use for the next job. So, now it is your turn once again that try to take
some example, try to experiment it, try to write down the names of the file, try to change
them, try to use different option and after that you try to look into the file for sure that
the type of changes what you wanted are they being executed by the R in that file or not
and whether those effects are present in the file or not.
And I will see you in the next lecture with a new topic, till then good bye.
15
948
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 48
Introduction to Statistical Functions - Introduction, Frequency and Partition
Values
Hello friends, welcome to the course Foundations of R Software. Now, from this lecture
we are going to begin with a new topic. So, you can see, right from the beginning of this
lecture we were talking of different types of command, different types of say the control
structure and different ways of data handling etc. So, now the question comes here how
are you going to use them? Where are you going to use them? What are the different
applications of this R software and why this R became so popular.
So, one of the reason that why R became so popular was that it was very useful in the
statistical programming. Well, in this course we are not going to talk about the statistical
programming, but definitely I would like to give you here some basic operations in
statistics and which I believe that you have learnt it learnt them in the elementary classes.
So, that should not be a problem for you to understand means anything related to
statistics. You already have done the topic like frequency table and then counting the
values, partitioning values, quantiles, quartiles etc. So, my objective now from this
lecture is that, ok. I would try to give you some idea that whatever you have learnt earlier
how it can be executed in the R software.
So, after just giving you some elementary concepts of frequency table quantiles etcetera I
will try to take the topic of graphics. Because, in graphics I need these concept for
example, if you want to create a histogram or a bar plot, then you need this concept that
of what value you are going to create the graphics.
So, that is why I will try to consider some very basic elementary statistical topic in the
lecture today. And after that, I will try to take up the topics of graphics and I will try to
show you that how you can successfully produce graphics in the R software. So, let us
begin our lecture and we try to understand some basic elementary statistical concept, ok.
949
(Refer Slide Time: 02:40)
So, you know that whenever you get some data, the first job which you are going to do is
that you want to employ the tools of descriptive statistics, right. Descriptive statistics
means, you want to have the idea about the central tendency of the data, variation in the
data, structure and shape of the data, relationship study and in order to understand this
we would like to use the graphical as well as these analytical tools, right.
So, now we are not going into this details in this course, but I would try to simply show
you some very basic elementary tools of descriptive statistics we are trying to compile
2
950
the data in a statistical way. So, first we try to understand the concept of absolute and
relative frequencies, right. So, this is a very elementary concept and I believe that you
have been taught it in some elementary classes only, elementary means possibly in class
6, 7, 8, 9, 10 and so on, right.
So, let me try to take here an example and try to explain what is this absolute frequency
and what is relative frequency. Suppose there are 10 persons and they have been coded
as male or female in two categories. So, male is indicated by M and the female is
indicated by F. So, now the person has been selected and the person turns out to be male
and then the second person turns out to be female, then the third person turns out to be
male and so on and the data is recorded here like this M, F, M, F etc.
So now, you would like to know that how many people are in these two categories of
male and female. So, let us try to count that how many are here male and how many are
here female. So, let me try to count it here 1, 2, 3, 4, 5, 6, 7 so, there are 7 males and then
1, 2, 3 there are here 3 females, right. So now, these two categories say male and female
they can be in general indicated by say here a 1 and a 2. For example, a 1 can be male
category and a 2 can be a female category.
And the total number of people who are male and female in these two categories are say
they are indicated by n 1 is equal to 7 and n 2 is equal to 3, right; that means the number
of people in the category a 1 that is n 1 and n 2 is the number of people in the category a
2 which is here n 2 equal to 3, right. So, the number of observation in a particular
category this is called as absolute frequency, right.
So, for example here you can see here this n 1 equal to 7 and n 2 equal to 3 they are the
absolute frequency of the class a 1 and a 2 which are indicating the two categories male
and female, right.
951
(Refer Slide Time: 05:23)
Similarly, if you try to consider that what is the proportion or percentage of these
frequencies? So, that is called as relative frequencies. So for example, we have here two
classes a 1 and a 2. So, the relative frequency of class a 1 is indicated by here
n1 7
f1 = = = 0.7 = 70%
n1 + n2 10
which is equivalent to 30 percent. So, you can see here, I can say here that in these
values there are 70 percentage males and there are 30 percentage females, right. So, this
relative frequency gives us the information about the proportion of male and female in
the data set.
952
So, now the question here in which we are interested is that how can you find such
absolute and relative frequencies in the R software, right. So, in order to do this first we
need to create our table here and this table will give us the information about the absolute
frequency of the data set which is stored inside the name variable. So, the command will
be here table t a b l e and inside the parenthesis you have to give the name of the variable
of which you want to find out the absolute frequency, right.
So obviously, if I if you try to store the data as x then, if you try to write table and inside
the parentheses here x then this will give you the absolute frequencies and if you try to
divide the table x by the total number of observation in x which can be obtained by the
command l e n g t length of x. That means, how many observations are there in the x data
vector, then it is going to give you some idea about the relative frequencies, right.
So, for example if you try to take here same example so, we have the data here M, F, M,
F etc. So now, definitely when you are trying to consider these values M, F etc. then,
they cannot be understood by the R software, but you need to convert them into some
numerical value. So, that R can understand it and can do the required mathematical
operation. So, what we try to do here that we try to indicate the categories male and
female by the numbers 1 and 2. So, male is going to be indicated by 1 and female is
going to be indicated by 2.
So, now the same data set which is in the format of male and female that is now
indicated in terms of the numbers 1 and 2 and I try to store this data here in terms of 1
953
and 2 as and g e n d e r gender and this data is stored here in this data vector c, right. So,
this is here the data, yeah.
And if you do not want to do it manually you can also know that you can do it through
factor also, but anyway that I will leave up to you my objective here is simply that how
you want to create the table and find the absolute and relative frequencies.
So, after that if you try to simply say here table gender, then it will give you here this
type of outcome. This means here, this is the name of the variable g e n d e r and these
are here the 2 categories, right. So, this is indicating the categories 1 and 2. And now it is
given here in this second row these are the values of absolute frequencies. So, it is trying
to give here the formation that in this category 1 there are 7 people or the absolute
frequencies here 7.
And this category 2 the total number of people are here 3, right. So, this will give you the
absolute frequency, this is how it is indicated and here you can see the screen shot also.
And then, if you want to find out the relative frequency here this is here the table divid ed
by the length of the gender variable. So, that is the total number of observation in the
data vector, right. So, now it will give you here the name of the variable, here as a gender
1 and 2 these are the categories and then you have here the values of relative frequencies.
954
So, this is obtained as 7 upon 10 and this is obtained here as a 3 upon 10. So, it is trying
to indicate here that in the category 1 there are 70 percentage people in the category 2
there are 30 percent people, right. So, that is how it actually gives you the values of
relative frequencies, ok.
So, now let me try to show you these things on the R console so, that you get here a fair
idea that how the things are working and after that I will try to show you some more
examples here, yeah.
955
And you please try to keep these examples in mind because; we are going to use it in the
further lectures also. So, gender data here is like this and if you try to create here a
gender table of this gender variable. So, it will come out to be here like this and if you try
to divide it by here the length of gender, it will give you here the relative frequency like
this. So, you can see here this is very straightforward and simple to obtain this absolute
frequencies and relative frequencies in the R software, right.
So, now I try to give you here one more example suppose there is a restaurant which
deliver the pizza at home, right they have a home delivery system and suppose this
restaurant has three branches one is in the east direction of the city, another is in the west
direction of the city and third is in the central direction of the city; that means, in the
central part of the city, right.
So, the branch which is allocated in the east part of the city this is coded as 1, the branch
which is located in the west part of the city that is coded as 2 and the branch which is
located in the central part of the city that is coded as 3, right. And suppose they get the
orders every day and they try to deliver in different places, right.
So and then, it is like that they are getting the orders at a center place and depending on
the location where the order has to be delivered they try to choose the appropriate branch
from where the delivery can be made faster correction which are coded as 1, 2 and 3 this
is obtained here like this.
So that means, it is simply what was the distribution of coded branches 1, 2 and 3 and we
want to know that how many they have delivered from these branches. So, this data has
been can see stored as 1, 2 and 3 like this in terms of 1, 2 and 3 and the data is stored in
the variable d i r e c t i o n direction and this is stored using a data vector.
956
(Refer Slide Time: 12:11)
Now, in case if you try to see here what will be the distribution of these values? So, I can
find out here the absolute and relative frequencies. So, the absolute frequencies are going
to be obtained by t a b l e table and within the parenthesis; I down the variable name
direction and then you can see here that there are 3 directions here 1, 2 and 3.
And 1 has made 28 deliveries and direction number 2 has made 43 deliveries and the rest
in the direction number 3 that has delivered 29 such deliveries. And if you want to find
out their relative frequencies here, so that will be these frequencies divided by the total
number of observation that is here 100.
So, if you try to write down here table correction divided by length of the direction that
will give you the relative frequencies. So, the outcome will come here like this direction
and then these locations means east to west and central. So, you can see here the eastern
branch has delivered 28 percent of the orders western branch has delivered 43 percent of
the orders and the central branch has delivered 29 percent of the order.
So, this is the thing which you can say get directly from the R software. And the
difference between this example and example is that in the first example you can see
there are only 10 values. So, you will get confused that I can do this calculation by I can
do the manual calculations also, but you can see here as long as you have your 100
values you simply just cannot count by your eyes, but you have to count it and you can
think that if there are say big set of values say consisting of 1000 million values then,
957
you cannot do such calculations manually. And in that cases this command here in the R
software that helps you.
So, you can see here this is the screenshot of the same outcome. So, let me try to show
you this operation on the R console.
So, let me try to first create the data set here. So, you can see here this data set here is
like this direction and if you try to find out the absolute frequency. So, you execute the
command here table; you will see here this is coming out to be like this, right that there
are 28 orders are delivered by the eastern branch, 43 orders have been delivered by the
western branch and 29 orders have been delivered by the central branch of the pizza
restaurant and if you try to find out their relative frequency.
10
958
So, you can divide each of these absolute frequency by the total number of observations
you get here like this, right. So, that is the same outcome which I just explained you on
the R console.
So, now after this let me try to give you here one more option, one more concept, right.
Many times you have heard that in some examinations there is a condition that in order
to qualify the examination you have to be in the for example, say top 20 percent quantile
or say it is like say you have to be in the top 20 percentile, what does this mean? And
then once you talk about the value like percentile then you have the values like here
quartile and then deciles etc. So, what are these values?
So, you see look at me try to give you here a very simple example. For example, you
have heard many times and yeah that is only a hypothetical example to explain you.
Suppose you hear that there are two examination boards and they are trying hard
marking.
So called quote unquote hard marking and the other board is doing a soft marking means,
quote unquote soft marking. Yeah, because marking is always a marking and you cannot
question it, but sometimes we feel that ok there is some board which is very strict in
giving the marks if you make a mistake and there is another board which is not so strict
in giving the marks if a candidate is trying to make a mistake.
11
959
So, this is what we mean by soft marking and say hard marking, right. So now, suppose
both the boards give the marks out of 100. So, the minimum is 0 and maximum is here a
100. Now, suppose we call them as board 1 that is giving the hard marks and board 2 that
is giving the soft marks.
Now, suppose large number of candidates appear in the examinations and the
examination board 1 and examination board 2. Now, when their marks are coming to us
and suppose I am a school means, I have a college and I want to take admissions of those
students who have appeared in the examinations under board 1 and board 2. Now, my
problem is that the student who has got suppose say 40 out of 100 from the board which
is giving the hard marking and there is another student who has got say 70 percent from
the board which is giving the loose marking how to compare.
I am sure that you will agree with me that these marks are not really comparable you
cannot compare the 40 percent marks of a board which is doing the tough marking and
70 percent marks of a board which is the which is giving the loose marking or say soft
marking so, how to compare it. So, in order to compare the types of marks we have one
option here that what can do; that can simply create here a scale and where I try to give
here the minimum and maximum marks, right.
So, this is my here board 1 and then I have here board 2 and I also try to do the same
thing here minimum and here maximum, right. And now, yeah this minimum and
maximum for the board 1 and board 2 can be different for example, minimum and
maximum are indicating the mark of a student which is the minimum among all the
students in the board 1 and similarly in the board 2.
So for example, if there are suppose 2000 students appear in board 1 and there are 5000
students appear in board 2. So, whosoever marks is minimum out of 2000 this is here and
whosoever marks is maximum out of this 2000 that is here. And similarly, in the board 2
out of this 5000 student whosoever has got the minimum marks, that minimum mark is
here and whosoever has got the maximum mark that mark is given here as maximum ,
right.
So, suppose if I say that board 1 gives in the hard marking and board 2 gives the soft
marking. So, it is possible that in the board number 1, the minimum marks are suppose
12
960
20 and the maximum marks are suppose 60. And in the board 2, because it is giving the
soft marking so it is possible that the minimum marks are here suppose here 40 and the
maximum marks are here 90.
So now, in case if a student has got a particular mark it becomes now very difficult to get
the correct information about the capability of the student. So what we can do here, that
we try to divide the difference between this minimum and maximum marks in to equal
parts. So now, this equal part can be 4 that can be 10 that can be 100 or that can be
anything whatever you want. So, suppose for example, if I say I try to make here 100
partitions 100 partitions so, 1234 to here 100.
So, similarly here in this case also I try to make here 100 partitions, right. So, these 100
partitions each of the partition is called as quantile and when you are trying to make 100
partition and every partition is called as percentile. So, that is why many examinations
they have made a condition that they are going to consider those students who are in the
top 20 percentile of their board examinations.
So, what does this mean, that they will always try to look whatever is the top 20 from the
maximum; that means the 88th percentile? So, 88th percentile of board 1and 88th
percentile of board 2 and they will yeah 88th percentile of board 1 and the 88th
percentile of board 2 they are usually expected to have different marks.
So, it is possible that in the board 1 at 88th percentile there can be a student who has got
only say 50 marks and in the board number 2 there can be a student at the 88th percentile
who has got suppose say 80 marks. So now, they will say they will make a very simple
rule that all those students who have got the marks which are more than the 88th
percentile of the respective board they are eligible to apply for admission. That means, if
a student has appeared in board 1 and if a student has got the marks between say 50 or
say more than 50.
Then, the person that student is eligible and if a student from board 2 has got the marks
more than suppose here 80, then the student is eligible, right. So, now, this is how you
can possibly compare. This is one possible way which is the application of this quantile ,
yeah. So, I have taken this example here just to explain you that what are the partition
values. So, similarly if you want to make here only 4 partitions instead of 100 then these
13
961
partitions are called as quartile, you call it first quartile, second quartile that is median
third quartile etc.
And similarly, you can you have an option that you can make as many as partitions you
want depending on your wish when you are trying to work in the R software. So, this is
the topic which I am going to now explain you. I already have explained you the concept
and now my job is to show you basically that how to execute these things in the R
software, right.
So, in this partition value the values are divided the total frequency of the is divided into
the required number of partition. So, you have the data, you create the frequency table
and then you try to divide them into the required partition. So, if you are trying to divide
the total frequency into 4 equal parts, then it is called as quartile and if you are trying to
divide it into 10 each of the partition is called as decile and if you are trying to divide the
total frequency into 100 equal part then you try to then you call it as percentile, alright.
So, now the question here is, how to obtain such partitioning values in the R software?
So, in the R software we call them in general as a quantile q u a n t i l e and the function
which computes such partition this is the quantile function q u a n t i l e; and this quantile
function computes the quantiles corresponding to the given probabilities, right. Because,
you can define the partition in terms of the values between 0 and 1 so, probability also
lies between 0 and 1.
14
962
So, that is why the values of the partition; that means, at what value we want to partition
it that is indicated by a probability value. So, how do you get it? I will try to show you in
the example through the example, right. So, what happened that? The smallest
observation corresponding to probability of 0 and the largest observation is
corresponding to the probability of one, right. So, that is how this partition is done.
So, now we have a function here quantile q u a n t i l e and then, inside the parenthesis
we try to write down here the data vector x. So, if you do not write any choice then the
default here is quartile that it will try to divide the total frequency into 4 equal parts, but
if you want to do it according to your wish then you have here an option probs p r o b s
which is the short form of the probabilities, right. So, this is the short form of the
probabilities. So, probabilities are between 0 and 1.
So for example, now you can choose it in the way you want you can use the data vector
and if you want to give the sequence this is up to you. So for example, if you try to write
down here sequence 0, 1, 0.25, then you understand what is the meaning of this is here to
and this is here by, right and after that there are many options. I would request that you
please try to go to the help of this quantile and try to understand other options, right.
But anyway I will try to take here some examples. So, that I can show you that how these
things look like and how you can interpret them. So, for example, I take here a very
15
963
simple example that we have got the marks of 15 students out of 100 and these marks are
stored in this data vector here mark they are 68 82, 63 and so on. So now, I want to
create the quantiles of these marks. So, I simply try to give here the command on the R
console as quantile q u a n t i l e all in.
Lower case alphabets and within the parenthesis I write down here the data vector mark
and you get here this type of outcome. So, now you have to understand what is this
trying to say here. You can see here there are 5 values this is here 1, this is here 2, this is
here 3 50, 75 and here 4. So, what you have to understand here that the first value here is
the minimum value that is at 0 percent and this value here is 29. And if you try to see in
this database here this is value here is 29 which is the minimum value.
So, that is coming here, right. After this there is here an option here 25 percent. So, that
is your here 1st quartile, right. So, it is value here is 46.5. And similarly, you have here
the next value here is 50 percent, this is the value of your 2nd quartile and this value here
is 63. And similarly, if you try to see here the next value here is 75 percent and this value
here is 79.5.
So, this is the value of the 3rd quartile. And after that, you have the last value which is
here 100 percent, which is here the 4th quartile whose value here is 96 and this is the 4th
quartile, but if you try to see here that data what you have considered here is the 96 and
this is also the maximum value. So, that is what I said that this is also here the maximum
value. So, the value and this 0 percent is the minimum value.
So, the data between 0 percent and the minimum and maximum that has been divided
into 4 equal parts and these partitionings are happening at 46.5, 63, 79.5, right. So, this is
how we read this outcome and we try to understand the values of these quantiles which is
actually giving you here the quartiles. Now, similarly if you try to see that when I am not
using the command here probs, then it is giving you this type of outcome.
But, in case if you try to use here the command here probs and if you try to say that the
values are at 0, 0.25, 0.5, 0.75 and 1 that means the values are increasing by 0.25 then,
you get here the same outcome which you have obtained here. So, you can see here the
default outcome of the quantile function is the quartile, right.
16
964
(Refer Slide Time: 27:29)
And similarly, if you want to make here some other value. For example, you want to
have the quantiles at say 28th at 20 percent, 40 percent, 60 percent, 80 percent and so on
then you can just give it here by probs p r o b s. Probs and then you can try to give these
value inside the data vector and then you can see here you are getting here these values
here, right. So, this 0 is corresponding to this 0, this 20 percent is corresponding to this
20 percent, 40 percent is corresponding to this 0.4, 60 percent is corresponding to this
0.6.
And after this 80 percentage is corresponding to 0.8 and 100 percent is corresponding to
say here 1 and these are the values. So the value at that the 20 percent partition is 41.8
and the value at the 80 percent partition is 82.8. So, this is the control the probs. And if
you try to take the probs is equal to 0, 0.25, 0.5, 0.75 and 1 you can compare these are
your here basically the quartiles and you can see here that these values are quite different
than the values which are obtained here, right.
17
965
(Refer Slide Time: 28:42)
So, this is how this actually quantile function works here and this is here the screenshots .
So, you can see here these are your here quartiles and these are here the quantiles which
are partitioned at 20 percent, 40 percent, 60 percent, 80 percent and 100 percent these
values are quite different from each other, right. So, let me try to show you these
operations on the R console.
So, that you can understand them very easily and the reason I took it here, because this
quantile is very useful in real life. So, you need it many many times and another values
table etcetera that are you will see that we will use when we are trying to create the
graphics, right. So, this is you are here the data vector marks and if you try to find out
18
966
here the quantile of here marks. So, it will give you here this quartiles and if you try to
use the probs function here at 25 percent partitions you get here the same value you can
see, right.
And if you try to use here some other partition here, right possibly you can see here this
is the partition in partitioning is at 20 percent. Similarly, if you want to obtain the deciles
then, I can give this partitioning as a sequence between 0 to means from 0 to 1 with the
increment of 0.1, right. You can see here this comes here like this. So now, you have
here 10 partitions the 10 percent, 20 percent 30 percent etc.
And similarly, if you want to know here what are the values of percentile. Percentile
means here very simple you want to make 100 partitions so, you give the sequence
starting from 0 to 1 and changes are occurring at a 0.01.
19
967
(Refer Slide Time: 30:27)
So, this you can see here these are 100 values starting from 0 1 percent 2 percent and
finally, you have a 99 percent and 100 percent value, right. So now, I stop in this lecture
and you can see that was a pretty simple and straight forward lecture and my idea was
not really to teach you statistics, but my idea was that I want you to consider here some
topic which I am going to use in the further lectures.
And I because when I am trying to create the graphics, the graphics are created from
some data and in order to understand the data as I said that you have two options; one is
the analytical that you try to find out these values frequency table etcetera, but the
frequencies table etcetera they can also be indicated by some graphics.
So, that was the reason I took these topics in this lecture. So, why do not you try to take
some data set and try to create such frequency tables and remember one thing I have
considered here only a variable which is categorical variable for creating the table. And
for this quantile you can you have to take any continuous data means the value should
not be like that which are taking means some categorical values etc.
The value should be like a their height, weight etc. which can take any real number, right
means something like height. Height can be 1.57 meter also, height can be 1.5 meter
also, height can be 1.2 meter also, height can be I mean 1meter also and so on, right.
20
968
So, try to consider such continuous type of data and then you try to create such a quantile
that will make sense, right. So, why do not you try to practice these things and try to
understand what they are trying to indicate because, that is going to be useful when I am
going to consider the topics in the next lecture. So, you try to revise and I will see you in
the next lecture, till then goodbye.
21
969
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 49
Graphics- Scatter Plots and Bar Plots
Hello friends, welcome to the course Foundations of R Software. Now, from this lecture
we are going to talk about new aspect of this R software. This is about how to create the
Graphics. You know that whenever we get the data and we want to dig some information
from inside the data, then that can be done through the analytical tools like for example,
you know mean etc. And, the second is this graphical view, yeah both of these methods
they have their own advantages.
And, the idea is that we should actually use this analytical and graphical tools together,
but anyway. The question here is now that how can you create the graphics in the R
software? And, I can promise you in the beginning itself, the type of possibilities, the
amount of options, the number of options what you have and the amount of control what
you have in creating a graphic in the R software possibly that is excellent. The only thing
is this when you want to control something then you have to work on the individual
parameters.
It is something like this you know that when we want to cook some curry or some dish,
there are some readymade spices which are available in the market. You just bring the
spices and this is a combination of say 10, 12 different spices which are mixed in
different quantities. You try to put it in the vegetable and it will give you some flavour.
But, in case if you really want to change that flavour that is not possible from those
spices or rather you have to mix those 10, 12 spices manually with your choice of a
quantity.
And, that will give you a different type of flavour and, but the biggest advantage will be
then the taste of the curry is going to be under your full control. So, that is the story with
the R software also and compared to those software’s where you can click, click, click
and get the graphics. So, now, in this R software there are many options and there are
many packages which can be used for creating the beautiful graphics.
970
My objective here is to give you some idea that how you can begin and how you can
think, how you can control the different parameters in the R software and, how you can
manage these graphics in the R software. Surely, I cannot take care all possible type of
graphics which are available here. So, I will try to initiate a thought process inside you
that how to create the graphics and how to control different types of parameters.
And finally, the objective will be the type of graphics what you want, how you can give
these values to create the graphic. And, definitely I am going to consider here only some
collected common graphics. And, my idea is that I will first try to consider those
graphics which you know like a bar plot, histogram, pie chart you know, you are doing it
from your school days also.
So, that will give you some confidence, that ok how to create that graphics in the R
software that you possibly have created in other software also. And, then you can learn
that how you can manage those different parameters of that graphics to make it more
informative and more beautiful. So, that is how we are going to proceed in learning the
graphics. So, we begin our lecture and try to understand with a couple of examples.
Right ok. The first question comes here why should I use the graphics? So, we know that
the graphic summarizes the information contained in the data. For example, if you really
want to indicate that whether someone is happy or say normal ok or say sad, you can
971
very easily use these types of smileys nowadays. For example, if I do not even write here
what is this here happy, but just looking at this face you can say that it is indicating that
the person is happy.
And, even if I try to remove here what is written over this sad, but still you can look at
this face, this smileys and can say that ok the person is not happy. So, these are the
graphics. So, graphics also conveys the similar information what is conveyed by the data
and the advantage of this using the graphics is that they try to convey the information
which is hidden inside the data more compactly, right.
If you want to simply explain that how to explain whether the person is happy, I am sure
that you will take you have to write and explain very nicely and clearly. But, in the case
of graphic you simply have to create a smiley like this one, that is all, right. If you want
to make it here more happy, you can like even like this here.
So, that is the advantage. Now, this another question sometime I hear or I come to know
from different people that people have a sort of thought process; that if you try to use
more number of graphics or more number of plots.
Then, the graphic analysis or the analysis becomes better and it provides better
inferences. It is not correct, you have to simply use the appropriate number of graphic
and that gives you the good quality of inferences out of that. It is just like as if doctor
gives more number of medicine than required, it does not mean that the doctor is good. A
good doctor is that one who gives only the required amount of medicines, right.
972
So, now in R software or in statistics in general there are various types of graphics which
are available; 2-dimensional plot, 3-dimensional plot, scatter diagram, pie diagram,
histogram, bar plot, stem leaf program plot, box plot, Venn plot, many many things.
What we have to always keep in mind that appropriate number and choice of plot in the
analysis provides better inferences, that is our key word in this entire analysis.
So, now the question here is this in R these type of graphics can be created? Actually
there is a long list, but some of the popular graphics which you know which you already
have learnt in your past are like bar plot, pie chart, box plot, group box plot, scatter plot,
histogram, various types of 3-dimensional plots etcetera and that is a very long list I will
say again.
973
But anyway, we are trying to learn here how these things can be done and I will try to
take some collected these commands and I will try to show you how to create the
graphics. Yes, once you create the graphics, after that you have to analyze it and then
you have to dig out different types of information which I am not doing it here. But, my
main objective is that I want to show you that how the graphics can be created ok.
So, suppose we have 2 variable, 3 variable based on that you can have univariate
graphics, bivariate graphics, tri and 3 dimensional graphics etcetera. But, at this moment
we are in the beginning I am simply assuming here that we have here only 1 variable on,
which we have collected the data and the data is stored in this x.
So, if you want to simply make a scatter plot, scatter plot is something like this here on
the x and y axis. You simply try to plot the data, right. So, this can be obtained by the
command plot p l o t, right and inside the parentheses you write here x.
And, after that there are many options which I would request you once again to look into
the help menu and try to understand them. Some of the options, some of the popular
common options I will try to show you here. So, let me try to take an example here and
with these examples I will try to create different types of graphics and I will try to use
different options to show you that how options can change the look of the graphics.
So, suppose I have a data here where I have collected the data on the heights of 50
persons and the data is recorded in centimeters. So, their heights are like 166 centimeter,
974
125 centimeters and so on and this data is stored in a variable here height in this data
vector, right.
So, now if you try to use here this height so, you can see here means corresponding to
the index it will try to plot these values on the x y axis. You can see here these are the
data values which are obtained here, yeah. This scatter diagram gives out information for
example, if you have got two scatter sets and on the same scale of x and y, if one scatter
plot comes out to be here like this and another plot comes out to be here like this.
Then, if this is plot number 1, this is the plot number 2 then it indicates that variability in
the data in the plot number 1 is less than the variability in the plot number 2. So,
different types of such statistical outcomes are obtained in the first step through this plot
command. So, this is how you can create the plot command. You can use the plot
command to create a scatter plot, right.
Now, you can have different types of options that you can see here, here it is trying to
take the index. And, then here height then you can change this name, you can change this
ticks on the x axis, y axis, you can change the color of these dots and you can manage
many. So, what I am going to do here that I will try to take here a couple of examples
and through those examples I will try to show you that how you can control these
aspects.
975
And, in this lecture I am going to explain you in quite detail. And, then from the next
lecture the same commands are going to be repeated, then you will not need to repeat
them again, right.
So, now for example, in this case if you try to see, if you simply want to change the
colors of these dots. You can see here earlier these dots are in default black color, you
can see here, but now in this plot they are in red color. So, how to get it done? Simply,
have to use the plot command with the height and now you have to give it an option col
and then you have to write r e d red within the double quotes yeah.
There is a way in which you can specify these colors in the R software, right. So, for
example, r e d means red and that is going to change all the dots into a red color you can
see here.
So, the graph is the same, but this color is changed. So, this is how actually can add here
more options here and they will try to change the view of the graphics. You can add here
that the title, somewhere on the main title. You can indicate here what is indicating by
different legends etcetera etc, but anyway let us try to learn them with one by one. But,
before I move forward let me try to show you here that on the R console that how it will
look like and how I am going to manage it.
976
(Refer Slide Time: 10:40)
So, that we are trying to understand the outcome. So, you can see here I already have
reduced the width of my window. But, if you try to see here this is my here data, the data
is the same and now if I try to create here plot here.
So, I will try to write down here plot and then the name of the variable, say here height.
Now, you can see here there is another window which is opened here and in which you
can see here this is here the graphic. So, if you want to means store this graphic, you can
export it, you can there is an option here resize and etcetera etc. And, the most simple
8
977
option to save this graphic is this you can, right click on your mouse and you can see
here there are different options copy as meta file, copy as bitmap and print etc. etc., right.
So, these things you can do very easily. So, now, you can see here this is the way I am
going to do. On the left window, I will try to write down on this window I will try to
write my commands and the outcomes are going to be shown on this window here on
the, right hand side, right. Now, if you try to see here, if I try to add here the command
here color see by using the command col and then if I write down here red; you can see
here what happens.
Now, if you try to see here on the in this graph what happened as soon as I try to execute
it command. As soon as I say enter, you can see here this is the color becomes here red.
And, similarly if you want to make it here, suppose here blue you can see here you
simply have to makes you l is equal to blue and you can see here on the, right hand side
this is changing. So, that is the way I am going to represent all the graphics over here,
right.
So, let me try to take here one more very interesting example and through which I will
try to explain many many things, right. So, we are now going to consider about bar plots.
So, the first question comes here, what are these bar plots? So, these bar plots are used to
978
visualize the relative or absolute frequency of observed values of a variable, right and
this is used in the categorical variables.
Now, tell me what is this relative frequency or absolute frequency? That was the reason I
had explained you these concept in the last lecture; so, now, you know that how to
compute the relative frequency, absolute frequency, what is the meaning of this. So, it
will not be difficult for you to understand what the bar plots are trying to indicate. And,
there will be a one to one correspondence between an analytical tool and a graphical tool,
right.
So, what happens, that you have seen that when you are trying to find out the absolute
frequencies or the relative frequencies, there were some categories. And, you were trying
to find out what is the absolute or relative frequencies in those categories using the
command table. So, now, those frequencies are going to be plotted with respect to the
categories, right. So, one category will have one bar each. For example, if you try to
recall we had taken the one example where we have the variable here as a gender, say
female and say male.
So, their frequency was here 3 and 7; so, that will be indicated like this. Similarly, we
had taken one more example of direction, where you had 3 direction of a restaurant and
then we had created the frequency table in terms of absolute and relative frequency. So,
now, we are going to plot those frequencies against those categories under this bar plots.
And, the rule here in the bar plot is that the height of the each bar is determined by the
either the absolute frequency or the relative frequencies of the respective categories.
And that is indicated on the y axis; that means, if the height of the bar is more; that
means, the frequency is more. It is like for example you can see here in this example, the
frequency here is 7 and in the first case the frequency here is 3, but that is indicated by
the height of these bars, right ok.
10
979
(Refer Slide Time: 14:19)
So, now the next question comes here, how to create such bar plots in the R software?
So, we have here both the option that we can use either the relative frequencies or the
absolute frequencies of the observed values of a variable and then we have a command
here bar plot b a r p l o t. All in lower case alphabets, after that we write down the data
and now you have to be very careful, this data is going to be in the tabular format.
So, first you have to create the frequencies because this bar plot is going to plot the
frequencies not the data. So, first you have to obtain the frequencies and those
frequencies are going to be used as an input here x. Then, after this there are many
commands actually, first option here is width. So, that you can control the width of the
bar, then space means you can control the spacing between two bars, right and so on. So,
I will try to take up these some common operations through these options one by one.
So, right so, the first question is how to create a bar plot with the absolute frequencies?
So, for that you simply have to use the command bar plot and inside the parentheses if
your data is given in x, then you have to write down here table x. So, table x is the
command which gives you the frequency or the absolute frequencies of the data in x,
right.
And, similarly if you want to a create the bar plot with the relative frequencies, you
know that you simply have to revise the absolute frequencies by their length. So, you
11
980
simply have to use here the same command which you use to find out the relative
frequency which was table x divided by length of x and then you have to use here the bar
plot, right. So, this is how you are going to do it ok.
So, now if you miss my very sincere request to you all is that, without failing please try
to look into the help of this barplot. You can see here that how many things I can show
you here which are possible and at the end I also have written here 3 dot; that means,
continued. You can give here the data, you can control the width, you can control the
space.
You can give the names, you can control the legends, you can control the beside,
horizontal, density, angle, color, border, main title and then the sub title, the level on the
x axis, level on the y axis, the limits on the x axis, limits on the y axis, xpd, axisnames,
cex names, plot etcetera etcetera so, called you can see. So, now, if I try to explain you
each and everything in this lecture possibly this will become the longest lecture.
So, I will request you that you please try to go through it at least once. I am not asking
you to keep in mind, but at least you must know what are the different possibilities. So,
that whenever you are trying to create the graphics and whenever you want to compare
the graphic with any other graphic, you can know how you can create the same graphic
in the R software.
12
981
(Refer Slide Time: 17:07)
Now, I try to take here a very simple example that I took in the last lecture and I will try
to create the bar plots on the same data set. And, I will try my best to keep the examples
as minimum as possible so, that you can feel a connection between the analytical and
graphical tool.
So, you can recall that we had considered the data of 10 people and we had coded them
as male and female. The male category was indicated by 1 and female category was
indicated by 2. And, we had the data on male and female like the gender of the persons
and then it was coded here like this as gender and this variable was stored in gender.
13
982
Now, I try to create here a bar plot. I simply try to give it here bar plot gender and I get
here this thing. Well, my question to you here is that do you want this? Think about it.
There are only here two categories here, male and female. And, if you try to recall that
was the same outcome of the frequency table also, then why it is trying to give you here
1 2 3 4 5 6 7 8 9 10, these are 10 bars and you have just said here that each bar indicates
only one category.
So, what is this happening? Well, you have made a mistake, sorry rather I will say I have
made a mistake here, that we have created the bar plot on the gender which is which are
10 values and you have to create the bar plot on the frequency. So, if you try to replace
this gender by the table gender, then it will give you the correct value. Well, I have done
it intentionally because I know that in many many software the command is simply, that
you try to give the variable name inside the parenthesis.
So, frequency table is created automatically, but here it is not happening in the R
software. So, that is why I have taken this example to just indicate you that I have made
this mistake, but you please do not make this mistake.
So, now if you try to see here, if you try to operate the command table over gender then
the gender data is like this. Two categories 1 and 2 and the values are here 7 and 3. Now,
if you try to use the command bar plot on the table gender, it will give you like this. You
14
983
can see here exactly this is here 7 and this is exactly here 3. So, this is category 1, this is
category 2 which is coming from here category 1 and category 2 from here like this.
Category 1 here is male and this is indicating the male category and this is indicating
here the female categories, right. So, this is the one of the very basic bar plot and after
that you will have many options to change the name, change the colors and put a title on
the graph, on the axis etcetera; so, which I will try to show you very soon, right.
Now, in case if you want to create this bar plot with respect to the relative frequency. So,
here you can see you have plotted it with respect to the absolute frequency. Now, you
want to make it with respect to the relative frequency. So, the command here is very
simple. First you have to find out the relative frequency, for that you have to use the
command, table gender divided by length of gender, that you already did in the last
lecture.
And, now you simply have to use the bar plot command here over this outcome. This
outcome is that there are two categories and their relative frequencies are here like 0.7
and 0.3. And, now you will see here that it is giving you here the values which are like 0,
0.2 and so on, because the relative frequencies they are the proportions. So, they will
always lie between 0 and 1 and if you try to see it is here is 0.7.
15
984
So, for the category 1, the value here is 0.7 and for category 2 the value here is 0.3 and
category 1 is given here and category 2 is given here. So, now, you can see here this is
one of the very basic fundamental plot. But I will try to show you more options in the
next example, but before that let us try to first create this example on the R console.
So, right let me clear the screen and I try to create here my here data which is here
gender. And, now if you try to create here the bar plot over this bar plot, you can see now
I am going to make a mistake here. This will come like this, right. So, because this is the
individual values 1 2 1 2 and so on, that is the thing which you do not need.
So, what you have to do here? You have to give the input the data in terms of the table,
table command. And, as soon as you enter here, you get this graph over here on the, right
hand side which has only 2 bars. This first bar it is indicating here 1 and second bar is
indicating here 2 and first bar is indicating the male and the bar number 2 is indicating
the females.
And, now if you try to create here this here bar plot with respect to the relative frequency
here; so, you can see here it is becoming here like this and you can see here on the this y
axis, this absolute frequencies are now converted into relative frequencies. And, you are
getting here the same bar plot, but now with respect to the relative frequency, right.
16
985
(Refer Slide Time: 21:53)
Now, I try to take here one more example and through this example I will try to show
you a couple of details related to the options. And, I will try to show you how you can
control the graphics in the in the way you want. So, I try to consider here the same data
which I considered in the last lecture, that there is a restaurant who has got three
branches in the city in the east, in the west and in the central part of the city.
And, this restaurant is delivering the pizza at home. So, now, that they are getting the
orders at center places and then they are trying to contact the branch which is the closest
to the place of delivery. And, this is the data on the 100 values of delivery that which
branch has delivered the food, right. So, this is like 1, 1, 2, 1, 2 and so on. So, I have
created this data vector here direction.
17
986
And that is the same example which I did in the last lecture, where we had created the
frequency table and we had obtained the absolute and relative frequencies. So, now, I
want to create the bar plot. So, I try to make here the first here a mistake and I use here
the command bar plot direction. So, you will see these are the 100 values, but this is the
thing which you do not want. Now, you know why? Because you want to give the data in
terms of here table in the form of frequencies.
So, I try to write down here barplot tabled direction and now you can see here you will
get here a similar bar plot what you got in the last example and so, there are 3 directions;
1, 2 and 3 and here are these frequencies. Now, in this example I am going to explain
you that how you can add different features.
So, please try to be careful, I will try to explain you first with the help of this screen shot.
So, I will try to make some changes in the command and I will try to explain you the
changes with the screenshot. And, after that I will try to very quickly review that how
these things are going to happen in the R console, right.
18
987
(Refer Slide Time: 23:44)
So, now in case if you want to have this bar plot with respect to the relative frequency,
you can simply use the command here bar plot and then table divided by length and then
you can see here now this values are in the form of relative frequency. So, anyway now
you have learnt this type of bar plot very easily and this is the very basic fundamental
default bar plot which is provided by the R software.
Now, suppose I have different types of needs and requirements. Suppose, I want to
change the color of this bars, how to get it done? So, if you try to see here my command
19
988
remains the same here bar plot and inside the parentheses this is stable direction, but now
after this I have to add one option to do one job.
So, now I would like to inform you that if you want to add the colors in the bars, your
command here is col and these colors are given in the form of a data vector. Suppose, I
want to make the first bar to be red so, I will write down here within double quotes red
then comma and then I want to make the second bar as green. So, I will write down here
green g r double e n and then the third bar I want to make blue. So, I will write down
here blue.
So, the one very important point which I want to inform you that whatever these options
I am going to explain you here in detail, they will more or less will be valid in other
commands also yeah. There may be some change, but usually in the common graphics
these commands are going to work. So, you have to understand two things that if you
want to make a change in the graphic, how you can do it and how you can give the
values and what type of effect they are going to create on the graphics.
So, as soon as I add here this option, you can see here this is the option here. The first
graph will becomes here red, second green and third here blue. So, that is a very simple
option. Now, I will try to keep on adding different options to give a to add different
features.
20
989
And, you have to observe that how I am adding and how they are affecting the outcome
of the graphic. So, now, if you try to see I have just copied the earlier command up to
this point here and then I have just added here one option which is here main, you can
see here main here like this, right.
So, this main is going to add the title of the graph. Suppose, you want to add here
directions of food delivery as one of the main title of the cor; you can see here and you
can see here this is added here, right. So, the rule here is that if you want to add here the
main title of the cor, you have to use the command here m a i n. And, then after that you
have to give whatever you want to add inside the double quotes, you can see here like
this ok, right.
After this, suppose you want to add here some legends. For example you know, you do
not know what is this different bars are indicating. For example, here you can see this red
bar is indicating the direction number 1, green bar is indicating the direction number 2
and this blue bar is indicating the direction number 3. So, if you want to add this
information here, you have this type of graphics here. You can see here where you are
trying to write down dir1 again the red color, dir2 again the green color and dir3 again
the blue color.
21
990
So, for that you have to add here an option legend l e g e n d dot t e x t and within this
data vector you have to give in the same order the way you want to add the legend. And,
I have given here within double quote dir1, dir2 and dir3 and same thing is now added
here in the legend in the graphics. So, that is how you can see, you can very easily add
the legends also without any problem, right.
Now, in case if you want to add here some subtitles on the of this graph. Suppose, I want
to add here you can see here this I have written here now three directions. So, in order to
do this thing, I am using here the command here sub s u b is equal to produced here. So,
this is how you can see you can add here the subtitle now.
22
991
Similarly, if you want to add here some titles on the axis, then you have a command here
xlab and ylab. We will try to look into the graphic on the x axis I have added here
delivery direction, on the y axis I have added here number of deliveries. And, in order to
do it I am using the command here xlab and then within double quotes I am trying to
write down here Food Delivery Directions which is produced here.
And, then I am trying to write down here ylab is equal to Number of Deliveries, like as
here and it is produced here. So, you can see here if you want to write something on the x
and y axis as a title, then you can use the command here xlab x l a b and y l a b
respectively, right. So, that is how we try to add over these different options in the graph
and we try to work on this. So, now, I will try to just use my here this graph, earlier
graph and I will try to show you that how these commands are working on the R
software.
Like I will clear the screen and you can see here; now you have added here the colors.
Then, in the next command you are trying to add the main title over here. So, now, you
have to just observe as soon as I execute it what happens in the graphic. Now, you see
this main title is going to be added here, directions of the food delivery and then I tried to
use here this command for adding the legends.
23
992
So, you can see here that something will be added here as soon as I try to execute it. You
can see here it is here, yeah you can adjust these things, but yeah this position can also be
adjusted without any problem. And, then after that you try to add here subtitles. So, I try
to simply copy and paste to avoid any mistake.
So, as soon as I try to execute it here, you can see here that this three directions here, that
is added. And, similarly if you try to mark here something on the x and y axis here, you
can write down here this here like this. And, see here number of deliveries is added on
the y axis and food delivery direction that is added on the x direction.
And, you can see here this is the same command which I have shown you on my slides,
right. And, after that there are many options and what I covered in this lecture here it is
not the end of the lecture, not the end of the opportunities, but it is the beginning of the
lecture and the beginning of the opportunities. I have taken here only very selected
options which I believe that most of you know about them. So, I thought that you would
always try to create a graphic which you know very well.
So, that is why I have taken here very common options which are required which are
usually present in most of the graphics. So, that you get convinced that it is not difficult
to create such graphics in the R software. Yes, the only thing is this for the first time you
have to understand and then you have to add these different options in your command.
But, if you try to recall and if you try to understand the advantage, you are sitting at one
central office, you have created this command only once.
And, you are circulating it to all your offices all over the country, all over the world and
from all the places you are getting the graphics which are using the same color, same title
etcetera. So, now, the comparison of those graphics will become very simple in the long
run. So, that is the sort of investment in the short run, but it has very long run advantages
and besides this thing you can control each and every parameter in this graphic.
You can decide about the tick marks on the x axis, y axis, you can decide for the location
of the legends etcetera etc. many things. So, now, this is your turn that please try to look
into the help menu. Try to pick up one of the option and try to create a very simple
graphic and try to keep on adding one option at a time. And, then try to see at the end
24
993
what are you getting, whatever is written there and whatever you are getting how they
are matching to each other.
And, this will make you make you a better programmer and I am sure that you will learn
how to create beautiful graphics. Now, in the next time I will try to continue some more
graphics, but then I will not be going into this much detail. So, I would request that you
please try to revise it and try to see which of the command is doing what, because similar
type of commands I am going to use in the next lecture. So, you practice and I will see
you in the next lecture.
25
994
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 50
Graphics: Sub - divided Bar Plots and Pic Diagrams
Welcome to the course Foundations of R Software and you can recall that in the last
lecture we initiated a discussion about how to create different types of Graphics in the R
software and we had understood how we can create the bar plot and scatter diagram. And
when I talked about the bar diagram then I had explained you that there are different
options which are available in these graphics, which can be used to enhance the
information and the look of the graphic in the way you want.
So, continuing on the same line in this lecture, we are going to consider here one more
topic one more graphic, which is an extension of the bar plot that is sub divided bar plot
and after that I will try to give you some information about the pie diagram. You know
all these graphics you know at least both these graphics you know you have seen them
many times in the books magazine etcetera. So, now, my objective here in this lecture is
how you can create these graphics under the R software. So, let us begin our lecture.
995
So, the first question comes here what is the sub divided or component bar diagram,
right. So, this subdivided or component bar diagram they try to divide the total
magnitude of the variables into various parts, right.
Let me try to take here an example and try to show you that what is the meaning of this
subdivided or component bar diagram and how can you create it.
So, suppose there are 3 shops shop number 1, shop number 2 and shop number 3 and
different number of customers are coming to the shop on different days and the number
of customers arriving on 4 days that is recorded so on day 1 shop 1 had 2 customers,
shop 2 had 20 and shop 3 had 30 on day 2 the shop number 1 had 26 customers, shop 2
has 53 and shop 3 has 40 and so on.
And similar is the record for day 3 and day 4, right. So, all these customers they are
visiting the shop during 10 to 11 AM, right on 4 consecutive days. So, now, the question
is I want to create here a subdivided or component bar diagram, right. So, in order to
create bar diagram the first condition is that the input data has to be given in the form of
a matrix.
So, I would say here the way you are trying to look at this data just try to consider this as
a matrix. So, this is your here row 1, this is row 2, this is row 3 and this here row 4 and
this is your here column 1, this is column 2 and this is here column 3 now you have to
2
996
write all these values in the form of a matrix and that is going to work as an input of your
sub divided bar diagram.
So, if you try to see I am simply trying to use here the command here matrix and I try to
create here a matrix of order 4 by 3, where the data is given in this order and the data is
arranged by row. So, finally, you get here a matrix say like this. So, if you try to compare
here this part with here this part both are going to be the same, right.
So, and now you know how to enter the data into a matrix and then I try to use here the
command same command what we have used earlier bar plot, but now I try to give the
variable here in the matrix format and this is going to create a subdivided or component
bar diagram where the columns of the matrix are going to be indicated as a bar. And
these sections inside the bar indicate the values in the cumulative form. What does this
mean? I will try to show you with this example.
997
(Refer Slide Time: 03:44)
So, if you try to see here this is your here subdivided or component bar diagram what it
means and how it is indicating different type of information that is what we have to now
understand. So, if you try to look here this was your here matrix c u s t cust and now you
are trying to use here the command bar plot and inside parenthesis c u s t.
So, this is the data which is here in this matrix, right. If you want to see here what are the
cumulative values for this day 1 for this shop number 1 if you try to consider this data,
the first cumulative value is here 2; on the 2nd day the total number of customers which
arrive in the shop on day 1 and day 2 day 1 plus day 2 it is equal to here 2 plus 26 28.
Then on the day 3 this cumulative number of customers are going to be the customers
which who visit on day 1, day 2 and day 3 which is equal to here 2 plus 26 plus 42 equal
to here 70. And similarly on the 4th day this cumulative value is going to be day 1 plus
day 1 plus day 2 plus day 3 plus day 4 and this is going to be here sum of all the values 2
plus 26 plus 42 plus 30, which is equal to here 100, right.
So, these values are actually plotted in the bars and the section inside the bars they are
divided according to the days. So, if you try to see here this is what I have written here
these cumulative totals on day 1, this is on day 1 plus day 2, this is day 1 plus day 2 plus
day 3 and day 1 plus day 2 plus day 3 plus day 4.
998
So, if you try to see here in this bar this height is the maximum value which is the
cumulative or the total of all the observations. So, there are 100 customers in the shop 1.
So, this is here 100 and now how these customers have been distributed among all the
days that is indicated here.
So, this is here the value 2 for the day 1, this is here the value 28 for the day 1 plus day 2
and this is here the value here 70 which is here day 1 plus day 2 day 3 and this is here the
day 1, day 2, day 3 per day 4. So, if you try to see the bar has been divided into different
components here and in this case for example, this bar belongs to the correspond to the
shop 1, this is for shop 2 and this is for shop 3. So, looking at this height 1 can see here
that how many customers visited the shop on day 1 or day 1 plus day 2 or day 1 plus day
2 day 3 and so on.
So, this is actually the subdivided bar plot and now if you want to make here some
cosmetic changes for example, if you want to write down here name for these bars you
can use here the option names dot arg and you can draw, right down here shop 1, shop 2,
shop 3. So, you can see here this is going to be printed here.
Similarly, if you want to write down here say shops here you have to give here xlab and
if you want to give here days you have to give here in the ylab and similarly if you want
999
to change the color of this sub divisions you have to give them under the option here col
and you have to just separate all the colors within the double quotes by comma.
Say for example, here I have given here red, green, orange, brown the first here is red,
green, orange and here brown, right. So, this is how you can add the labels and colors to
this subdivided bar diagram also, right. So, why not to take here some why not to execute
this example on the R console and try to see what do we get here? So, first let me try to
create this data value here.
So, you can see here this is my data matrix and now if you try to say here bar plot see
here say customer which is cust you can see here this matrix or this bar plot will come
here. Now in case if you want to change here this labels colors etc. So, you have to just
give here these options here and you can see here you get here this graph. So, you can
see here it is not a very difficult job to construct such graphs in the R software, right.
1000
(Refer Slide Time: 07:51)
So, now after this we consider another very popular graphics which is pie diagram, right.
So, this charts or pie diagram they also visualize the absolute and relative frequencies
just like the bars, right, but the only difference is that in the case of bar diagram we try to
create a bar, but in the case of pie diagram we try to create a pie diagram which is a
circle in which the partitions, which are created and partitions are created into segments
where each of the segments represent a category just like in the bar plot the bar is going
to represent a category here.
In this case you have a circle like this you try to create a segment and each of this
segment, they try to indicate here a category. The size of this e segment depends upon
the relative frequency and it is determined by the angle like as here frequency into 360
degree, right.
So, you have to be watchful when you are trying to interpret it now. I am sure that you
all know about this pie diagram. So, I will not go into more details, but in the R software
how you can create the R diagram is as follows. Simply use here the command p i e pie
inside the parenthesis try to write down here the data and then you have to give here
labels where you can give it the names which can be there.
1001
(Refer Slide Time: 09:05)
So, why not to take here one example and try to understand this that how they can be
created? And my idea is very simple means, I can take another example also, but my idea
is that when you are trying to compute or you are trying to create the graphics on the
same data set in different ways possibly you can easily understand what is the difference
among these different types of graphic in the way they try to present the information.
So, I try to consider here the same data in which we have the data on 10 persons on their
gender as male or females males are indicated by 1 and females are indicated by 2 and
then I try to collect the data in the variable here gender like this and then now I try to
create here a pie gender; that means, try to write down the command here p i e you can
see here now this you get here this type of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; 10 partition do you
want this no.
So, once again I have made here a mistake that instead of giving here the frequency of
the data I am trying to give the original data that is why it is creating this type of issue.
So, we have to give here the frequency, right.
1002
(Refer Slide Time: 10:15)
So, in order to use the frequency exactly in the way we have used it in the bar plot we
have to use here table gender.
So, now if you try to use here pie over this table gender you get here this graph very
clearly this is your here category 1, this is your here category 2, this is for here male and
this is for here female. You can see here it is not a very difficult thing to create such a
graphic now, right.
1003
And in this case if I try to take here one more example the same example where I took
the pizza home delivery data that there are 3 branches of a restaurant, which are located
in the east, west and central part of the city and they are indicated by the codes as 1, 2
and 3 respectively and the handed values on their codes are collected as follows and they
are stored in the data vector correction.
And the data is indicating that 100 people are trying to give the order of their food and
their calls are received at a central place and then they try to determine that which of the
branch can deliver the food quickly and that branch is delivering the food in that part of
the city. So, these branches are serving like as branch number east etcetera.
10
1004
(Refer Slide Time: 11:42)
So, now if you try to create here the pie diagram of this data, it will look like this pie and
then you have to write down here table direction. So, you this is your here east branch
category 1, then you are here west branch category 2, and then you are here central brand
category 3.
So, by looking at this one you can see that the western branch is supplying most of the
pizza in the city and now after this, this is up to you how much you want to make this
graphics informative and beautiful for example, you can add here color titles etc. for
example, if you want to add the colors in the same pie diagram I will use the same
command here col is equal to say here in the data vector red, green and blue.
So, that is going to change the colors of the segments in the same order in which the
colors are written under the option here col you can see here first is here red. So, then
here category 1 is here red, then it is here green. So, the second category here is green
and then the third one third category here is a blue and then after that you have given
here the main title under the option main m a i n and this title is given here, right.
11
1005
(Refer Slide Time: 12:30)
So, you can see here it is not a very different job to do it, but let me try to first show you
these pie diagrams so, that you can be confident that it is not a very difficult thing to do,
right.
So, now let me try to clear the screen and try to create my data gender, you can see here
this is my here gender and if you try to create here pie of gender surely it is going to
12
1006
make a mistake here and you have to write down here see here table, right. If you want to
write down here table this is going to be here like this, right.
And similarly if you try to take here the data on this direction so, let me try to create here
the data my head direction and if you try to create here the pie diagram with this stable
direction this will look like this, right.
And if you try to made here make here these type of changes here, you can see here they
can be made very easily say the main title is added here direction of food delivery, then
these sections are given a name here I miss color here and if you want to give it here a
name also that you can do very easily ok.
13
1007
Now, after giving you these two graphics let me give you here a very important
operation which is needed when you are trying to create such graphics. Many times you
want to plot the graphics on the same page for example, you can have like as here this is
your page you want to have one graphic here one graphic here.
And similarly, if you want means you can also do like that one graphic here one here one
here one here. So, now, how to put multiple graphics in a single plot, so for that we try to
use here a command p a r and then inside the parenthesis we try to give here a command
here mfrow were all in lowercase alphabets.
And this will adjust the graphical parameters and we can create the graphics on the same
page for example, we give the value here mfrow is equal to c that is data vector and then
p comma q and then after that you write here p a r par and then within parenthesis this
mfrow is equal to c p, q. So, this is going to create the plotting in a p cross q array, right.
So, let me try to take here an example and try to show you that how you can get it, right?
So, what I try to do here that you have recently used the data on correction and I want to
first create here a bar plot on the data and then I want to put the pie diagram of the same
data on the same piece like this. So, you can see here that I want to make here a column
of 2 photographs or I can say that there are 2 rows in 1 column.
So, the way you want means I just want to show you how it works. So, I try to consider
here the same data set on that direction which I have just use it and then after that I give
here the option here par mfrow is equal to c 1 comma 2. So, this will create a plotting
area of 1 cross 2 array.
After this my second step this is my here is step 1, then my step 2 will be that I try to
create the first plot. Suppose here this is the bar plot of the data in the direction and then
after this in the step 3, I try to create here the data of the pie chart and if you try to do it
in this sequence then it will create this type of graphic.
14
1008
(Refer Slide Time: 16:01)
So, and if you try to see here you will get here this type of graphic and then later on I
will show you that if you try to take here mfc 2 1 then you will get here this type of
graphic, the same means bar plot and pie chart they will be arranged like this. So, let me
try to show you these things on the R console which is more interesting here.
15
1009
(Refer Slide Time: 16:22)
So, I try to create here this bar, right. This direction data is already here actually here. So,
I do not need to I can show you here. This is your here direction data, right and I will just
clean this graphical window so, that it looks completely fresh and now I try to create here
a this I will execute step number 1 like this.
Now you can see here it has created this screen which is here completely blank now what
I do here? I try to take here the first command here which is bar plot and I try to give it
here you can see here as soon as I give this command here and execute it the bar plot is
created here.
And in case if you want to create a pie chart. So, then after in the next command try to
give here the command for the pie chart and you can see here this pie chart is created
here, right. So, I try to close this thing here and I try to redo the things that now I try to
change my here this parameter here mfrow to be here instead of 1 comma 2 I try to give
you here 2 comma 1, right. So, you can see here this has created this type of blank space
for the graphics.
16
1010
(Refer Slide Time: 17:35)
And now if you try to see here I try to give here the let me try to give here this command
bar plot and you can see here as soon as you enter, this gives you here the bar plot here
like this, right let me try to clear the screen, ok.
Let me try to clear the screen and then after that if you try to give here the command for
that pie chart this pie chart will come here. Yes. You can control the size of this graphics
also very easily because I have reduced this graphic size window and all these things
17
1011
otherwise; it will look here just like here this, right. So, now, I try to conclude this lecture
and I come to an end to this lecture and you can see this was a pretty simple lecture in
which I have just given you 2 types of graphics.
One is about the subdivided bar plot and then the pie chart which and both are quite
popular quite useful and after that I have given you 1 very important aspect that how you
can combine different this graphics on a single sheet. So, for that the command is the
same and the more important part was how are you going to execute the command, right.
So, the way you if you try to execute the first graphic then it will come on the first place
and after that if you click for the second graphic the second graphic will come in the
second page.
And then I have taken here the command of only two graphics in a page you can take
any number 2 by 2 2 by 3 and so on. And try to create more graphics and try to put them
in a single place and then try to adjust the size of the graphics these are the different
options which are available here which are perfectly in your control. So, you try to
practice it take some example and try to learn these graphics and I will see you in the
next lecture with more graphics, till then good bye.
18
1012
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 51
Graphics: Histogram
Hello friend welcome to the course foundations of R Software and you can recall that in
the last two lectures we were talking about the various types of graphics and we had
considered the graphics which are essentially for the categorical variable.
Now, we are going to continue on the same lines in this lecture also and we are going to
consider today very popular graphic which is histogram. Histogram also looks like the
bar plot, but the main difference between the two is that bar plots and pie diagram
equivalently they are used for the categorical data; that means, when you have the data
which are in the form of categories.
And histogram is used when your data is continuous means that is the non categorical
data, right for example, if I say heights, weight, age etc. So, if you try to take the real
values of height weight and age in say centimeters kilograms or say years they are the
continuous data and they are the data from the continuous variable.
So, today we are going to learn about how you can create the histogram in the R
software. Well what is histogram I know you know better than me. So, we begin our
lecture and we try to learn how we can create the histogram.
1013
Now, the first question comes over here that how the histogram is considered and how it
is different from the bar plot. If you try to have a look the bar plot will look like this and
histogram will look like this so; that means, this bars are stacked together, right. So,
actually this histogram is based on the basic idea that the data can be categorized into
different groups and then their frequencies can counted and bar for each of the category
can be created and with certain height, right.
And yeah, but please do not understand that in the histogram the height of the bar is
going to proportional to the frequency that is happening only in the case of bar plot. In
the bar plot the height of the bar is proportional to the absolute or relative frequency of
the data, but that is not the case in the case of histogram. The histograms are constructed
when the data is continuous and in histogram the area of the bar is considered.
That means the height of the bar and the width of the bar and this area is proportional to
the frequency or the relative frequency. Now if you try to see what really happened? if
you try to take here a histogram can be like this one bar is like this and another bar is like
this, right that is as such it is correct, but it is not so convenient in practice, right by
looking as these two bars you cannot find whose frequencies is more.
So, that is why what people do in practice they will try to keep the width of the bars of
the two bars in the histogram to be the same. If this part is same then only the difference
will come due to the height and that is why you will see that most of the histograms
always have the same weight, right. But as such these widths need not necessarily to be
the same always that is what you have to keep in mind.
1014
Now, in R software if you want to create the histogram the command here is very simple
h i s t. So, if you try to use here h i s t then it will show the absolute frequencies and if
you try to use here an option that f r e q the frequency that is the short form for the word
frequency that is actually FALSE. Then the histogram will be constructed with respect to
the relative frequencies.
One thing I would like to explain you here that how to create the frequency tables from
the data in the case of continuous data that is the entirely different topic which I have not
covered in the lecture. But I believe that you all have done in the elementary classes and
from there if you try to recall you used to write on the frequency table and then you used
to write that the frequency classes.
And then you used to count the number of values which are lying in that interval and that
used to be the frequency and then you used to consider the midpoint of the class intervals
and then you try to create all such graphics. So, well I am not going into that direction
for that I will request you please try to go through with some elementary book and try to
see that how the frequency tables are created.
And now here in the case of histogram I am talking of the frequencies of the frequencies
table in the case of continuous data, right. Because in the case of discrete data you had
seen that you had simply counted for example, in the example of gender you simply
counted the number of males and females and that had given you the frequency, right.
1015
And then if you try to see when you are trying to use the histogram. Then there are as it
happens in all the graphics it also has many options like as say main is for giving the title
of the chart, then here col you know this is to choose the colours of the bars and then
here x lab which is trying to decide the title on the x axis.
Then similarly you have x lim y lim they are trying to specify the range of the values on
the x and y axis. So, they are x limits and y limits etc. So, once again I would strongly
recommend you to look into the help in the histogram and try to look for this more
details.
But my thing is there here that I would try to consider here very small example and try to
show you that how these things can be created. So, I try to consider here an earlier use
data in which we had collected the heights of fifty persons in centimeters and they are
stored in the data vector here height like this, right.
1016
(Refer Slide Time: 05:51)
So, now, I will try to create the histogram on this data on height. So, if you try to write
down here the command here h i s t and inside the parenthesis. If you simply write here
height and now you can see here this is different. Because in the case of bar plot you
used to first convert the data into frequency using the command table.
And then the bar plot was used, but in the case of histogram you are simply using the
original variable with the command h i s t. So, that is what you have to keep in mind and
that is what I always say you have to see that how R is going to work and accordingly
you have to choose your values, right so ok. So, if you try to see here this is like here see
here now a couple of things will be automated here.
For example it will try to take here the name on the axis directly from the name of the
variable and it will try to write down your histogram of height automatically. And then
on the y axis there is a frequency, right. So, the data on height has been classified into
frequency table and the frequency of those class intervals that has been plotted here, right
ok.
1017
(Refer Slide Time: 06:49)
So, now in case if you try to make some alterations in this histogram that is very simple
you have to simply use the same type of options which you have used and learned in the
earlier lecturer. For example, if you want to make here the title heights of person. So, this
can be controlled by the command or option their main and you simply have to write
down here heights of persons and similarly if you want to make here the colour of this
graphic to be green.
So, we can see here I have used here the option here the c o l is equal to green and you
can see the colour of this graphic becomes here green. Similarly if you want to give here
this heights here on the x axis then you have to use here the command here x lab is equal
to heights yes.
You have to be little bit careful that ok in the earlier graphic also it has used here the
height, but it was h e i g h t all in lower case and I have given here the heights where H is
in capital letter and it is actually heights s is also added. So, do not get confused that it is
coming automatically, right after this you want to give here the say title on the y axis. So,
this is given here by here y lab, right. So, this is how you can see it is not a very difficult
option to add these options and get this histogram.
1018
(Refer Slide Time: 08:01)
Now I will try to explain you some more options which are available in the histogram
and then you have to understand what is really happening. For example, if there is a
command there is an option in the histogram that is density d e n s i t y.
Actually this option as the this type of lines that you can see here inside the bars and this
density value this controls the shading of the line that is the that how many lines per inch
should be there, right. And in case if you try to increase the value of this option here
density, right then there will be more number of lines in a square of inch. So, if you try to
see here the way I am trying to use it here all other options are just like earlier.
But now I am using here colour is equal to here red. So, this colour is actually going to
control the colour of the histogram and the this type of lines, right. So, I try to use here x
lab equal to heights y lab equals to is equal to number of persons and only here two
changes are made I am trying to add here the option density equal to 2 and the colour
here is red.
So, if you try to see here the outcome this comes out to be here like this you can see here
these types of lines are created, right you can see here like this. Now I try to do here one
thing just to show you that what is the interpretation out.
1019
(Refer Slide Time: 09:16)
And how this density actually changes I simply try to increase the value of the density
from 2 to 8. And you can see here now this density have become more closer these lines
have become more closer to each other, right.
And in this way density if I also try to use here one more option here angle, right there is
an option here angle. So, this angle is going to control the slope of the shading lines,
right and that is in the counter clockwise direction. So, the entire command is the same
1020
as earlier which I used here in this one and you can see here this is here the line
inclination and here this inclination has become just different this is now at the angle of
100 degree.
So, these lines are now becoming here like this and earlier the lines were like this. So,
you can see here now where you are going to use it that is up to you, but my job was to
convince you that ok you can do such jobs very easily in the R software and that is what
I am trying to do here, right.
And beside those things I am just trying to make you comfortable in understanding and
in believing that R can do many such options which many software can do. So, now, let
me try to show you these results in the R console. So, I have prepared here my height
data and if I try to use here the command hist h e i g h t you can see here this is my
histogram which I have shown you in the beginning, right. Now I try to make here
different type of changes so that you can see that these things are working.
1021
(Refer Slide Time: 10:44)
And you can see here now I have changed the colour to be here green.
10
1022
(Refer Slide Time: 10:49)
And if you want to change this colour to be for example, here if you want to make it here
pink it is here like this if you just and you see I am not change any other command it will
become here like this, right.
And after that if you want to make here these types of lines using the command density.
11
1023
(Refer Slide Time: 11:07)
I can change here like this. So, we can see here this density is here like this, right and if
you try to use here.
The and if you simply try to change here this value here with density from 2 to 8. I can
show you what I am doing.
12
1024
(Refer Slide Time: 11:20)
Then you will see here then this lines are becoming more dense. And after that if you
want to use here the angle also. So, I can show you here that.
I will just try to add here the command angle. Angle is equal to 100 and try to see what
will happen to the direction of this lines this is now changed, right.
13
1025
(Refer Slide Time: 11:34)
So, now, that was a pretty simple lecture and now you can see that creating histogram is
an easy job and for which you used to take the software from your friends for doing this
job. Now, it is in your hand you can do it very easily not only easily. Now you can
control each and every aspect of this histogram and you can make it as beautiful as you
want as informative as you want that is up to you that will be your capability that how
much you have practiced how much you have learnt in the histogram command to make
it more impressive.
So, now I will make a very simple request please try to consider the data try to use these
options from the help menu. And try to see what happens when you are trying to use a
particular command in the hist function and try to see the effect yes at this moment some
of these options which they are in the help may not be really helpful.
But try to identify that which are the options which are helpful for you and try to see that
earlier the type of histogram which you are creating now can you make it more
informative using the R software. So, you try to practice it and I will see you in the next
lecture till then good bye.
14
1026
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 52
Graphics: Bivariate and Three-dimensional Plots
Hello friends. Welcome to the course Foundations of R Software and you can recall that
in the last couple of lectures, we had talked about different types of graphics and then
and I took a some examples very simple example. And I explained you how you can
make them beautiful and impressive.
And all those graphics were in the univariate direction; that means, you had only one
variable for which you wanted to create the graphics. Similarly, one when you try to
increase the number of variables then you, then the way the graphic graphics have to be
prepared that also changes. For example, if you have two variables, then means you
would like to see what is the joint behavior of the two variable.
For example, if I say height and weight for the small children as the height increases
weight also increases, as the weight increase the height also increases. So, in this case if
you have the peer data that is on height of child 1, weight of child 1, height of child 2,
weight of child 2 then you would like to have a bivariate plot so that you can see what is
the joint variation of the two variables and how these variables are going to going to
affect each other.
And similarly, if you have more than two variable, then also you have the similar type of
thing, you would like to create a graphic in the three-dimension, but certainly in the
three-dimension you cannot plot anything we can see in the three-dimension.
But we cannot plot a graphic in the three-dimension, but the way we try to plot we try to
plot them in the two dimension in such a way. So, that they are trying to give us the
effect of three-dimension. So, these are the things which are very easily available in the
very nice way in the R software.
Surely covering everything in graphics also that is very difficult for me, possibly we can
have a full course on the graphics in the R software. So, once again I will try to take here
1027
some nice representative graphics in this lecture today and then I will stop with the
graphic also. And now we are coming very close to the end of the course also.
So, after that I would expect that you can look into the help, you can look into the course
material, books etc. and then try to see what are the different possibilities and then you
can practice yourself. So, with this request let us begin our this lecture and try to
understand about two-dimensional and three-dimensional graphics in the R software,
right.
So, you see whenever we have two variables from the statistics point of view, we are
always interested in finding out their association. For example, the number of hours of
study affect the marks obtained in the examination. You know that as a teacher we
always say, if you study more you will get more mark. Similarly, when the temperature
of the other increases, then the power consumption will also increases.
For example, in summers people try to use air conditioner, cooler etc. whereas, during
the nice weather when it is not too cold, not too hot people do not use these equipments
and the consumption of this power or this electricity decreases. Like similarly the weight
of small children increases as their height increases under the normal circumstances and
vice versa also.
1028
(Refer Slide Time: 03:30)
So, now you are always interested in learning about this that how you can study the
association of two variables. And in the statistics the this is a very big job for us, right to
that how to get it done and for that we have graphical procedures, we have quantitative
procedures both actually and in this case the observation on both the variables are related
to each other.
So, the first question comes here how to know whether the variables are related or not?
And sometimes you also want to know the degree of relationship between the two
variables whether the degree is a strong or say not so strong or weak. In order to
understand or get the answers of this thing, we have two types of procedures: one is the
graphical and another is the quantitative procedures in a statistic.
1029
(Refer Slide Time: 04:34)
So, for example, our main question is now how to judge or graphically summarize the
association of two variables. So, suppose we have here two variables, which I am trying
to denote here as say X and Y and we have one pairs small number of observations of the
paired observations on say X and Y which are denoted as x 1, y 1, x 2, y 2, x n, y n for
example, if I have data of say 5 children on their height on their weight. So, x 1 is the
height of the child 1 and w and y 1 is the weight of the child 1.
Similarly, for the child 2, x 2 is the height y 2 is the weight and so on these are called a
paired observation, right. So, we have such an observation and we want to plot them.
1030
So, when we try to plot the paired observations, they are called as a scatter plot and
scatter plot you also have learned in the earlier lectures that you can create using the
command plot. But that was for the single variable, right. And if you try to make here
such a scatter plot they can reveal the nature entrant of the possible relationship and this
relationship can be linear or non-linear.
For example, if you try to create a scatter plot and if it is look like this or like this, then
you can very clearly conclude from this one plot number 1, that the relationship between
X and Y is quite approximately linear, but in the case of plot number 2 certainly you can
say that the relationship between X and Y is non-linear, right.
Similarly, the concept of strength and trend of relationship that can also be studied from
the scatter diagrams. For example, if you try to see at let me call this picture says 1, 2, 3
and here 4, right or yeah. So, if you try to look at here this picture and this picture. So,
what you see here? Here, the points are very close to the line and in the picture number 2
the points are quite away from the line which I have drawn.
So, in this case you can say that the relationship between X and Y is quite strong.
Whereas, in the case number 2 we can say that the relationship between X and Y is say
not so strong. And similar is the case if you try to see here in this case also picture 3 and
4 also, you can see here we have a similar type of relationship, but there is a difference.
1031
The difference here is this that here the line was like this, but now the line is here like
this, right.
The line is passing through like this and here like this. So, the relationship between X
and Y is decreasing whereas, in the case of 1 and 2 this was increasing, but the degree of
relationship if you try to see here the points are very close to the line, which is drawn
here and here the points are not so close compared to the points in picture number 3.
So, we can say that here in the case number 3 you have strong negative relationship and
in the case number 4, you have here moderate to moderate negative linear relationship.
So, this is how we can judge by looking into the scattered diagrams about the strength
and trends of the relationship.
And then if there is a no pattern like this one between X and Y, then you can say that ok
there is no clear relationship. So, now, I have explained you that how you can take a
conclusion about the relationship from the scattered diagram that you are trying to plot
here. So, people try to actually look for the direction as well the degree of linear
relationship and then we can combine both the graphical and quantitative aspects
together to take a final call.
1032
(Refer Slide Time: 07:49)
So, that is the job which we try to do as a statistician, but here now our main objective is
that how we can create such plots, how we can create such bivariate plots which can help
us in taking this type of call. So, this bivariate plots provide us the first hand visual
information about the nature and degree of relationship between the two variables, right.
And this relationship can be linear or say non-linear and now we will try to discuss this
type of relationship through example and we will try to create the bivariate plots.
1033
So, earlier you have command plot. So, now, this plot command can also be used for
making the bivariate plots. So, inside the plot you had the length couple of commands.
So, that those commands can be used here very easily without any problem.
And the way it is going to be used here it is you write plot, plot and within the
parenthesis you try to write down the data vector on x and y separated by comma and
after that you have many options.
So, for example, if you there is an option here type. So, if you try to give here type is
equal to p, that is the default actually you will get here points. If you give here the type is
equal to l, then you will get the lines. If you take a type equal to b that the natives were
point and line both. Then similarly if you try to take care c; then if you try to see in the
both option whatever is there only the remaining part is there.
If you try to take it here o then it is over plotted as is for the stair steps, h is for histogram
types or say high density vertical lines ok. What are these thing? How the graphs will
look like? Under these types of option I will try to show you on the same data, right.
And then beside those things, you have other options also like I said main for giving an
overall title, sub a for giving a subtitle xlab title for the x axis ylab title for the y axis
aspect for to control the aspect ratio and so on.
1034
(Refer Slide Time: 09:36)
But anyway, let me try to take here a very simple example and I will try to show you
these graphics on this data so that you can understand them very easily.
So, suppose there are 20 students and they have appeared in an examination and their
marks are obtained out of 500 and then it was asked from those 20 students that how
many hours they have studied in a week. And then we try to see that whether the number
of hours of study in a week and the marks obtained in the examination are they
interrelated or not. Well, that is a very common statement by our teachers, by our family
members that if you study more you will get more marks.
And we want to see here on the basis of this data, whether this holds true or not. So, you
can see here we have a data here like has on the marks and the number of hours of study
in a week. So, it is like that, if a student has a study 23 hours he got 337 marks, 2nd
student is studied 25 hours in a week and that is true and got 316 marks and so on, right.
1035
(Refer Slide Time: 10:34)
So, all this data is here and I have compiled this data into two different data vectors, one
for the marks and one for the hour. So, the relationship is very simple, the value at the
first position in the mass corresponds to the number of hours studied by the student
number 1 and so on.
So, now I have here two data vectors and I try to use here a command plot. So, I simply
use here plot x, y, right. So, I try to use here plot, hours, and marks. So, you can see here
10
1036
you will get here this type of graph, this is here hour this is here marks and this is here
this dot, right. And you know that if you want to change the color of this dots etc. etc.
you can do very easily, but now I try to show you that what you can conclude. If you
hypothetically think that ok.
You can create here a line somewhere here passing through here like this, then you can
actually judge that how close are this point to this line and that will possibly give you the
degree of linear relationship or degree of association between the hours and marks, on
the basis of the given cut of data, right. So, later on I will try to show you that how you
can create such hypothetical equation here and how you can plot such a line.
But anyway, let me try to change my here these options, right. If you try to see here I try
to take here the option here say l which is actually written here like this type is equal to l
like this, so l just for line. So, you can see here all these points which are here is in this
graph you can see here they are the points, but now they are joined here like this and so
on, right. So, this is happening in the case of a line.
11
1037
(Refer Slide Time: 12:04)
Similarly, if you try to take here the option here b, type is equal to b, then you can see
here this points and lines both are occurring together this is the meaning.
And similarly, if you try to change the option to here say type is equal to o, then you can
see here that this lines and this points the line is over plotting over the points. So, this is
the meaning of here o.
12
1038
(Refer Slide Time: 12:25)
And similarly, if you try to take here the type is equal to here h, which means like
histogram or say high density vertical lines. So, every point is giving you here a line
which is falling vertically on the x axis, right.
Similarly, if you try to take here type equal to here s. So, all the points here they are
joined like a stair case, right. So, s means a stair steps, you have seen the steps in your
stairs and there are steps in your home on which you actually walk, right like this.
13
1039
(Refer Slide Time: 12:51)
So, these are the different types actually I will try to show it on the R console and if you
want to add here some options. For example, now you know that if you want to add here
number of weekly hours here, you have to give here the option here xlab and if you want
to obtain here the option on the y axis you can use here the option here y lab.
And if you want to add here the title of this plot, which is the marks obtained versus
number of hours of per week you can use option here main and then you can give this
title inside the double quotes and so on. That you can also change the color etc. So, these
are very simple things which you can do very easily after learning so much of graphics,
right.
So, let me try to first show you these operations on the R console itself. So, let me try to
first get here the data on here marks and hours, right.
14
1040
So, if you try to see here, I have this is the data here marks and this is the data here
hours, right. And now I see here plot the hours and here marks. So, you can see here you
get here this type of air plot, right.
And if you try to see here hours on the x axis and marks on the y axis. But if you try to
plot it here see here marks comma hours, then you will see that their locations are going
to be changed yeah like this.
15
1041
But anyway, let me try to clear the screen and let me try to create the graph here plot.
And then now, I will try to add here the option for the here type. So, type if I try to say
here this is here line. So, this you can see here just see here how the things are changing.
So, if I try to make it here this option of the line here, I can reduce here the size of the
graphic so that I can show you here everything, right. So, you can see you can adjust all
these things on the graphics without any problem. Now, instead of line if you try to take
here both, we can see here now both are there, right.
16
1042
And similarly, if you try to take here instead of here both, if you try to take here over
plotted type is equal to o. You can see here now this graph is change here like this.
Now, if you try to take the option here h, which means high density lines you will get
here this type of graph. And if you try to take here the option here s, that is steer steps
you get here this type of graphic, right. So, you can see here that is not a very difficult
thing for you to understand and after this if you try to just try to add here some titles etc.
17
1043
(Refer Slide Time: 15:14)
So, you can just see here if you simply try to execute it, you are getting here this type of
offshore option, right yeah. I am trying to manage both the screen on the same computer,
but you can see it very clearly when you try to make it large.
And anyway, I come back to my slice and try to give you here one more command here.
So, sometime you see you have these types of variable and you would like to see their
joint variation, but you would like to create all possible combination of the plots in a
single plot.
18
1044
So, that is called here as a matrix scatter plot and this command can be obtained by using
the command pairs, pairs, right. So, the rule is very simple, inside the pairs you have to
write down here c bind and you. For example, here you try to write hours comma marks
and if you have more than two variables that you can join here separated by commas.
Right, say x comma y comma z etc. So, now since you have taken here two variables.
So, it will give you here a plot like this one which is a 2 by 2 plot and it looks like a
matrix. So, that is why it is called as a matrix scatter plot. So, this will be like on is here
it is hovers and it is here marks and on the x axis also this is here hours and this is here
mark. You can see from here, right.
Now, this is a plot between hours and hours. So, which has no value and this is the plot
here between marks and marks which has no value, marks from here, marks from here,
hours from here, hours from here. Now, if you try to see this is a plot between say here
hours and here mark. So, you can see here this type of trend and this is another graphic
here which is between say hours and here marks. So, the only difference is that the roles
of the x and y axis they are changing.
But if you are trying to take here say 4 variables possibly it will look like this. All the
graphics will be there. Now, that will be a challenge and that will be your job to conclude
what you really want to conclude at the end, right. But anyway my job was to explain
you that you can do these things on the R software also.
19
1045
For example, if you try to see here, when I try to obtain matrix scatter plot and if I want
to make here the labels of my choice.
I can use the command here labels, labels say study hours and marks obtain and I can
make it with the color to be here red. So, you can see here these labels are controlled by
the this option here labels and then this color of this dots, that is changed by this ul
option. I do not think it will be a very difficult thing for you so, but let me try to show
you that how you can obtain it on the R console.
20
1046
So, if you try to see here this is your here matrix plot, right. And similarly, if you try to
take here the other option, where you are trying to change the color and you are trying to
add the labels you can see here this is here like this. You can see here now labels are
added here study hours and marks obtain and this color is now changed to red. So, you
can see here that is not a very difficult thing, right, and I am sure that you must be
confident at how well you can do it very easily.
Now, in this scatter plots, if you try to see here whenever you are trying to make a scatter
diagram like this one I always ask you that you try to create here a hypothetical line like
this one and try to see whether these points are close to the line or not. Well we have
statistical methods by which we can find the equation of such a line. So, what is
happening?
That when we want to plot such a line all the statistical tools are used inside the program,
the equation of the line is obtained and then this line is plotted over the scatter plot, right.
So, now, we try to obtain such scatter plot which have a smooth curve. Smooth curve
means it may be like this one, but if you are trying to fit only a straight line possibly for
the same data, it may look like this, right. So, now, we are looking for a smooth curve.
So, suppose there are two variables which are related.
21
1047
So, what we try to do? A scatter plot is created with a fitted line and that will provide
other information on the trend and the type of relationship between them. And in order to
do it we have a command here scatter dot smooth, s c a t t e r dot smooth and all in
lowercase alphabets and this will produce a scatter plot and this will add a smooth curve
to the scatter plot, right.
How that I will try to show you. So, actually just for the information this is based on the
concept of LOESS, which is a locally weighted scatter plot smoothing method. And it is
used for local polynomial regression fitting and it fits a polynomial surface determined
by one or more numerical predictors using the local fitting.
Well, I am not going into these details these are the things which we try to study in
statistics, right, but we are simply using here. So, if you want to have more information
about that scatter dot smooth you can just look into the help, right and try to see.
22
1048
(Refer Slide Time: 19:47)
And then you will see here there are many many options, the most simple option here is
that you try to write down the scatter dot is smooth and then try to give here this data on
say x and y. And then you can control the span, the family by which it is going to be
estimated, right degree xlab, ylab etc. is ylimb.
But and there are many many commands here, which you are going to use for example, x
and y they are the arguments that will provide the data span will control the smoothness
parameter, degree is the degree of the local polynomial, which is used family is the we
try to use in stats is Gaussian fitting or like that. xlab is the label for the x axis, ylab is the
label for the y axis y limit the limits on the y axis.
23
1049
So, I try to take here thus, I have just given you here just repeated this so that you can
understand this is the same example.
So, from this example, where we have the marks and number of hours which I have
stored here in these two data vectors, marks and hours.
24
1050
So, I will try to use the same data set and I will try to create here a scattered smooth plot.
So, you can see here I just try to use here the command here scatter dot is smooth and
inside the parenthesis hours and mark.
And you can see here you get here this type of graph. You can see here earlier you had
obtained only this dot, but now there is a line here also. So, now, looking at this line you
can make a better judgment whether the points are close to the line or not. So, that is the
advantage, right.
And similarly, if you want to make some modifications in the line you can just use here
different types of options here.
Like lpars which is actually a list now you know, what is the list where color is equal to
red then lwd is equal to 3 lty is equal to 3. So, these things are trying to control the this
spacing of the line like this, this one, this one, this one and the width like this one, right
because this line can also be like this like this.
So, these things are, but in order to understand this thing I will say simply you try to take
different values and try to plot them in the R software and try to see that how do they
affect. And then you will have an idea that how these things are going to be changed.
25
1051
(Refer Slide Time: 21:51)
So, let us try to first make this scatter smooth plot on the R software so that you get here
more confident. So, I try to you can see here this is here the graph, which you wanted to
have, right. So, you can see now here it is very convenient to look at what is that
hypothetical line and try to see how close are the points to this line, right.
And similarly, if you want to make here these type of changes here, you can use here this
command here you can see here that this line is changing. But anyway, these are the
26
1052
cosmetic changes and what it depends on the on your need that how you want to move
forward ok.
So, now I try to take here the one more topic that is about the three-dimensional scatter
plots, right. Three-dimensional scatter plots you know that they cannot be done in the
three-dimension, but what we try to do here that we try to make the two-dimensional plot
which will look like a three-dimensional plot.
So, this command here is scatter plot 3D which is here like this s c a t t e r plot3d and
inside the parenthesis x comma y comma z this will plot a three-dimensional plot and,
but it is based on a package here scatterplot3d, right. So, for that you need to install this
package and then you have to upload it. But I will try to take here one very simple
example to show you that how these things can be done very easily.
27
1053
So, yeah just this is a very simple hypothetical example, which I have created to explain
you that I have considered 5 persons and then you know that height, weight and age they
are inter connected, inter related. As the age increases usually under normal
circumstances the height and weights also increase.
So, now, we would like to create here a three-dimensional figure such that there are three
axis x, y and z and where we are trying to take the variables height weight and h for
these three axis and then if we would like to explore more that how we can move ahead.
So, you can see here this is person number 1, the height is 100 centimeters, then weight
is 30 kgs, age is 10 years. This is person number 2, the height is 125 centimeters, weight
is 35 kg and age is 15 years and so on.
So, what I try to do here, that I try to create the data vectors for the height, for the
weight, and for the age, right. Like as here you can see.
So, now I can explain you that you need to use the command scatterplot3d, where you
have to be very careful s c a t t e r plot. That is in the lower case alphabets, but now there
is here a number 3 and then it is here lower case alphabet d.
And inside this parenthesis, you have to give this values, right and then you have to first
install a package scatterplot3d and you have to upload it. I already have done it on my
28
1054
computer so but you also please do it. And then you see I have created here the data
vectors on height weight and age, right.
As soon as you try to plot it, it will look like this. So, you can see here these are here the
data vectors, right. So, I have taken it suitable way so that you can see it very clearly.
And now there are many many options in this scatterplot3d.
29
1055
For example, if you try to see here there is an option here angle, angle is equal to here
120. So, if you try to see here this was the inclination. Now, this inclination is changed
by 120 degrees and this inclination becomes here like this, right. It is changed and you
can see here the axis here are age, weight and height.
But now it becomes here weight, height and age. So, now, this entire box is being rotated
like this clockwise or anticlockwise that you also can control. So, now, if you have this
type of option and if you write a simple loop and you try to change the angle, say from 0
to 360 degree and then you try to play the program. Do not you think that this picture
will start moving like this.
And you can move this picture in any direction and can see what is the location of this
point. So, if you try to see this is a very simple logic to view the two-dimensional picture
or a three-dimensional picture in a two-dimensional way, right. So, it depends on your
capability what you can do.
And similarly, if you want to change here the color. Now, this is a very simple thing here
you can use here color; color that the same thing you can see here now the color is
changed to red, right.
30
1056
(Refer Slide Time: 25:51)
So, these are the thing which you can do here and beside those things we have many
more plots, contoured plot, dotchart, image, mosaicplot, perspective loss etc. and yeah it
is very difficult for me to consider all these plots. But surely, I will try to show you these
commands.
So, I try to use them I client screen and I first I try to load the library and then I have
stored the data on say height weight and age and then I try to create here this
31
1057
scatterplot3d and as soon as I do it, you can see here this is created here. And if you try
to yeah make this picture larger you can do a you can see a better view, right. So, I hope
you will not mind, right.
And then, if you try to change here the angle also that also. You can do here like this you
can see now the angle is actually change. If you try to see here earlier one that was like
this and now the new one it is here like this. So, you can see here because change and
similarly if you want to add here a color also you can see here that this color can be
changed here like this, right. So, you can see here it is not a very difficult thing that you
cannot actually do.
32
1058
So, after this actually I just want to give you a very quick example that how beautiful
graphics you can make, this is the example of the perspective plus. I simply try to take
this values, I am not going into the detail that what are these values I just want to give
you a demonstration.
And then after that I use this command and you will get here this type of graphic. And if
you try to change some parameter values you will get here this type of graphic, right. So,
33
1059
just as an advertisement of the R software, I would like to show you these things on the
R console, right and then I will stop in this lecture, right.
So, if you try to see here, I have taken just these value. Surely I am not going to explain
it in more detail, but I will leave it up to you that how far you want to go, but if you try to
see here this curve is here, right.
34
1060
And if you try to make it here like this perspective here you can see here you can see
here now this curve is changed and this curve becomes here like this, right. You can
increase the size minus the size, ok. So, now, I come to an end to this lecture and in this
lecture I have tried my best to give you a quick review of the bivariate plot and a small
glimpse of the three-dimensional plot.
But surely, as I said earlier, this is the beginning of the graphics that you want to learn in
the R software. Now, there are many many commands, many many types of graphic,
there are different packages like ggplot, they can create a very beautiful very impressive
very informative graphics, but surely as I say always that if I try to take all of them in the
same course possibly this course will become only on the graphics.
So, and that was not my objective also, my objective was simply to take the fear from
your heart, do not forget, right. And possibly I feel that I have been successful and if you
have followed the course and I am sure that you have followed the lectures, I am sure
that you will not have fear to create any type of complicated graphics also in the R
software. And I will see you in the next lecture, till then good bye.
35
1061
Foundations of R Software
Prof. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology, Kanpur
Lecture - 53
Some Examples of R Programming
Hello friends. Welcome to the course Foundations of R Software and welcome to the last
lecture of the course. So, now, you can see we have learnt many aspects of the R
software, basic commands, basic function, how to create graphics, how to write functions
etcetera. Now, in this lecture I am not going to entertain any new topic, but my objective
is that I just want to show you some salient features of the R software through the
programmes.
And my objective is this, I would like to explain you how you can write a programme in
the R software and for that I will try to take couple of examples and through those
example I will try to show you the different features of this R programming ah.
Well, I have here one question that some of the candidates may not be knowing anything
about the programming, and some candidates may be knowing very well about the
programming. So, what I am going to do, that in the first couple of minutes I am going to
give you a very brief introduction to the programming that what do we expect and what
and why do we do this programming. And after that I will try to take couple of example
to explain you the different types of applications and features of the R programming.
So, let us begin about lecture and try to understand first, the very basic fundamentals of
this programming and to know why do we need programming. So, let us begin our
lecture.
1062
(Refer Slide Time: 01:46)
So, you see first question comes here, what is the programme and what are the different
steps to write a programme? So, a programme is simply a set of instructions or
commands which are written in a sequence of operation, that whatever you want to do
first they are written first and then whatever you want to do in the second step they are
written at the second step, right. So, whatsoever come first and whatsoever come after
that all the instructions are given in a sequence, right.
And the objective of a programme is to obtained and outcome which is predefined and
which is based on some input variables and some given mathematical function. And the
only thing is this that we can do such calculations and manipulations manually also, but
we try to take the help of computer and the computer is instructed to perform this defined
task.
1063
The only reason is that because computer is an obedient worker and it can get our things
done fast. And the only issue is this the computer does not understand our language like
as Hindi or English, but computer has its own language.
And our problem is this we do not understand the computers language and computer
does not understand our language too. So, now, what is the solution? The solution come
through the software. The software help us and works like an interpreter between us and
the computer. Whatever we try to give or whatever we type in programme in English
language, they are translated to the language of the software, so that the computer can
understand it, computer does the work.
And then it is reverted back in the same language what we know. And what we try to do?
We try to say something in the software language and software informs to the computer.
Computer does the task and informs back to the software and then software re-translate
the outcome into our language and inform us through the outcome, right.
1064
And we already have discussed that the programmes in the R software, they are written
in the form of a function. And in order to write down function or the programme, we
have to first define the objective. And we should know what we really want to obtain as
an outcome of a programme.
And then we try to translate it to the language of R that, how to communicate with the R
using the commands and functions. And in order to do it first we try to identify the input
and output variables, we try to identify the nature of input and output variables, like as
whether it is numeric, string, factors, matrices etcetera.
Now, you know all of them, right. And now this input as well as output variables, they
can be a single variable, they can be a vector valued, matrix or even a function itself,
right.
And whatever are the input variables of the programme, they are the component of this
function and these components are reported inside the argument that is here with in the
parenthesis of the command function.
One thing you have to keep in mind that whatever is the outcome of one function that
can also be an input for another function, right. And that is the biggest advantage of this
R software that you can call within a function, a function and that will try to work as an
input also, right.
1065
And the output of an outcome can be formatted as per the need and requirement that can
be printed, soft copy, hard copy, they can all be arranged, right.
So, some very basic and fundamental tip for the programmer who are going to use the
big data. Usually, it has been observed that the that loops usually slower the speed of
programmes, they consume more space in the memory. So, people try to prefer to work
with the vectors and matrices which makes the speed of the programme quite fast.
At this moment we are trying to handle very small programmes in which the speed does
not make much different, but when you are trying to write down the big programmes
dealing with big data sets, then you will see that sometime it takes long time to execute a
programme. And to my experience when we are trying to conduct the simulation, this
time can be from couple of seconds to the couple of days, right.
And then always try to use the hash symbol to write the comment inside the programme,
so that you can understand the syntax at some future point.
And then try to use the names of the variable which are easy to understand. For example,
if you have two variables height and weight, so it is better to use height for height and
weight for weight. If you try to indicate the height by the variable weight or vice versa
then it will be very confusing.
1066
And do not forget to initialize the variables and try to remove it from your memory
before you begin. Otherwise, some earlier variables or earlier values will take over and
we will never come to know. And the outcome will be very different, ok.
So, now I try to take here couple of examples. And one thing I can accepted before
beginning the example that my objective here is not really to teach you programming or
to explain you that how are you going to write the programme.
My objective is very simple, I want to show you that whenever you are trying to write a
programme what are the different aspects which you have to keep in mind, and how you
have to proceed. For example, in this example, I just want to write a programme of the
x 2
i n
2
x
i =1
n
and i . Well, I will show you that this command can be written now
i =1 yi
y
i =1
2
i
you know in a very simple words that, if you try to have the two data vectors consisting
of the values of x and y as x and y then this can be written here as a simply here is some
x hat 2 divided by some y hat. Whereas, this can be written here as a some x upon y
whole square.
But my objective is not to compute this value, but I want to show you the steps. And
well, those steps are going to be a couple of steps and, but finally, I will show you that
1067
the length of the programme is very less. But then, I am trying to explain in a more
detailed way, so that it will carry over to the couple of slides. But it does not mean that
the programme is long or it is complicated, right.
x
i =1
2
i
So, suppose I want to compute here these two quantities. So, I try to denote the n
as
y
i =1
2
i
2
n
x
g and i as say h. So obviously, what we need here that, we need to store the
i =1 yi
values of x i in a data vector x and the values of y i is in a data vector y. So, we try to
denote them here x and y. And now you can see here this is here n.
So, in this case, I have taken the summation which is going to the equal number of
observation. But if there are say different number of observation, then you can use here n
1 and n 2 also in place of without any problem. And I try to denote this function as say g
and this function here as say h, right.
Now, if I try to explain you that how I am going to do here that I wish to show you the
application of the loop.
1068
(Refer Slide Time: 08:03)
So, that is why I am just trying to write in a more detailed way, so that you can
understand. At least those candidates who do not have very strong programming
background, they will also be able to understand, and those who have very good
programming language for them it is very easy example.
So, as a first rule of the programming that try to remove all the data from your memory ,
so that you know what you are going to give that is going to be used later on ah.
Definitely, when you want to execute the programme you need some data input values.
So, I try to define here x and y here as say like this.
It takes, say 3 values x takes 10, 20, 30 and y takes 1, 2, 3, yeah. You can take anything
and this number can be anything. Now, I want to write a programme to find out this g
and h, right. So, I just try to give it here a name say here example 1. So, this is the name
of the programme.
And after that I will try to write down here the command function f u n c t i o n, and
within parenthesis you have to give the input vector. So, now, you have here x and y, and
yeah the value of n, I am going to compute it from the number of observations in x and y,
right.
1069
So, now, I begin here the starting of the function. So, for that I have to write down here a
curly bracket and then after that I have to give here all the input command for the
computation.
Now, the first input value comes here which we have not defined is the value of here n.
So, for that instead of giving it from outside, I try to give it from inside using the
command l e n g t h length of x. Actually, this is always a better idea that when you are
trying to compute something try to compute, try to write command in which the things
are calculated automatically. Otherwise, they can always be a chance of mistake.
And now since I want to use here the loop, so I try to see here if I am trying to find the
value of a summation x i square. So, what I have to do? First, I have to find out the value
of x 1 and then x 1 square, and then I have to find out the value of x 2. And then I have
to add it in the say x 1 square like as x 1 square plus x 2 square and then I have to get
here the value of x 3, and then I have to add this value x 3 square in the earlier obtained
sum.
So, for that in order to begin the loop, I have to take here say some initial value. So, I try
to define here one initial value for summation x i square, one initial value for
corresponding to summation y square, and one initial value for summation x i upon y i
whole square which I am going to call it here as a summation of z i square. So, that is
why I have defined here x 1, y 1 and z 1 which all take value 0.
1070
Now, I begin here the loop and I choose here the for loop. So, for loop you have to write
f o r and then within parenthesis i in 1 colon n. Now, I need to define here this x 1, y 1, z
1, so that the individual values of say x i square y i square and x i upon y i whole square
they can be stored.
So, I try to define it element wise, so you know that when you are trying to write down
here x 1 and inside the square bracket you are trying to write down here i; that means, the
i-th value in the value will be stored here. So, I write here x 1 i is equal to x inside the
square bracket i and then whole square.
So, whatever is the i-th value of the data vector x, that will be squared and that is going
to be stored in the variable x 1 at i-th location. Similarly, the value of y i square will be
computed by the command y square bracket i whole square and that is going to be stored
in the variable y 1 at the i-th position.
And similarly I try to define here one more variable here z i. So, z i is something like
here x i upon y i. So, this becomes here x square bracket i divided by y square bracket i
and x square. So, now this value is going to be stored in the new variable here z 1 at the
i-th position, right.
So, now, this completes my loop and I give here this bracket. So, this loop is you can see
here within these two brackets, these two curly brackets, right.
10
1071
And now I have to obtain the sum of all the values in the x 1, y 1, z 1. So, I try to define
here a variable here some underscore square underscore x and then using the function
sum I can find out the summation x i square as say here sum of x 1.
2
n
x n
And similarly, I can find out y and i by using here the command sum of y 1
2
i
i =1 i =1 yi
and storing it to some underscore square underscore y and sum of z 1 and storing it to a
new variable sum underscore square underscore z, right.
So, now, I have obtained here this summation x i square, summation y square and
summation x i upon y i whole square. Now, I need to compute my here g and h. So, g is
n
x
i =1
2
i
simply your here n
. So, I try to define here this sum of square of x divided by sum
y
i =1
2
i
of squares due to y from these two values. And then h that is already the sum under
underscore z, this has been obtained here also.
Now, after this I want to take the output in the format that the value of g and h which is a
string is like this R, the value of here g, and then the string h, and then the value of here h
and then here next line. So, now you know all these commands. So, I use here the
command here cat function and I want here a formatted outcome. So, now, after this I
will get here the outcome in the way I want.
11
1072
And if you try to see this whole this function in a single slide, you can see that it is not a
very difficult programme. I have just given here the input, and initial values, and here it
is here the loop function and then I have here defined the sum of the squares and then I
am defining here g and h and here is the outcome, that is all, right.
And if you try to see it in the R console also, it will look like this.
And means, when you try to type it inside the R console, there will be here plus sign.
Now, you know what is the meaning of this plus sign, right.
12
1073
(Refer Slide Time: 13:41)
And now after this since you have taken the value of x to be here 10, 20, 30, the value of
y to be here 1, 2, 3. Then you try to write down here example 1 and within parenthesis x,
y. To execute it the outcome will look like this, the value of g and h are 100 and 300,
respectively. And now you know how this outcome is coming, right.
Similarly, if you try to take here one more example in which you are trying to take
different values of x and y, and then you try to execute it, now, once again the similar
output will come in which the new values of g and h are going to be reported. The value
of g and h are here like this and this, respectively.
So, you can see here another advantage is that just by using the values of say x and y,
you can very easily execute the programme as many as times you want, and depending
on the value of here g and h. And this is here the screen shot.
13
1074
So, before I try to move forward, let me try to show you this example on the R console.
So, I try to just copy this command on the R console, so that you can see here this is here
like this, right. So, if you try to see here your programme here is like this example 1, you
can see here like this, right.
So, now, if you try to define here x is equal to suppose here is 10, 12, 14 and then y here
as say here c, it can be anything 34, 76, 12 and so on.
14
1075
(Refer Slide Time: 15:16)
And then if you try to write down here example 1, and then x and here y, see here the
outcome will look like here this, right. The values of h, g and h are like this.
And in case if you try to replace or you try to change the values of here x and y here like
this, suppose if I want to add here some more values here like this. And then similarly, if
I try to take here some values of here x like this, I try to now make it here 5 values in
place of 3 and if I try to use an example 1, x, y you will get here this new values and you
can see here this is the value of g and h are like this, right.
So, you can see here it is not a very difficult job to write such programme and to execute
them, right.
15
1076
So, now, as I told you that if you really want to ask me how to write this programme
efficiently, then I can write down the values of g and h in a single line like this. This is
quantity is your here some of x square and summation y square summation like in the
sum of y hat 2 and then this here is here like the sum x upon y and say hat 2.
So, we can see here just by using the commands of the R software which I have defined
in a very mathematically friendly way, you can write such functions in a single line,
right. So, that is that one test, that is what I was telling you, right from the beginning that
this R programming and built-in functions help you a lot writing the programme
compactly and efficiently, right.
So, now, let me try to take here one more example and where I try to show you that I
want to compute this function, right.
So, if you look at this function very carefully, you can see here this is here a function
x + ln y
which is used here at say here 3 places. So, I try to rewrite this function as
y
( g ( x, y ) ) exp g ( x, y ) 23
2
x + ln y
g ( x, y ) = , right.
y
16
1077
(Refer Slide Time: 16:58)
Now, I want to show you something here. So, you can see here in this case there are two
input variable, say x and y, and the output is that, ok, you want to compute the say here
function and you want to know the value of your f(x, y).
So, but if you try to see here that this f(x, y) this is depending on the value of here g(x,
y). So, what we can do here, that we can compute here g(x, y) and use it as an input
variable inside the f(x, y). So, how to get it done? And the advantage that the more
complicated programming can be broken into different components, simple components
and different people can write the programmes, and then this programme can be joined
together without any difficulty.
17
1078
So, as usual I will try to remove all the data and then I will try to define here the input
variable that is the first step.
Now, in the second step, I simply try to define here g(x, y) and f(x, y). So, you see now
this g(x, y) here is like this. So, writing down this function is very simple, defined here g
and then function and then within parenthesis x, y and it is simply your here x plus log y
divided by y. So, that you already have learned that how to define this natural log and
then this will complete your function.
And similarly, if you try to now define here this function here, so you already have
defined here this here g here, right. So, now you would try to bring this value of g here
directly here. So, it is like here you can see g(x, y) whole square which is here, then this
5 plus g(x, y) cube which is here, and then after this it is exponential of g(x, y) raise
power of 2 by 3 which is here, right. So, you can see here now I have written this
programme.
18
1079
(Refer Slide Time: 18:29)
Now, in case if you try to see briefly in a single slide that how this programme can be
written. This is just your here g(x, y) and this here f(x, y), yeah.
Now, if you try to see one very peculiar characteristic of this R programming. Here you
are trying to use here g(x, y) as an input, so that means, when you try to execute this
function here f it is going to look for the value of g(x, y) and the value of g(x, y) is being
computed externally outside this function. So, the function g should also be available at
the same place where you are trying to write down the f function, right.
19
1080
So, this is how I try to write down.
Now, if you try to see what I am going to do, that is very interesting and this is what you
have to observe. I simply define here x and here y, and I execute here f(x, y) and we are
getting here this value.
20
1081
Now, think and tell me where is your g(x, y)? You have not computed the value of g(x,
y), but you are simply computing here the value of f(x, y). But what is happening that
when you are trying to compute the value of f x, y this programme goes outside and then
it tries to compute the value of g(x, y). Just like here I can explained you on this screen
shot.
When you are trying to compute here f(x, y) then control comes over here and it tries at ,
ok there is a function here g(x, y). So, this function comes out of this programme and in
the same directory it tries to search where is this g(x, y). So, it is going to take the same
input value x and y to this function g x here, and it will try to compute this function and
then it will try to bring it here.
So, you can see here the function is automatically jumping out of the function. It is going
to the other function. It is trying to compute the values and it is trying to bring back the
numerical values as an input to the next function, right. So, this is what is happening
here.
And similarly, if you try to change here the value of here x and y, you will get here new
values. And you can see here there is no need to calculate the value of g(x, y). So, now,
if you try to extend this concept to bigger example that there is a very complicated
function, and as a programmer you try to divide the programming in say here different
components.
21
1082
And you try to give it to the different people in your organization, and all of them they
are trying to just write a programme for a small section of the programme. And then you
are trying to just use all of their programme and try to combine or try to call them in a
single programme, right.
So, let me try to show you here this here these two functions. So, I have copied here the
function for this g and f. So, you can see here what is your here g and this is your here f.
Now, if you try to see, I try to choose here some value of here x say 10 and y is equal to
20.
Now, if you try to see what I am going to do, I am simply trying to going to execute here
f(x, y). I am not computing here g(x, y) just before you, right. So, if you try to see here
as soon as I enter it gives me here this value.
So, f(x, y) from here it has used the value of g(x, y) for that it has gone outside this
function and then it has brought the numerical value of g x, y and it has used it.
Similarly, if you try to take here any other value of x like this and if you try to repeat this
command you can see here this is here f(x, y), right.
You have nowhere use the value of g(x, y). But that is automatically computed that is the
very strong feature of this R programming which is quite popular actually. And that is
why this R software gained the popularity, ok.
22
1083
(Refer Slide Time: 22:00)
Now, we consider here one more example and you have seen this type of functions in
mathematics and our objective is this, I want to make here a plot between here x on the x
axis and f x on the y axis, right. So, what this function is trying to say? It has divided the
range of x into 3 parts when x is greater than 0, this function is given by this function.
When x equal to 0, then the value of this function is given by 10. If this x is negative,
2+ x
3
then the value of the function is obtained by . So, I try to take here some values of
x
here x on the x axis and corresponding to which we need to compute the value of x first
and then we have to plot. So, we try to choose here the values of x between minus 1 to 5
and I try to increase it by say 0.2. So, this will be minus 1.0 to minus 0.8 and so on up to
here 5. So, I will try to take these values on the x axis, the values of f x on the y axis and
then I will try to make here a plot.
23
1084
So, now you can see here by looking at this function you can see very easily that you can
use here the if-else condition. So, for that first you try to do the basic operation, try to
remove all the data and try to define the input variable.
Now, in order to use the if-else condition you know that if this condition is true, then this
is going to be executed and if x is equal to 0, then x is going to be, then f x is going to be
here 10. And if both of them are not true, then whatever is left that is going to be
executed here like this.
So, now, I try to use the if-else-if condition and if-else-if condition execution that you
had learnt earlier and you can see here you can write this programme very easily. So,
first I try to take here the first condition that if x is greater than 0, then f(x) is going to be
computed by this exponential of x plus log 1 plus x cube upon x square.
And I try to write down here if x is greater than 0, then within the curly brackets I write
this expression. Now, you know how to write down this expression in the R language.
And then I try to give here second option that if x is exactly equal to 0, then the value of
f(x) is going to be 10. So, this I write down here that if x is exactly equal to 0, then the
value of f(x) is going to be 10 and this is your under else-if.
And now finally, if you try to see your third option here, that if the first two condition are
not correct, then obviously, the third condition is going to be correct and then it comes
2+ x
3
under here else and then . And all these things I have enclosed in the programme
x
24
1085
name or function name f. I write down here f is equal to function within parenthesis, then
put variable here x, and I try to write down all these 3 commands inside a curly bracket.
But now here you have given the programme to compute the f(x). Now, you need to
write down the programme that how you can create the curve between x and f x. So, for
that you try to define here x and then plot command.
So, I try to define here a new programme here h and then I start here the programme. So,
first I have to generate the values of x. So, I try to use here a command sequence from
minus 1 to 5 by 0.2, and then I try to define here the value of f(x) as here y and then I try
to initialize it, y equal to 0.
25
1086
And then I have to compute the value of f(x) that is your here y for each x i. So, it will be
like here that I want to compute the value of y i for each value of x i, right. So, for that I
try to write down here a loop. Well, you can do it directly also, but I am trying to write
down here a loop. And then I try to write down here y square bracket i is equal to f of x
square bracket i. That means, for each of the ith value in the x, the value of y is going to
be computed using the function f.
You are nowhere given here f. What is here f? That we do not know because this has
been defined externally here. You can see here, this is not a part of this programme. So,
once again as we get in the earlier example, it will go outside the programme and it will
try to compute the value. It will bring it inside the programme, and it will try to execute
it.
So, then I try to use here the command here simple plot command, plot x, y and then type
I am going to take here l that is line, right. So, this is my programme.
Now, you can see here this programme is very simple. This is your here f and this is your
here h, that is all.
26
1087
(Refer Slide Time: 26:13)
And if you try to now execute it on the R console, this will be your here screenshot of the
programme.
27
1088
(Refer Slide Time: 26:18)
And now if you try to see here, we do not want it, but just for the sake of illustration if
you try to take x equal to 1, 2, 3, then the value of f is going to be like this. If you try to
take here x equal to minus 1, 2, 3 then the value of f x is going to be here like this. So,
these are the values of here y is, right.
Similarly, f of 0 you can see here f of 0 should be actually here 10 because your function
at x equal to 0 it should be equal to 10. So, and then if at x equal to 8, f value is here like
this, at minus 4 the value here is like this and so on, right.
28
1089
So, now if I try to execute my here function x, so what it will do? It will try to choose,
actually it will try to generate the value of x and it will go outside the function h and it
will try to compute the value of f x i. And then bring it inside the programme and it will
use the plot command to create this plot and the plot will look like this. You can see here
it is not a very difficult job. And you can very easily do it on the R console, right.
So, let me try to execute this programme on the R software, and then we try to see that.
So, I try to simply copy here both these functions and you can see here, these are your
here functions h.
29
1090
So, if I try to see here this is your here f and this is your here h, right. And now we are
going to create here a plot. Let me try to adjust my screen here.
Now, if you try to see here I just want to; I am not using here any of this here function f.
I am simply using here h and as soon as you enter you can you will see here you will get
the this type of curve, right.
And then similarly if you try to change the value of here x according to your choice, if
you try to change the values of say here f(x) according to your choice, then you can
create such a graphics without any problem and you can plot such functions without any
problem in the R software.
So, now, in this lecture, I have taken 3 examples and my objective was essentially to
show you that when we are trying to use the programming in the R software, then the
input of a programme can also be another function which is defined outside this
programme. And this is a very strong feature in the R software and which made the R
programming very popular.
And now you have seen that within the R programme also, you can use the R
programming language and you can use the built-in packages also. And you have
combined both the things together in order to find out the sum mean.
30
1091
You do not need to write the programme separately, but you can simply write down the
function like sum, mean, etc. And all these functions as a very obedient friend they will
try to give you the correct value.
And now you can see here whatever you have done, means I have taken here very simple
examples in the entire course to convince you that if R is a free software, it does not
mean that it is going to give you the wrong value.
And when you are convinced with this smaller value which you can compute manually,
with your own hand, then you should be confident about some bigger value, bigger data
sets, and some complicated computation, that R is very dependable and R is always
going to give you the correct value unless and until someone has made the mistake in the
programming.
So, now with this lecture, I come to end to not only to this lecture, but to the course also.
But I am going to finish this course from this lecture, but your journey towards the R
software is going to begin from today after this lecture.
I have tried my best to select or to and to give you some selected commands, syntax,
during this entire course. But as I said many times this is not the end of the list of the
commands and functions which are possible in the R software, there are many more
function, many possibilities are there and in the last two decades R has made a very good
progress.
Now, R has diversified to many areas in which people are trying to use it, but surely you
will agree with me that covering all the topics in the same course is not possible. And I
and as I said in the beginning itself my job, my aim was not to teach you the R software,
but my aim was to take out the fear of learning R from your heart, that is all. And I
believe that now there is no more fear in your heart, and now you can start flying and the
sky is the limit for you to reach in your life, in your career and in the use of your R
software.
So, I wish you all the best. And I wish and pray to god that you achieve good life, you
get more successes. And I will see you sometimes, somewhere, in some other course or
sometime physically. Till then, may god bless you. And I will see you, till then goodbye.
31
1092
THIS BOOK
IS NOT FOR
SALE
NOR COMMERCIAL USE