Nothing Special   »   [go: up one dir, main page]

Interpretation of Universities Using Multidimensional Scaling and Principal Component Analysis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Interpretation of Universities Using Multidimensional

Scaling and Principal Component Analysis


Multidimensional scaling is a type of multivariate analysis that involves visualization of similarity
or dissimilarity of data points by displaying the points in two-dimensional plots. There are two
types of multidimensional scaling i.e. Classical multidimensional scaling and Non-metric
multidimensional scaling. The data provided is first loaded and the first six columns are viewed
using the head() method. The data contains seven columns where the six of the columns are
numerical while the column university contains character names. The data has no missing
values or any duplicated rows. Using the summary() to get a brief statistical summary of the
data.

In order to perform multidimensional scaling in R, the data has to be converted into an array or
matrix. This involves first removing the first column from the data set and assigning the same to
as the row values. The Euclidean distance is then calculated between the various universities.
Euclidean distance is used to measure the similarity between the variables. Displaying the
Euclidean distances in a heat map to explain the similarity. Basing on the heatmap, the
universities Havard, Yale, Stanford and MIT are seen to have a strong correlation hence more
similarities in term of variables provide while there is huge dissimilarity in universities such as
Havard and TexaxA&m while a less dissimilarity between Georgetown and yale.
For Classical multidimensional scaling, the compute classical (metric) multidimensional scaling
(cmdscale()) stats package is used. This works by preserving the original distance metric
between points. The output contains the preserved distances, the eigenvalues returned, the
doubly centred distance matrix, the additive constant and the numeric vectors. From the plot,
the distance between the universities Havard, Standford, Yale, MIT, Duke, Columbia is less
hence shows the similarities. The distance to from the same university to Pennstate, Umchigan
and TexasA&M is largely showing the dissimilarities between these universities. CalTech and
JohnsHopkins have high negative points showing their dissimilarities to the other universities.

The Kruskal’s Non-metric multidimensional scaling provides a stress value of 4.48976. This
represents the goodness of fit of the regression based on the sum of squared differences. This
is a high value the dissimilarities in the universities.

Finally, deploying the principal component analysis to the data using the prcomp() function of
the numerical column and factoextra package for visualisation. Screeplot displays a graph of
inertia against principal components. Principal components are created in order of the amount
of variation they cover. PC1 covers the most variation, PC2 the second-most while PC6 covers
the least variation. Fviz_eig() displays an elbow-shaped which is the ideal cutting off point after
which it flattens out.

PC1 explains 76.8% of the total variance, which means that nearly two-thirds (four variables)
can be encapsulated by just that once principal component. Grad, Top10, SAT, and Expense all
contribute to pca1. From the biplot of individuals and variables, CalTech and JohnsHopkins
universities are observed at a distance in pca1 compared to Havard, MIT, Duke, Yale which
shows the dissimilarities.
Some of the advantages of multidimensional scaling over principal component analysis include;
● MDS is more focused on relations among the scaled objects while PCA is focused on
the dimension seeking to maximize explained variance.
● MDS projects data point in 2-dimensional space such that similar objects are closer
together while PCA projects a multidimensional space to the direction of maximum
variability using correlation matrix for analyzing the correlation between data points.

You might also like