Home Browse Hybrid prediction models for assessing the Higher Education Institutions...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Hybrid prediction models for assessing the Higher Education Institutions Performance in QS World Institution Rankings

[version 1; peer review: 1 not approved]

Chandana Sri Basireddy¹, Vishwanth Kumar Goud Cheruku¹, Prabadevi B¹, Sivakumar Rajagopal², Rahul Soangra³

Chandana Sri Basireddy¹, Vishwanth Kumar Goud Cheruku¹, [...] Prabadevi B¹, Sivakumar Rajagopal², Rahul Soangra³

PUBLISHED 17 Dec 2024

Author details Author details

¹ School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu, 632-014, India
² School of Electronics Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu, 632-014, India
³ Crean College of Health and Behavioral Sciences, Chapman University, Orange, California, USA

Chandana Sri Basireddy
Roles: Conceptualization, Methodology, Writing – Original Draft Preparation

Vishwanth Kumar Goud Cheruku
Roles: Conceptualization, Data Curation, Methodology, Writing – Original Draft Preparation

Prabadevi B
Roles: Conceptualization, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Sivakumar Rajagopal
Roles: Formal Analysis, Supervision, Writing – Review & Editing

Rahul Soangra
Roles: Visualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

Quality Education is one of the primary requirements for the best survival. Pursuing higher education in a highly reputed institutions makes much difference in shaping the career of the individual. As many ranking and accreditation boards for higher education institutions like NAAC is prevalent, World ranking distinguishes institution reputation globally. The QS World University Ranking is a vital gauge for learners, educators, and institutions all over the world, allowing them to analyze and compare the quality and reputation of higher education. Predicting these rankings is difficult due to data availability concerns and QS’s frequent methodology revisions. Subjectivity and narrow criteria in rankings hamper the assessment of university greatness even more. Machine learning, data scraping, model adaptability, algorithm reversal, and short-term predictions are some existing ways of dealing with these difficulties. In this research, a prediction model for assessing institution performance in the QS World institution Rankings is designed using hybrid machine learning algorithms and optimization techniques.

Two algorithms surpass others in forecasting ranks, according to the analysis. These hybrid models improves prediction accuracy of QS world rankings by integrating data analysis with model optimization using particle swarm optimization and Tabu search method.

Keywords

QS World University Ranking, Hybrid Machine learning algorithms , Data analysis, Educational competitiveness, Prediction accuracy.

Corresponding author: Prabadevi B

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Basireddy CS et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Basireddy CS, Cheruku VKG, B P et al. Hybrid prediction models for assessing the Higher Education Institutions Performance in QS World Institution Rankings [version 1; peer review: 1 not approved]. F1000Research 2024, 13:1529 (https://doi.org/10.12688/f1000research.155847.1) First published: 17 Dec 2024, 13:1529 (https://doi.org/10.12688/f1000research.155847.1) Latest published: 17 Dec 2024, 13:1529 (https://doi.org/10.12688/f1000research.155847.1)

1. Introduction

Higher education institutions, such as universities and colleges, are assessed and compared using lists or other methods that consider a variety of criteria and aspects. These rankings are invaluable resources to help prospective students and their families make well-informed decisions about where to pursue higher education. Rankings help students find colleges that best fit their academic and professional objectives by offering insights into the schools’ general caliber, standing, and effectiveness. Additionally, they can assist establishments in identifying areas in need of development and formulating plans to elevate their profile in the international higher education arena. University rankings are vital for several reasons and benefit many stakeholders in the higher education system. First, they help students and their families make well-informed decisions by offering a prompt evaluation of the caliber and standing of an institution and by assisting in the selection of colleges that complement academic and professional objectives. Rankings also function as a type of quality control, highlighting that universities continually provide resources, research, and instruction of the highest caliber. Rankings assist scholars in identifying colleges that are well-known in particular subjects, which facilitates the search for partners and specialized resources.

Furthermore, elite professors, researchers, and students are drawn to highly regarded universities to enhance the learning environment and promote intellectual diversity. Internationally recognized rankings offer global awareness and can help improve an institution’s reputation and appeal to donors, which can aid fundraising efforts. For universities, self-evaluation and benchmarking are essential tools that support their efforts to improve quality and pinpoint opportunities for development. Rankings help government agencies and legislators make decisions on how to allocate resources and advance transparency and public accountability in higher education. Owing to its ability to analyze vast datasets, discover patterns, and make informed forecasts, machine learning plays a critical role in prediction across a wide range of domains, including forecasting stock prices, currency exchange rates, market trends, assisting investors and traders, recommendation systems, autonomous vehicle decision-making, medical condition diagnosis, outcome prediction, and disease risk assessment. Its use is widespread and growing. Machine learning plays an important role in predicting university rankings, because it allows institutions to combine historical and various data sources to create more accurate forecasts of their future rankings. These forecasts enable institutions to take proactive steps to improve their academic and research skills, faculty qualifications, and overall appeal to students and researchers, thereby increasing their global ranking.

In this study, we used machine learning methods to predict QS World University rankings. We start with cleaned, preprocessed data and divide them into two parts: 70% for training and 30% for testing. To construct a prediction model, we applied two different groups of algorithms such as Hybrid machine learning optimization algorithms, including i) Ridge Regression, Long Short-Term Memory (LSTM) and LightGBM, ii) Particle Swarm Optimization (PSO) and Tabu search (TS) with base models, and iii) PSO and TS with hybrid models. Hybrid machine learning non-optimization algorithms include i) Gradient Boosting (GB) and k-nearest neighbors (KNN); ii) Random Forest (RF) and XGBoost (XGB), Support Vector Machine (SVM), Neural Network (NN), and GBM; and iv) NN and XGB. Metrics including the Coefficient of Determination (R-squared) Score, Mean Square Error (MSE), Root Mean Square Error (RMSE), and Accuracy Percentage (%) were then used to compare the outcomes. Finally, based on these metrics, we determined which model is more effective for predicting university rankings.

2. Literature study

University rankings have evolved as a significant means of measuring the quality of higher education institutions in response to global demands and the demand for transparency and efficiency in public organizations. In Ref. 1, Stefan Wilbers and Jelena Brankovic delved into the historical-sociological pillars of university rankings, concentrated in the United States. This study highlights the shift in the perception of organizational success that occurred during the postwar period, a shift that was impacted by functionalism’s ascent as the prevailing theory. An indication of this shift is the grading system for graduate programs, which was established by the ACE, the American Council on Education, and the National Science Foundation between 1966 and 1970. In the 1970s, social scientists began to examine the criteria and utility of categorizing institutions of higher learning, and they still do so now. Dembereldorj² assessed the impact of worldwide university rankings on institutions around the world, underlining their significance in both developed and developing countries. While universities in developing economies place more emphasis on research intensity, institutions in advanced economies prioritize competitive competency to achieve top rankings. This study makes the case that resource constraints and the demand for institutional or competitive competency drive worldwide rankings, which in turn actively affect higher education.

Vernon et al.³ insist on the necessity of making real efforts to raise the caliber of research, focusing on developing novel standards that organizations can use to evaluate and enhance their social contributions. Prioritizing quality above quantity is essential for confirming efforts to boost research output, which leads to advancements in science, economic expansion, and public health. The study highlights the deficiencies of current standards and recommends evaluating research outcomes in three dimensions–scientific influence, economic effects, and public health consequences–to provide a comprehensive assessment of research performance within the context of an academic institution. In Ref. 4, Leah Dowsett focused on how Australian universities developed long-term strategies for worldwide rankings over fifteen years by assessing four colleges and discovered that these rankings greatly influenced their research programs. Institutions’ market positioning and rankings have improved as a result of their proactive engagement with the rankings and efforts to influence and react to them. Murat Perit C¸ akır et al. compared and contrasted worldwide and national university ranking systems⁵ to highlight divergent views. Global rankings concentrate on research achievement with fewer parameters, whereas national rankings include a wider range of indicators such as institutional and educational components. The data show little association between country and global rankings, with a few exceptions. In some countries such as Brazil, Chile, and Poland, national rankings are predicted by global rankings, suggesting a relationship between research success and educational criteria. Owing to the challenge of acquiring accurate per capita data for global rankings, bibliometric criteria related to size have gained increasing attention. National rankings are becoming increasingly prevalent, especially in underdeveloped nations. This creates opportunities for benchmarking and comparative studies that can enhance international ranking systems and provide information on higher education.

In Ref. 6, Adina-Petruta Pavel compares and contrasts the methodology, standards, and effects on stakeholders of three influential worldwide university rankings: Quacquarelli Symonds (QS), The Times Higher Education World University Rankings, and the Academic Ranking of World Universities (ARWU), commonly referred to as the Shanghai Ranking. Notably, the global rankings place research ahead of instruction. The findings inspire institutions to improve their operations and place greater emphasis on industry collaboration and innovation. Friso Selten et al. analyzed the techniques of the three significant global university rankings stated above in Ref. 7 and discovered that, while these rankings are somewhat stable over time, there are differences between them. Applying factor analysis with principal component analysis, the study indicates that the differences in these rankings fundamentally evaluate two factors: university prestige and research performance. By correlating and visualizing these data, it is possible to discern between ranks. This paper responds to common criticisms of the ranking technique by stating that the variables may not accurately represent the concepts they are supposed to represent. It emphasizes how difficult and imprecise it is to use rankings to judge a university’s progress.

In Ref. 8, Moskovkin et al. explored the global battle for university reputation, which began in 2003, focusing on quantitative comparisons of World University Rankings by ARWU, QS, and THE from 2014 to 2018. The study discusses the remaining challenges with ranking consistency, as well as the approach of integrating the number of institutions and Overall Scores by nation. This study examines differences in university rankings to identify the top 100 rankings that are most and least consistent within the top 100 rankings. It also delivers the most favorable aggregated indicator values for the US and United Kingdom. Shehatta and Mahmood collected, examined, and analyzed the top 100 institutions qualitatively and quantitatively according to six significant global rankings published in 2015.⁹ The six global rankings that were selected were the Academic Ranking of World Universities (ARWU), Quacquarelli Symonds World University Rankings (QS), Times Higher Education World University Rankings (THE), National Taiwan University Ranking (NTU), US News & World Report Best Global University Rankings (USNWR), and University Ranking by Academic Performance (URAP). For comparison, the number of overlapping universities and the Pearson’s and Spearman’s correlation coefficients between every pair of the six global rankings under investigation were used. They found similar traits or independent factors used to determine rank in all the ranking systems studied. Bublyk et al. explored the elements impacting global university rankings in Ref.10 using data from the QS World University Rankings. It employs statistical and correlation analyses and focuses on how the ranking of Lviv Polytechnic National University has changed over time. This research identifies trends, benefits, and drawbacks and establishes a framework for worldwide university growth plans. It also assesses information security compliance, and provides comprehensive strategic recommendations for future progress.

Hasan and Abuelrub¹¹ compared the usability of Jordan’s top three university websites on the outcomes of the heuristic assessment method and Eduroute, one of the main ranking systems. The findings suggest that information regarding the general usability of university websites can be obtained from Eduroute’s rankings. The heuristic evaluation technique used in this study also revealed common usability problems found on university websites. Mahesh provided an overview of several machine learning techniques that might be applied to predict and categorize the data provided in Ref. 12, which serves as a foundation for subsequent predictions of university rankings. Liu et al.¹³ examined the institutions and their rankings in each country and discovered a correlation between six indicators: academic credibility, employer credibility, staff/student ratio, citations per staff, global staff ratio, and global student ratio. ML algorithms such as There are three methods used: XGBoost, random forest, and linear regression. The mean absolute error, mean squared error, and root mean squared error were used to gauge the accuracy of the models. The results demonstrate that XGBoost has lower-precision measurement values. The lower the value, the more accurate the prediction. In Ref. 14, Gadi Himaja et al. compared and contrasted various regression techniques to recommend a rank prediction system to a national institute. They identified the most accurate one using evaluation metrics such as R2, MAE, MSE, and RMSE, and they chose the Random Forest using threshold value comparison and z-score calculation.

In Ref. 15, Vaibhav Singh et al. used a standardized database rating of the world’s universities by Times Education. The dataset was split into test data from 2016 and training data from 2011 to 2015. Using linear regression, they calculated the expected rank score for teaching, research, citations, and international orientation. Finally, they used the expected total rank score to rank the universities globally. Tabassum et al.¹⁶ developed a global university rank prediction algorithm using The Times Higher Education World University Ranking, which was established in the United Kingdom in 2010. Data analysis of prior university rankings by country was conducted to discover the most influential elements or indications for prediction. Outlier detection algorithms were utilized to generate predicted scores, whereas the rank_score_calculate algorithm was used to predict the feature scores. Universities have been ranked worldwide. Methods for evaluating the suggested model: The number of matched ranks vs. rank deviation, recall, ROC curve, and accuracy vs. deviation are all variables to consider. Li¹⁷ discovered that utilizing linear regression to evaluate various indicators can better predict the comprehensive score of colleges based on many features. They used a radar graphic to compare the indicators they looked at. Following the investigation, faculty quality was found to be a key determinant of institution ranking. In Ref. 18, Dr. Prakash Kumar Udupi et al. created machine learning models to anticipate global rankings and examined the Quacquarelli Symonds (QS) method of analyzing university rankings worldwide. This study analyzes the information using exploratory data analysis and then evaluates machine learning algorithms using regression approaches to predict worldwide rankings. The QS system rankings are divided into three categories: worldwide overall rating, regional, and global ranking according to the subjects. Key performance metrics, including teaching, research, employability, university mission, and internationalization, form the foundation of QS global rankings. Boosting regression makes use of the Gaussian loss function to improve the predicted outcomes. The same evaluation metrics as those in Ref. 13 were used.

In Ref. 19, Estrada-Real et al. highlighted that we undertook a Feature Selection exercise to assess the validity of the six indicators using the Recursive Feature Elimination (RFE) method, with the rationale for using the QS technique with the availability of data on their website for ranking purposes. Using supervised machine learning techniques, including multiple regression with panel data, logistic regression, decision trees, random forests, and support vector machines, they created a prediction model with categorical response variables based on the generated training data. Test data and statistical measurements, including R2, p-value, accuracy, sensitivity, specificity, confusion matrix, receiver operating characteristic (ROC), and Area Under the Curve (AUC), were used to evaluate the model’s output. Both logistic regression and random forest analyses yielded comparable results. In Ref. 20, Yan-yan SONG and Ying LU investigated how decision trees are commonly employed in data mining to construct classification systems and forecasting models. In Ref. 21, Nishi Doshi et al. used exploratory data analysis (EDA) to analyze ranking data using correlation heatmaps and box plots, and introduced a novel method for extracting decision paths for rank improvement using Decision Tree (DT) algorithms and data visualization. The approach, which has been refined with Laplace correction, provides institutions with a quantitative means to analyze development potential, plan long-term actions, and design distinctive success roadmaps. Sziklai applied The Weighted Top Candidate (WTC) approach in Ref. 22 to rank smaller academic institutions in specific research areas. Jajo and Harrison provided an overview of the available ranking systems in Ref. 23, including the founder, year, range, and measures. They employed partial least squares path modelling to create an achievement index that can be used to compare university performance across several ranking systems simultaneously. Roba Elbawab utilized unsupervised machine learning to group institutions.²⁴ The results divided the universities into four categories. Mohamed El Mohadab et al. evaluated a variety of classification strategies,²⁵ including supervised, semi-supervised, and unsupervised learning. The study used performance metrics, such as F-Measure, Precision, GMAP, NDCG, MAP, and Recall, to determine relevant characteristics within the research setting.

Furthermore, the ensembled models constructed using SVM, KNN, MLP, decision trees, random forest and Logistic regression algorithms with fast Fourier transformers were used in Ref. 26 for predicting international ranks of HEIs utilizing the Shanghai world ranking dataset. Wardley et al, have used different machine learning models to rank universities in Canada on different categories and recommended gradient boosting as the best performer.²⁷ Also, the predictive power of the ensembled models was further improved by trimming the low-performing model and optimized through heuristics, in predicting international rankings.²⁸ The power of machine learning models was further leveraged for predicting research performance indicators,²⁹ job satisfaction analysis³⁰ and financial stability planning³¹ in HEIs. Few attempts to rank HEIs using multicriteria decision-making models with machine learning and NLP were carried out.³²^–³⁵ These models were constructed based on subjective and objective measures to strengthen the results.

Our survey of research papers culminated in a systematic tabulation and is tabulated in Table 1 (Extended data), capturing essential aspects including Dataset Diversity, Ranking Level, Number of Algorithms employed, Highest Accuracy attained, identification of the Best Algorithm, and an insightful overview of limitations. This structured presentation offers a concise and comprehensive snapshot of the various methodologies and outcomes prevalent in the landscape of research endeavors.

1.4.1 Methods

Figure 1 shows the proposed methodology.

1. Data collection:
Data collection is a methodical process of obtaining and compiling pertinent information from various sources, including text documents, databases, sensor networks, and websites, to create an extensive and organized QS World University ranking dataset for further analysis and knowledge discovery. The data must be extracted, transformed, and loaded into an appropriate repository. This process yields an enormous amount of raw data.
2. Data preparation:
Data preparation is a two-step process that is essential for managing the data gathered from various sources. The first is data inspection, which is the process of carefully reviewing and evaluating a dataset to obtain a general idea of its quality, features, and organization. The preprocessing of data comes in second. Data loading is the first step: compiling data from many sources and storing them in a single repository. To guarantee data quality, data cleaning entails locating and fixing mistakes, inconsistencies, and outliers. While imputation fills in missing values to make a dataset more complete, transformation reworks the data to satisfy the needs of the analysis. Labeling is used to classify or tag data points for supervised learning. Understanding the distribution and trends of the data through visualization helps to make well-informed judgments regarding preprocessing. By guaranteeing that data features are on a consistent scale, normalization helps to avoid biases in the analysis performed later. Ultimately, splitting creates a clean, well-organized dataset that is prepared for data mining or machine-learning tasks by dividing the dataset into test, validation, and training sets to create and assess models.
Feature engineering:
Feature engineering uses cleaned data. It includes both feature selection and data transformation to produce machine learning model input variables that are both optimal and informative. Feature selection lowers the dimensionality and boosts the model effectiveness by selecting a subset of the most pertinent and discriminative characteristics from the original dataset. However, data transformation includes methods, such as addressing missing values.
- • Encoding categorical variables: The majority of machine-learning methods require numerical input data formats. Numerical values must be assigned to categorical variables that represent the labels or categories. There are a few ways to accomplish this, such as label encoding and one-hot encoding.
  One-Hot Encoding: In this method, each category in a categorical variable has a binary column. Categorical data are converted into a numerical representation by each binary column, which indicates whether a category is present or absent.
  Label Encoding: Using this technique, every category is given a distinct number. Although it simplifies the data, not all algorithms are compatible with it.
  To convert categorical columns to numerical columns, label encoding was applied here, which assigns a unique integer to each category. In the following dataset the columns ‘Institution Name’, ‘Country Code,’ ‘Country, ’ ‘SIZE,’ ‘FOCUS, ’ ‘RES,’ and ‘STATUS’ into numerical values (See Figure 3).
- • Scaling numerical features: The size of the numerical characteristics affects many machine-learning techniques. By guaranteeing that every numerical characteristic has the same scale, scaling prevents certain features from predominating over others. This is a common preprocessing step to ensure that all features have a similar scale. Typical scaling methods include the following.
  Standardization (Z-score normalization): The features are scaled to have a mean of 0 and a standard deviation of 1.
  Min-Max Scaling: This transforms features into a specified range, typically [0, 1] or [-1, 1].
  Robust Scaling: It Scales features based on the median and interquartile range, making it robust to outliers.
  To ensure that all the features have a similar scale, min–max scaling was applied here, which can be essential for certain machine learning algorithms and constructing new characteristics through mathematical operations. These methods ultimately refine the data for use in the models. By retaining important information and eliminating noise, both features of feature engineering seek to improve the predictive ability of the model while ensuring that the data are properly organized and ready for the model to learn from.
3. Data splitting:
Relevant Features were extracted from the dataset. A dataset is normally split into two subsets via data splitting: training set and test set. 30% of the dataset is typically utilized for testing, and 70% is used for training. By creating the machine learning model historical data, the training set helps to understand patterns and relationships within the dataset. The testing set functioned as an independent dataset for evaluating the model’s performance and capacity to generalize to new, unseen data because it was not observed by the model during training. A crucial indicator of a model’s efficacy is its ability to predict outcomes from data that it has never seen before, which is assessed with the aid of this segmentation.
4. Model construction:
Model construction is a complex process that involves creating, adjusting, and assessing predictive models to address certain issues or tasks. It encompasses a variety of methods, including optimization and non-optimization algorithms, each with specific abilities and purposes. To improve model performance, accuracy, and robustness, hybrid machine learning optimization techniques integrate different machine learning models to capitalize on their unique advantages. Conversely, hybrid machine learning non-optimization algorithms combine many machine learning techniques without requiring explicit optimization methods. Experimentation with various combinations, adjusting hyperparameters, and evaluating performance using metrics and validation methods are all part of the model creation process. The ultimate objective is to develop a reliable and robust model that can produce precise predictions of novel, unproven data.
In the framework of the research study under discussion, a variety of machine-learning models were employed for simulation purposes in both classification and regression tasks. The system models used were as follows:
Ridge Regression Long Short-Term Memory (LSTM) Light Gradient Boosting Machine (LightGBM) XGBoost (Extreme Gradient Boosting) model
Ridge Regression hyperparameter optimization uses the particle swarm optimization (PSO) technique.
The Tabu Search technique is used in Ridge Regression to optimize the hyperparameters.
KNN model: K-Nearest neighbors
Support Vector Machine (SVM) Random Forest Model
Gradient Boosting model (more specifically, gradient boosting classifier and regressor) Linear Regression model
These models were used to carry out tasks such as regression and classification on the provided dataset, and each model has unique strengths and applicability based on the particulars of the data and the specific task at hand.
Hybrid Models (Optimization Algorithms):
- 1. Ridge Regression, Long Short-Term Memory and LightGBM model (HRRM+LSTM+LightGBM) (Hybrid Model 1)
  - ➢ Ridge Regression (HRRM): This technique is used because it is easy to understand and straightforward, but it might not be able to identify the intricate trends seen in the data.
  - ➢ Long Short-Term Memory (LSTM): To handle temporal dependencies and sequences in the ranking LSTM is used to store data.
  - ➢ LightGBM: LightGBM includes gradient-boosting capabilities and is efficient in handling large datasets with numerous features. This combination guarantees that the ranking prediction considers both structural and temporal variables, making it a complete solution.
- 2. PSO and Tabu search with base models (PSO + TS - base models): (Hybrid Model 2)
  - ➢ The base model optimization techniques are PSO and TS. To improve the model ensemble, basic models were merged with PSO and Tabu Search. By determining the ideal weights for each model, these algorithms optimize the combination of basic models and increase prediction accuracy. The total performance was improved by this hybrid technique, which successfully balanced the contributions of each base model.
- 3. PSO and Tabu search with hybrid models (PSO+TS - Hybrid models): (Hybrid Model 3)
  - ➢ Hybrid models, which integrate a variety of machine learning methods, are optimized using PSO and Tabu Search, similar to the base models. PSO and TS improve the overall accuracy of the hybrid model by choosing appropriate weights for each component model. This method effectively integrates the capabilities of many algorithms while considering them.
Hybrid Models (Non-Optimization Algorithms):
- 1. XGBoost and Neural Network (XGB + NN): (Hybrid Model 4)
  - ➢ XGBoost (XGB): XGBoost has a solid track record in forecasting outcomes, particularly with structured data. This provides the hybrid model with the benefit of gradient boosting.
  - ➢ Neural Network (NN): Complex patterns can be effectively captured by neural networks. Neural networks and XGBoost offer a balance between organized and unstructured data.
- 2. Gradient Boosting and k-nearest neighbors (GB+KNN): (Hybrid Model 5)
  - ➢ Gradient Boosting (GB): Combining the predictive power of several models, Gradient Boosting works well in ensemble learning.
  - ➢ The k-nearest neighbor (KNN) algorithm helps identify certain patterns within the data. By combining GB and KNN, one may use their complementary abilities to increase prediction accuracy.
- 3. Random Forest and Support Vector Machine (RF+SVM): (Hybrid Model 6)
  - ➢ Random Forest (RF): RF can handle structured data and is a robust algorithm.
  - ➢ Support Vector Machine (SVM): The SVM is a useful model for representing complex decision borders. An all-encompassing strategy is ensured when RF and SVM are combined, particularly in nonlinear ranking predictions.
- 4. Support Vector Machine, Neural Network and Gradient Boosting model (SVM+NN+GB): (Hybrid Model 7)
  - ➢ Support Vector Machine (SVM): SVM addresses data nonlinearity.
  - ➢ Neural Network (NN): An NN can process unstructured data effectively.
  - ➢ Gradient Boosting (GB): The GB strengthens the group. The hybrid model improves ranking predictions by combining the adaptabilities of different methods.
The idea behind hybrid models is to exploit the advantages of many algorithms while mitigating the shortcomings of each separately. To improve the forecast accuracy and offer more reliable solutions for university ranking predictions, these hybrid models integrate complementary methodologies. To enhance the overall performance, they handle a variety of data characteristics, such as complicated patterns, temporal relationships, and organized and unstructured data. A well-rounded strategy is ensured via hybridization, which improves performance on the difficult task of accurately predicting university rankings.
5. Comparison of models:
To determine which model is most appropriate for a certain task or problem, it is necessary to compare and evaluate multiple models by methodically analyzing their performance, predictive ability, and applicability. Numerous evaluation indicators are typically used in this comparison. The final objective is to choose the model that offers the optimum balance between deployment-related practical factors and predictive performance, resulting in a dependable and efficient solution to the issue at hand.
6. Choosing the best model:
A methodical procedure for creating assessment measures, training and cross-validating several models, comparing their performance, and considering real-world implications such as resource needs and interpretability is necessary to choose the optimal model. The final decision should be based on domain-specific knowledge and business requirements, and the documentation of the entire process is essential for reproducibility and transparency. This ensures that the selected model is in line with the project’s specific goals and provides the best balance between practicality and predictive performance.
7. Accurate QS rankings:
Accurately predicting university rankings, such as the QS World University Rankings, is a difficult endeavor that depends on several variables, including international diversity, faculty qualifications, research output, and academic reputation. The model chosen for this study was utilized for this purpose. The creation of an intricate predictive model that considers various pertinent aspects and data sources is necessary to achieve high prediction accuracy.
Comparison:
In particular, the QS World University rankings are the subject of our work, which employs a hybrid technique combining both optimization and non-optimization machine-learning algorithms to forecast university rankings. It uses hybrid models that combine these methods as well as Ridge Regression, Long Short-Term Memory (LSTM), LightGBM, Particle Swarm Optimization (PSO), and Tabu Search (TS). To guarantee accurate predictions, this study highlights the need for quality assurance and data pre-treatment. The model’s performance was evaluated using measures such as RMSE, MAE, R2-squared, and accuracy %, as tabulated in Table 3. On the other hand, several literature-based comparison studies also explore university ranking predictions, mainly through the application of machine learning techniques. This study used comparable data preparation techniques and assessment measures., they vary, in the particular models they use, the feature selection techniques they use, and the analysis’s focus. Some focus on certain parts of the ranking, such as decision trees, clustering, and ranking systems other than QS. The hybrid model approach we introduced and the insightful information we provided on enhancing ranking accuracy and decision making in higher education establishments make our work noteworthy. This aligns with the overall objective of these comparative studies, which is to improve the caliber of university-ranking systems. We examined important factors, including dataset size, applied optimizations, accuracy measures, and related remarks in our comparative analysis of the research articles. This thorough review offers a sophisticated comprehension of the methods utilized, enabling a critical analysis of the study strategies concerning these pivotal elements, which are shown in Table 7. Using the scatter plots shown in Figure 4 and Figure 5, our study provides a visual depiction that clarifies the effectiveness of the hybrid strategy as well as the complex interactions between characteristics. These charts offer a sophisticated viewpoint that makes it easier to see how well the hybrid model performs and to identify trends in the attributes of the dataset. Following rigorous comparisons between different algorithms and calculating relevant metrics, we combined the data into a table, as shown in Table 5 and Table 6 and an extensive bar plot see Figure 7 and Figure 8. This graphic depiction provides a clear and insightful summary of the relative performance and important information about the effectiveness of various algorithms within our framework.

Figure 1. Proposed architecture.

Figure 2. The used dataset's correlation heat map.

Figure 3. Conversion of categorical values to numerical values.

Table 2. The configuration of the hyperparameters.

The hyper parameters' configuration:
S.No	Quantitative	Number of quantity
1	Entry Level	18
2	Covered Layer	64 and 32
3	Function of Activation	ReLU
4	Iteration Boundary	50
5	Function of Cost	Mean Squared Error
6	Size of Batch	32

Table 3. Performance comparison of hybrid algorithms.

Hybrid ML Optimization Algorithms	RMSE	MSE	R-Squared Score	Accuracy (%)
Hybrid Model 1	3.25	1.98	0.29	90.23%
Hybrid Model 2	3.22	1.99	0.31	90.33%
Hybrid Model 3	3.20	1.94	0.32	89.97%

Table 4. Performance comparison of hybrid algorithms.

Hybrid ML Non-Optimization Algorithms	RMSE	MSE	R-Squared Score	Accuracy (%)
Hybrid Model 4	3.64	2.48	0.12	89.06%
Hybrid Model 5	3.29	1.98	0.28	90.13%
Hybrid Model 6	3.30	1.80	0.28	90.11%
Hybrid Model 7	3.34	2.06	0.26	89.97%

Table 5. Comparison of the metrics of the considered algorithms.

ML Optimization Algorithms	RMSE	MSE	R-Squared Score	Precision	Recall	F1 Score	Accuracy Percentage (%)
HRRM	3.70	13.70	0.09	0.88	0.98	0.93	86.33
LSTM	11.38	129.71	-7.55	0.93	0.96	0.95	85.33
LightGBM	3.18	10.12	0.33	0.91	0.96	0.94	87.67
PSO	3.70	13.70	0.09	0.87	1.00	0.93	87.67

Table 6. Comparison of metrics of the considered algorithms.

ML Non-Optimization Algorithms	RMSE	MSE	R-Squared Score	Precision	Recall	F1 Score	Accuracy Percentage (%)
XGB	3.35	11.22	0.25	0.94	0.94	0.94	78.33
NN	4.73	22.38	-0.47	0.93	0.89	0.91	75.00
GB	3.69	13.63	0.10	0.91	0.94	0.92	77.33
kNN	3.62	13.13	0.13	0.89	0.94	0.91	75.67
RF	3.06	9.42	0.37	0.92	0.95	0.93	78.33
SVM	3.82	14.63	0.03	0.87	1.00	0.93	86.00
LR	3.70	13.70	0.09	0.88	0.96	0.92	85.33

Table 7. Comparison with existing approaches.

Ref.	Dataset size	Optimization	Accuracy	Remarks
¹	X	X	X	Historical-sociological account
²	X	X	X	Institutional Competence
³	Generic Higher Education Data	X	High	Systematic review
⁴	Australian Institutional Performance Data	X	Moderate	Case study of Australian performance
⁵	Global and National University Ranking Data	X	Moderate	Comparative analysis
⁶	Global University Ranking Data	X	High	Comparative analysis
⁷	Research Paper Ranking Data	Supervised learning	High	Rank prediction
⁸	World University Ranking Data	Quantitative analysis	Moderate	Comparative analysis
⁹	Global University Ranking Data	X	Moderate	Policy implications
¹⁰	QS World University Ranking Data	X	High	Strategic analysis
¹¹	University Website Usability Data	X	Low	Usability prediction
¹²	X	X	X	Review of machine learning algorithms
¹³	QS World University Ranking Data	Data mining	High	Rank prediction using data mining
¹⁴	National Institute Ranking Data	Machine learning	High	Rank prediction for national institutes
¹⁵	X	X	X	Rank prediction system
¹⁶	Global Performance Indicator Data	X	X	Analysis of global performance indicators
¹⁷	University Comprehensive Score Data	Regression analysis	X	Comprehensive score prediction
¹⁸	Global University Ranking Data	Machine learning regression	High	Global ranking prediction
¹⁹	QS World University Ranking Data	Data Analytics	X	Competitiveness analysis
²⁰	X	Decision tree methods	X	Decision tree applications
²¹	University Ranking Improvement Data	Data-driven strategy	Moderate	Rank improvement using decision trees
²²	X	X	X	Academic excellence ranking
²³	X	Partial least squares path modeling	X	Alternative ranking approach
²⁴	X	Cluster analysis	X	Goals and cluster analysis
²⁵	Research Paper Ranking Data	Supervised learning	Moderate	Rank prediction using supervised learning

Figure 4. The scatter plot for the hybrid approach.

Figure 5. Scatter plot of the features.

Figure 6. a. Comparison of performance of the hybrid algorithms Model 1 to Model 3. b. Comparison of performance of the hybrid algorithms Model 4 to Model 7.

Figure 7. a. Comparison of individual metrics of the hybrid algorithms HRRM, LSTM, LightGBM and PSO. b. Comparison of combined visualisation of all metrics for hybrid algorithms HRRM, LSTM, LightGBM and PSO.

Figure 8. (a). RMSE contrasts the algorithms under consideration. (b). MSE comparison of the considered algorithms. (c). R-squared comparison of the considered algorithms. (d). Accurate comparison of the algorithms under consideration. (e). Recall comparison of the algorithms considered. (f ). Comparison of the F1 Scores of the evaluated algorithms. (g). Accuracy comparison of the considered algorithms. (h). Comparison of the overall performance of selected algorithms.

3. Results & Discussions

1. Data description:

The dataset used in this study stems from the esteemed QS World University Rankings for 2024, a comprehensive assessment comprising 1,500 institutions across 104 global locations. With 1,499 rows and 29 columns, it encompasses vital attributes, such as 2024 and 2023 ranks, institution details, size, focus, research metrics, reputation scores, and newly introduced metrics for sustainability, employment outcomes, and international research networks. These additions reflect a methodological evolution, enabling a nuanced evaluation of universities’ commitment to social responsibility, employability, and global research collaboration. Notably, the dataset provides a holistic view, including academic reputation, employer perception, faculty-student ratios, citation impact, and internationalization indices. Widely recognized and respected, the QS World University Rankings influence global perspectives on higher education, making this dataset a valuable resource for researchers, policymakers, academics, and prospective students, offering multifaceted insights for education and higher education policy analysis. Because there are no irrelevant characteristics in the dataset, it contains correlated features, as shown in the correlation matrix in Figure 2.

2. Implementation requirements:
This research project was implemented in Python, leveraging various frameworks and libraries to address different aspects of the study. The primary programming language used was Python, and the project relied extensively on popular machine learning and deep learning frameworks, including scikit-learn, xgboost, lightgbm, and TensorFlow. These frameworks are complemented by essential numerical computing and data manipulation libraries such as Numpy and Pandas. To streamline the setup and ensure replicability, a requirements.txt file was provided, specifying the necessary dependencies and their corresponding versions. These dependencies encompass scikit-learn version 0.24.2, xgboost version 1.5.0, lightgbm version 3.2.1, TensorFlow version 2.7.0, numpy version 1.21.2, pandas version 1.3.3, and pyswarm version 0.6.1. The inclusion of this file facilitates straightforward installation of the required packages, ensuring compatibility across different computing environments. The evaluation metrics employed in this research project were tailored to the nature of the implemented machine-learning models. For classification tasks, the evaluation metrics included Accuracy, Precision, Recall, and F1 Score, providing a comprehensive assessment of the models’ performance. On the other hand, regression models are evaluated using RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and R2 Score. The experimental setup encompasses specific hardware configurations to ensure the execution of experiments under standardized conditions. The hardware requirements include a minimum of 4GB RAM, processor with a minimum clock speed of 2.26GHz, and storage capacity of at least 512MB. The experiments were conducted in a cloud environment, with Google Collaboratory and Jupyter Notebook by Anaconda serving as primary platforms for model development and evaluation. These platforms offer the necessary computational resources and collaborative features that are conducive to research workflow.
3. Data preparation

In our initial research phase, we worked with raw data, a dataset that included various college parameters including reputation ratings and rankings. Missing values presented analytical difficulties and required correction, which is typical for raw data. Categorical data, such as institution names, were numerically encoded for machine learning model compatibility, and scaling techniques were used to handle the different scales of numerical variables (see Figure 3). Mitigating possible influences on model performance requires the discovery and rectification of defects or outliers. After thorough preparation, we performed a transformational refinement of our dataset, which resulted in decreased errors, controlled outliers, and transformed categorical variables. The numerical feature scale homogeneity was guaranteed by standardization and normalization processes. The dataset was enhanced with new characteristics that came via mathematical processes, making it ready for a variety of analytical uses such as machine learning. Dimensionality reduction techniques were applied, which streamlined the dataset without compromising important information and accelerated the modelling process. Our dataset is best suited for model training and assessment after pre-processing, providing improved interoperability with different machine learning techniques.

4. Model construction:

Initially, we preprocessed the dataset in the given model generation procedure. This includes addressing missing data, encoding categorical variables, scaling numerical features, and generating newly derived characteristics. Next, the dataset was divided into the testing and training sets. We built hybrid prediction models that use optimization without optimization parameters. To evaluate the prediction performance of both models, we computed several evaluation measures, such as the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²) score. Interestingly, we also added a statistic called “accuracy percentage” to measure how accurate ranking predictions are within a given threshold. The results are presented in Tables 3 and 4. The results were synthesized using a comprehensive bar plot. A further simulation is shown in Figure 6 and the execution using the hyperparameter configurations is presented in Table 2. The proposed approach performed better in terms of convergence than other algorithms. Regarding the total RMSE, MSE, and R² Score the performance of the proposed method in comparison with the other algorithms is displayed in Figure 9.

5. Comparative analysis:
Figure 6(a) shows the performance metrics for the three hybrid models listed in Table 3. The bar chart clearly shows that Hybrid Model 2 performs better across most metrics, confirming the conclusions from Table 3.
Figure 6(b) shows the performance metrics for the four hybrid models listed in Table 4. Hybrid Models 5 and 6 seem to perform better than the other two models, aligning with the conclusions drawn from Table 4.
Figure 7(a) shows the individual metrics listed in Table 5. The subplots confirm that LightGBM performs well in terms of RMSE, MSE, and R-squared score, whereas LSTM excels in Precision, Recall, and F1 Score.
Figure 7(b) provides a combined visualization of all evaluation metrics for the four algorithms from Table 5. LightGBM and LSTM seem to outperform HRRM and PSO across most metrics, which is consistent with the conclusions drawn from Table 5.
Table 6 compares the performance of the seven-machine learning non-optimization algorithms. SVM has an RMSE value (3.82) and the highest accuracy (86.00%), suggesting that it may be the best-performing algorithm among the seven. LR and GB performed reasonably well across most metrics.
The RMSE plot shows that the Random Forest (RF) model has the lowest RMSE of 3.06, indicating the best performance in terms of prediction accuracy among the models tested. Conversely, the Neural Network (NN) model had the highest RMSE of 4.73, suggesting the poorest performance. Other models such as XGBoost (XGB), Gradient Boosting (GB), k-nearest neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR), have RMSE values ranging from 3.35 to 3.82, with XGBoost (3.35) and k-nearest neighbors (3.62) also performing relatively well.
In the MSE plot, the Random Forest (RF) model again demonstrated superior performance, with the lowest MSE of 9.42. The Neural Network (NN) model had the highest MSE of 22.38, reflecting its poor performance in predicting outcomes accurately. The XGBoost (XGB) model has an MSE of 11.22, showing good performance. Other models, such as Gradient Boosting (GB), k-nearest neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR), have MSE values between 13.13 and 14.63, indicating moderate performance.
The R-squared plot reveals that the Random Forest (RF) model has the highest R-squared value of 0.37, indicating the best fit to the data. The Neural Network (NN) model shows a significantly negative R-squared value of -0.47, indicating that it performs worse than a horizontal line (mean prediction). Other models such as XGBoost (XGB), Gradient Boosting (GB), k-Nearest Neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR) exhibit R-squared values ranging from 0.03 to 0.25, with XGBoost (0.25) performing comparatively better.
This plot shows the precision values for different models or methods, with the highest precision score of approximately 0.9. The XGB model achieved the highest precision of 0.94, whereas the LR model achieved the lowest precision of 0.88.
The recall values displayed in this plot were consistently high across all models or methods, with the SVM model achieving a perfect recall score of 1.0. The XGB, NN, GB, KNN, and LR models had recall scores ranging from 0.94 to 0.96, indicating excellent performance in terms of recall.
This plot presents the F1 scores, with the XGB model having the highest score of 0.94, whereas the remaining models exhibited scores clustered around 0.9.
The accuracy percentages shown in this plot ranged from 75.0% to 86.0%, with the SVM model achieving the highest accuracy of 86.0%. The NN model had the lowest accuracy of 75.0%, while most other models fell within the range of 75.67% to 78.33%.
This comprehensive plot compares multiple evaluation metrics across different models or methods, allowing for a side-by-side comparison of the performance. The metrics displayed include the RMSE, MSE, R-Squared, Precision, Recall, F1 Score, and Accuracy (%). This plot provides a holistic view of how models or methods are applied across various evaluation criteria.
In Figure 9(a), the combination of Ridge Regression, Long Short-Term Memory, and LightGBM (HRRM+LSTM+LightGBM) achieves a significantly lower Root Mean Squared Error (RMSE) compared to each individual algorithm used alone. This indicates a superior fit and potentially more accurate predictions.
In Figure 9(b), the hybrid models optimized with PSO and Tabu Search (PSO+TS-Hybrid Models) outperform models that combine these optimization algorithms with base models (PSO+TS-Base Models) across all metrics, including accuracy. This suggests that combining various machine learning methods within the hybrid models themselves leads to even better results.
The plot in Figure 9c compares the performance of three different models/approaches: “Proposed Approach”, “XGB”(Extreme Gradient Boosting), and “NN” (Neural Network). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9d compares the performance of three different models/approaches: “Proposed Approach”, “GB” (Gradient Boosting), and “kNN” (k-Nearest Neighbors). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9e compares the performance of three different models/approaches: “Proposed Approach”, “RF” (Random Forest), and “SVM” (Support Vector Machine). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9f compares the performance of four different models/approaches: “Proposed Approach”, “SVM” (Support Vector Machine), “NN” (Neural Network), and “GB” (Gradient Boosting). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other models across all metrics.

Figure 9. (a) Hybrid Model – 1. (b) Hybrid Model – 2. (c) Hybrid Model – 4. (d) Hybrid Model – 5. (e) Hybrid Model – 6. (f ) Hybrid Model – 7.

4. Conclusions and future work

Finally, by proposing and analyzing hybrid algorithms, we hope to improve the accuracy of predicting QS World University Rankings. With the lowest RMSE, highest R2 score, and commendable accuracy, Hybrid Model 3 emerged as a top performer using Particle Swarm Optimization and tabu search. Furthermore, Hybrid Model 5, which combines Gradient Boosting and k-nearest neighbors, outperformed the competition. Expanding deep learning techniques, such as tailored neural network designs and utilizing pretrained language models, are future directions for ranking prediction. Integrating multimodal data, stressing explainability, and addressing ethical problems such as fairness and transparency are critical.

Recognizing limitations, such as dataset specificity and the necessity for external validation, researchers should investigate qualitative characteristics while keeping in mind the changing landscape of higher education. Ethical issues, data quality improvements, and continual deep-learning breakthroughs are critical for refining ranking forecasts and ensuring their relevance and applicability in the changing sector of university assessments.

Ethics and consent

No ethical approval and consent needed.

Data availability

The QS rank dataset used to carry out this project has been obtained from Kaggle repository which can be downloaded from the link: https://www.kaggle.com/datasets/joebeachcapital/qs-world-university-rankings-2024

Extended data

Zenodo: Hybrid prediction models for assessing the higher education institutions performance in QS World Institution rankings: https://doi.org/10.5281/zenodo.14101002 ³⁶

The project contains the following data:

• Table 1

Creative Commons Attribution 4.0 International

Software availability

Source code available from: https://github.com/VishwanthCheruku/hybrid-machine-learning-algorithms

Data is available under MIT License

Archived software available from: 10.5281/zenodo.14000400

Data available under CC by 4.0 license

Creative Commons Attribution 4.0 International

References

1. Wilbers S, Brankovic J: The emergence of university rankings: a historical sociological account, Accepted: 5 October 2021, The Author(s) 2021, corrected publication 2022. High. Educ. Publisher Full Text
2. Dembereldorj Z: Review of the Impact of World Higher Education Rankings: Institutional and Institutional Competence. International. J. High. Educ. 2018; 7(3): 25. Publisher Full Text Reference Source
3. Vernon MM, Balas EA, Momani S: Are university rankings useful in improving research? A systematic review.March 7, 2018; 13. PubMed Abstract | Publisher Full Text | Free Full Text
4. Dowsett L: Global university rankings and strategic planning: A case study of Australian institutional performance. J. High. Educ. Policy Manag. 2020; 42(4): 478–494. Publisher Full Text
5. Çakır MP, Acartürk C, Alaşehir O, et al.: Comparative Analysis of Global and National University Ranking Systems. Scientometrics. 2015; 103: 813–848. Publisher Full Text Reference Source
6. Pavel A-P: Global university rankings - a comparative analysis, peer review under responsibility of Academic World Research and Education Center.Publisher Full Text
7. Selten F, Neylon C, Huang C-K, et al.: A longitudinal analysis of university rankings. Quant. Sci. Stud. 2020; 1(3): 1109–1135. Publisher Full Text
8. Moskovkin VM, He Z, Sadovski MV, et al.: Comprehensive quantitative analysis of TOP-100s of ARWU, QS, and THE World University Rankings for 2014–2018. Educ. Inf. June 2022; 38(2): 133–169. Publisher Full Text Reference Source
9. Shehatta I, Mahmood K: Correlation among the top 100 universities in the six major global rankings: policy implications. Scientometrics. 2016; 109: 1231–1254. Publisher Full Text Reference Source
10. Bublyka M, Slavaa O, Vysotskaa V, et al.: World Universities Strategic Analysis Based on Data from the QS World University Rankings, IntelITSIS’2023:4th International Workshop on Intelligent Information Technologies and Systems of Information Security, March 22–24, 2023. Khmelnytskyi, Ukraine. Reference Source
11. Hasan L, Abuelrub L: Is it possible to predict the usability of a university website using a university ranking system? Proceedings of the World Congress on Engineering 2013 Vol II, WCE 2013, July 3–5, 2013, London, U.K. 978-988-19252-8-2. ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online). Reference Source
12. Mahesh B: Machine-learning algorithms: A review, paper ID: ART20203995. Int. J. Sci. Res. 2319-7064. ResearchGate Impact Factor (2018): 0.28|SJIF (2018): 7.426. Publisher Full Text Reference Source
13. Liu X, Chen G, Wen S, et al.: Analysis and Prediction of QS World University Rankings based on Data Mining Technology, ICEMT 2022, July 13–15, 2022, Guangzhou, China © 2022 Association for Computing Machinery. ACM ISBN 978-1-4503-9645-5/22/07…$15.00.Publisher Full Text
14. Himaja G, Rao GS, Naidu GA, et al.: Recommendation system: National Institute rank prediction using machine learning. J. Algebr. Stat. 2022; 13(3): 146–152. 1309-3452. Reference Source Reference Source
15. Singh V, Rawat A, Kumar P: University Ranking Prediction System. IJSRD, International Journal for Scientific Research & Development. 2021; 9(6): 2321-0613. Reference Source
16. Tabassum A, Hasan M, Ahmed S, et al.: Abdullah, Tasnim Musharrat. University ranking prediction system by analyzing influential global performance indicators. 978-1-4673-9077-4/17/$31.00 c 2017 IEEE, February 2017. Publisher Full Text Reference Source
17. Li Y: Prediction of University Comprehensive Score Based on Regression Analysis. Adv. Soc. Sci. Edu. Humanit. Res. 666. Reference Source
18. Dattana PKUV, Netravathi PS, Pandey J: Predicting Global Ranking of Universities Across the World Using Machine Learning Regression Technique SHS Web of Conferences 156, 04001 (2023), ICTL 2022. Publisher Full Text
19. Estrada-Real AC, Cantu-Ortiz FJ: A data analytics approach for university competitiveness: The QS world university rankings. Int. J. Interact. Des. Manuf. (IJIDeM). 2022; 16: 871–891. Publisher Full Text
20. SONG Y-y, LU Y: Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry. April 2015; 27(2): 130–135. PubMed Abstract | Publisher Full Text | Free Full Text Reference Source
21. Doshi N, Gundam S, Chaudhury B: Data-driven and visualization-based strategies for university rank improvement using Decision Trees.October 2021. Reference Source
22. Sziklai B: Ranking institutions within a discipline: The steep mountain of academic excellence. J. Informetr. May 2021; 15: 101133. Publisher Full Text Reference Source
23. Jajo NK, Harrison J: World University Ranking Systems: An alternative approach using partial least squares path modelling. J. High. Educ. Policy Manag. 2014; 36(5): 471–482. Publisher Full Text
24. Elbawab R: University Rankings and Goals: Cluster Analysis, Elbawab, Roba. 2022. University Rankings and Goals: A Cluster Analysis. Economies. 10: 209. Publisher Full Text
25. El Mohadab M, Bouikhalene B, Saf S: Predicting rank for scientific research papers using supervised learning, M. El Mohadab et al. Appl. Comput. Inform. 2019; 15: 182–190. 2210-8327/2018. Production and hosting by Elsevier B.V. on behalf of King Saud University. Publisher Full Text
26. Agarwal N, Tayal D: FFT-based ensemble model to predict ranks of higher educational institutions. Multimed. Tools Appl. April 22, 2022; 81: 34129–34162. (2022). Publisher Full Text
27. Wardleya LJ, Rajabia E, Amin SH, et al.: A machine learning approach is used to forecast the future performance of universities in Canada.June 16, 2024; 100548. Publisher Full Text
28. Agarwal N, Tayal DK: A Novel Ensemble Trimming Methodology to Predict Academic Ranks with Elevated Accuracies. Conference paper. November 15, 2023; pp 377–388. Publisher Full Text
29. Hiteshkumar HS, Virparia P: A Benchmarking Model based on Research Performance Indicators of Engineering Institutions: A Principal Component Analysis. ICIMMI ‘22: Proceedings of the 4th International Conference on Information Management & Machine Intelligence. December 2022; pp. 1–8. Article No.: 74. Publisher Full Text
30. Sinnia S, Mamun A, Fairuz M, et al.: Modeling the significance of motivation on job satisfaction and performance among academicians: The use of hybrid structural equation modeling–artificial neural network analysis. Front. Psychol. 2022; 13. PubMed Abstract | Publisher Full Text | Free Full Text
31. Al-Filali IY, Abdulaal RMS, Alawi SM, et al.: Modification of strategic planning tools for planning financial sustainability in higher education institutions. J. Eng. Res. March 1, 2024; 12: 192–203. Publisher Full Text
32. Prabadevi B, Deepa N, Ganesan GS: Decision model for ranking Asian higher education institutes using an NLP-based text analysis approach. ACM Transactions on Asian and Low-Resource Language Information Processing. 2023; 22(3): 1–20. Article No.: 75. Publisher Full Text
33. Wang N, Ren Z, Zhang Z, et al.: Evaluation and Prediction of Higher Education System Based on AHP-TOPSIS and LSTM Neural Network. Appl. Sci. 2022; 12(10): 4987. Publisher Full Text
34. Makki AA, Sindi HF, Brdesee H, et al.: Goal programming and mathematical Modelling for developing a capacity planning decision support system-based framework in higher education institutions. Appl. Sci. 2022; 12(3): 1702. Publisher Full Text
35. Ayhan İ, Özdemir A: A practical framework for ranking universities by their competitive advantages: A mixed methods study on foundation universities in Turkey. TQM J. 2023; 35(8): 2114–2140. Publisher Full Text
36. Cheruku VKG, Basireddy CS, Boopathy P: Hybrid prediction models for assessing the higher education institutions performance in QS World Institution rankings. F1000Research. 2024. Zenodo. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 17 Dec 2024

Author details Author details

Chandana Sri Basireddy
Roles: Conceptualization, Methodology, Writing – Original Draft Preparation

Vishwanth Kumar Goud Cheruku
Roles: Conceptualization, Data Curation, Methodology, Writing – Original Draft Preparation

Prabadevi B
Roles: Conceptualization, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Sivakumar Rajagopal
Roles: Formal Analysis, Supervision, Writing – Review & Editing

Rahul Soangra
Roles: Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 17 Dec 2024, 13:1529

https://doi.org/10.12688/f1000research.155847.1

© 2024 Basireddy CS et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Basireddy CS, Cheruku VKG, B P et al. Hybrid prediction models for assessing the Higher Education Institutions Performance in QS World Institution Rankings [version 1; peer review: 1 not approved]. F1000Research 2024, 13:1529 (https://doi.org/10.12688/f1000research.155847.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 17 Dec 2024

Views

Reviewer Report 13 Feb 2025

Delali Kwasi Dake, University of Education, Winneba, Ghana

Not Approved

https://doi.org/10.5256/f1000research.171070.r358764

1. Introduction

The first and second paragraph under the introduction needs referencing. A lot of statements are made without referencing their sources.
Example … Owing to its ability to analyze vast datasets, discover patterns,

1. Introduction

The first and second paragraph under the introduction needs referencing. A lot of statements are made without referencing their sources.
Example … Owing to its ability to analyze vast datasets, discover patterns, and make informed forecasts, machine learning plays a critical role in prediction across a wide range of domains, including forecasting stock prices, currency exchange rates, market trends, assisting investors and traders, recommendation systems, autonomous vehicle decision-making, medical condition diagnosis, outcome prediction, and disease risk assessment. Its use is widespread and growing (needs to be referenced)
We start with cleaned, preprocessed data and divide them into two parts: 70% for training and 30% for testing. To construct a prediction model, we applied two different groups of algorithms such as Hybrid machine learning optimization algorithms, including i) Ridge Regression, Long Short-Term Memory (LSTM) and LightGBM, ii) Particle Swarm Optimization (PSO) and Tabu search (TS) with base models, and iii) PSO and TS with hybrid models. Hybrid machine learning non-optimization algorithms include i) Gradient Boosting (GB) and k-nearest neighbors (KNN); ii) Random Forest (RF) and XGBoost (XGB), Support Vector Machine (SVM), Neural Network (NN), and GBM; and iv) NN and XGB. Metrics including the Coefficient of Determination (R-squared) Score, Mean Square Error (MSE), Root Mean Square Error (RMSE), and Accuracy Percentage (%) were then used to compare the outcomes. Finally, based on these metrics, we determined which model is more effective for predicting university rankings. [Not relevant under the introduction] This is not the methodology
Please use research questions at the end of the introduction to guide the study.

2. Literature Review

Good literature review
Table 1 that you referenced is missing.
Please provide details on the machine learning algorithms performance and the datasets utilized. What was the gap in those studies. That will necessitate your study.

3. Methodology

Good diagram to depict the methodological process
Data description shallow. Please explain into details the sources of data and how they were collected. Which techniques were used to collect the data
Please the correlation heatmap looks clumsy. Regenerate the map to make the variable correlations clearer.
You listed the machine learning algorithms without referencing why they were selected. If you state the reason for using an algorithm, you must reference its performance in some literature.
The formatting and logical flow of your work is problematic. I cant differentiate now between the methodology and the results

4. Results and Discussion

The results and discussion was not properly structured. The data description is scanty
Explain the relevance of the scatter plot for features in your work
After that you moved to data preparation. Is that a sub-heading or what?. The work needs to be rearranged for logical flow and sequencing
The comparative analysis stage should be logically rearranged and resubmitted for review. There is no correlation and coherence in the flow of results.

5. Conclusion

The conclusion is shallow and does not summarize the findings and relevance of the study

*The work should undergo major review.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Educational Data Mining, Artificial Intelligence, Reinforcement Learning, Intelligent Systems

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 17 Dec 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 17 Dec 24	read

Delali Kwasi Dake, University of Education, Winneba, Ghana

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

3 Views

13 Feb 2025 | for Version 1

Delali Kwasi Dake, University of Education, Winneba, Ghana

3 Views Cite this report Responses(0)

Not Approved

1. Introduction

The first and second paragraph under the introduction needs referencing. A lot of statements are made without referencing their sources.
Example … Owing to its ability to analyze vast datasets, discover patterns, and make informed forecasts, machine learning plays a critical role in prediction across a wide range of domains, including forecasting stock prices, currency exchange rates, market trends, assisting investors and traders, recommendation systems, autonomous vehicle decision-making, medical condition diagnosis, outcome prediction, and disease risk assessment. Its use is widespread and growing (needs to be referenced)
We start with cleaned, preprocessed data and divide them into two parts: 70% for training and 30% for testing. To construct a prediction model, we applied two different groups of algorithms such as Hybrid machine learning optimization algorithms, including i) Ridge Regression, Long Short-Term Memory (LSTM) and LightGBM, ii) Particle Swarm Optimization (PSO) and Tabu search (TS) with base models, and iii) PSO and TS with hybrid models. Hybrid machine learning non-optimization algorithms include i) Gradient Boosting (GB) and k-nearest neighbors (KNN); ii) Random Forest (RF) and XGBoost (XGB), Support Vector Machine (SVM), Neural Network (NN), and GBM; and iv) NN and XGB. Metrics including the Coefficient of Determination (R-squared) Score, Mean Square Error (MSE), Root Mean Square Error (RMSE), and Accuracy Percentage (%) were then used to compare the outcomes. Finally, based on these metrics, we determined which model is more effective for predicting university rankings. [Not relevant under the introduction] This is not the methodology
Please use research questions at the end of the introduction to guide the study.

2. Literature Review

Good literature review
Table 1 that you referenced is missing.
Please provide details on the machine learning algorithms performance and the datasets utilized. What was the gap in those studies. That will necessitate your study.

3. Methodology

Good diagram to depict the methodological process
Data description shallow. Please explain into details the sources of data and how they were collected. Which techniques were used to collect the data
Please the correlation heatmap looks clumsy. Regenerate the map to make the variable correlations clearer.
You listed the machine learning algorithms without referencing why they were selected. If you state the reason for using an algorithm, you must reference its performance in some literature.
The formatting and logical flow of your work is problematic. I cant differentiate now between the methodology and the results

4. Results and Discussion

The results and discussion was not properly structured. The data description is scanty
Explain the relevance of the scatter plot for features in your work
After that you moved to data preparation. Is that a sub-heading or what?. The work needs to be rearranged for logical flow and sequencing
The comparative analysis stage should be logically rearranged and resubmitted for review. There is no correlation and coherence in the flow of results.

5. Conclusion

The conclusion is shallow and does not summarize the findings and relevance of the study

*The work should undergo major review.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Educational Data Mining, Artificial Intelligence, Reinforcement Learning, Intelligent Systems

Respond to this report

Responses (0)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

Hybrid prediction models for assessing the Higher Education Institutions Performance in QS World Institution Rankings

Abstract

Background

Keywords

1. Introduction

2. Literature study

Figure 1. Proposed architecture.

Figure 2. The used dataset's correlation heat map.

Figure 3. Conversion of categorical values to numerical values.

Table 2. The configuration of the hyperparameters.

Table 3. Performance comparison of hybrid algorithms.

Table 4. Performance comparison of hybrid algorithms.

Table 5. Comparison of the metrics of the considered algorithms.

Table 6. Comparison of metrics of the considered algorithms.

Table 7. Comparison with existing approaches.

Figure 4. The scatter plot for the hybrid approach.

Figure 5. Scatter plot of the features.

Figure 6. a. Comparison of performance of the hybrid algorithms Model 1 to Model 3. b. Comparison of performance of the hybrid algorithms Model 4 to Model 7.

Figure 7. a. Comparison of individual metrics of the hybrid algorithms HRRM, LSTM, LightGBM and PSO. b. Comparison of combined visualisation of all metrics for hybrid algorithms HRRM, LSTM, LightGBM and PSO.

3. Results & Discussions

Figure 9. (a) Hybrid Model – 1. (b) Hybrid Model – 2. (c) Hybrid Model – 4. (d) Hybrid Model – 5. (e) Hybrid Model – 6. (f ) Hybrid Model – 7.

4. Conclusions and future work

Ethics and consent

Data availability

Extended data

Software availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated