Keywords
QS World University Ranking, Hybrid Machine learning algorithms , Data analysis, Educational competitiveness, Prediction accuracy.
Quality Education is one of the primary requirements for the best survival. Pursuing higher education in a highly reputed institutions makes much difference in shaping the career of the individual. As many ranking and accreditation boards for higher education institutions like NAAC is prevalent, World ranking distinguishes institution reputation globally. The QS World University Ranking is a vital gauge for learners, educators, and institutions all over the world, allowing them to analyze and compare the quality and reputation of higher education. Predicting these rankings is difficult due to data availability concerns and QS’s frequent methodology revisions. Subjectivity and narrow criteria in rankings hamper the assessment of university greatness even more. Machine learning, data scraping, model adaptability, algorithm reversal, and short-term predictions are some existing ways of dealing with these difficulties. In this research, a prediction model for assessing institution performance in the QS World institution Rankings is designed using hybrid machine learning algorithms and optimization techniques.
Two algorithms surpass others in forecasting ranks, according to the analysis. These hybrid models improves prediction accuracy of QS world rankings by integrating data analysis with model optimization using particle swarm optimization and Tabu search method.
QS World University Ranking, Hybrid Machine learning algorithms , Data analysis, Educational competitiveness, Prediction accuracy.
Higher education institutions, such as universities and colleges, are assessed and compared using lists or other methods that consider a variety of criteria and aspects. These rankings are invaluable resources to help prospective students and their families make well-informed decisions about where to pursue higher education. Rankings help students find colleges that best fit their academic and professional objectives by offering insights into the schools’ general caliber, standing, and effectiveness. Additionally, they can assist establishments in identifying areas in need of development and formulating plans to elevate their profile in the international higher education arena. University rankings are vital for several reasons and benefit many stakeholders in the higher education system. First, they help students and their families make well-informed decisions by offering a prompt evaluation of the caliber and standing of an institution and by assisting in the selection of colleges that complement academic and professional objectives. Rankings also function as a type of quality control, highlighting that universities continually provide resources, research, and instruction of the highest caliber. Rankings assist scholars in identifying colleges that are well-known in particular subjects, which facilitates the search for partners and specialized resources.
Furthermore, elite professors, researchers, and students are drawn to highly regarded universities to enhance the learning environment and promote intellectual diversity. Internationally recognized rankings offer global awareness and can help improve an institution’s reputation and appeal to donors, which can aid fundraising efforts. For universities, self-evaluation and benchmarking are essential tools that support their efforts to improve quality and pinpoint opportunities for development. Rankings help government agencies and legislators make decisions on how to allocate resources and advance transparency and public accountability in higher education. Owing to its ability to analyze vast datasets, discover patterns, and make informed forecasts, machine learning plays a critical role in prediction across a wide range of domains, including forecasting stock prices, currency exchange rates, market trends, assisting investors and traders, recommendation systems, autonomous vehicle decision-making, medical condition diagnosis, outcome prediction, and disease risk assessment. Its use is widespread and growing. Machine learning plays an important role in predicting university rankings, because it allows institutions to combine historical and various data sources to create more accurate forecasts of their future rankings. These forecasts enable institutions to take proactive steps to improve their academic and research skills, faculty qualifications, and overall appeal to students and researchers, thereby increasing their global ranking.
In this study, we used machine learning methods to predict QS World University rankings. We start with cleaned, preprocessed data and divide them into two parts: 70% for training and 30% for testing. To construct a prediction model, we applied two different groups of algorithms such as Hybrid machine learning optimization algorithms, including i) Ridge Regression, Long Short-Term Memory (LSTM) and LightGBM, ii) Particle Swarm Optimization (PSO) and Tabu search (TS) with base models, and iii) PSO and TS with hybrid models. Hybrid machine learning non-optimization algorithms include i) Gradient Boosting (GB) and k-nearest neighbors (KNN); ii) Random Forest (RF) and XGBoost (XGB), Support Vector Machine (SVM), Neural Network (NN), and GBM; and iv) NN and XGB. Metrics including the Coefficient of Determination (R-squared) Score, Mean Square Error (MSE), Root Mean Square Error (RMSE), and Accuracy Percentage (%) were then used to compare the outcomes. Finally, based on these metrics, we determined which model is more effective for predicting university rankings.
University rankings have evolved as a significant means of measuring the quality of higher education institutions in response to global demands and the demand for transparency and efficiency in public organizations. In Ref. 1, Stefan Wilbers and Jelena Brankovic delved into the historical-sociological pillars of university rankings, concentrated in the United States. This study highlights the shift in the perception of organizational success that occurred during the postwar period, a shift that was impacted by functionalism’s ascent as the prevailing theory. An indication of this shift is the grading system for graduate programs, which was established by the ACE, the American Council on Education, and the National Science Foundation between 1966 and 1970. In the 1970s, social scientists began to examine the criteria and utility of categorizing institutions of higher learning, and they still do so now. Dembereldorj2 assessed the impact of worldwide university rankings on institutions around the world, underlining their significance in both developed and developing countries. While universities in developing economies place more emphasis on research intensity, institutions in advanced economies prioritize competitive competency to achieve top rankings. This study makes the case that resource constraints and the demand for institutional or competitive competency drive worldwide rankings, which in turn actively affect higher education.
Vernon et al.3 insist on the necessity of making real efforts to raise the caliber of research, focusing on developing novel standards that organizations can use to evaluate and enhance their social contributions. Prioritizing quality above quantity is essential for confirming efforts to boost research output, which leads to advancements in science, economic expansion, and public health. The study highlights the deficiencies of current standards and recommends evaluating research outcomes in three dimensions–scientific influence, economic effects, and public health consequences–to provide a comprehensive assessment of research performance within the context of an academic institution. In Ref. 4, Leah Dowsett focused on how Australian universities developed long-term strategies for worldwide rankings over fifteen years by assessing four colleges and discovered that these rankings greatly influenced their research programs. Institutions’ market positioning and rankings have improved as a result of their proactive engagement with the rankings and efforts to influence and react to them. Murat Perit C¸ akır et al. compared and contrasted worldwide and national university ranking systems5 to highlight divergent views. Global rankings concentrate on research achievement with fewer parameters, whereas national rankings include a wider range of indicators such as institutional and educational components. The data show little association between country and global rankings, with a few exceptions. In some countries such as Brazil, Chile, and Poland, national rankings are predicted by global rankings, suggesting a relationship between research success and educational criteria. Owing to the challenge of acquiring accurate per capita data for global rankings, bibliometric criteria related to size have gained increasing attention. National rankings are becoming increasingly prevalent, especially in underdeveloped nations. This creates opportunities for benchmarking and comparative studies that can enhance international ranking systems and provide information on higher education.
In Ref. 6, Adina-Petruta Pavel compares and contrasts the methodology, standards, and effects on stakeholders of three influential worldwide university rankings: Quacquarelli Symonds (QS), The Times Higher Education World University Rankings, and the Academic Ranking of World Universities (ARWU), commonly referred to as the Shanghai Ranking. Notably, the global rankings place research ahead of instruction. The findings inspire institutions to improve their operations and place greater emphasis on industry collaboration and innovation. Friso Selten et al. analyzed the techniques of the three significant global university rankings stated above in Ref. 7 and discovered that, while these rankings are somewhat stable over time, there are differences between them. Applying factor analysis with principal component analysis, the study indicates that the differences in these rankings fundamentally evaluate two factors: university prestige and research performance. By correlating and visualizing these data, it is possible to discern between ranks. This paper responds to common criticisms of the ranking technique by stating that the variables may not accurately represent the concepts they are supposed to represent. It emphasizes how difficult and imprecise it is to use rankings to judge a university’s progress.
In Ref. 8, Moskovkin et al. explored the global battle for university reputation, which began in 2003, focusing on quantitative comparisons of World University Rankings by ARWU, QS, and THE from 2014 to 2018. The study discusses the remaining challenges with ranking consistency, as well as the approach of integrating the number of institutions and Overall Scores by nation. This study examines differences in university rankings to identify the top 100 rankings that are most and least consistent within the top 100 rankings. It also delivers the most favorable aggregated indicator values for the US and United Kingdom. Shehatta and Mahmood collected, examined, and analyzed the top 100 institutions qualitatively and quantitatively according to six significant global rankings published in 2015.9 The six global rankings that were selected were the Academic Ranking of World Universities (ARWU), Quacquarelli Symonds World University Rankings (QS), Times Higher Education World University Rankings (THE), National Taiwan University Ranking (NTU), US News & World Report Best Global University Rankings (USNWR), and University Ranking by Academic Performance (URAP). For comparison, the number of overlapping universities and the Pearson’s and Spearman’s correlation coefficients between every pair of the six global rankings under investigation were used. They found similar traits or independent factors used to determine rank in all the ranking systems studied. Bublyk et al. explored the elements impacting global university rankings in Ref.10 using data from the QS World University Rankings. It employs statistical and correlation analyses and focuses on how the ranking of Lviv Polytechnic National University has changed over time. This research identifies trends, benefits, and drawbacks and establishes a framework for worldwide university growth plans. It also assesses information security compliance, and provides comprehensive strategic recommendations for future progress.
Hasan and Abuelrub11 compared the usability of Jordan’s top three university websites on the outcomes of the heuristic assessment method and Eduroute, one of the main ranking systems. The findings suggest that information regarding the general usability of university websites can be obtained from Eduroute’s rankings. The heuristic evaluation technique used in this study also revealed common usability problems found on university websites. Mahesh provided an overview of several machine learning techniques that might be applied to predict and categorize the data provided in Ref. 12, which serves as a foundation for subsequent predictions of university rankings. Liu et al.13 examined the institutions and their rankings in each country and discovered a correlation between six indicators: academic credibility, employer credibility, staff/student ratio, citations per staff, global staff ratio, and global student ratio. ML algorithms such as There are three methods used: XGBoost, random forest, and linear regression. The mean absolute error, mean squared error, and root mean squared error were used to gauge the accuracy of the models. The results demonstrate that XGBoost has lower-precision measurement values. The lower the value, the more accurate the prediction. In Ref. 14, Gadi Himaja et al. compared and contrasted various regression techniques to recommend a rank prediction system to a national institute. They identified the most accurate one using evaluation metrics such as R2, MAE, MSE, and RMSE, and they chose the Random Forest using threshold value comparison and z-score calculation.
In Ref. 15, Vaibhav Singh et al. used a standardized database rating of the world’s universities by Times Education. The dataset was split into test data from 2016 and training data from 2011 to 2015. Using linear regression, they calculated the expected rank score for teaching, research, citations, and international orientation. Finally, they used the expected total rank score to rank the universities globally. Tabassum et al.16 developed a global university rank prediction algorithm using The Times Higher Education World University Ranking, which was established in the United Kingdom in 2010. Data analysis of prior university rankings by country was conducted to discover the most influential elements or indications for prediction. Outlier detection algorithms were utilized to generate predicted scores, whereas the rank_score_calculate algorithm was used to predict the feature scores. Universities have been ranked worldwide. Methods for evaluating the suggested model: The number of matched ranks vs. rank deviation, recall, ROC curve, and accuracy vs. deviation are all variables to consider. Li17 discovered that utilizing linear regression to evaluate various indicators can better predict the comprehensive score of colleges based on many features. They used a radar graphic to compare the indicators they looked at. Following the investigation, faculty quality was found to be a key determinant of institution ranking. In Ref. 18, Dr. Prakash Kumar Udupi et al. created machine learning models to anticipate global rankings and examined the Quacquarelli Symonds (QS) method of analyzing university rankings worldwide. This study analyzes the information using exploratory data analysis and then evaluates machine learning algorithms using regression approaches to predict worldwide rankings. The QS system rankings are divided into three categories: worldwide overall rating, regional, and global ranking according to the subjects. Key performance metrics, including teaching, research, employability, university mission, and internationalization, form the foundation of QS global rankings. Boosting regression makes use of the Gaussian loss function to improve the predicted outcomes. The same evaluation metrics as those in Ref. 13 were used.
In Ref. 19, Estrada-Real et al. highlighted that we undertook a Feature Selection exercise to assess the validity of the six indicators using the Recursive Feature Elimination (RFE) method, with the rationale for using the QS technique with the availability of data on their website for ranking purposes. Using supervised machine learning techniques, including multiple regression with panel data, logistic regression, decision trees, random forests, and support vector machines, they created a prediction model with categorical response variables based on the generated training data. Test data and statistical measurements, including R2, p-value, accuracy, sensitivity, specificity, confusion matrix, receiver operating characteristic (ROC), and Area Under the Curve (AUC), were used to evaluate the model’s output. Both logistic regression and random forest analyses yielded comparable results. In Ref. 20, Yan-yan SONG and Ying LU investigated how decision trees are commonly employed in data mining to construct classification systems and forecasting models. In Ref. 21, Nishi Doshi et al. used exploratory data analysis (EDA) to analyze ranking data using correlation heatmaps and box plots, and introduced a novel method for extracting decision paths for rank improvement using Decision Tree (DT) algorithms and data visualization. The approach, which has been refined with Laplace correction, provides institutions with a quantitative means to analyze development potential, plan long-term actions, and design distinctive success roadmaps. Sziklai applied The Weighted Top Candidate (WTC) approach in Ref. 22 to rank smaller academic institutions in specific research areas. Jajo and Harrison provided an overview of the available ranking systems in Ref. 23, including the founder, year, range, and measures. They employed partial least squares path modelling to create an achievement index that can be used to compare university performance across several ranking systems simultaneously. Roba Elbawab utilized unsupervised machine learning to group institutions.24 The results divided the universities into four categories. Mohamed El Mohadab et al. evaluated a variety of classification strategies,25 including supervised, semi-supervised, and unsupervised learning. The study used performance metrics, such as F-Measure, Precision, GMAP, NDCG, MAP, and Recall, to determine relevant characteristics within the research setting.
Furthermore, the ensembled models constructed using SVM, KNN, MLP, decision trees, random forest and Logistic regression algorithms with fast Fourier transformers were used in Ref. 26 for predicting international ranks of HEIs utilizing the Shanghai world ranking dataset. Wardley et al, have used different machine learning models to rank universities in Canada on different categories and recommended gradient boosting as the best performer.27 Also, the predictive power of the ensembled models was further improved by trimming the low-performing model and optimized through heuristics, in predicting international rankings.28 The power of machine learning models was further leveraged for predicting research performance indicators,29 job satisfaction analysis30 and financial stability planning31 in HEIs. Few attempts to rank HEIs using multicriteria decision-making models with machine learning and NLP were carried out.32–35 These models were constructed based on subjective and objective measures to strengthen the results.
Our survey of research papers culminated in a systematic tabulation and is tabulated in Table 1 (Extended data), capturing essential aspects including Dataset Diversity, Ranking Level, Number of Algorithms employed, Highest Accuracy attained, identification of the Best Algorithm, and an insightful overview of limitations. This structured presentation offers a concise and comprehensive snapshot of the various methodologies and outcomes prevalent in the landscape of research endeavors.
1.4.1 Methods
Figure 1 shows the proposed methodology.
1. Data collection:
Data collection is a methodical process of obtaining and compiling pertinent information from various sources, including text documents, databases, sensor networks, and websites, to create an extensive and organized QS World University ranking dataset for further analysis and knowledge discovery. The data must be extracted, transformed, and loaded into an appropriate repository. This process yields an enormous amount of raw data.
2. Data preparation:
Data preparation is a two-step process that is essential for managing the data gathered from various sources. The first is data inspection, which is the process of carefully reviewing and evaluating a dataset to obtain a general idea of its quality, features, and organization. The preprocessing of data comes in second. Data loading is the first step: compiling data from many sources and storing them in a single repository. To guarantee data quality, data cleaning entails locating and fixing mistakes, inconsistencies, and outliers. While imputation fills in missing values to make a dataset more complete, transformation reworks the data to satisfy the needs of the analysis. Labeling is used to classify or tag data points for supervised learning. Understanding the distribution and trends of the data through visualization helps to make well-informed judgments regarding preprocessing. By guaranteeing that data features are on a consistent scale, normalization helps to avoid biases in the analysis performed later. Ultimately, splitting creates a clean, well-organized dataset that is prepared for data mining or machine-learning tasks by dividing the dataset into test, validation, and training sets to create and assess models.
Feature engineering:
Feature engineering uses cleaned data. It includes both feature selection and data transformation to produce machine learning model input variables that are both optimal and informative. Feature selection lowers the dimensionality and boosts the model effectiveness by selecting a subset of the most pertinent and discriminative characteristics from the original dataset. However, data transformation includes methods, such as addressing missing values.
• Encoding categorical variables: The majority of machine-learning methods require numerical input data formats. Numerical values must be assigned to categorical variables that represent the labels or categories. There are a few ways to accomplish this, such as label encoding and one-hot encoding.
One-Hot Encoding: In this method, each category in a categorical variable has a binary column. Categorical data are converted into a numerical representation by each binary column, which indicates whether a category is present or absent.
Label Encoding: Using this technique, every category is given a distinct number. Although it simplifies the data, not all algorithms are compatible with it.
To convert categorical columns to numerical columns, label encoding was applied here, which assigns a unique integer to each category. In the following dataset the columns ‘Institution Name’, ‘Country Code,’ ‘Country, ’ ‘SIZE,’ ‘FOCUS, ’ ‘RES,’ and ‘STATUS’ into numerical values (See Figure 3).
• Scaling numerical features: The size of the numerical characteristics affects many machine-learning techniques. By guaranteeing that every numerical characteristic has the same scale, scaling prevents certain features from predominating over others. This is a common preprocessing step to ensure that all features have a similar scale. Typical scaling methods include the following.
Standardization (Z-score normalization): The features are scaled to have a mean of 0 and a standard deviation of 1.
Min-Max Scaling: This transforms features into a specified range, typically [0, 1] or [-1, 1].
Robust Scaling: It Scales features based on the median and interquartile range, making it robust to outliers.
To ensure that all the features have a similar scale, min–max scaling was applied here, which can be essential for certain machine learning algorithms and constructing new characteristics through mathematical operations. These methods ultimately refine the data for use in the models. By retaining important information and eliminating noise, both features of feature engineering seek to improve the predictive ability of the model while ensuring that the data are properly organized and ready for the model to learn from.
3. Data splitting:
Relevant Features were extracted from the dataset. A dataset is normally split into two subsets via data splitting: training set and test set. 30% of the dataset is typically utilized for testing, and 70% is used for training. By creating the machine learning model historical data, the training set helps to understand patterns and relationships within the dataset. The testing set functioned as an independent dataset for evaluating the model’s performance and capacity to generalize to new, unseen data because it was not observed by the model during training. A crucial indicator of a model’s efficacy is its ability to predict outcomes from data that it has never seen before, which is assessed with the aid of this segmentation.
4. Model construction:
Model construction is a complex process that involves creating, adjusting, and assessing predictive models to address certain issues or tasks. It encompasses a variety of methods, including optimization and non-optimization algorithms, each with specific abilities and purposes. To improve model performance, accuracy, and robustness, hybrid machine learning optimization techniques integrate different machine learning models to capitalize on their unique advantages. Conversely, hybrid machine learning non-optimization algorithms combine many machine learning techniques without requiring explicit optimization methods. Experimentation with various combinations, adjusting hyperparameters, and evaluating performance using metrics and validation methods are all part of the model creation process. The ultimate objective is to develop a reliable and robust model that can produce precise predictions of novel, unproven data.
In the framework of the research study under discussion, a variety of machine-learning models were employed for simulation purposes in both classification and regression tasks. The system models used were as follows:
Ridge Regression Long Short-Term Memory (LSTM) Light Gradient Boosting Machine (LightGBM) XGBoost (Extreme Gradient Boosting) model
Ridge Regression hyperparameter optimization uses the particle swarm optimization (PSO) technique.
The Tabu Search technique is used in Ridge Regression to optimize the hyperparameters.
KNN model: K-Nearest neighbors
Support Vector Machine (SVM) Random Forest Model
Gradient Boosting model (more specifically, gradient boosting classifier and regressor) Linear Regression model
These models were used to carry out tasks such as regression and classification on the provided dataset, and each model has unique strengths and applicability based on the particulars of the data and the specific task at hand.
Hybrid Models (Optimization Algorithms):
1. Ridge Regression, Long Short-Term Memory and LightGBM model (HRRM+LSTM+LightGBM) (Hybrid Model 1)
➢ Ridge Regression (HRRM): This technique is used because it is easy to understand and straightforward, but it might not be able to identify the intricate trends seen in the data.
➢ Long Short-Term Memory (LSTM): To handle temporal dependencies and sequences in the ranking LSTM is used to store data.
➢ LightGBM: LightGBM includes gradient-boosting capabilities and is efficient in handling large datasets with numerous features. This combination guarantees that the ranking prediction considers both structural and temporal variables, making it a complete solution.
2. PSO and Tabu search with base models (PSO + TS - base models): (Hybrid Model 2)
➢ The base model optimization techniques are PSO and TS. To improve the model ensemble, basic models were merged with PSO and Tabu Search. By determining the ideal weights for each model, these algorithms optimize the combination of basic models and increase prediction accuracy. The total performance was improved by this hybrid technique, which successfully balanced the contributions of each base model.
3. PSO and Tabu search with hybrid models (PSO+TS - Hybrid models): (Hybrid Model 3)
➢ Hybrid models, which integrate a variety of machine learning methods, are optimized using PSO and Tabu Search, similar to the base models. PSO and TS improve the overall accuracy of the hybrid model by choosing appropriate weights for each component model. This method effectively integrates the capabilities of many algorithms while considering them.
Hybrid Models (Non-Optimization Algorithms):
1. XGBoost and Neural Network (XGB + NN): (Hybrid Model 4)
➢ XGBoost (XGB): XGBoost has a solid track record in forecasting outcomes, particularly with structured data. This provides the hybrid model with the benefit of gradient boosting.
➢ Neural Network (NN): Complex patterns can be effectively captured by neural networks. Neural networks and XGBoost offer a balance between organized and unstructured data.
2. Gradient Boosting and k-nearest neighbors (GB+KNN): (Hybrid Model 5)
➢ Gradient Boosting (GB): Combining the predictive power of several models, Gradient Boosting works well in ensemble learning.
➢ The k-nearest neighbor (KNN) algorithm helps identify certain patterns within the data. By combining GB and KNN, one may use their complementary abilities to increase prediction accuracy.
3. Random Forest and Support Vector Machine (RF+SVM): (Hybrid Model 6)
4. Support Vector Machine, Neural Network and Gradient Boosting model (SVM+NN+GB): (Hybrid Model 7)
The idea behind hybrid models is to exploit the advantages of many algorithms while mitigating the shortcomings of each separately. To improve the forecast accuracy and offer more reliable solutions for university ranking predictions, these hybrid models integrate complementary methodologies. To enhance the overall performance, they handle a variety of data characteristics, such as complicated patterns, temporal relationships, and organized and unstructured data. A well-rounded strategy is ensured via hybridization, which improves performance on the difficult task of accurately predicting university rankings.
5. Comparison of models:
To determine which model is most appropriate for a certain task or problem, it is necessary to compare and evaluate multiple models by methodically analyzing their performance, predictive ability, and applicability. Numerous evaluation indicators are typically used in this comparison. The final objective is to choose the model that offers the optimum balance between deployment-related practical factors and predictive performance, resulting in a dependable and efficient solution to the issue at hand.
6. Choosing the best model:
A methodical procedure for creating assessment measures, training and cross-validating several models, comparing their performance, and considering real-world implications such as resource needs and interpretability is necessary to choose the optimal model. The final decision should be based on domain-specific knowledge and business requirements, and the documentation of the entire process is essential for reproducibility and transparency. This ensures that the selected model is in line with the project’s specific goals and provides the best balance between practicality and predictive performance.
7. Accurate QS rankings:
Accurately predicting university rankings, such as the QS World University Rankings, is a difficult endeavor that depends on several variables, including international diversity, faculty qualifications, research output, and academic reputation. The model chosen for this study was utilized for this purpose. The creation of an intricate predictive model that considers various pertinent aspects and data sources is necessary to achieve high prediction accuracy.
Comparison:
In particular, the QS World University rankings are the subject of our work, which employs a hybrid technique combining both optimization and non-optimization machine-learning algorithms to forecast university rankings. It uses hybrid models that combine these methods as well as Ridge Regression, Long Short-Term Memory (LSTM), LightGBM, Particle Swarm Optimization (PSO), and Tabu Search (TS). To guarantee accurate predictions, this study highlights the need for quality assurance and data pre-treatment. The model’s performance was evaluated using measures such as RMSE, MAE, R2-squared, and accuracy %, as tabulated in Table 3. On the other hand, several literature-based comparison studies also explore university ranking predictions, mainly through the application of machine learning techniques. This study used comparable data preparation techniques and assessment measures., they vary, in the particular models they use, the feature selection techniques they use, and the analysis’s focus. Some focus on certain parts of the ranking, such as decision trees, clustering, and ranking systems other than QS. The hybrid model approach we introduced and the insightful information we provided on enhancing ranking accuracy and decision making in higher education establishments make our work noteworthy. This aligns with the overall objective of these comparative studies, which is to improve the caliber of university-ranking systems. We examined important factors, including dataset size, applied optimizations, accuracy measures, and related remarks in our comparative analysis of the research articles. This thorough review offers a sophisticated comprehension of the methods utilized, enabling a critical analysis of the study strategies concerning these pivotal elements, which are shown in Table 7. Using the scatter plots shown in Figure 4 and Figure 5, our study provides a visual depiction that clarifies the effectiveness of the hybrid strategy as well as the complex interactions between characteristics. These charts offer a sophisticated viewpoint that makes it easier to see how well the hybrid model performs and to identify trends in the attributes of the dataset. Following rigorous comparisons between different algorithms and calculating relevant metrics, we combined the data into a table, as shown in Table 5 and Table 6 and an extensive bar plot see Figure 7 and Figure 8. This graphic depiction provides a clear and insightful summary of the relative performance and important information about the effectiveness of various algorithms within our framework.
Hybrid ML Optimization Algorithms | RMSE | MSE | R-Squared Score | Accuracy (%) |
---|---|---|---|---|
Hybrid Model 1 | 3.25 | 1.98 | 0.29 | 90.23% |
Hybrid Model 2 | 3.22 | 1.99 | 0.31 | 90.33% |
Hybrid Model 3 | 3.20 | 1.94 | 0.32 | 89.97% |
Ref. | Dataset size | Optimization | Accuracy | Remarks |
---|---|---|---|---|
1 | X | X | X | Historical-sociological account |
2 | X | X | X | Institutional Competence |
3 | Generic Higher Education Data | X | High | Systematic review |
4 | Australian Institutional Performance Data | X | Moderate | Case study of Australian performance |
5 | Global and National University Ranking Data | X | Moderate | Comparative analysis |
6 | Global University Ranking Data | X | High | Comparative analysis |
7 | Research Paper Ranking Data | Supervised learning | High | Rank prediction |
8 | World University Ranking Data | Quantitative analysis | Moderate | Comparative analysis |
9 | Global University Ranking Data | X | Moderate | Policy implications |
10 | QS World University Ranking Data | X | High | Strategic analysis |
11 | University Website Usability Data | X | Low | Usability prediction |
12 | X | X | X | Review of machine learning algorithms |
13 | QS World University Ranking Data | Data mining | High | Rank prediction using data mining |
14 | National Institute Ranking Data | Machine learning | High | Rank prediction for national institutes |
15 | X | X | X | Rank prediction system |
16 | Global Performance Indicator Data | X | X | Analysis of global performance indicators |
17 | University Comprehensive Score Data | Regression analysis | X | Comprehensive score prediction |
18 | Global University Ranking Data | Machine learning regression | High | Global ranking prediction |
19 | QS World University Ranking Data | Data Analytics | X | Competitiveness analysis |
20 | X | Decision tree methods | X | Decision tree applications |
21 | University Ranking Improvement Data | Data-driven strategy | Moderate | Rank improvement using decision trees |
22 | X | X | X | Academic excellence ranking |
23 | X | Partial least squares path modeling | X | Alternative ranking approach |
24 | X | Cluster analysis | X | Goals and cluster analysis |
25 | Research Paper Ranking Data | Supervised learning | Moderate | Rank prediction using supervised learning |
The dataset used in this study stems from the esteemed QS World University Rankings for 2024, a comprehensive assessment comprising 1,500 institutions across 104 global locations. With 1,499 rows and 29 columns, it encompasses vital attributes, such as 2024 and 2023 ranks, institution details, size, focus, research metrics, reputation scores, and newly introduced metrics for sustainability, employment outcomes, and international research networks. These additions reflect a methodological evolution, enabling a nuanced evaluation of universities’ commitment to social responsibility, employability, and global research collaboration. Notably, the dataset provides a holistic view, including academic reputation, employer perception, faculty-student ratios, citation impact, and internationalization indices. Widely recognized and respected, the QS World University Rankings influence global perspectives on higher education, making this dataset a valuable resource for researchers, policymakers, academics, and prospective students, offering multifaceted insights for education and higher education policy analysis. Because there are no irrelevant characteristics in the dataset, it contains correlated features, as shown in the correlation matrix in Figure 2.
2. Implementation requirements:
This research project was implemented in Python, leveraging various frameworks and libraries to address different aspects of the study. The primary programming language used was Python, and the project relied extensively on popular machine learning and deep learning frameworks, including scikit-learn, xgboost, lightgbm, and TensorFlow. These frameworks are complemented by essential numerical computing and data manipulation libraries such as Numpy and Pandas. To streamline the setup and ensure replicability, a requirements.txt file was provided, specifying the necessary dependencies and their corresponding versions. These dependencies encompass scikit-learn version 0.24.2, xgboost version 1.5.0, lightgbm version 3.2.1, TensorFlow version 2.7.0, numpy version 1.21.2, pandas version 1.3.3, and pyswarm version 0.6.1. The inclusion of this file facilitates straightforward installation of the required packages, ensuring compatibility across different computing environments. The evaluation metrics employed in this research project were tailored to the nature of the implemented machine-learning models. For classification tasks, the evaluation metrics included Accuracy, Precision, Recall, and F1 Score, providing a comprehensive assessment of the models’ performance. On the other hand, regression models are evaluated using RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and R2 Score. The experimental setup encompasses specific hardware configurations to ensure the execution of experiments under standardized conditions. The hardware requirements include a minimum of 4GB RAM, processor with a minimum clock speed of 2.26GHz, and storage capacity of at least 512MB. The experiments were conducted in a cloud environment, with Google Collaboratory and Jupyter Notebook by Anaconda serving as primary platforms for model development and evaluation. These platforms offer the necessary computational resources and collaborative features that are conducive to research workflow.
3. Data preparation
In our initial research phase, we worked with raw data, a dataset that included various college parameters including reputation ratings and rankings. Missing values presented analytical difficulties and required correction, which is typical for raw data. Categorical data, such as institution names, were numerically encoded for machine learning model compatibility, and scaling techniques were used to handle the different scales of numerical variables (see Figure 3). Mitigating possible influences on model performance requires the discovery and rectification of defects or outliers. After thorough preparation, we performed a transformational refinement of our dataset, which resulted in decreased errors, controlled outliers, and transformed categorical variables. The numerical feature scale homogeneity was guaranteed by standardization and normalization processes. The dataset was enhanced with new characteristics that came via mathematical processes, making it ready for a variety of analytical uses such as machine learning. Dimensionality reduction techniques were applied, which streamlined the dataset without compromising important information and accelerated the modelling process. Our dataset is best suited for model training and assessment after pre-processing, providing improved interoperability with different machine learning techniques.
Initially, we preprocessed the dataset in the given model generation procedure. This includes addressing missing data, encoding categorical variables, scaling numerical features, and generating newly derived characteristics. Next, the dataset was divided into the testing and training sets. We built hybrid prediction models that use optimization without optimization parameters. To evaluate the prediction performance of both models, we computed several evaluation measures, such as the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2) score. Interestingly, we also added a statistic called “accuracy percentage” to measure how accurate ranking predictions are within a given threshold. The results are presented in Tables 3 and 4. The results were synthesized using a comprehensive bar plot. A further simulation is shown in Figure 6 and the execution using the hyperparameter configurations is presented in Table 2. The proposed approach performed better in terms of convergence than other algorithms. Regarding the total RMSE, MSE, and R2 Score the performance of the proposed method in comparison with the other algorithms is displayed in Figure 9.
5. Comparative analysis:
Figure 6(a) shows the performance metrics for the three hybrid models listed in Table 3. The bar chart clearly shows that Hybrid Model 2 performs better across most metrics, confirming the conclusions from Table 3.
Figure 6(b) shows the performance metrics for the four hybrid models listed in Table 4. Hybrid Models 5 and 6 seem to perform better than the other two models, aligning with the conclusions drawn from Table 4.
Figure 7(a) shows the individual metrics listed in Table 5. The subplots confirm that LightGBM performs well in terms of RMSE, MSE, and R-squared score, whereas LSTM excels in Precision, Recall, and F1 Score.
Figure 7(b) provides a combined visualization of all evaluation metrics for the four algorithms from Table 5. LightGBM and LSTM seem to outperform HRRM and PSO across most metrics, which is consistent with the conclusions drawn from Table 5.
Table 6 compares the performance of the seven-machine learning non-optimization algorithms. SVM has an RMSE value (3.82) and the highest accuracy (86.00%), suggesting that it may be the best-performing algorithm among the seven. LR and GB performed reasonably well across most metrics.
The RMSE plot shows that the Random Forest (RF) model has the lowest RMSE of 3.06, indicating the best performance in terms of prediction accuracy among the models tested. Conversely, the Neural Network (NN) model had the highest RMSE of 4.73, suggesting the poorest performance. Other models such as XGBoost (XGB), Gradient Boosting (GB), k-nearest neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR), have RMSE values ranging from 3.35 to 3.82, with XGBoost (3.35) and k-nearest neighbors (3.62) also performing relatively well.
In the MSE plot, the Random Forest (RF) model again demonstrated superior performance, with the lowest MSE of 9.42. The Neural Network (NN) model had the highest MSE of 22.38, reflecting its poor performance in predicting outcomes accurately. The XGBoost (XGB) model has an MSE of 11.22, showing good performance. Other models, such as Gradient Boosting (GB), k-nearest neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR), have MSE values between 13.13 and 14.63, indicating moderate performance.
The R-squared plot reveals that the Random Forest (RF) model has the highest R-squared value of 0.37, indicating the best fit to the data. The Neural Network (NN) model shows a significantly negative R-squared value of -0.47, indicating that it performs worse than a horizontal line (mean prediction). Other models such as XGBoost (XGB), Gradient Boosting (GB), k-Nearest Neighbors (kNN), Support Vector Machine (SVM), and Linear Regression (LR) exhibit R-squared values ranging from 0.03 to 0.25, with XGBoost (0.25) performing comparatively better.
This plot shows the precision values for different models or methods, with the highest precision score of approximately 0.9. The XGB model achieved the highest precision of 0.94, whereas the LR model achieved the lowest precision of 0.88.
The recall values displayed in this plot were consistently high across all models or methods, with the SVM model achieving a perfect recall score of 1.0. The XGB, NN, GB, KNN, and LR models had recall scores ranging from 0.94 to 0.96, indicating excellent performance in terms of recall.
This plot presents the F1 scores, with the XGB model having the highest score of 0.94, whereas the remaining models exhibited scores clustered around 0.9.
The accuracy percentages shown in this plot ranged from 75.0% to 86.0%, with the SVM model achieving the highest accuracy of 86.0%. The NN model had the lowest accuracy of 75.0%, while most other models fell within the range of 75.67% to 78.33%.
This comprehensive plot compares multiple evaluation metrics across different models or methods, allowing for a side-by-side comparison of the performance. The metrics displayed include the RMSE, MSE, R-Squared, Precision, Recall, F1 Score, and Accuracy (%). This plot provides a holistic view of how models or methods are applied across various evaluation criteria.
In Figure 9(a), the combination of Ridge Regression, Long Short-Term Memory, and LightGBM (HRRM+LSTM+LightGBM) achieves a significantly lower Root Mean Squared Error (RMSE) compared to each individual algorithm used alone. This indicates a superior fit and potentially more accurate predictions.
In Figure 9(b), the hybrid models optimized with PSO and Tabu Search (PSO+TS-Hybrid Models) outperform models that combine these optimization algorithms with base models (PSO+TS-Base Models) across all metrics, including accuracy. This suggests that combining various machine learning methods within the hybrid models themselves leads to even better results.
The plot in Figure 9c compares the performance of three different models/approaches: “Proposed Approach”, “XGB”(Extreme Gradient Boosting), and “NN” (Neural Network). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9d compares the performance of three different models/approaches: “Proposed Approach”, “GB” (Gradient Boosting), and “kNN” (k-Nearest Neighbors). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9e compares the performance of three different models/approaches: “Proposed Approach”, “RF” (Random Forest), and “SVM” (Support Vector Machine). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other two models across all metrics.
The plot in Figure 9f compares the performance of four different models/approaches: “Proposed Approach”, “SVM” (Support Vector Machine), “NN” (Neural Network), and “GB” (Gradient Boosting). The y-axis represents different evaluation metrics like RMSE (Root Mean Squared Error), MSE (Mean Squared Error), R-Squared, and Accuracy (%). The bars show the values of these metrics for each model, allowing for a direct comparison of their performance. The proposed approach which is a hybrid model of the individual algorithms that are compared outperforms the other models across all metrics.
Finally, by proposing and analyzing hybrid algorithms, we hope to improve the accuracy of predicting QS World University Rankings. With the lowest RMSE, highest R2 score, and commendable accuracy, Hybrid Model 3 emerged as a top performer using Particle Swarm Optimization and tabu search. Furthermore, Hybrid Model 5, which combines Gradient Boosting and k-nearest neighbors, outperformed the competition. Expanding deep learning techniques, such as tailored neural network designs and utilizing pretrained language models, are future directions for ranking prediction. Integrating multimodal data, stressing explainability, and addressing ethical problems such as fairness and transparency are critical.
Recognizing limitations, such as dataset specificity and the necessity for external validation, researchers should investigate qualitative characteristics while keeping in mind the changing landscape of higher education. Ethical issues, data quality improvements, and continual deep-learning breakthroughs are critical for refining ranking forecasts and ensuring their relevance and applicability in the changing sector of university assessments.
The QS rank dataset used to carry out this project has been obtained from Kaggle repository which can be downloaded from the link: https://www.kaggle.com/datasets/joebeachcapital/qs-world-university-rankings-2024
Zenodo: Hybrid prediction models for assessing the higher education institutions performance in QS World Institution rankings: https://doi.org/10.5281/zenodo.14101002 36
The project contains the following data:
Creative Commons Attribution 4.0 International
Source code available from: https://github.com/VishwanthCheruku/hybrid-machine-learning-algorithms
Data is available under MIT License
Archived software available from: 10.5281/zenodo.14000400
Data available under CC by 4.0 license
Creative Commons Attribution 4.0 International
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
No
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Educational Data Mining, Artificial Intelligence, Reinforcement Learning, Intelligent Systems
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 17 Dec 24 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)