1 Introduction
The
World Health Organization (WHO) has released a report stating that the COVID-19 epidemic has caused widespread disruption in 93% of the world’s countries.
1 The need for psychiatric treatment has increased as a result of lockdowns implemented in affected areas as a preventive measure. According to the claim [
36], physiological anxiety factors such as worry about getting sick and concern about the future increase with each duration of the lockdown. One of the most common illnesses for mental health is Depression, and it significantly affects the daily lives of those affected [
4]. It can cause a variety of mental and physical problems that limit a person’s actions and can lead to serious effects on the person’s environment, career, or school, and even the most fundamental human requirements such as sleeping and eating. Although depression is treatable, the condition often goes undiagnosed. This is due to several reasons, including patient’s inability to recognise the problem themselves or the social stigma associated with mental disorders. In recent years, the use of social media has increased, which has opened new doors to the diagnosis of depression. Digital media allows users to share and express their thoughts without restriction [
4]. Moreover, people suffering from depression often turn to digital media for information about their condition or to communicate with others about their concerns and symptoms.
In the field of natural language processing,
recurrent neural networks (RNNs) have been widely used. RNNs can achieve a level of performance that is considered state-of-the-art by exploiting contextual information in a “memory” mechanism modelled via hidden/cell states [
3]. Despite its advantages, the memory mechanism makes it difficult to interpret the decisions made by the model. This is because, as the hidden states are transmitted over time, different pieces of information are interwoven across timesteps, naturally giving RNN models the appearance of a “black box.” Attentional weights, however, are not always understandable, although some of them have the potential to be very revealing. Instead, they often turn out to be a meaningless collection of data for which there is little meaningful interpretation. Moreover, an understanding of the theoretical underpinnings of how RNNs work is necessary to conduct an analysis of attentional weights. Therefore, a first-time user may find it difficult to understand, and consequently its widespread use in the real world may not be practical. A prototypical technique seeks out some examples or prototypes from the dataset for a given sequence and then derives a decision. This process is similar to the way people, such as doctors and judges, make decisions about a new case by referring to similar previous cases. Such prototypes provide intuitive clues and evidence about how the model reached a conclusion in a way that a layperson may understand. This is important from an interpretability perspective. However, existing prototype-based methods locate prototypes at the paragraph level. This makes it difficult to break down the analysis at sentence level, such as the connections and flows that are made up of individual sentences in a given paragraph.
Personality traits have been explored in the context of online texts and the Internet in general, and the results show that these traits have a strong relationship [
2]. In addition, several predictive models have been presented to extract psychological background of users based on residuals of their online behaviours and link them to personality traits. However, relationship features such as attachment orientations and other forms of bonding have been largely ignored in the online context, despite the fact that user activity has a strong correlation with aspects of social behaviour. For this reason, examining a relationship profile from an application standpoint is extremely important and provides a wealth of information about individuals’ social profiles.
Conventional methods have viewed mental illness as a supervised text classification task [
4]. Depression-based training instances are used to create diverse datasets. Although annotated clinical practice data are of high quality, their development is often expensive and time-consuming. Due to privacy regulations and ethical constraints imposed by institutions, their dissemination is limited in several ways. Data from digital and social media have been used to avoid institutional restrictions and study people in clinical experiments. According to Mukhiya et al. [
25], depression is caused by a combination of current and long-term conditions. Under difficult conditions, it is not always easy to determine cause [
16]. It is important to recognise early signs and symptoms of depression to get help as soon as possible. Many Internet forums and social media now allow people to contact each other anonymously and talk about distress, grief, and possible treatment options [
17]. People from all over the world can openly express thoughts and feelings, according to Muhleck et al. [
24]. Online monitoring can be a proactive and promising technique for discovering high-risk concerns. It can promote meditation and increase overall well-being [
29].
According to WHO,
2 anxiety is one of the most disabling diseases in the world. It has become a common disease affecting approximately 264 million people worldwide [
10]. According to Mazza et al. [
20], untreated depression can develop and cause lifelong anguish. Anxiety can lead to suicidal thoughts in the worst cases. According to the World Health Organization, more than 800,000 people die by suicide each year. Suicide is the second leading cause of death among 15- to 29-year-olds. This is because between 76% and 85% of mentally ill people in low- and middle-income countries go untreated. Lack of financial assistance and support, a shortage of qualified physicians, misperceptions, and social stigma against the mentally ill are all barriers to effective treatment [
10]. The biggest barriers that prevent people from seeking treatment are negative tautism, shyness, and fear of disclosure. People often feel ashamed, humiliated, and afraid of having their mental anguish thoroughly explored [
26]. For these reasons, people are often reluctant to admit that they are unhappy or to seek psychological care and counselling. The prevention and treatment of mental health problems have become a global priority in health systems.
Text classification is a type of sequence-based methodology and mostly uses attention [
4]. In attention mechanisms, variable-size inputs focus on the most relevant correlated features, called self-attention mechanisms. An attention mechanism is often used to compute a representation of a given sequence. For the tasks of text classification and embedding creations, self-attention is advantageous in bidirectional RNNs [
43]. Based on previous studies [
2,
4,
25], when users disclose personal information, they tend to use expressions that could indicate psychological problems such as depression. Users use words that could be potential clues to mental illnesses such as depression. We hypothesise that people with the same illness, as opposed to healthy individuals, have comparable topic interests in personal statements. For example, the following sentence is a personal statement from a user diagnosed with depression, “I am prescribed medication to treat my depression. I cannot calm my anxiety.” However, it is noted that highlighting the importance of these terms through feature selection and phrase weighting could help distinguish depressed individuals from non-depressed individuals.
Recent years have seen a rapid growth in the importance of incorporating digital healthcare and remote health monitoring technology into the overall healthcare industry. However, those who work in the healthcare industry are acutely aware of the fact that a positive public perception of integrated healthcare systems is the result of a complex interaction between a number of factors. Users of mobile and wearable health technologies need to work on strengthening their relationships with one another. In light of this possibility, medical experts might find it useful to investigate the emotional responses of patients to therapies that are administered remotely. Finding solutions that are acceptable will make it possible for e-health applications to continue to be competitive, which will lead to the development of new tools and approaches that will have an impact on future healthcare applications.
Artificial intelligence (AI) refers to a collection of algorithms that can identify human emotions just by analysing facial expressions. AI can “feel” emotions. AI is able to rapidly identify feelings that a person is experiencing. Monitoring a patient’s emotional state enables medical professionals to swiftly determine whether or not a patient is cooperating with the diagnostic plan and providing assistance to patients in emergency situations. Additionally, the identification of emotions via mobile health monitoring applications and other forms of intelligent healthcare technology would be a significant advance. This is due to the fact that emotionally intelligent and technically advanced algorithms can instantly identify stressed patients and offer assistance in maintaining healthy lifestyle patterns. Emotionally aware and intelligent systems have become an essential component of modern medical practice because of the role they play in determining the most effective course of treatment for individual patients. However, there is still a significant amount of work to be done in the medical field before the technology can properly capture human emotions. It is necessary to have an understanding of the risks associated with collecting and using emotional data of patients, particularly in the context of early disease identification, efficient patient communication, and emergency assistance. It is essential to make use of the vast body of research that has been conducted in this field to appropriately handle risks and problems connected with emotion-aware intelligent technology in the healthcare industry.
The need for services related to mental health has significantly increased as a direct result of the unpredictability of lockdown zones. The probability of having severe health problems is increased when a person is subjected to physiological stresses such as anxiety around sickness and uncertainty regarding their health. Isolation from others and a lack of physical activity are both factors that can make stress worse. In addition, those who work in the healthcare industry frequently experience anxiousness, a lack of protective equipment, and a highly stressful atmosphere, although it can be difficult to establish the exact source of these conditions. This can lead to anxiety and melancholy. In addition, depression is among the most distressing diseases that may be found anywhere in the world. There are roughly 264 million people around the globe who suffer from depressive illnesses. The majority of mental illnesses are misdiagnosed, because there is insufficient verbal interaction and a lack of trust. Depression is the underlying cause of death by suicide for more than 800,000 people each year. Due to the fact that the issue is not being addressed, suicide is the main cause of death among those aged 15 to 29. In middle-income nations, suicide accounts for 76 percent of fatalities, while in low-income countries, it accounts for 85 percent of deaths. Aside from personal concerns, other challenges with early detection include a lack of resources, untrained healthcare providers, social stigma, and speedy reaction. All of these factors contribute to the problem. Because anxiety and shyness can make it difficult to maintain a steady state of mind, some people feel embarrassed about their inability to do so. Even after doing an exhaustive evaluation of patient’s psychological anguish, this issue continues to be a concern. As a consequence of this, people who are afflicted with depression may be unable to participate in extra treatment for the purpose of addressing their existing medical issues.
This study identifies people who show indicators of depression. Early diagnosis increases the chances of appropriate treatment. Unlike methods based on word frequency, the proposed method allows focusing on frequent terms used as nodes in user contexts, which allows early diagnosis of this disorder. In this study, we propose an attention-based architecture for node classification of graph-structured data inspired by recent works. The goal is to compute the hidden representations of each node in a graph by paying attention to its neighbours and using a method of self-observation. The attention architecture has several intriguing properties: (1) node neighbours can help achieve parallelisation between collocated word pairs; (2) variable weights can be used to improve import neighbours; and (3) the method aids in inductive learning (without prior data), since in our case the digital media with the new words develop the model, the model is able to generalise to unseen graphs. In particular, this article makes the following contributions:
(1)
We have proposed an attention-based word weighting scheme that is used to select the significant depression terms.
(2)
We have implemented Graph Attention-based inductive learning for depression symptom classification in mental health interventions.
(3)
We have performed extensive experiments to evaluate the generalisation of the learning system by reducing data annotation, and the results indicated that the designed model outperforms the state-of-the-art models.
The rest of the article is organised as follows: Section
2 summarises the works that are related. Section
3 describes the primary technique used to conduct the experiment, collect data, and build the model. Section
4 discusses the results and findings. Finally, Section
5 summarises the results and makes suggestions for further research.
2 Related Work
Social media data was used to analyse issues related to public health. Numerous health topics (including allergies, migraines, and obesity) were explored using Twitter data [
32]. Methods from computational linguistics have also been used to study mental illnesses such as attention deficit hyperactivity disorder. Research on depression has been conducted in a variety of fields, including psychology, medicine, and linguistics. They have attempted to explore the reasons, symptoms, and causes for the diagnosis of depression. Interactions between users leave traces that reflect their personality traits. The study of these traces is essential to a variety of disciplines, including social science, psychology, and marketing, among others. Studies of personality prediction based on online behaviour have increased substantially, but the focus has been on individual personality traits rather than relational components. This is despite the fact that the amount of research on this topic has increased significantly. As a direct result, information provided by users on platforms for social media has emerged as an interesting area of study for topics related to depression [
14].
Tweets from Twitter can be analysed to determine how depressing certain content is. Research has shown that individuals commonly publish information about their depression in generic terms, such as symptoms of depression, therapy, and other related topics. The authors draw a number of distinctions between Twitter users who suffer from depression and those who do not. People who are depressed are less likely to engage in social activities, are more likely to express themselves in a negative manner, and are more concerned with medical and religious issues. Nadeem et al. collected messages on Twitter from individuals who indicated that they had been diagnosed with depression over the course of a year [
27]. They conducted experiments combining bag-of-words and traditional classifiers.
A new model has been developed known as
Time-delayed Multifaceted Factorizing Personalized Markov Chains (Time-delayed Multifaceted-FPMC) [
15]. This model accounts for many different types of historical user actions, such as clicks, collections, and add-to-carts, in addition to past sales. The diminishing influence of previous actions is another factor that our model considers. Learning parameters of the proposed model can be done using this approach, thanks to its unified framework called Bayesian Sparse Factorization Machines. Here, the concepts of classical factorisation machines are applied to a more flexible learning framework and the time-delayed multifaceted FPMC is trained using the Markov Chain Monte Carlo method. Extensive evaluations conducted on a number of real-world datasets have shown that the technique is significantly more effective than a number of the existing purchase recommendation systems.
The study used a dataset consisting of 1.7 billion tweets. It then determined where each tweet came from with respect to data collected from 59 countries [
19], then provided a list of 22
Online Values Inquiries (OVI), each containing a unique set of questions from the World Values Survey. These questions cover a variety of topics, including religion, science, and abortion. The method has shown that the solution is quite capable of capturing human values online for a variety of countries and topics of interest. The technique also shows that certain offline values, particularly those related to religion, are significantly associated (up to c = 0.69, p = 0.05) with the corresponding online values in the digital domain. Our approach is general. Specialists in the social sciences, such as demographers and sociologists, will find it useful, because it allows them to create their own online values studies using the expertise and knowledge they already have. This will enable them to study human values in an online environment.
This deficiency will be addressed by research in the form of developing a predictive model for a holistic personality profile in
online social networks (OSNs). The model will combine socio-relational traits (attachment orientations) with traditional personality traits [
11]. Specifically, a feature engineering process has been developed that extracts a variety of features (taking into account behaviour, language, and emotions) from users’ OSN accounts. These features are then used to create a profile of the person. They have developed a
Machine Learning (ML) model that predicts users’ trait values based on the features extracted from user profiles. The proposed model architecture is influenced by psychological features; more specifically, it exploits the interrelationships that exist between the different parts of a person’s personality. As a result, it achieves a higher level of accuracy than current state-of-the-art methods. To demonstrate the usefulness of this method, we applied the model to two different datasets—typical online social network users and social media opinion leaders—and compared the psychological profiles of these two different groups. It is possible to make a clear distinction between the two groups by focusing on both Big Five personality traits and attachment orientations. The research presented points to a potentially fruitful line of research for further studies to categorise and characterise OSN users.
In addition, approaches based on
Deep Learning (DL) have been used to predict depression. The authors examined
Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) to detect Twitter users who are depressed [
30] and obtained competitive results. Written texts containing possible depressive signs are likely to be highly subjective. Therefore, some authors effectively exploited material to detect depressed users. The authors used sentiment analysis to determine the polarity of Twitter tweets. They found that individuals suffering from depression sent longer emotional tweets. The importance of emotions in posts about despair was also used to diagnose depression on Twitter and Reddit [
5]. Recently, another work developed a one-class classification strategy to detect sadness on Reddit [
1]. Other studies have examined the network topology of interactions to determine the presence of depression [
34]. They tested a probabilistic model that accounted for a variety of variables, including emotional expressiveness, language style, and social network aspects.
Another study used public Twitter data to examine psychological issues [
7]. They quickly collected data on anxiety, bipolar disorder, and seasonal affective disorder, among other mental illnesses. The classifier was developed to distinguish each group from the control group based on textual information [
21]. The classifier also detects correlations and extracts insights from quantitative and meaningful psychological Twitter signals. Lin et al. used a
Deep Neural Network (DNN) to detect stress. The authors used data from four microblogs to evaluate the effects of their proposed four-layer DNNs. Random Forest, SVM, and Naive Bayes are ML algorithms. Neuman et al. [
28] has developed a different approach, “Pedesis,” which uses NLP dependency parsing to decompose web pages containing fears and extract extended conceptual domains from metaphorical connections. Then, the domain knowledge defines the phrases used as depression metaphors. The vocabulary is used to independently evaluate the amount of depression in the text.
The predictive power of
Neural Networks (NNs) is determined by the hidden layers of the network as well as the architecture. Network tuning requires careful selection among the various available layers, topologies, and hyperparameters. In the training process of the optimized network, the input features have the potential to provide a higher-order representation of the vector [
4]. A more accurate representation of the features is taught to students to generalise and improve their ability to anticipate. In research on modern NNs, the network that has both the lowest computational complexity and the highest predictive capacity is often selected. In the past two decades, the number of architectural concepts has increased. The main differences between hidden layers, types of layers, shapes, and links between layers have been identified by Vijayakumar and colleagues [
35].
Using ML approaches, Wainberg et al. [
39] have shown how higher-dimensional features can be extracted from tabular data. CNNs are responsible for accumulating patterns from visual pixels. The network’s ability to learn and predict is enhanced by the information contained in the pixels and the diversity between pixels. The network is able to achieve this using the translation variable pixels. In natural language processing, such as machine translation, speech synthesis, and time series analysis, the RNN architecture has been developed and used for sequential data. This architecture has been used in several operations. [
41]. The RNN model can contain either an encoder or a decoder. The encoder is responsible for sequencing the input, while the decoder organises everything into a vector of some length. To process the input features, the model uses multiple portals, each of which depends on the loss function. The alignment of the input and output vectors is another problem that must be overcome when designing an encoder and decoder for an RNN. The order is determined by the values of the immediately adjacent elements. Another variation of the RNN algorithm is the creation of a new network called the attention mechanism [
4]. However, it uses the attention approach of the input vector to selectively assign weights to the many different inputs that are provided. This is done to improve the accuracy of the results. To provide an accurate representation of the features, the decoder can make use of the orientation of the context vector as well as the weights that are linked with it. This is done in accordance with the relative relevance and placement of the information at hand. The weights of the RNN model can be trained either through the architecture or the feature representation to make predictions. This learning process takes into account both the care weight and the context vector. An attention mechanism like this one can take many different network forms, such as a soft design, a hard design, or even a global form. They came up with the soft attention paradigm to reduce the amount of contextual information. The model’s typical hidden state was utilised throughout the construction of the context vector. This method helps to understand how the input characteristic is buried, which decreases the amount of lost information.
Other differences introduced by Luong et al. [
18] are local and global attention. Global attention is the middle ground between soft and intense focus. The model chooses the focus point for each input data batch, which helps speed up the convergence process. The position of the attention vector in the local attention model that includes a prediction function can be learned with the help of this. The techniques predict where the attention will be focused. At both the local and the global levels, it is necessary to conduct domain-specific data analysis to achieve computing efficiency in two completely distinct areas of attention.
When it comes to the world of medical imaging, the importance of image security cannot be overstated [
8]. A number of research projects have obtained medical imaging. The confidentiality of the images is preserved with no loss of data when encrypted. Due to the complexity, redundancy and capacity of the data, traditional encryption techniques cannot be easily applied to e-health data. This is especially true when patient data is transmitted over open channels. Patients run the risk of having their data compromised, because images are not the same as words in terms of data loss and confidentiality. Researchers have identified similar vulnerabilities and have responded by developing a variety of strategies for encrypting images. According to the study’s findings, currently available methods have application-specific security flaws. The results of this study provide the healthcare industry with a method for encryption that is both lightweight and effective. A lightweight encryption method protects medical images by using two different permutation techniques. Compared to more conventional encryption methods, the security level of the techniques and the time required to execute it are evaluated. Test photographs were used to evaluate the performance of the proposed algorithm. A series of tests have shown that the algorithm is more effective for the proposed image cryptosystem.
The demand for devices capable of performing tasks automatically and communicating with each other has increased as the world has progressed [
9]. To meet this demand, the concept of the
Internet of Things (IoT) was developed. The Internet of Things allows connected smart devices to communicate with each other over a network to perform a variety of tasks, including automating tasks and making intelligent judgments. The Internet of Things has enabled people to delegate some of their tasks to machines, which then take care of their surroundings and adjust their behaviour accordingly. This allows people to save time and energy. As you can see, these machines use sensors to collect important data, which is then analysed in a computing node. Based on the results of this analysis, the node determines how these devices should operate in an intelligent manner. IoT security has been enhanced by DL-based techniques to defend against attacks. However, IoT networks still pose a significant obstacle in industries with complex infrastructures. Such systems require a long training time to process a large dataset from the network’s past data flow. Decision trees, logistic regression, and SVMs are traditional DL methods. The work involves experimental research on cryptographic algorithms to characterise them as asymmetric or symmetric. It has been studied in terms of rate attack in real-time DL and complex IoT applications. The speed of encryption and decryption of certain encryption algorithms was evaluated using simulations. The tests involve encrypting and decrypting the same plaintext five times and comparing the average time. The maximum key size of each encryption algorithm is given in bytes. The average time each device takes to compute the algorithm is compared. In the simulation, a collection of plaintexts-password and a paragraph-sized text-achieves the targeted fair results compared to existing techniques in real-time DL networks for IoT applications.
Some other recent related works are summarised next. In Reference [
40], the authors use a polymorphic graph attention network for establishing good practice in Chinese
Name Entity Recognition (NER). The authors’ methodology can easily be mixed with both pre-trained as well as sequence encodings. Their results show support for multi-head attention and has really strong inference speeds. In Reference [
44], the authors propose TrajGAT, an embedded map graph attention network for vehicle trajectory data. Their results indicate strong performance against other state-of-th-art methods. While focussing on vehicular technology, their work as stated in their paper may be applicable in other domains. In Reference [
13], the authors propose GANLDA, a graph attention network used for disease prediction. The authors combine heterogeneous data on
long non-coding RNA (lncRNA) to help connect to diseases. Their novel work was shown to outperform many other state-of the-art methods in a 10-fold cross-validation procedure.
Text-based techniques for diagnosing depression in the context of mental illness can be found in Table
1 [
1,
4,
31,
42]. Multiple datasets (eRisk 2017/2018, Twitter, Amazon Turk services-based labelling, and Reddit) were used to evaluate the effectiveness of these methods. The methods discover a wide range of features that capture users’ verbal, behavioural, and emotional expressions. Our motivation for this approach stems from a theory in the field of psychology known as personality theory, which states that personality traits are reflected in a variety of online behaviours and actions. Since the proposed feature engineering method considers a larger number of traits than those studied in previous research, it is possible to perform psychological profiling that is more broadly applicable. By using a crowdsourcing platform, we were able to acquire a labeled dataset to which we could then apply and test our methodology. The approaches then created a baseline dataset that was tagged with their respective psychological profiles. We made extensive use of the Twitter
application programming interface (API) to collect tweets from the recruited participants’ Twitter accounts and apply the proposed feature engineering technique. To improve the predictions for individual features, the methodology described above uses a large set of the collected (psychologically related) features, carefully selects the subsets of these features that have the strongest predictive power for each feature, and exploits correlations between personality and relationship behavioural (i.e., social) features. In this way, the approaches not only predict the social components of a psychological profile along with the personality facets (which are not captured by existing personality prediction models), but also use the different attributes for more accurate holistic profile prediction. In this way, the social components of a psychological profile be predicted along with the personality facets (which are not captured by existing personality prediction models). This work shows how node- and edge-level hypergraphs can support the training approaches and tries to extend the trainable instances using a semantic extension strategy [
3]. The proposed model aims to reduce the data annotation overhead. Thus, the technique contributes to the generalisation of the system. Semantic vectors are classified based on hypergraph information obtained from the context in which they occur. Based on the semantic information, the resulting word embedding selects a subset of the unlabelled text. This method finds instances using unlabelled text extracted from the hypergraph learning process. The approach integrates the additional training points into the model training.
The vast body of NLP literature has a variety of techniques that have been proposed for emotion detection. Emotional knowledge-based (EKB) systems, however, have not yet been thoroughly researched. The word sense lexicon and the learned embedding in a variety of contexts are the two components that make up the EKB. Our first proposal was an embedding that integrates words that have different meanings, depending on the context with the depression lexicon (which is based on word sense) and emotional information from Internet forums. Words that can convey context and sentiments are the building blocks of emotional intelligence. The embedded sentence structure is a representation of what was learned. The retrieved embedding contributes to capturing the semantic composition of the text in a more meaningful way. Calculations are made using the linguistic patterns to determine the co-occurrence frequencies of the words that have been vectorised. The learned model is built from the one-of-a-kind word of the author and is represented with a vector whose length is constant. A word that is very similar can be found in the vicinity. The majority of the pre-trained embeddings are for communication that is used for broad purposes. As a result, pre-trained models cannot be applied to the emotional analysis. The bespoke mental health model is trained with the help of the word sense model and the transfer learning method. This helps us to expand the corpus. The primary reason for this is that the majority of the embedding is trained on open-source data, including the texts of Wikipedia and the data from Twitter, which are used to determine sentiment. The meaning of the word “feelings” is conveyed by the words “emphasis sad” and “emphasis pleased.” However, these words denote a very distinct mental state. Because of this, it is necessary to extend the embedding by utilising word sense.
4 Experimental Result and Analysis
In this study, we used the processing settings listed in Table
3. After first performing the preprocessing described in Section
3.2, we created the custom depression embedding and then performed the graph attention-based node feature extraction in Section 1. These steps were performed in the order described in the section preceding this one. The embedding is used to categorise the data from Twitter.
In this research, we used processing configurations as given in Table
5. The results of the
fasttext-base model are shown in Figure
4. The trend plot illustrates that as the training, development, and test sets approach the upper left corner of the precision curve, the model overperforms. This is because the training, development, and test sets converge as they approach the upper left corner. To determine how well the algorithm performs, we computed the ROC curve, precision, recall, and F-measure to determine how well the algorithm performs in terms of precision, recall, and F-measure. To compare the performance of multiple classifiers, it may be beneficial to combine the performance of each classifier into a single metric. This is because it allows the performance of the classes to be compared. A typical method for determining the area under the ROC curve, or AUC, is to determine the area of the curve under the ROC curve. Thus, it is comparable to the probability that a random sample of positive observations will be ranked higher than a random sample of negative observations, i.e., it is comparable to the Wilcoxon rank sum statistic for two samples. It is possible that a classifier with a high AUC will occasionally perform worse than one with a lower AUC at a given location. Nevertheless, AUC is an excellent comprehensive indicator of the accuracy of a predictive model. Because the number of instances for each symptom is not the same, we chose to use both approaches, macro and micro. There is a significant difference between macro and micro averaging. In macro averaging, each class is weighted equally, whereas in micro averaging, each sample is weighted equally. If each class has the same number of samples, then the result for macro-averaging is the same as for micro-averaging as long as each class has the same number of samples. The model used a graphical word representation, regardless of the context in which it was used. It is clear that when words are used consecutively, they have different meanings. When words are combined with lexicons to form a single representation, the meaning of the term may be distorted in a way that is detrimental to the term. The embedding approach must accept and capture multiple meanings of the same word, even if the resources are present in the imbalanced training corpus, for it to be successful. For the embedding approach to be effective, it is important that it does so.
An emotional lexicon consists of a collection of words with emotional connotations. The accuracy of a prediction based on unreliable data will be lower than that of a prediction based on reliable data. To estimate and reduce the noise, we use a Bi-LSTM network. LSTM networks are generally used to predict a time series of data. Bi-LSTM networks are bidirectional, unlike LSTM networks. Since the forward LSTM network learns from past data in the upward direction, it uses the input values at that time, the gate weights of the LSTM cells, and the forward or backward output. Like the forward LSTM network, the backward LSTM network also learns from future values in the reverse direction, using the inputs at the time, the gate weights of the LSTM cells, and the forward/ backward output. The output gate is responsible for storing the data related to the bidirectional steps taken.
A description of the hyper-tune architecture setting can be found in Table
6. Each size applied to the text tensors has an embedding dimension of 300. The matrices have the same number of columns, regardless of the number of words that are being processed simultaneously. Max-pooling layer output is of the same size as the output of this layer. Each of the outputs created by the various filters that had the same size is applied to this layer using the max operation. All outputs are merged into a single feature. By using the max operation, we are guaranteed to have recorded the most important aspect of the sentence or document. The primary benefits of combining resources are the following: The number of parameters has been reduced significantly, resulting in a reduced calculation cost and a reduction in overfitting. As a result of concatenating the feature vectors that have been independently derived from the text tensors, the final feature map will be produced. It will contain the most essential and conspicuous characteristics of the text-based feature vector extraction. We made use of four dense layers to achieve a more gentle dimensionality reduction. In addition to the first and second dense layers, there is a third dense layer containing 64 neurons. As a result of the presence of neurons in the final dense layer, binary classification is possible. Each dense layer has a half probability of dropping out. As a result of the ReLU activation function, the first three dense layers are responsible for binary classification. However, the Softmax function is responsible for handling the final dense layer.
The content of online communications defines particular emotion keywords, which is why the filters are applied first to the text part of the communication. Afterwards, feature vectors are concatenated together using features derived from both branches belonging to the same text modality to extract features from the concatenated feature vectors. Last, but not least, is that to converge the framework into a classification system that can handle multiple labels, multiple dense layers are added with a gradually decreasing number of neurons for a softer reduction in dimensionality. By doing so, we ensure that all of the distinguishable features that are used to classify an object can still be preserved. As a result of the combination of a semi-supervised attention neural network with a self-assembling architecture, it is possible to perform optimal feature engineering out of both labelled and unlabelled corpora in addition to a supervised neural network. Therefore, we have been able to obtain the dimensions of the layers of the neural network to construct the ultimate configuration of the network.
In this study, a dense layer was used to generate a vector of selected features provided in parallel with the Bi-LSTM. An activator and regulariser are used to complete the normalisation process, which helps reduce overfitting by using a regulariser. To start the network, the next layer merges the vectors to build the network. To solve this problem, you need to consider the number of hidden neurons in each Bi-LSTM unit. In the context of the attention layer, two tasks need to be done. First, we need to integrate the output of the upstream layer into the downstream layer. This is a relatively straightforward process to perform. In the parallel structure, there must be a layer that assembles the output data in channel order. The reason for this is that the input data for the parallel structure already contains the channel dependency information. As is well known, there are a number of different inputs to the different representations. Therefore, the other task that needs to be done is to filter the essential representations for detection. This is done by redistributing the weights of the representations according to a certain mechanism. Based on data inputs from multiple channels of Bi-LSTM networks, the attention mechanism computes the most recent hidden state and attention score vectors based on the most recent hidden state information. After determining the score values of the representations, we apply the dot product function to determine the score values of the representations. Therefore, we use the dot-product focus, which is more time-saving in this case. Following the normalisation process, each of these scores is normalised using a Softmax function. A context vector is formed by aligning and summing the vectors to create the final vector.
When using RNN with GRU, experiments have shown that long-term memory cells perform well in the sequential task. The output layer of a bidirectional LSTM architecture can preserve the last step length of the hidden state. Moreover, we used the bidirectional LSTM architecture that reads input token lists from the end and sets a parameter for forward unrolled LSTMs. Consequently, the position of each token has two input states that are concatenated to produce the output state, which can also be extended to the attention layer. The dropout ratio was set to 0:5 to prevent overfitting of the LSTM layer.
As in Figure
5, the
Bidirectional LSTM high precision-recall curve value of 0.81 has been achieved by the model. By using the bidirectional technique, it is possible to go forward and backwards while capturing the context at the same time. As a result of the hidden states, the relational meaning of the lexicon and graph structure is retained and stored. Consequently, there are very few false positives as a result of this method. With the embedding feature of the model, which incorporates verbal sentiment, the performance of the model has improved; however, the regularisation and averaging of the model are affected by long-range text relations. It has been found that the models in some batches are not able to differentiate between symptoms and lower classes. In the case of increasing text and sentiments overlapping within each sentence, the model does not perform optimally when the text is increased.
The semantic vectors are arranged according to the semantic information obtained from the context in which they are located. By using the available semantic information, the created similarity metrics can be used to make the process of selecting an unlabelled text subset clearer. The unlabelled text is filtered out using the proposed approach and then included in the subsequent cycle of the active learning process. Previously, the unlabelled material was separated out and ignored. Thanks to this strategy, the training of the model is improved by incorporating the newly obtained training points. This cycle continues until the best solution is found. Then, all the unlabelled text is converted into the training set, which serves as the basis for the algorithm. The expanded lexicon of emotions helps reduce overfitting and generalisation due to the expanded lexicon of emotions. A vector representation with nodes and an edge-based hypergraph are supported by the attention-based system structure. A node’s attention contributes to the context of the text, while an edge’s attention contributes to the emotional trigger represented in the text. As an online adaptive intervention, this technology can be used in virtual sessions with mental health patients to facilitate the recovery process.
The
Bidirectional LSTM with attention outperformed
fastext and
Bidirectional without attention in Figure
6. As a result of the chosen batch weights and the attention mechanism, learning the sent words is easier. The segmentation of sentences according to the importance of each word contributes to the improvement of the results. The results illustrate that attention is capable of capturing the temporal fluctuations associated with emotional phrases when it is equipped with a location prediction function. As a result of the lower error in the training and development sets relevant to the test set, the model should be able to generalise more effectively. It is evident from the high positive and false negative rates of the training data that global attention contributes to the expansion of the training dataset. As a result of this, it can be concluded that local word location plays a major role in learning emotional segments and that sentence attention plays a major role in context learning.
There is a text written by a patient that describes him as a whole, using keywords that describe what he is. In a first step, the screens are applied in parallel to the text. Then, the extracted features are concatenated by early fusion to correspond to the same text modalities, since the features extracted from both branches correspond to the same text modalities. It is proposed that successive dense layers with a decreasing number of neurons can be added for finer dimensionality reduction to converge the architecture to a categorical classification system such that all specific information used for classification is preserved. A semi-supervised graph attention neural network coupled with a temporal self-assembling architecture enables optimal feature extraction from both labeled and unlabelled corpora using the semi-supervised graph attention neural network.
As can be seen in Figures
7 and
11, the outcomes are improved by using the attention method, which uses the predictive capacity of the words’ locations to improve the accuracy of the results. Furthermore, it should be noted that the incorporation of lexicon-based neighbours also contributes to an overall improvement during the application of the attention technique, uncertainty sampling can be used as a means to assist in the identification of unlabelled occurrences that occur at the decision border if the attention technique is used. Based on the level of confidence in the model, a classification can be made. The model can be used by psychiatrists to determine how the patient’s emotional experiences relate to their mental symptoms and how those experiences are related to the patient’s emotional experiences. To assist in the identification of symptoms and the classification of the phrases, the graph-based approach makes use of attention nodes and lexicons to assist in the identification and classification of the phrases. It can be implemented as a computer-based method for Internet-based therapies and provides the psychiatrist with the assistance he or she needs to evaluate the patient’s notes during therapy.
In this way, we were able to set up a neural network with the required number of layers and the required size. Thanks to the proposed semi-supervised architecture, ensemble predictions can be made for unknown labels that are as accurate as the actual labels themselves. Considering that the ensemble predictions of unknown labels by the proposed semi-supervised architecture are as accurate as the actual labels, this is also to be expected. Consequently, training the model with a relatively modest number of labeled samples yields good results, and increasing the amount of labeled data will not significantly change the performance of the model, as it is trained with a modest number of labeled samples. Experiments have been conducted with the proposed framework on datasets to show that the model is able to categorise text and achieve excellent performance even with a small amount of training data. Although the proposed system has demonstrated its superiority on a number of parameters, it still has some limitations or weaknesses that need to be addressed. We have optimised our recognition algorithm to handle patient-authored text data, but it could also be used in future studies with text data from other sources. This system does not consider the trustworthiness or authenticity of the source, which can also play a critical role in detecting underlying mental health problems. Instead, our approach uses a multi-category analysis system to classify texts into nine different identification symptoms. However, the solution can be further improved by classifying the text based on a rating or scale to make it more robust. To make the system useful in the real world, it can be improved as a standalone application or as a browser plugin. Using web scrapes of multiple versions of a patient’s story, it is possible to gain a deeper understanding of the purpose of the story by analysing the timeline.
Since the dataset is not very large, the preprocessing steps—such as tokenisation, sentence splitting, dependency parsing, and negative example selection—can have a significant impact on the performance of the models. The best F1 score we can obtain on the test set is 90%, as shown in Figures
8 and
9. We will need to re-implement this model to apply it to the pre-processed data version when we apply it to the re-implemented model. Since both models use the same method to partition the data, it is possible to attribute the significant performance differences to changes in the preprocessing of the data that are responsible for the significant performance differences. Preprocessing is responsible for significant performance differences. This again highlights the importance of using the same preprocessed data to evaluate the efficiency of different models based on the same preprocessed data.
Lexicon expansion, discussed in Figures
8 and
9, leads to conclusions based on the data not labeled with and without lexicon expansion. The results show how the learning cycle proceeds under semi-supervised conditions and how it is affected by the environment. Due to the limited space available and the enormous number of emotion types that can be recognised, we found that emotion recognition is difficult. Emotion recognition phrases usually consist of very long words with specific terminology built into them, making recognition difficult. It was also difficult to determine the meaning of the word “emotion,” because it occurs in a limited number of contexts but covers a wide range of applications in a variety of contexts. In more than half of the cases, the term in question refers to a context rather than its actual meaning. This presented quite a challenge when it came to completing the work.
In this way, it is clear that the time that has passed after the event has an influence on the number of error years recorded for the events that occurred within a narrower period after the event. In addition to the emotional event itself, there are a number of other factors that are also important, such as the intensity of the emotional event and its duration. In summary, the temporal expressions detected in the texts written by patients were arranged in descending order according to the probability score determined for each expression. The score of a lexicon determines the position of an expression in the text; the higher the score, the higher the position of the expression in the text. By considering the temporal expressions, an estimate of the time required to pay attention to the document can be made. It is important to note that the methods used to divide the documents have an impact on the degree of accuracy with which the duration of emotion concentration can be calculated. However, the time interval between the time of the appointment and the time of occurrence has no discernible influence on the precision values caused by the time interval between the time of the appointment and the time of occurrence. The second method of evaluation, looking at the average error over the years, needs further study. It is a measure calculated from the difference between the estimated and classified emotions.
Thus, it is clear that the number of error years recorded for events that occurred within a narrower time frame after the event is affected by the length of time that has elapsed since the event. There are a number of aspects that play a role in this, not the least of which is the emotional experience itself and its intensity and duration. The temporal expressions in the text written by the patient were ordered in descending order of their probability rating. The position of a word or phrase in the text is determined by its score in a lexicon; a higher score indicates a more prominent placement in the text. This highlights a particular prominent part of the text. A rough estimate of the time spent reading the document can be made by looking at the phrases that refer to time. The precision with which the amount of time spent in emotional intensity can be calculated depends on the methods used to divide the documents. However, the precision values are not significantly affected by the time interval between the date and the event. The second evaluation approach, which calculates an annualised error rate, also deserves a closer look. This is a metric derived from the comparison of estimated and classified emotional states.
The diversity and consistency aspects of lexicons, however, help to focus attention on the most informative words by assigning higher relevance values to these words, as mentioned in Table
7 and shown graphically in Figure
10, respectively. For this reason, the process of aggregating vector representations can learn more effectively. Even if the phrases that trigger the alarm are interwoven with a number of other words, the most informative terms in an emotion lexicon are highlighted by assigning them higher relevance values, this is how the lexicon works. The representation aggregation approach is thus able to learn additional true hidden vectors. The diversity and consistency of dictionaries highlights the most informative terms by assigning them higher relevance scores. Therefore, this method of aggregating vector representations can learn faster and better than any other. You do not need to worry if the phrase that triggers the alert consists of several different words. The most informative phrases in an emotion lexicon are still highlighted by assigning them a higher relevance score, as is the nature of dictionaries. Consequently, the representation aggregation method can be used to learn more true hidden vectors.
The highest score this year goes to the expansion of the emotion lexicon. The emotion monitoring model assumes that any sentence that contains all the essential emotion expressions as well as the corresponding trigger words is likely to convey the meaning of the event. There is a high probability that the emotions expressed in that line will also play a similar role in that event. To begin, we take the time to carefully evaluate the accuracy of the automatically labeled data. From the automatically labeled information, we randomly selected 500 samples. It is important to note that the selected samples are statements with highlighted triggers, labeled arguments, and relevant event types and argument role assignments for each statement. Adding automatically generated and labeled data to the training sets is one way to extend the training sets. We then determine whether the performance of the event extractor trained with these extended training sets has improved as a result of the extended training sets. In this way, the effectiveness of the proposed method can be automatically demonstrated to show that it works. In this study, emotion expansion was used to remove distracting word triggers and expand nominal triggers by filtering out distracting word triggers. The fact that event identification results are higher when the feature set is used than when it is not suggests that using the feature set is also beneficial when expressive neural methods are used to identify emotional information when the feature set is used. Given the large differences between arguments, it is abundantly clear that the features outlined for extending the argument language are also beneficial because of these differences.
The test has a high rate of truly positive outcomes because of its strong. As a result of these findings, there is no evidence that there are key phrases that contribute more to the classification of depression symptoms and that are crucial for understanding them. By focusing on particular words that contribute more than others, the network also helps to minimise the amount of computing power that must be used to perform the computation. It is because of the fact that the model is aware of the target word in the task and has learned both the positive and negative connotations associated with the subject matter. Due to the complexity of the data related to mental health, there is a possibility that grammatical and lexical variations can have an impact on performance.
As in Figure
11, the attention-based probability and the triggered word rules in the sentence may assist in treatment. The visualisation helps the psychiatrist to see the classification reasoning as words, i.e.,
depression,
heaven, and
peace for the diagnostic decision. As you can see, the highlighted weights represent two symptoms, S2 (feeling down, depressed, or hopeless) and S3. It is significant that the model successfully recognises the triggering rules and highlights the words that help psychiatrists make notes and diagnoses in their notes. The visualisation makes it easier for the user to see the context behind the extraction of symptoms, resulting in a more understandable and less complex attention model. The proposed model allows for a better understanding of the reasoning behind the prediction. When a psychiatrist detects an irregularity in the prediction model, he or she can identify it by the word attention, which gives an indication of the intent supporting the decision-making of the prediction model.
Table
8 shows the comparison with the state-of-the-art of models [
2,
3,
4]. The goal of this article is to investigate how depression symptoms can be extracted from mental health interventions using
Natural Language Processing (NLP) and attention-based active learning techniques with high entropy. To achieve this goal, we propose an approach based on synonym expansion using semantic vectors. Based on the semantic information obtained from the context in which the semantic vectors occur, we classify them into groups according to the semantic information they convey. For the similarity metrics semantic information is used to help select a subset of text that has not yet been labeled. The unlabelled text is extracted in this way and then used in the subsequent cycle of the active learning mechanism to learn something new. As a result of the additional training points, the procedure performs an update of the model training using the additional training points. Until the optimal solution is found, the cycle repeats until all unlabelled text has been converted into the training set. To increase the size of the instances that can be trained, the researchers plan to use a semantic clustering approach to help them achieve their goal. This strategy helps to generalise the learning system as well as reduce the workload associated with annotating data during the learning process. As a result, our model achieves a lexicon expansion f1 of 0.90, which is in contrast to another model that achieves relatively high accuracy with high explainability.
A semantic clustering technique [
4] is used to increase the number of cases that can be trained. For this purpose, the author proposes a strategy based on synonym expansion using semantic vectors. The linguistic vectors are clustered based on contextual information extracted from the environment in which they exist. Using semantic information, the resulting similarity metrics allow the selection of the set of unlabelled texts. Using the linguistic features of real texts written by patients, the study proposes a fuzzy classification, the Deep-Attention-based model, which magnifies emotional lexicons [
2]. Over time, the active learning approaches can augment the learned dataset and fuzzy rules.
As a result, the approach may facilitate labelling efforts for mental health applications by simplifying the process. Consequently, the proposed model can address issues related to per-class language length, data sources, and development techniques and provide a benchmark for each performance level based on per-class language lengths, data sources, and development techniques. By presenting weighted terms in this article, we are also able to provide explainability. Another work aims to construct graph attention networks (GATs) that use mask self-attention layers to overcome the problem of text classification in the context of depression [
3]. The networks assign meaning to each node in a neighbourhood based on the characteristics and emotions of its neighbours without using complex matrices such as similarities or architectural knowledge, which can be very costly. In the context of this study, hypernyms are used to expand the emotional vocabulary. There is no comparison between the architecture and its competitors. As a result of the experiment, we found that combining an emotion lexicon with an attention network yields an f-value of 0.87 while remaining interpretable and transparent.
4.1 Discussion
Openness is the tendency to have an open mind and imaginative ideas. When we focused on individual traits and their relationship to emotional attributes, we found that leaders had a higher score than normal users. A willingness to be open to new experiences is another trait typically associated with individuals who have achieved high levels of social influence. These results were to be expected. Despite the presence of a social environment, individuals in our sample performed relatively poorly on symptoms 8–9. This is inconsistent with the fact that the environment was highly social. This suggests that the actions associated with these traits are not easily reflected online and therefore are not easily detected. For the other traits, we find that ratings are high not only for fear but also for avoidance. One possible explanation for this finding is that the individuals who make up our sample do not intend to cultivate fruitful connections with other people on the platform and instead use it solely as a means to publicise their own efforts and accomplishments and to interact with other users whose content seems to contain interesting cognitive material.
Psychiatrists are able to create and prescribe the appropriate treatment program using symptom-based visualisation and symptom-based probability. The NLP provides an elegant technique for adapting and teaching imagery, while the IDPT systems are helpful in computerised exercises for psycho-education. Both the LSTM and attention models help achieve high accuracy in symptom prediction. The bidirectional LSTM model, which included an output-attention layer, performed exceptionally well when asked to classify multiple labels of symptoms. The active learning model is capable of increasing one’s knowledge level over time. Our model achieved a ROC score of 0.85 and helped in visualising concepts based on attention that helps in recommending symptoms. The method presented takes advantage of adapting IDPT systems to perform psycho-educational activities. These systems automatically learn from texts written by patients. The modified intervention generates individualised feedback on the activities that are recommended in response. In the future, we will seek to incorporate a character-level text classifier. In addition, strict regulations can help improve performance while reducing overfitting. These results point to an interesting area of research where our methodology could be valuable, namely, in-depth analyses of possible differences between the behaviour of different types of people in the real world and in the atmosphere of a therapist. This could point to groups or clusters of people who tend to have a persona that is markedly different from their actual personality, as well as possible variables that give rise to this phenomenon. While these initial results illustrate the utility of our work in this direction (a more comprehensive study is beyond the scope of this article), we believe they may stimulate further research on this topic, for example, by considering alternative user groups (such as artists, politicians, etc.) or classification methods.
4.2 Limitations
Our framework imposes some constraints, which we will now address. Using the probability technique may magnify the instability of the pre-trained network due to the instability of re-training big models. The model should be trained multiple times with a variety of initialisations and then selected based on how it performs on development data. The fragmentation of context results from this method of chunking text. An example would be a sentence that is severed in the middle, resulting in the loss of a large amount of meaning. Another way to say it is that the sentences are not taken into account when the text is broken up. Semantic-aware techniques should be used to deal with the segmentation of sentences.
4.3 Omitted Design Elements
The model by itself cannot be interpreted, along with a necessity for a vast amount of data and powerful computational resources to solve the problem. It is suggested that the vocabulary and lexicon expansion with references to the problem should be carefully selected. In the absence of domain and the relevancy of the domain, probability-based pseudo labelling did not produce good results. To minimise the impact of uncertainties during both optimisation and decision-making processes, uncertainty quantification (UQ) methods play a pivotal role in reducing the impact of uncertainty. For learning model training, validation, and reproducibility to be successful, uncertainty quantification is crucial. It is essential to have a statistical framework that is semantically aware to be able to compare model output with the observed data to perform a global parameter search.