Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = big blog data

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 888 KiB  
Article
Exploring Sentiment Analysis and Visitor Satisfaction along Urban Liner Trails: A Case of the Seoul Trail, South Korea
by Sumin Lee, Won Ji Chung and Chul Jeong
Land 2024, 13(9), 1349; https://doi.org/10.3390/land13091349 - 24 Aug 2024
Cited by 1 | Viewed by 1018
Abstract
Increasing public health awareness has stressed the significance of the mental and physical benefits of outdoor activities. Government involvement and support for urban redevelopment projects in Korea, such as Seoul Dulle-gil, connected previously disconnected green spaces. Despite the ecological and cultural importance of [...] Read more.
Increasing public health awareness has stressed the significance of the mental and physical benefits of outdoor activities. Government involvement and support for urban redevelopment projects in Korea, such as Seoul Dulle-gil, connected previously disconnected green spaces. Despite the ecological and cultural importance of urban spaces, their impact on residents and tourists and their role in exploring the city’s dynamic remains limited. This study aims to evaluate how green space activities engage in sustainable land management and offer insights into surrounding communities. A quantitative big data research method was employed, analyzing 3995 online blog post reviews using Python code, and sentiment analysis conducted with pandas and KoNLPy’s Okt library. The results indicated that sentiment scores were generally higher in sections located south of the Han River. Among the eight trail courses, courses 6, 3, 4, and 5, located south of the Han River, exhibited higher sentiment scores compared to courses 7, 8, 2, and 1, located north of the Han River, which showed lower satisfaction levels. Among the 16 characteristics influencing visitor satisfaction, the study emphasized the importance of potential space maintenance to enhance trail user safety and community well-being, contributing to sustainable land management. Full article
(This article belongs to the Special Issue Public Spaces: Socioeconomic Challenges)
Show Figures

Figure 1

Figure 1
<p>Satisfaction for each course of the Seoul Trail based on sentiment scores. Note: Red color illustrates high sentiment scores with high satisfaction and blue color identifies low sentiment scores with low satisfaction. 1: Course 1 through Su-raksan and Buramsan, 2: Course 2 through Yongmasan and Achasan, 3: Course 3 through Godeol and Iljasan, 4: Course 4 through Daemosan and Umyeonsan, 5: Course 5 through Gwanaksan Moun-tain, 6: Course 6 through the Anyangcheon stream, 7: Course 7 through Bongsan and Aengbongsan and 8: Course 8 through Bukansan. Source, Authors’ recreation.</p>
Full article ">
16 pages, 2955 KiB  
Article
Analyzing Trends in Digital Transformation Korean Social Media Data: A Semantic Network Analysis
by Jong-Hwi Song and Byung-Suk Seo
Big Data Cogn. Comput. 2024, 8(6), 61; https://doi.org/10.3390/bdcc8060061 - 4 Jun 2024
Viewed by 825
Abstract
This study explores the impact of digital transformation on Korean society by analyzing Korean social media data, focusing on the societal and economic effects triggered by advancements in digital technology. Utilizing text mining techniques and semantic network analysis, we extracted key terms and [...] Read more.
This study explores the impact of digital transformation on Korean society by analyzing Korean social media data, focusing on the societal and economic effects triggered by advancements in digital technology. Utilizing text mining techniques and semantic network analysis, we extracted key terms and their relationships from online news and blogs, identifying major themes related to digital transformation. Our analysis, based on data collected from major Korean portals using various related search terms, provides deep insights into how digital evolution influences individuals, businesses, and government sectors. The findings offer a comprehensive view of the technological and social trends emerging from digital transformation, including its policy, economic, and educational implications. This research not only sheds light on the understanding and strategic approaches to digital transformation in Korea but also demonstrates the potential of social media data in analyzing the societal impact of technological advancements, offering valuable resources for future research in effectively navigating the era of digital change. Full article
Show Figures

Figure 1

Figure 1
<p>Data collection and analysis process for digital transformation.</p>
Full article ">Figure 2
<p>CONCOR analysis of news network of the digital transformation.</p>
Full article ">Figure 3
<p>CONCOR analysis of blog network of digital transformation.</p>
Full article ">
16 pages, 1770 KiB  
Article
A Study on MBTI Perceptions in South Korea: Big Data Analysis from the Perspective of Applying MBTI to Contribute to the Sustainable Growth of Communities
by Hyejin Lee and Yoojin Shin
Sustainability 2024, 16(10), 4152; https://doi.org/10.3390/su16104152 - 15 May 2024
Viewed by 2509
Abstract
This study aimed to assess the potential contributions of the Myers–Briggs Type Indicator (MBTI) to the sustainable growth of communities by conducting a comprehensive analysis of social perceptions of the MBTI in South Korea through big data analysis. The investigation encompasses three primary [...] Read more.
This study aimed to assess the potential contributions of the Myers–Briggs Type Indicator (MBTI) to the sustainable growth of communities by conducting a comprehensive analysis of social perceptions of the MBTI in South Korea through big data analysis. The investigation encompasses three primary stages: data collection, preprocessing, and analysis, involving text mining, network analysis, CONCOR analysis, and sentiment analysis. A total of 31,308 text data pieces (13.73 MB) from various sources, including news, blogs, and other sections of Naver and Google, over the past three years, were collected and analyzed using the keyword “MBTI”. Tools, such as Textom SV, UCINET, and NetDraw, were employed for data collection and analysis. The study’s key findings include the identification, through term frequency (TF) and TF-inverse document frequency analyses, of top-ranking terms, such as 16Types, 4Indicators, Test, Myself, OthersMBTI, Situation, and Contents. The CONCOR analysis further revealed six clusters, encompassing themes like interest in MBTI personality tests, application of 16 types in daily life, MZ’s MBTI consumption patterns, trending of MBTI characters, extension to K-Test, and professional use of MBTI. Moreover, sentiment analysis indicated that 68.5% of individuals in South Korea expressed a positive sentiment towards MBTI, while 31.5% conveyed a negative sentiment. The specific emotions identified included liking (Good Feeling), disgust, and interest, in order of prominence. In light of these findings, this study delineates a spectrum of perceptions regarding MBTI in South Korea, encompassing both positive interests and negative concerns. To ensure the responsible use of MBTI, it is imperative to implement reliable scientific testing and education, mitigate the potential harm of stereotyping, and reshape social perceptions surrounding MBTI usage. Only through these measures can MBTI genuinely contribute to the sustainable growth of communities without being confined to limiting stereotypes. Full article
Show Figures

Figure 1

Figure 1
<p>TF word cloud.</p>
Full article ">Figure 2
<p>Results of MBTI network visualization.</p>
Full article ">Figure 3
<p>CONCOR analysis results.</p>
Full article ">Figure 4
<p>Results of sentiment analysis.</p>
Full article ">
31 pages, 32120 KiB  
Article
The Impact of Personality and Demographic Variables in Collaborative Filtering of User Interest on Social Media
by Marwa M. Alrehili, Wael M. S. Yafooz, Abdullah Alsaeedi, Abdel-Hamid M. Emara, Aldosary Saad and Hussain Al Aqrabi
Appl. Sci. 2022, 12(4), 2157; https://doi.org/10.3390/app12042157 - 18 Feb 2022
Cited by 1 | Viewed by 2550
Abstract
The advent of social networks and micro-blogging sites online has led to an abundance of user-generated content. Hence, the enormous amount of content is viewed as inappropriate and unimportant information by many users on social media. Therefore, there is a need to use [...] Read more.
The advent of social networks and micro-blogging sites online has led to an abundance of user-generated content. Hence, the enormous amount of content is viewed as inappropriate and unimportant information by many users on social media. Therefore, there is a need to use personalization to select information related to users’ interests or searchers on social media platforms. Therefore, in recent years, user interest mining has been a prominent research area. However, almost all of the emerging research suffers from significant gaps and drawbacks. Firstly, it suffers from focusing on the explicit content of the users to determine the interests of the users while neglecting the multiple facts as the personality of the users; demographic data may be a valuable source of influence on the interests of the users. Secondly, existing work represents users with their interesting topics without considering the semantic similarity between the topics based on clusters to extract the users’ implicit interests. This paper is aims to propose a novel user interest mining approach and model based on demographic data, big five personality traits and similarity between the topics based on clusters. To demonstrate the leverage of combining user personality traits and demographic data into interest investigation, various experiments were conducted on the collected data. The experimental results showed that looking at personality and demographic data gives more accurate results in mining systems, increases utility, and can help address cold start problems for new users. Moreover, the results also showed that interesting topics were the dominant factor. On the other hand, the results showed that the current users’ implicit interests can be predicted through the cluster based on similar topics. Moreover, the hybrid model based on graphs facilitates the study of the patterns of interaction between users and topics. This model can be beneficial for researchers, people on social media, and for certain research in related fields. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>General System Architecture for Mining User Interest from Social Medial.</p>
Full article ">Figure 2
<p>Proposed Approach to Find Similar Users and Topics.</p>
Full article ">Figure 3
<p>Proposed Model.</p>
Full article ">Figure 4
<p>Users-Topic Heterogeneous Graph.</p>
Full article ">Figure 5
<p>The Proposed Clustering Idea.</p>
Full article ">Figure 6
<p>The Proposed Schema of Heterogeneous Graph.</p>
Full article ">Figure 7
<p>Link Types in Heterogeneous Graph.</p>
Full article ">Figure 8
<p>The Extracted LDA topics.</p>
Full article ">Figure 9
<p>Lengths of Interest Topics.</p>
Full article ">Figure 10
<p>Optimal k Value based on Elbow Method.</p>
Full article ">Figure 11
<p>The sample of Heterogeneous Graph in Plotly Library.</p>
Full article ">Figure 12
<p>Graph with Functionality.</p>
Full article ">Figure 13
<p>Graph-Types Scenario.</p>
Full article ">Figure 14
<p>Interest Topic Scenario.</p>
Full article ">Figure 15
<p>User Interests Scenario.</p>
Full article ">Figure 16
<p>Relationships between Users Scenarios.</p>
Full article ">
16 pages, 2307 KiB  
Article
A Network Approach to Revealing Dynamic Succession Processes of Urban Land Use and User Experience
by Minjin Lee, Hangil Kim and SangHyun Cheon
Sustainability 2021, 13(21), 11955; https://doi.org/10.3390/su132111955 - 29 Oct 2021
Cited by 1 | Viewed by 2478
Abstract
One significant challenge to understanding the mechanisms of urban retail areas’ transition is limited data to trace a dynamic perspective of influential actors’ experience in an extended urban area. We overcome this gap by employing text mining to collect big text data from [...] Read more.
One significant challenge to understanding the mechanisms of urban retail areas’ transition is limited data to trace a dynamic perspective of influential actors’ experience in an extended urban area. We overcome this gap by employing text mining to collect big text data from online blogs and propose a methodology to explore the dynamic spatial transformations and interactions across multiple adjacent retail areas. We study five retail areas that currently function as a major commercial hub in Seoul—the Hongdae area and its neighboring districts. We create co-occurrence networks of the text data to capture representative place images and user experiences. Our blog-word networks systematically capture the “invasion-succession” process in land-use transition during the commercialization of Hongdae’s neighboring districts. The process mirrors the history of spatial change in the areas, which once formed a small-scale, bohemian hip neighborhood that incubated indie culture and has now fully commercialized as a global tourist attraction. The commercial transition triggered by Hongdae’s cultural capital peaked with consumer experiences of “food and eating” dominating the whole area. Finally, the text networks signal gentrification in each commercial district near Hongdae, contributing to the current discourse on commercial gentrification by adding consumers’ perspectives. Full article
(This article belongs to the Section Sustainable Urban and Rural Development)
Show Figures

Figure 1

Figure 1
<p>Map of Hongdae and neighboring districts.</p>
Full article ">Figure 2
<p>Temporal semantic categories of Hongdae, Seoul, and Hongdae-neighboring areas.</p>
Full article ">Figure 3
<p>Rank distribution with normalized degrees for five districts and five years. To compare the trends of networks of different sizes, the degree was normalized with the total number of nodes in a network. The normalized degree was <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>k</mi> <mo>˜</mo> </mover> <mo>=</mo> <mi>k</mi> <mo>/</mo> <mfenced> <mrow> <mi>N</mi> <mo>−</mo> <mn>1</mn> </mrow> </mfenced> </mrow> </semantics></math>, where N was the number of nodes. Each graph shows networks of different years (<b>A</b>) year 2004, (<b>B</b>) year 2007, (<b>C</b>) year 2010, (<b>D</b>) year 2013, (<b>E</b>) year 2016. In (<b>d</b>) and (<b>e</b>), (1) and (2) boxes highlight the high-degree regime and low-degree regime respectively.</p>
Full article ">Figure 4
<p>Network visualization of community separation for Hongdae (<b>a</b>), Sangsu (<b>b</b>), Hapjeong (<b>c</b>), Yeonnam (<b>d</b>), and Mangwon (<b>e</b>). Each community is shown in different colors, and high-degree nodes are labeled so that we can infer the key topic of each community. We use similar colors in communities where many words are the same over time to visualize them more easily. For instance, food-related communities, which have been the major communities in recent years, are shown in red colors, while the communities of other topics are blue or green. Words in a group tend to co-occur more often than words in different groups.</p>
Full article ">
19 pages, 3476 KiB  
Article
A Method for Identifying Geospatial Data Sharing Websites by Combining Multi-Source Semantic Information and Machine Learning
by Quanying Cheng, Yunqiang Zhu, Hongyun Zeng, Jia Song, Shu Wang, Jinqu Zhang, Lang Qian and Yanmin Qi
Appl. Sci. 2021, 11(18), 8705; https://doi.org/10.3390/app11188705 - 18 Sep 2021
Cited by 6 | Viewed by 2221
Abstract
Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) [...] Read more.
Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) have been developed on the Internet. GDSW is a point of access to geospatial data, which is able to provide a geospatial data inventory. How to precisely identify these data websites is the foundation and prerequisite of sharing and utilizing web geospatial data and is also the main challenge of data sharing at this stage. GDSW identification can be regarded as a binary website classification problem, which can be solved by the current popular machine learning method. However, the websites obtained from the Internet contain a large number of blogs, companies, institutions, etc. If GDSW is directly used as the sample data of machine learning, it will greatly affect the classification precision. For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning. Firstly, based on the keyword set, we used the Baidu search engine to find the websites that may be related to geospatial data in the open web environment. Then, we used the multi-source semantic information of geospatial data content, morphology, sources, and shared websites to filter out a large number of websites that contained geospatial keywords but were not related to geospatial data in the search results through the calculation of comprehensive similarity. Finally, the filtered geospatial data websites were used as the sample data of machine learning, and the GDSWs were identified and evaluated. In this paper, training sets are extracted from the original search data and the data filtered by multi-source semantics, the two datasets are trained by machine learning classification algorithms (KNN, LR, RF, and SVM), and the same test datasets are predicted. The results show that: (1) compared with the four classification algorithms, the classification precision of RF and SVM on the original data is higher than that of the other two algorithms. (2) Taking the data filtered by multi-source semantic information as the sample data for machine learning, the precision of all classification algorithms has been greatly improved. The SVM algorithm has the highest precision among the four classification algorithms. (3) In order to verify the robustness of this method, different initial sample data mentioned above are selected for classification using the same method. The results show that, among the four classification algorithms, the classification precision of SVM is still the highest, which shows that the proposed method is robust and scalable. Therefore, taking the data filtered by multi-source semantic information as the sample data to train through machine learning can effectively improve the classification precision of GDSW, and comparing the four classification algorithms, SVM has the best classification effect. In addition, this method has good robustness, which is of great significance to promote and facilitate the sharing and utilization of open geospatial data. Full article
(This article belongs to the Special Issue Machine Learning Techniques Applied to Geospatial Big Data)
Show Figures

Figure 1

Figure 1
<p>GDSW identification method.</p>
Full article ">Figure 2
<p>Keyword-based GDSW acquisition and pre-processing process.</p>
Full article ">Figure 3
<p>The basic flow of website list generation.</p>
Full article ">Figure 4
<p>Flow chart of website filtering based on multi-source semantic information.</p>
Full article ">Figure 5
<p>Precision–recall curves of KNN, LR, RF, and SVM on sample data after being filtered by multi-source semantic information.</p>
Full article ">Figure 6
<p>ROC curves of KNN, LR, RF, and SVM on sample data after being filtered by multi-source semantic information.</p>
Full article ">Figure 7
<p>Precision–recall curves of KNN, LR, RF, and SVM on robust data after being filtered by multi-source semantic information.</p>
Full article ">Figure 8
<p>ROC curves of KNN, LR, RF, and SVM on robust data after being filtered by multi-source semantic information.</p>
Full article ">
14 pages, 454 KiB  
Article
Determination of Motivating Factors of Urban Forest Visitors through Latent Dirichlet Allocation Topic Modeling
by Doo-San Kim, Byeong-Cheol Lee and Kwang-Hi Park
Int. J. Environ. Res. Public Health 2021, 18(18), 9649; https://doi.org/10.3390/ijerph18189649 - 13 Sep 2021
Cited by 6 | Viewed by 2383
Abstract
Despite the unique characteristics of urban forests, the motivating factors of urban forest visitors have not been clearly differentiated from other types of the forest resource. This study aims to identify the motivating factors of urban forest visitors, using latent Dirichlet allocation (LDA) [...] Read more.
Despite the unique characteristics of urban forests, the motivating factors of urban forest visitors have not been clearly differentiated from other types of the forest resource. This study aims to identify the motivating factors of urban forest visitors, using latent Dirichlet allocation (LDA) topic modeling based on social big data. A total of 57,449 cases of social text data from social blogs containing the keyword “urban forest” were collected from Naver and Daum, the major search engines in South Korea. Then, 17,229 cases were excluded using morpheme analysis and stop word elimination; 40,110 cases were analyzed to identify the motivating factors of urban forest visitors through LDA topic modeling. Seven motivating factors—“Cafe-related Walk”, “Healing Trip”, “Daily Leisure”, “Family Trip”, “Wonderful View”, “Clean Space”, and “Exhibition and Photography”—were extracted; each contained five keywords. This study elucidates the role of forests as a place for healing, leisure, and daily exercise. The results suggest that efforts should be made toward developing various programs regarding the basic functionality of urban forests as a natural resource and a unique place to support a diversity of leisure and cultural activities. Full article
(This article belongs to the Special Issue Forest for Human Health and Welfare)
Show Figures

Figure 1

Figure 1
<p>LDA topic generation.</p>
Full article ">
15 pages, 2039 KiB  
Article
Perceptions Related to Nursing and Nursing Staff in Long-Term Care Settings during the COVID-19 Pandemic Era: Using Social Networking Service
by Juhhyun Shin, Sunok Jung, Hyeonyoung Park, Yaena Lee and Yukyeong Son
Int. J. Environ. Res. Public Health 2021, 18(14), 7398; https://doi.org/10.3390/ijerph18147398 - 11 Jul 2021
Cited by 3 | Viewed by 3059
Abstract
Purpose: The purpose of this study was to investigate what opinions and perceptions people have about nursing and the role of nursing staff in nursing homes (NHs) on Social Networking Service (SNS) by analyzing large-scale data through social big-data analysis. Methods: This study [...] Read more.
Purpose: The purpose of this study was to investigate what opinions and perceptions people have about nursing and the role of nursing staff in nursing homes (NHs) on Social Networking Service (SNS) by analyzing large-scale data through social big-data analysis. Methods: This study investigated changes in perception related to nursing and nursing staff in NHs during the COVID-19 pandemic era using target channels (blogs, cafes, Instagram, communities, Twitter, etc.). Data were collected on the channel from 12 September 2019 to 11 September 2020, 6 months before and after 12 March 2020 when the COVID-19 pandemic was declared. Selected keywords included “nursing,” “nurse,” and “nursing staff,” and included words were “long-term care settings,” “geriatric hospital,” and “nursing home.” Text mining, opinion mining, and social network analysis were conducted. Results: After the COVID-19 pandemic, the frequency of keywords increased about 1.5 times compared to before. In March 2020 when the COVID-19 pandemic was declared, the negative phrase “be infected” ranked number one, resulting in a sharp 8% rise in the percentage of negative words in that month. The related words that have risen in rank significantly, or were newly ranked in the Top 30 after the pandemic, were related with COVID-19. Conclusion: The public began to realize the role of nursing staff in the prevention and management of mass infection in NHs and the importance of nursing staff after the pandemic. Further studies should examine the perceptions of those who have received nursing services and include a wide range of foreign channels. Full article
(This article belongs to the Special Issue To Be Healthy for the Elderly: Long Term Care Issues around the World)
Show Figures

Figure 1

Figure 1
<p>Frequency by channel and week related to keyword.</p>
Full article ">Figure 2
<p>Emotional word rates before and after the COVID-19 pandemic.</p>
Full article ">Figure 3
<p>Emotional word rates by month.</p>
Full article ">Figure 4
<p>Keyword-related words before (<b>left</b>) and after (<b>right</b>) the COVID-19 pandemic.</p>
Full article ">Figure 5
<p>Word cloud for keyword-related words before (<b>left</b>) and after (<b>right</b>) the COVID-19 pandemic.</p>
Full article ">
15 pages, 921 KiB  
Perspective
From the Digital Data Revolution toward a Digital Society: Pervasiveness of Artificial Intelligence
by Frank Emmert-Streib
Mach. Learn. Knowl. Extr. 2021, 3(1), 284-298; https://doi.org/10.3390/make3010014 - 4 Mar 2021
Cited by 20 | Viewed by 6006
Abstract
Technological progress has led to powerful computers and communication technologies that penetrate nowadays all areas of science, industry and our private lives. As a consequence, all these areas are generating digital traces of data amounting to big data resources. This opens unprecedented opportunities [...] Read more.
Technological progress has led to powerful computers and communication technologies that penetrate nowadays all areas of science, industry and our private lives. As a consequence, all these areas are generating digital traces of data amounting to big data resources. This opens unprecedented opportunities but also challenges toward the analysis, management, interpretation and responsible usage of such data. In this paper, we discuss these developments and the fields that have been particularly effected by the digital revolution. Our discussion is AI-centered showing domain-specific prospects but also intricacies for the method development in artificial intelligence. For instance, we discuss recent breakthroughs in deep learning algorithms and artificial intelligence as well as advances in text mining and natural language processing, e.g., word-embedding methods that enable the processing of large amounts of text data from diverse sources such as governmental reports, blog entries in social media or clinical health records of patients. Furthermore, we discuss the necessity of further improving general artificial intelligence approaches and for utilizing advanced learning paradigms. This leads to arguments for the establishment of statistical artificial intelligence. Finally, we provide an outlook on important aspects of future challenges that are of crucial importance for the development of all fields, including ethical AI and the influence of bias on AI systems. As potential end-point of this development, we define digital society as the asymptotic limiting state of digital economy that emerges from fully connected information and communication technologies enabling the pervasiveness of AI. Overall, our discussion provides a perspective on the elaborate relatedness of digital data and AI systems. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

Figure 1
<p>A simplified view of the hierarchical organization of society. The digitalization of the shown fields progresses from the inside toward the outside. This leads eventually to a full penetration of society with artificial intelligence.</p>
Full article ">
17 pages, 350 KiB  
Article
The Use of Big Data and Its Effects in a Diffusion Forecasting Model for Korean Reverse Mortgage Subscribers
by Jinah Yang, Daiki Min and Jeenyoung Kim
Sustainability 2020, 12(3), 979; https://doi.org/10.3390/su12030979 - 29 Jan 2020
Cited by 3 | Viewed by 3063
Abstract
In recent years, big data has been widely used to understand consumers’ behavior and opinions. With this paper, we consider the use of big data and its effects in the problem of projecting the number of reverse mortgage subscribers in Korea. We analyzed [...] Read more.
In recent years, big data has been widely used to understand consumers’ behavior and opinions. With this paper, we consider the use of big data and its effects in the problem of projecting the number of reverse mortgage subscribers in Korea. We analyzed web-news, blog post, and search traffic volumes associated with Korean reverse mortgages and integrated them into a Generalized Bass Model (GBM) as a part of the exogenous variables representing marketing effort. We particularly consider web-news volume as a proxy for marketer-generated content (MGC) and blog post and search traffic volumes as proxies for user-generated content (UGC). Empirical analysis provides some interesting findings: First, the GBM by incorporating big data is helpful for forecasting the sales of Korean reverse mortgages, and second, the UGC as an exogenous variable is more useful for predicting sales volume than the MGC. The UGC can explain consumers’ interest relatively well. Additional sensitivity analysis supports that the UGC is important for increasing sales volume. Finally, prediction performance is different between blog posts and search traffic volumes. Full article
(This article belongs to the Special Issue Social Media Influence on Consumer Behaviour)
Show Figures

Figure 1

Figure 1
<p>Trends in sales volume and big data.</p>
Full article ">Figure 2
<p>Test results: Baseline scenario and Test scenario 1.</p>
Full article ">Figure 3
<p>Test results: Baseline scenario and Test scenario 3.</p>
Full article ">Figure 4
<p>Test results: Baseline scenario and Test scenario 2.</p>
Full article ">Figure 5
<p>Test results: Test scenarios 4 and 5 (PS4, OS4, PS5, OS5).</p>
Full article ">
34 pages, 637 KiB  
Article
Text Mining in Big Data Analytics
by Hossein Hassani, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani and Mohammad Reza Yeganegi
Big Data Cogn. Comput. 2020, 4(1), 1; https://doi.org/10.3390/bdcc4010001 - 16 Jan 2020
Cited by 161 | Viewed by 27476
Abstract
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine [...] Read more.
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Figure 1
<p>The methods and application discussed in this study.</p>
Full article ">
17 pages, 2534 KiB  
Article
Social Big-Data Analysis of Particulate Matter, Health, and Society
by Juyoung Song and Tae Min Song
Int. J. Environ. Res. Public Health 2019, 16(19), 3607; https://doi.org/10.3390/ijerph16193607 - 26 Sep 2019
Cited by 9 | Viewed by 2732
Abstract
The study collected particulate matter (PM)-related documents in Korea and classified main keywords related to particulate matter, health, and social problems using text and opinion mining. The study attempted to present a prediction model for important causes related to particulate matter by using [...] Read more.
The study collected particulate matter (PM)-related documents in Korea and classified main keywords related to particulate matter, health, and social problems using text and opinion mining. The study attempted to present a prediction model for important causes related to particulate matter by using social big-data analysis. Topics related to particulate matter were collected from online (online news sites, blogs, cafés, social network services, and bulletin boards) from 1 January 2015, to 31 May 2016, and 226,977 text documents were included in the analysis. The present study applied machine-learning analysis technique to forecast the risk of particulate matter. Emotions related to particulate matter were found to be 65.4% negative, 7.7% neutral, and 27.0% positive. Intelligent services that can detect early and prevent unknown crisis situations of particulate matter may be possible if risk factors of particulate matter are predicted through the linkage of the machine-learning prediction model. Full article
(This article belongs to the Section Environmental Health)
Show Figures

Figure 1

Figure 1
<p>Random Forest Model of Cause-and-Disease Factor.</p>
Full article ">Figure 2
<p>Decision-Tree Model of Cause-and-Disease Factor.</p>
Full article ">Figure 3
<p>Random Forest Model of Cause-and-Disease Factor.</p>
Full article ">Figure 4
<p>Particulate-Matter Cause-and-Disease Risk Multilayer Neural Network Prediction Model.</p>
Full article ">Figure 5
<p>Disease Multilayer Neural Network Prediction Model for Causes of Particulate Matter.</p>
Full article ">Figure 6
<p>Performance of Machine Learning for Predicting Disease Causes of Particulate Matter.</p>
Full article ">Figure 6 Cont.
<p>Performance of Machine Learning for Predicting Disease Causes of Particulate Matter.</p>
Full article ">
19 pages, 968 KiB  
Article
Managing Marketing Decision-Making with Sentiment Analysis: An Evaluation of the Main Product Features Using Text Data Mining
by Erick Kauffmann, Jesús Peral, David Gil, Antonio Ferrández, Ricardo Sellers and Higinio Mora
Sustainability 2019, 11(15), 4235; https://doi.org/10.3390/su11154235 - 5 Aug 2019
Cited by 65 | Viewed by 13646
Abstract
Companies have realized the importance of “big data” in creating a sustainable competitive advantage, and user-generated content (UGC) represents one of big data’s most important sources. From blogs to social media and online reviews, consumers generate a huge amount of brand-related information that [...] Read more.
Companies have realized the importance of “big data” in creating a sustainable competitive advantage, and user-generated content (UGC) represents one of big data’s most important sources. From blogs to social media and online reviews, consumers generate a huge amount of brand-related information that has a decisive potential business value for marketing purposes. Particularly, we focus on online reviews that could have an influence on brand image and positioning. Within this context, and using the usual quantitative star score ratings, a recent stream of research has employed sentiment analysis (SA) tools to examine the textual content of reviews and categorize buyer opinions. Although many SA tools split comments into negative or positive, a review can contain phrases with different polarities because the user can have different sentiments about each feature of the product. Finding the polarity of each feature can be interesting for product managers and brand management. In this paper, we present a general framework that uses natural language processing (NLP) techniques, including sentiment analysis, text data mining, and clustering techniques, to obtain new scores based on consumer sentiments for different product features. The main contribution of our proposal is the combination of price and the aforementioned scores to define a new global score for the product, which allows us to obtain a ranking according to product features. Furthermore, the products can be classified according to their positive, neutral, or negative features (visualized on dashboards), helping consumers with their sustainable purchasing behavior. We proved the validity of our approach in a case study using big data extracted from Amazon online reviews (specifically cell phones), obtaining satisfactory and promising results. After the experimentation, we could conclude that our work is able to improve recommender systems by using positive, neutral, and negative customer opinions and by classifying customers based on their comments. Full article
Show Figures

Figure 1

Figure 1
<p>The proposed architecture using sentiment analysis (SA) and text data mining to identify the main positive/negative product features.</p>
Full article ">Figure 2
<p>Classification of reviews and products by star score and review sentiment score (RSS).</p>
Full article ">Figure 3
<p>Comparative score graph: RSS versus feature sentiment score (FSS) of the products.</p>
Full article ">Figure 4
<p>Positive (left) and negative (right) word clouds (features) in the dashboard.</p>
Full article ">
13 pages, 1031 KiB  
Article
Estimation of Economic Indicator Announced by Government From Social Big Data
by Kenta Yamada, Hideki Takayasu and Misako Takayasu
Entropy 2018, 20(11), 852; https://doi.org/10.3390/e20110852 - 6 Nov 2018
Cited by 6 | Viewed by 3935
Abstract
We introduce a systematic method to estimate an economic indicator from the Japanese government by analyzing big Japanese blog data. Explanatory variables are monthly word frequencies. We adopt 1352 words in the section of economics and industry of the Nikkei thesaurus for each [...] Read more.
We introduce a systematic method to estimate an economic indicator from the Japanese government by analyzing big Japanese blog data. Explanatory variables are monthly word frequencies. We adopt 1352 words in the section of economics and industry of the Nikkei thesaurus for each candidate word to illustrate the economic index. From this large volume of words, our method automatically selects the words which have strong correlation with the economic indicator and resolves some difficulties in statistics such as the spurious correlation and overfitting. As a result, our model reasonably illustrates the real economy index. The announcement of an economic index from government usually has a time lag, while our proposed method can be real time. Full article
(This article belongs to the Special Issue Economic Fitness and Complexity)
Show Figures

Figure 1

Figure 1
<p>Flowchart of estimating an economic index from many word frequencies based on comprehensive search.</p>
Full article ">Figure 2
<p>The upper chart shows the Composite Index (CI) coincidence and the lower chart is monthly frequency of “hukeiki” (recession) from January 2007 to December 2014. Broken lines represent the median.</p>
Full article ">Figure 3
<p>Two typical examples of grouped words: (<b>a</b>) recession and economic recovery policy; and (<b>b</b>) foreign banks and capital market.</p>
Full article ">Figure 4
<p>The relationships between variables: (i) economic index and words; and (ii) representative words and subordinate words as a result of analyses: extraction of words by one-body correlation, grouping and round robin (detection of spurious correlation).</p>
Full article ">Figure 5
<p>Regression results using the following word frequencies: (<b>a</b>) recession (one word), (<b>b</b>) recession, foreign banks and primary balance (three words), and (<b>c</b>) recession, foreign banks, primary balance, allowance, medical expenses, long-term prime rate and home sales (seven words). Training period was from January 2007 to December 2014 (green line) and test period was from January 2015 to October 2015 (blue line).</p>
Full article ">Figure 6
<p>Regression results of CI by seven random walks. In the case of (<b>a</b>), <math display="inline"> <semantics> <mrow> <msup> <mi>R</mi> <mn>2</mn> </msup> <mo>=</mo> <mn>0.35</mn> </mrow> </semantics> </math>, and in the case of (<b>b</b>), <math display="inline"> <semantics> <mrow> <msup> <mi>R</mi> <mn>2</mn> </msup> <mo>=</mo> <mn>0.83</mn> </mrow> </semantics> </math>.</p>
Full article ">Figure 7
<p>Probability density function of <math display="inline"> <semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics> </math> for 10,000 samples by the regression of seven random walks. The red line shows the result in <a href="#entropy-20-00852-f005" class="html-fig">Figure 5</a>c.</p>
Full article ">Figure 8
<p>The relationship between the coefficient of determination (<math display="inline"> <semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics> </math>) and the number of explanatory variables for the proposed model (black) and random walk model with Top 5 and 10 percentiles (red and blue).</p>
Full article ">Figure 9
<p>(<b>a</b>) Daily index, <math display="inline"> <semantics> <mrow> <mi>I</mi> <mo>(</mo> <msub> <mi>t</mi> <mrow> <mi>day</mi> </mrow> </msub> <mo>)</mo> </mrow> </semantics> </math>, calculated by Equation (<a href="#FD25-entropy-20-00852" class="html-disp-formula">25</a>) and optimal moving average defined by Equation (<a href="#FD26-entropy-20-00852" class="html-disp-formula">26</a>); and (<b>b</b>) estimated optimal weight <math display="inline"> <semantics> <msub> <mi>w</mi> <mi>k</mi> </msub> </semantics> </math> in Equation (<a href="#FD26-entropy-20-00852" class="html-disp-formula">26</a>).</p>
Full article ">
15 pages, 2310 KiB  
Article
Research on Big Data Digging of Hot Topics about Recycled Water Use on Micro-Blog Based on Particle Swarm Optimization
by Hanliang Fu, Zhaoxing Li, Zhijian Liu and Zelin Wang
Sustainability 2018, 10(7), 2488; https://doi.org/10.3390/su10072488 - 16 Jul 2018
Cited by 107 | Viewed by 5787
Abstract
The public’s acceptance level of recycled water use is a key factor that affects the popularization of this technology; therefore, it is critical to know the public’s attitude in order to make guiding policies effectively and scientifically. To examine the major focuses and [...] Read more.
The public’s acceptance level of recycled water use is a key factor that affects the popularization of this technology; therefore, it is critical to know the public’s attitude in order to make guiding policies effectively and scientifically. To examine the major focuses and hot topics among the public about recycled water use, one of the major platforms for social opinion in China, the micro blog, is used as a source to obtain data related to the topic. Through the “follow-be followed” and “forward-dialogue” behaviors, a network of discussion of recycled water use among micro-blog users has been constructed. Improved particle swarm optimization has been used to allow deep digging for key words. Ultimately, key words about the topic of have been clustered into three categories, namely, the popularization status of recycled water use, the main application, and the public’s attitude. The conclusion accurately describes the concerns of Chinese citizens regarding recycled water use, and has important significance for the popularization of this technology. Full article
Show Figures

Figure 1

Figure 1
<p>Parallel optimization model of algorithm.</p>
Full article ">Figure 2
<p>Graphical illustration of the particle representation scheme unique interest.</p>
Full article ">Figure 3
<p>A graphical illustration of the operations of the update rules.</p>
Full article ">Figure 4
<p>Detected communities structure of the complex network regarding to the topic of recycled water use on micro-blog.</p>
Full article ">Figure 5
<p>Average <span class="html-italic">NMI</span> values for the complex network regarding to the topic of recycled water use on micro-blog.</p>
Full article ">
Back to TopTop