Keywords

1 Introduction

Digital learning-support environments such as learning management systems (LMS), electronic portfolio systems, and digital textbooks are increasingly used in various schools and online courses [45]. They can facilitate learning activities in and outside classrooms, and their exhaust data can be used to perform learning analytics, which is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. Such a data-centric approach also allows for a potential use of early prediction about learners [17].

Although digital learning environments can facilitate and enrich learning experiences as exemplified in various learning analytics projects in countries including Japan, it is very difficult for people without Internet connectivity to use such environments [28]. Older adults who do not use the Internet cannot access digital learning environments easily. The same is true for people living without reliable and affordable Internet infrastructure in developing regions including some Tanzanian communities. In rural marginal villages with most inhabitants being senior citizens, it may be extremely difficult for them to find someone who can help solve their technical problems with digital technologies and the Internet. In addition, people without access to digital learning environments would not be able to get evidence-based feedback from a system or an instructor due to the lack of required data for learning analytics. Because of these issues, the advances in digital learning-support technologies arguably widen the educational gap between the connected and the not connected.

Researchers have proposed technical solutions to deliver digital learning contents for teachers and learners in developing regions. However, existing proposals without intelligent and efficient data collection mechanisms cannot easily support evidence-based pedagogical approaches involving learning analytics. In this paper, we discuss a novel learning-support platform that integrates delay-tolerant networking (DTN) mechanisms and model-driven crowdsensing techniques to deliver learning materials and collect educational data (see Fig. 1). Staff members or volunteers at each regional facility serve as crowdsensing workers who collect educational data from learners by using a web form, a camera, etc. on their mobile devices.

Fig. 1.
figure 1

Overview of the proposed platform. The illustration shows a school that has a main campus and three regional facilities. The main campus is connected to one of the regional facilities via the Internet. The other two regional facilities are not connected to the Internet. Learning contents and educational data are transmitted to and from these two regional facilities via delay-tolerant networking involving vehicles and pedestrians as “slow transporters of data.”

The platform we propose could support various offline learners including people in developing communities and older adults. We thus look into the case of the super-aged society of Japan as well as the case of developing communities in Tanzania. Our platform employs active learning to collect useful educational data in an intelligent and efficient manner. The data are shared across different regional facilities and the main campus via DTN, thereby enabling a variety of analysis and visualization. Instructors, students, and other stakeholders can thus explore actionable insights for improving teaching and learning based on data.

2 Technology-Enhanced Learning: Current Status, Opportunities and Challenges

We next discuss current status, opportunities and challenges of technology-enhanced learning in two different countries, focusing on critical sociotechnical issues around ICT infrastructures.

The technology we design could support various kinds of offline learners including people living in developing communities as well as older adults in rural areas. We thus look into the case of the super-aged society of Japan as well as the case of developing communities in Tanzania.

2.1 Technology-Enhanced Learning for Older Adults in Japan

Current Status and Opportunities

Technology-enhanced lifelong and recurrent learning are increasingly perceived as important since they can help older adults learn knowledge and skills that can improve their quality of life. Indeed, many Japanese older adults in their 60’s are interested in lifelong learning. Despite such interest, the reasons for not doing lifelong learning activities include not only busyness and cost but also lack of relevant opportunities, facilities, places, information, and friends according to the public opinion poll in Japan [7]. Internet-based digital learning environments could reduce these perceived obstacles substantially.

It is a relatively new idea to use learning analytics to understand and support older learners in lifelong learning scenarios [22]. Analyzing activities of older learners can be useful for improving learning materials and tools as well as providing relevant feedback to learners, instructors, and other stakeholders.

Challenges

Older and economically deprived people in rural areas cannot easily use Internet-based digital learning environments. Only about 68%, 47%, and 20% of people in their 60’s, 70’s, and 80’s use the Internet, respectively, according to the survey conducted by the Japanese government in 2017. Other statistics show that the percentage of individuals using the Internet are much smaller for low income families and in rural areas.

Although digital learning environments and learning analytics can open up unexplored opportunities for older adults to improve their quality of life, lack of Internet connectivity for certain relevant population as well as the scarcity of “learning log” data about older learners make it difficult to pave the way for the future in which all is benefitted from digital learning technologies.

2.2 Technology-Enhanced Learning in Tanzania

Current Status and Opportunities

The need for deploying technology in improving the quality of education has been a prioritized agenda in both governmental and institutional levels in Tanzania. The government recognizes that quality education is fundamental for attainment of national sustainable development and it has been making substantial efforts in making sure that the target is realized. One way of supporting this was the formulation of National ICT Policy (United Republic of Tanzania, 2016) which (through its objectives), clearly stipulates the intent to use ICT to improve the quality of education delivery in all fields.

E-learning is a permanent agenda in the country and in response to this, the education sector, from primary to tertiary level, has been embarking on finding avenues of integrating ICT in teaching and learning activities. In the case of higher education sector, universities have put in place ICT infrastructure, corresponding mediums and developed virtual learning environments to enhance learning experiences and access to content. The rise of mobile technology is also a significant contributor to the demands pertaining to e-learning development.

The developments in e-learning in Tanzania have so far brought a plethora of advantages including improved access to learning materials, self-paced and flexible learning, reduced educational costs and improved teaching and learning practices. The current status of e-learning in Tanzanian higher education sector is in its fundamental stages with ample room for potential development. There has been an increasing deployment of virtual learning environments, to engage in online teaching and learning at both distance and on-campus institutions. Universities are acquiring learning management systems and invest in supplementing technologies such as mobile learning to ensure students get access to learning materials. However, further developments such as learning analytics and game-based learning are under exploited.

Information service in Tanzania is operated from the National Fibre Optic Cable Network named the National ICT broadband Backbone (NICTBB); and two submarine cables namely Eastern Africa Submarine Cable System (EASSY) and Southern and Eastern Africa Communication Network (SEACOM). The broadband provides leased connection to Mobile network operators, internet service providers, local television and radio stations and data service providers [31].

Challenges

Despite the policies and initiatives to establish the ICT infrastructure, the e-learning sector in Tanzania is still faced with problems: As in other African countries, the key challenges facing e-learning implementation include poor ICT infrastructure, lack of facilities and lack of internet connectivity. According to World Bank, by 2017 only 16% of total Tanzanian population has access to the internet as compared to Japan which is at 91%. There is a huge gap in terms of urban and rural access to ICT Services. In rural areas problems such as lack of reliable electricity and limited access to computers intensify the issues exponentially. For the case of the Open University of Tanzania which provides service through 30 regional centers across the country, this problem has been evident. Most students from remote areas do not have access to the internet and hence cannot access resources in Moodle [30]. In this regard, students have been supported by interactive CDs that are equipped with learning materials. Furthermore, lack of skilled human resources, technical skills and motivation amongst academic staff are identified as challenges as well.

3 Related Works

3.1 Technologies for Supporting Education in Developing Regions and Rural Areas

Researchers have proposed technologies for supporting education in developing regions. Brewer et al. discussed applications of intermittent, delay-tolerant networking (DTN) technologies for the education of children in developing regions. The applications they propose include a local content repository where students and teachers can store and retrieve digital stories, games, and other digital content that they create by using easy-to-use authoring tools. This application requires only intermittent networking for each school. Parikh and Lazowska discuss mobile applications that suit requirements of rural regions. In particular, they propose a framework called CAM, which support paper-based navigation and offline interaction using barcodes and mobile cameras [34]. They describe several applications of CAM, ranging from microfinance and healthcare to a type of local knowledge repository. The Digital Study Hall system [50] exploits intermittent networking technology, thin-client displays, and educational content repository to support usage scenarios such as lecture capture and replay, homework collection and feedback, and question-answer sessions in resource-starved village schools in rural India. However, these proposals do not support the collection of learning activity data, which makes it difficult to enable evidence-based pedagogical approaches in developing regions.

The rise of mobile technology in developing regions creates opportunities to develop novel e-learning environments. Mobile phones allow for bidirectional communication with learners, and thus we can collect data from learners by using interactive mechanisms such as mobile crowdsourcing. Several projects have deployed mobile crowdsoucing environments in developing regions. For example, Singh et al. developed an SMS-based prototype that provides real-time notification and information gathering capabilities [47]. Gupta et al. developed a mobile crowdsourcing platform that exploits SMS, making it accessible from a low-end mobile phone [13]. Their platform allows participation by people who do not own high-end mobile phones, thereby offering employment opportunities to low-income workers. There are many smartphone users in developing regions. Open-source tools such as Open Data Kit (ODK) [5] enables flexible and powerful data collection using web forms, cameras, microphones, sensors, GPS, and databases. ODK allows for an asynchronous means of data transfer, thereby supporting pervasive crowdsensing regardless of the availability of Internet connectivity. These mobile crowdsourcing technologies could be extended and integrated with digital learning environments. However, existing systems often incur too much burden on crowdworkers without intelligent mechanisms to minimize human labor, which makes it difficult to realize a successful crowd-powered evidence-based digital learning environments.

3.2 The Uses of Educational Data and Machine Learning for the Improvement of Learning

In recent years, the advancement of the machine learning technology has made large impacts in many fields, and some researchers have applied it in education to extract hidden information behind the learning behavior. For example, behavioral data of studetns can be used for dropout rates prediction, performance prediction, and course recommendation, etc. The historical data of students can be collected in a certain dimension, and we can predict dropout rates so that teachers can assist students in time, or predict students’ performance to identify their weaknesses and suggest methods for improvements.

Again, teachers can promptly take measures to help students who have a high possibility of dropout continue their studies for a better future if we can predict student’s dropout risk. Nicolae-Bogdan combined the data from the MaCom Lectio system and public online dataset as the datasets and applied machine learning methods in these datasets to predict dropout risks [39]. Random forest-based prediction achieves an accuracy of 93.47% in their experiments, showing that machine learning can effectively predict the high-school student’s risk of dropout. Rovira et al. have also demonstrated the power of machine learning techniques in dropout prediction [36]. Sansone has shown that teachers can get more information from student’s high-dimensional data and could use machine learning algorithms to effectively predict the risk of dropout of high school [38]. Dekker et al. have found techniques to improve the prediction of dropout without additional data about the students [9]. Lee and Chung used synthetic minority oversampling techniques (SMOTE) and ensemble approaches to get balanced samples and then evaluated the performance of random forest, boosted decision tree, random forest with SMOTE, and boosted decision tree with SMOTE, indicating that boosted decision tree is optimal [24]. Santana et al. used four classifiers to predict the dropout risk of online education students, which Tan and Shao also proved, and the results show that the SVM algorithm is optimal [37, 48]. Mduma et al. review the prediction of dropout method and suggested that, in developing countries, the prediction of dropout should also take school level factors into consideration [29].

There have been many studies on the prediction of students’s performance. Sekeroglu et al. utilize two data sets to predict and classify the performance of students, respectively [42]. They use five machine learning techniques and their preliminary experimental results show that student’s performance can be predicted and classified. Similarly, Pojon used logistic regression, decision tree, and Naive Bayes to predict the performance of students using two public datasets, suggesting that machine learning can effectively predict student’s performance [35]. Belachew applied neural net, naïve Bayesian and SVM to the collected transcript data (i.e., final GPA and grades in all courses) of students at Wolkite University, and suggested that Naïve Bayesian have higher performance [2]. Khan et al. used a decision tree algorithm to predict the final score of each student in a programming course [21]. Elbadrawy applied a linear regression model in student performance prediction [10]. Qazdar et al. proposed two models based on machine learning method at H.E.K high school in Morocco to predict student performance in the next semester and the national exam results [32]. Zohair has proved that student’s performance can be predicted with a small dataset, and also shown the efficiency of SVM when the size of a dataset is small [1]. Kaur et al. used a machine learning approach or machine learning classification method to predict student academic performance [20]. Livieris et al. used two semi-supervised learning approaches to predict student’s performance and suggested that semi-supervised approaches can significantly improve the classification accuracy in the final examinations [26]. Kalles applied machine learning algorithms to predict the performance in the final exams of the student in distance learning [18].

Despite the developments in modern learning-support techniques such as predictive analytics, little work has been done on the integration of such techniques with the technological and social contexts in developing regions and rural areas.

4 Crowdsensing Educational Data

Educational data about learners, teachers, learning contents and contextual factors can be used to visualize, analyze and predict patterns of educational successes and failures. Systems can provide resources for reflecting on the past, present and future of educational environments to induce actionable insights. They can also recommend relevant information or provide pertinent advice for learners and teachers automatically or with some help from human experts.

Employing visualization, prediction, and recommendation techniques based on students’ activity log data can help reduce dropouts as well as improve students’ performance and satisfaction. For instance, effective intervention based on learning analytics can cut dropout rates, change students’ behaviors, and improve students’ performance [40].

Our experiences with learning analytics in university environments [33] suggest some of the attributes that can be used to analyze and predict dropouts and academic performance:

  • Demographic information

  • Attendance

  • Scores of quizzes

  • Submissions of assignments

  • Access logs of learning materials

  • Learning journals

  • Responses to surveys (e.g., course evaluation surveys)

  • Grades

Educational data can be collected at different places and at different times as learners may engage in self-paced learning as well as classroom-based learning. Thus, we can also consider contextual attributes such as location and time.

These types of information can be collected easily if all students use personal computers, high-speed Internet, and digital learning environments. In learning environments where students may use analog and/or offline media such as standalone personal computers, mobile phones without broadband networking, and even sheets of paper, the above types of information could be collected if we can leverage the power of human computation effectively. Instructors, assistants and volunteers can use their mobile devices to digitize the information on analog and/or offline media by using web forms and mobile cameras, and inject the digitized data into a delay-tolerant communication infrastructure. This can be understood as a type of crowdsensing that involves instructors, assistants and volunteers as crowdworkers. Apparently, making this approach successful requires minimization of the workload of crowdworkers.

5 Minimizing the Workload for Data Collection

To collect educational data successfully based on crowdsensing, it is critical to minimize the workload of crowdworkers. In this section, we propose an approach to urge crowdworkers to collect data from appropriate samples (i.e., learners), which can minimize the collective workload of all crowdworkers.

5.1 Model-Driven Crowdsensing

To minimize the workload, we exploit the general idea to reduce the costs of data collection, including human labor and mobile battery consumption, by optimizing collective behaviors of crowdworkers based on a model of the target environment. In our previous project called cooperative Human Probes [49], we have introduced a mechanism that minimizes battery consumption of mobile phones in urban crowdsensing scenarios by reducing sensing frequency in a cooperative manner. In addition, we proposed a crowd replication technique [14] that allows a small number of volunteers to collect human activity data efficiently based on a sampling strategy that is aware of an existing model of the target space. The model triggers dynamic instructions on crowdwoekers’ smartphones, which can drive the behaviors of crowdworkers. Crowd replication can be considered as a type of mode-driven crowd sensing, which relies on cluster sampling where the physical entry points of the space are interpreted as clusters and random sampling is applied within each entry point to minimize bias to the distribution of user demographics. This approach can mitigate possible biases that are often difficult to avoid and understand with crowd sensing methods while minimizing crowdworkers’ workload. In order to employ a similar approach for educational data, we devise an intelligent sampling mechanism based on active learning.

5.2 Active Learning

Active learning is a modern method in machine learning, aiming to reduce the sample size (namely the dataset), complexity, and increase the accuracy of the data tasks as much as possible with less data. The key hypothesis of active learning is that the learning mechanisms will be more intelligent if the learning algorithm can actively choose the most significant unlabeled data. An active learner will query only a small number of valuable unlabeled instances to be labeled by an oracle (or annotator) to automatically enlarge the labelled dataset, in an intelligent manner [43].

There are three main scenarios that have been studied of active learning, membership query synthesis, stream-based selective sampling and pool-based sampling [19, 43, 52]:

  • In query synthesis, any unlabeled instance can be queried by an active learner, including the model-generated although it may have no practical meaning and cannot be labeled by human annotator. While the other scenarios do not have this problem that cannot be labelled, because the learner must query the instances of what it thinks important from the actual input pool.

  • In stream-based selective sampling, the unlabeled instances will be query sequentially by the learner [25]. And the learner will decide whether the instance be annotated or not.

  • In pool-based sampling, a large number of unlabeled instances are assumed to be available. In this kind of scenario, the learner should rank the entire unlabeled instances according to an informativeness measure, that is the pool of unlabeled instances, and then, query the most informative one [25]. The main difference between pool-based sampling and stream-based selective sampling is that the former should evaluate all unlabeled data before select query, while the latter just query the instance in sequence [44].

The measure of informativeness evaluation is vital in all active scenarios, and can be parted into four groups [3, 19, 46]:

  • Uncertainty selection, which would query the most uncertain instance on the prediction of the current model.

  • Query by committee, which query the most disagreeing instance of the committees’ prediction. Each committee member is a different model based on the current training set.

  • Expected objective change, which query the instance that could make the maximizing impact on the objective. For example, maximizing model change, maximizing the generalization error reduction, maximizing the output variance reduction.

  • Data-centered method, which query the most representative of the most informative instance.

5.3 Active Learning-Based Crowdsensing

It can be seen that active learning and crowdsourcing are the key technologies for optimizing data collection and processing, and many researchers have conducted a lot of studies on the ways to integrate them [12]. Although existing studies have certain achievements on classification tasks, regression-based prediction tasks are rarely considered. In addition, there are fewer studies on crowdsourced data acquisition tasks than labelling tasks.

Lease suggest that crowdsourcing, with active learning, may provide new insights for better focusing annotation effort on the examples that will be most informative to the learner to accelerate model training, as well as reduce cost of annotation vs. traditional annotators [23]. Costa et al. propose two methods of combining crowdsourcing and active learning. The two methods were tested with Jester data set, a text humor classification benchmark, and the result shows promising improvements [8].

Crowdsourcing systems sometimes assign a task to multiple crowdworkers to control the quality of accomplished tasks. Techniques to deal with such redundancy can impact the collective workload of crowdworkers.

In the crowdsourcing scenario of active learning, many researches are conducted on the multi-annotator. Because of the carelessness and knowledge limitation in the target domain, the annotations generated by multiple crowdworkers can be noisy. In spite of the noisiness, Hsueh et al. have demonstrated that its performance is better than a single non-expert annotator for modeling [16]. Therefore, in active learning-based crowdsourcing, many strategies to estimate and select the most suitable annotator as the oracle without experts were studied by researchers and many noise-robust schemes with unknown ground truth, which could iteratively evaluate the label quality and chose the most informative instance to query, were proposed [11, 27, 51, 53].

Again, conventional approaches rarely consider regression-based prediction tasks and data acquisition, which are important to minimize the workload for collecting and predicting educational information. We next describe our active learning-based sampling strategy for collecting and predicting educational information.

5.4 Sampling Strategy for Collecting and Predicting Educational Information

For the prediction of student performance, we can easily obtain plenty of students’ learning data through various Internet-based methods in developed countries. However, in developing regions and rural areas without reliable network connections, such data could not be collected easily. As we discusses earlier, interactive CDs are mailed physically to learners across Tanzania without a systematic environment for collecting educational data from learners in remote areas. Collecting data from remote learners would enable prediction of their performance and thereby providing appropriate learning contents and feedback.

Distance learning students are located in different regions (the Open University of Tanzania provides learning service in 30 regional centers across the country). Crowdworkers including instructors, assistants and volunteers may use a mobile data collection tool to capture students’ learning log data, however, it is prohibitively costly to collect data from all students. When we opt for collecting samples rather than entire population, a question is which samples to select for the quality of educational data and the effectiveness of relevant performance prediction.

Active learning can find the most informative data to improve the performance of prediction model. Therefore, we apply active learning in student’s data collection in developing regions and rural areas to solve the problems raised above. We assume that a certain amount of data has been collected and a preliminary model of student performance prediction has been generated. The students whose information will effectively improve the performance of the model are \( s^{ * } \). We propose the process of finding \( s^{ * } \) as follows:

  1. 1.

    The existing dataset \( {\mathcal{D}} \) is clustered, and the \( { \mathcal{C}} = \left\{ {c_{1} ,c_{2} \ldots ,c_{n} } \right\} \) are the result. Let \( S_{{c_{i} }} \left( {i \in \left[ {1,n} \right]} \right) \) be the student who produce the data of category \( c_{i} \).

  2. 2.

    \( {\mathcal{D}}^{{c_{i} }} \) represents the dataset of category \( c_{i} \). k samples extracted according to the data density from each \( {\mathcal{D}}^{{c_{i} }} \) marked \( {\mathcal{D}}_{j}^{{c_{i} }} \left( {j \in \left[ {1,k} \right]} \right) \). Gaussian noise is added to each \( {\mathcal{D}}_{j}^{{c_{i} }} \), \( {\mathcal{D}}_{j}^{{c_{i} {\prime }}} = {\mathcal{D}}_{j}^{{c_{i} }} + \lambda_{j}^{{c_{i} }} \), and make up the new dataset \( {\mathcal{D}}^{{c_{i} {\prime }}} \).

  3. 3.

    The new dataset \( {\mathcal{D}}^{{c_{i} {\prime }}} \) will be added to \( {\mathcal{D}} \) to form a new dataset \( {\mathcal{D}\bigcup \mathcal{D}}^{{c_{i} {\prime }}} \), and new prediction models will be re-trained in \( {\mathcal{D}\bigcup \mathcal{D}}^{{c_{i} {\prime }}} \).

  4. 4.

    Compare the models with the original model, and we can find the class \( c^{*} \) that has the most influence on the model. We then obtain \( s^{*} = S_{{c^{*} }} \).

6 Preliminary Data Collection Form

Figure 2 shows our preliminary data collection form that exploits Open Data Kit [5] to upload and visualize educational data on a cloud server. It records contextual information (location and time) as well as student’s demographic information (ID, gender, age) and feedback (perceived difficulty and satisfaction, and comments). It can also capture student’s handwritten learning journal entries. All of the data can be captured offline without requiring internet connectivity. The captured data will be uploaded to a server when the device becomes online.

Fig. 2.
figure 2

Preliminary data collection form that exploits Open Data Kit to upload and visualize educational data on a cloud server.

7 Conclusion and Future Work

Despite the continuous growth of global Internet users, almost 4 billion people do not use the Internet. In this context, we have discussed a learning-support platform for learners without an easy, reliable, and affordable means to access digital learning environments on the Internet.

Our proposed platform integrates delay-tolerant networking (DTN) mechanisms and active learning-based model-driven crowdsensing techniques to deliver learning materials and collect educational data efficiently.

Our future work includes the formative evaluation of the platform, including its preliminary mechanisms and tools, based on a user-centered approach involving stakeholders in Tanzania and Japan. We also intend to consider mechanisms for remedying cold start problems in predictive analytics.