The Thirty-Third International
FLAIRS Conference (FLAIRS-33)
Detecting Trait versus Performance Student Behavioral Patterns
Using Discriminative Non-Negative Matrix Factorization
Mehrdad Mirzaei,1 Shaghayegh Sahebi,1 Peter Brusilovsky2
1
2
Department of Computer Science, University at Albany - SUNY, Albany, New York 12203
School of Computing and Information, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
{mmirzaei, ssahebi}@albany.edu, peterb@pitt.edu
Abstract
may expect to see an association between these inefficient
learning behaviors and low performance in students. However, the same studies showed that using all behavioral patterns, one cannot easily separate high- and low-performing
students. Studying stability of these patterns during the time,
suggested that many of them are representative of student
behavioral traits, rather than student performance. Specifically, both high- and low-learners may demonstrate some
inefficient behavioral patterns in their sequences.
In this context, a natural question is if we can differentiate between the trait behavioral patterns and the performance
behavioral patterns. In other words, which of the behavioral
patterns are associated with student behavioral traits, and
which are indicators of students’ high or low performance?
Answering these questions will help to better detect inefficiencies in students’ sequences while interacting with online
learning systems, and guide them towards a more productive
learning behavior.
In this work, we mine the trait versus performance behavioral patterns in students by summarizing student sequences
as frequent micro-pattern vectors, grouping the students according to their performance, and discovering the latent
factors that represent each group using discriminative nonnegative matrix factorization. We experiment on a real-word
dataset of sequences from students interacting with an online programming tutoring system, with two different learning material types: problems, and worked-examples. Our experiments show the discriminative power of our method between different types of behavioral patterns. Also, by clustering these patterns according to their discovered latent factors, we reveal interesting associations between them.
Recent studies have shown that students follow stable behavioral patterns while learning in online educational systems.
These behavioral patterns can further be used to group the
students into different clusters. However, as these clusters include both high- and low-performance students, the relation
between the behavioral patterns and student performance is
yet to be clarified. In this work, we study the relationship between students’ learning behaviors and their performance, in
a self-organized online learning system that allows them to
freely practice with various problems and worked examples.
We represent each student’s behavior as a vector of highsupport sequential micro-patterns. Then, we discover both the
prevalent behavioral patterns in each group and the shared
patterns across groups using discriminative non-negative matrix factorization. Our experiments show that we can successfully detect such common and specific patterns in students’
behavior that can be further interpreted into student learning
behavior trait patterns and performance patterns.
Introduction
In many online learning environments, students have the
freedom to access learning materials, repeatedly, in any order, and at their own pace. With fewer restrictions, a variety
of interaction sequences emerge as learners work with such
systems. For example, in an interaction session, a student
may start by studying some reading material for a while,
then move on to solving relevant problems, and eventually, take a quiz before leaving the system. Recent studies on extracting behavioral patterns from these sequences
have shown that students follow stable behavioral patterns
while working with these systems (Guerra et al. 2014;
Mirzaei, Sahebi, and Brusilovsky 2019; Gitinabard et al.
2019; Wen et al. 2019). For example, some students tend
to study the reading materials, while others are more interested in learning by solving problems (Mirzaei, Sahebi, and
Brusilovsky 2019). In addition to learning patterns, some
studies have discovered inefficient learning behaviors in student sequences. For example, Guerra et al. found that some
students tend to repeat the same problems and concepts,
even after mastering them, rather than moving on to learn
new and more complex concepts (Guerra et al. 2014). One
Related Work
With the amount of information from students’ interaction
log in online educational systems increasingly growing, it
is compelling for researchers utilizing this data to study
and improve the learning process. Such data can be utilized to model students’ behavior while interacting with online courses. Students’ behavior from log data are used to
predict students’ performance (Xing et al. 2015) to either
intervene the student and avoid failure or encourage them
to pursue productive behaviors (Chunqiao, Xiaoning, and
Qingyou 2017). Another usage is to predict dropout in online open-access courses (Boyer and Veeramachaneni 2015;
c 2020, Association for the Advancement of Artificial
Copyright
Intelligence (www.aaai.org). All rights reserved.
439
Whitehill et al. 2015; Ameri et al. 2016).
Sequence mining has been widely used in educational researches to study students’ activities in online systems. Exploratory sequence analysis of students’ actions could unveil
learning strategies in flipped classes (Jovanović et al. 2017).
This method helps instructors to design courses and scaffolds. Students can also take advantage of the approach to
improve their learning behaviors. Analyzing the sequence
of transitions between online platforms in (Gitinabard et
al. 2019) has shown meaningful patterns that are helpful
for both instructors and students. Mining students’ sequential patterns of actions is used in (Maldonado et al. 2010)
to extract students’ behavioral patterns while interacting
around a tabletop. They used the patterns to distinguish between high achievement and low achievement groups. Previous researches have shown that students’ behaviors can
impact their performance since the behavior could be productive or non-productive. In (Guerra et al. 2014) the patterns are extracted using sequential pattern mining methods
from interaction with exercises and in (Mirzaei, Sahebi, and
Brusilovsky 2019) the patterns are extracted from interaction with multiple learning materials. In those researches,
distinctive patterns are recognized for each group, however,
there are some patterns that are common among all students
that should be taken into account.
Matrix factorization methods have been introduced in recommendation systems (Koren, Bell, and Volinsky 2009) and
widely used in other areas such as document clustering (Kim
et al. 2015; Xu, Liu, and Gong 2003; Shahnaz et al. 2006;
Pauca et al. 2004). In (Mouri et al. 2019) non-negative Matrix Factorization (NMF) is used to detect high-performance
learners’ browsing patterns from the collected log data to increase students’ thinking skills. Algorithm DICS in (Zhang
et al. 2018) exploits the relationships in different views to
build a classifier. This approach uses joint NMF to explore
discriminative and non-discriminative information existing
in common and specific sections among multiple views. Another way of representing students’ behaviors are by using
tensors. Tensor-based methods are used to model students’
behavior and predict their performance (Sahebi, Lin, and
Brusilovsky 2016). In (Wen et al. 2019) multi-way interactions are considered as behavior and common and discriminative patterns are discovered with a framework of iterative
discriminant factorization.
Joint discriminative non-negative matrix factorization has
been used previously in (Kim et al. 2015) to discover common and distinctive topics in documents. Their topic modeling method simultaneously finds common and distinct topics from multiple datasets. We apply this approach to detect
common and distinct extracted patterns from students’ sequential behaviors with different performances.
a short-answer question, presenting a code snippet to students and asking for the results of executing that code. The
students can repeat answering to the same problem multiple
times. However, every time simple code parameters, such
as variable values, change and as a result, the correct answer to that problem changes. The annotated examples are
code snippets that include natural language explanations for
different lines of code. Our collected data includes every
student’s sequence of activities, in the form of problem or
example identifiers, if the student’s answer to the problem
is correct (success) or incorrect (failure), and the time the
student spends on each activity. Each problem or workedexample in the dataset is assigned to a specific course topic.
Additionally, students’ prior knowledge in the material (as
pre-test scores) and knowledge at the end of the course (as
post-test scores) are available in the dataset. The dataset includes 83 student activity sequences on 103 problems and
42 examples. Student sequence length in each session varies
between 1 and 30, with an average of 2.33 activities. 61.2%
of activities are on problems, and 38.8% are on examples.
The average student success rate on problems is 68%.
Discriminative Learning of Student Behavior
In this section, we describe the process of extracting patterns from student learning behaviors. An illustration of our
framework is presented in Figure 1.
In summary, our framework follows the following steps:
1. coding student activity and constructing student sequences;
2. building student pattern matrices; and
3. finding discriminative vs. common patterns between highand low-performing students.
In the following sections, we describe each of these steps.
Constructing Student Activity Sequences
In this part, we follow the work of Mirzaei et al. to code
student activity sessions based on activity attempts’ type,
outcome, and duration (Mirzaei, Sahebi, and Brusilovsky
2019). Table 1 shows a short description of all attempt labels.
Attempt type. Since students can work with various types
of learning material (in our case, problems and workedexamples), we code activities based on the learning material
type. Specifically, for worked examples we use the letters
“e” or “E”, and for problems, we use the letters “s”, “S”,
“f”, or “F” according to outcome and duration.
Attempt Outcome. Attempting to solve problems can have
different outcomes. In our case, students can have a correct
(success) or incorrect (failure) answer. We code each kind
of feedback with different letters: a student’s successful outcome is presented with “s” or “S”, and the unsuccessful one
is presented with “f” or “F”.
Attempt Duration. We code the time spent on each attempt
for each learning material as a short (represented by lowercase letters, like “s”) or long (represented by capital letters,
like “S”) attempt. To determine if an activity should be categorized as short vs. long, we compare the time taken on the
Dataset
Our dataset is collected from an online tutoring system that
includes programming problems and worked examples. Students are free to choose the problems they would like to
work on, and the examples they would like to study in any
order. Each programming problem is a multiple-choice or
440
6WG,'
VV
6HTXHQFH
HQFH
BHHBHHB))IB)VBHB66B
B6VBVVB6BI)VBVHBHHB
B66B6VHHHBHHHB66BV
ʌ
3DWWHUQ
&063$0
Q
5DQN
6V
VVB
)V
B6V
ʌ
ʌ
;
6WXGHQWV
3DWWHUQV
/DEHOHGDWWHPSWV
V6I)H(
ൎ
:F:G
/RZ3HUIRUPDQFH
6WXGHQWV
;
ൈ
+F7
+G7
:F:F
:G:G
;
ʌ
B6)V6B6HB)VB6B)IVB
ൎ
+LJK3HUIRUPDQFH
6WXGHQWV
:F :G
ൈ
+F7
+G7
Figure 1: Most frequent patterns are extracted from sequences by CM-SPAM. These patterns are rows of matrix X and students
are columns. We split matrix X based on the performance of the students to X1 and X2 . Then with discriminative non-negative
matrix factorization, common and distinct patterns are extracted.
Label
S
F
E
Attempt
Long Success
Long Failure
Long Example
Label
s
f
e
Attempt
Short Success
Short Failure
Short Example
each micro-pattern in their complete coded sequence. The
normalization is done such that the sum of values for micropatterns for each student equals to one. This normalization
compensates for students having various sequences lengths
and allows the student vectors to be on the same scale. We
can then build a pattern matrix that represents all student
behaviors by concatenating their normalized micro-pattern
vectors.
Table 1: Attempt coding labels
activity in this attempt with the median time-taken on this
activity across all attempts of all students. If this attempt
takes longer than the median time, the attempt is coded as
a long attempt; Otherwise, it is coded as a short one.
Each student can attempt learning materials from various
topics in any order. Using the assigned learning material
topic, we separate student activity sequences into multiple
topic sessions. A new topic session starts when the student
moves to a different topic, meaning that all student activities within a session focus on the same topic. To indicate the
start and the end of each session, we use a special symbol
“ ”. For instance, “ Fse ” is a student session that starts with
working on a problem for a long time and failing at it, then
working on the problem again for a short time and succeeding in it, and finally moving on to studying an example for a
short time.
Discriminative Non-negative Factorization of
Patterns
Our main goal in this work is to distinguish between micropatterns that can represent students’ learning behavior traits
and the ones that can be indicators of student performance.
To measure the performance of student s, we use students’
normalized learning gain as:
post-tests − pre-tests
normalized-learning-gains =
max-post-test − min-pre-test
in which max-post-test and min-pre-test are the maximum
and minimum possible scores in post-test and pre-test, respectively. We group the top 40% (n = 29) of students with
the highest normalized learning gain as high-performing students, and the bottom 40% (n = 26) as low-performing
students. We leave out the students in the middle (20%) to
achieve better discrimination between student performances
in the two high and low groups.
Our assumption is that the micro-patterns that are representative of learning behavior traits, are independent of student performances. As a result, they can be shared across
both high- and low-performance students. On the other hand,
we assume that the micro-patterns that discriminate highperforming students from the low-performing ones, can be
predominantly seen in one of these two groups. According
to these assumptions, we expect to see three sets of micropatterns in high- and low-performance students’ pattern vectors: i) a set that is common across the student groups, and
has a similar importance in both groups’ pattern vectors; ii)
a set that is frequently seen in high-performance students’
sequences, and not in low-performance ones’; and iii) a set
that is specific to low-performance students.
To verify this distinction between different sets of patterns, we apply discriminative non-negative matrix factor-
Building the Pattern Matrix
Following the work of Guerra et al., in this part we use
the coded student sequences to build students’ micro-pattern
vectors (Guerra et al. 2014). More specifically, we extract
high-frequent micro-patterns from the coded sequences, and
then build student pattern vectors based on those frequent
micro-patterns.
For the first step, we use CM-SPAM (Fournier-Viger et
al. 2014), a sequential pattern mining algorithm, to find the
frequent micro-patterns with minimum support of 5.4%. We
choose this minimum support to keep the most important
patterns, while maintaining an adequate statistical power for
the experiments. Then, we discard the short patterns, or the
ones with length less than two, as they do not convey a sequential notion. This leads to 77 different frequent micropatterns. For the second step, we use these 77 most frequent
micro-patterns as features to build student pattern vectors.
For each student, we calculate the normalized frequency of
441
ization (Kim et al. 2015) that was proposed for discriminatory topic modeling in documents. To do this, we split
the pattern-student matrix X, built in previous section based
on the students’ performance to achieve matrix X1 for lowperforming students, and X2 for low-performing ones. Each
column in these matrices represent micro-patterns of one
student, and each row represent the presence of one micropattern in all students’ sequences.
Using simple non-negative matrix factorization, each of
these two matrices can be decomposed into multiplication
of two lower-dimensional matrices W and H, with k latent factors. These latent factors can summarize the association between behavioral micro-patterns and students, using a shared latent space (X1 ≈ W1 H1T X2 ≈ W2 H2T ).
To learn the W and H matrices, an optimization algorithm
(such as gradient descent) can be used to minimize the following objective function, with respect to these parameters:
2
2
(1)
L = X1 − W1 H1T + X2 − W2 H2T
F
X1 − W1 H1T
Figure
2:
Reconstruction
error
(RMSE)
of
and X2 − W2 H2T with 500 iterations
(GD) algorithm.
2
2
X1 − W1 H1T + X2 − W2 H2T +
F
F
2
2
T
2
2
α W1,c − W2,c F + β W1,d W2,d + γ(W +H )
L=
F
(4)
F
However, this factorization does not discriminate between
common and distinctive patterns. To enforce our assumptions and further group the micro-patterns into the abovementioned three sets, we use their latent representations. To
find the micro-patterns that belong to group i, we restrict
the discovered latent representations for some of the micropatterns to be as similar as possible across the two groups of
students. To find the micro-patterns that belong to groups ii
and iii, we impose the discovered latent representations for
other micro-patterns to be as different as possible across the
two groups of students. To do so, we assume W and H can
be split to two sub-matrices, each having either common or
discriminative patterns,with kc and kd latent factors, respectively:
W1 = [W1,c
W1,d ], W2 = [W2,c W2,d ]
H2,c
H1,c
H2 =
H1 =
H2,d
H1,d
Experiments
Finding Pattern Latent Vectors
Using the GD algorithm and performing a grid-search to find
the best number of common and distinct latent factors (Kc
and Kd ), we find each pattern’s latent vectors. To evaluate
the goodness of fit, we use the reconstruction error (Root
Mean Square Error) on matrices X1 and X2 . We vary K between 2 and 20 and for each K, we search over Kc s between
0 to K, such that Kd = K − Kc . The least reconstruction
error happens when K = 15, Kc = 10, and Kd = 5. In
Figure 2, we show the convergence of the GD algorithm in
reconstructing X1 and X2 in the first 500 iterations.
The discovered latent factors for each pattern are shown in
Figure 3. The left 10 columns show an average of common
latent factors in W1,c and W2,c , the middle 5 are discriminative latent factors for low-performing students (W1,d ), and
the last 5 are factors of high-performing students (W2,d ).
The darker the color, the more a latent factor is weighted
for each pattern. Looking at the heatmap, we can see that a
big group of micro-patterns in the bottom rows have similar,
and lower weights in common and distinctive latent factors.
These are the patterns that happen in student sequences from
any groups (so, associated with learning behavior trait), but
are not very strong in showing the kind of learning trait. Another group of patterns that are common between students is
the ones that show predominantly example-related activities
(e.g., ‘ee ’, and ‘ ee’ micro-patterns). For these patterns, we
see lower discriminatory weights for the performance latent
factors, but high weights for the common latent factors. This
shows that not only these patterns are indicative of learning behavior traits, but they are also representing a specific
kind of these traits: they show the group of students who are
interested in studying the worked examples, more than others. This finding is in accordance with having “readers” vs.
other student cluster in previous literature (Mirzaei, Sahebi,
and Brusilovsky 2019).
The rest of the patterns are performance patterns: if they
have a high weight in low-performing latent factors, they
(2)
Here W1,c and W2,c contain common patterns and W1,d
and W2,d have distinct ones and k = kc + kd . To impose the
similarity between common patterns (setting W1,c ≈ W2,c )
and dissimilarity between distinct patterns (setting W1,d ≈
W2,d ), we add two regularization terms, fc (.) and fd (.), to
the objective function. fc (.) and fd (.) aim to penalize the
difference between common patterns and the similarity between distinct patterns, respectively. For the difference between common patterns, the euclidean distance is used and
for the similarity between distinct ones, the dot product between vectors. As a result, these two functions are defined
as in Equation (3).
2
fc (W1,c , W2,c ) = W1,c − W2,c F
2
(3)
T
fd (W1,d , W2,d ) = W1,d
W2,d
F
Eventually, considering regularization on W and H for
generalizability purposes, we will minimize the objective
function in Equation (4), with respect to W and H, and constraining them to be non-negative, using Gradient Descent
442
will not have a high weight in high-performing latent factors, and vice versa. For example, the first group of patterns, mostly with long successful attempts repeated only
once or twice with shorter successes, are having higher
weights in high-performance factors, and very low weight
in low-performance factors. This means that observing these
sets of patterns in a student’s behavior can be indicative
of their high performance. On the other hand, the sets
of patterns with many repeated successful, but short attempts, (like ‘sss’, and ‘ssssss’) are having high weights in
low-performance factors and almost zero weights in highperformance factors. It means students that succeed in solving problems of the same topic repeatedly but do not take the
time on them are more likely to be low-performing students.
nent latent factor is the fourth low-discriminative factor; and
cluster 2’s (low performing sequence of repeated short successes) most weighted factor is the last factor in the discriminative ones. These results show the discriminative power of
latent factors, especially in indicating “example studying”
behavior and finding low-performing patterns. Using these
observations, we can use the same latent factors to predict
students’ performance in our future work.
Conclusions
In this paper, we proposed a framework to discriminate
learning behavior trait patterns vs. performance-indicator
patterns of students from student sequences in an online
learning environment. In our analyses, we have shown that
we can discover meaningful pattern clusters based on the latent factors that we find using discriminative non-negative
matrix factorization. These patterns demonstrate that high-
Clustering Patterns
To further understand the students’ learning behavior trait
and performance patterns, we cluster these patterns, according to the discovered latent factors, into different groups using the spectral clustering algorithm, to find 6 different clusters. The horizontal bars in Figure 3 are dividing the patterns into the discovered clusters. The results illustrate the
division of patterns based on a combination of learning trait
and performance factors. First, we see that trait vs. performance patterns are falling into separate clusters. For example, patterns containing reading examples (as a trait) such
as ‘ee ’, and ‘ ee’ fall into the same cluster, and patterns
with long successes (as a performance indicator) such as
‘ Sss’, and ‘ SS’ are together in another cluster. Second, we
see that high- and low-performance patterns are falling into
separate clusters. For example, patterns with long successes
(as a high-performance indicator) such as ‘ SS’ vs. patterns
with short repetitive successes (as a low-performance indicator), such as ‘sssss’, belong to different clusters. Third,
we observe a trait-related separation between different performance clusters. For example, both first group of patterns
(with long successes, followed by a few short successes) and
the fifth group of patterns (with long failures, mostly followed by long success) are indicators of high-performance
students. However, the first one shows the students that
would like to repeat their success a few times after spending
the time to get a problem right. While the second one shows
the group of students who will move on to other problem
topics as soon as they have a long-thought success, after a
long failure. This result is in accordance with grouping the
students into “confirmers” and “non-confirmers” by Guerra
et al. (Guerra et al. 2014). We see similar trait-based clusters
within low-performance patterns: the second and fourth sets
of patterns in Figure 3.
To analyze the clusters more and find the most discriminating patterns within each cluster, we find the average of
the latent factor values in each cluster. These results are plotted in Figure 4. The error bars represent the 95% confidence
interval, showing if the weight of a latent factor in a cluster
is significantly different from the weight of the same latent
factor in other clusters. We observe that the second common
latent factor is the most prominent in cluster 3 (the example
studying patterns). Cluster 4’s (low-performance patterns indicating a short success after a long failure) most promi-
&RPPRQ/DWHQW)DFWRUV
'LVFULPLQDWLYH
'LVFULPLQDWLYH
/DWHQW)DFWRUVIRU/DWHQW)DFWRUVIRU
/RZ3HUIRUPDQFH +LJK3HUIRUPDQFH
6WXGHQWV
6WXGHQWV
Figure 3: Heatmap shows the distribution of latent factors in
common and discriminative parts.
443
&RPPRQ/DWHQW)DFWRUV
'LVFULPLQDWLYH/DWHQW)DFWRUV 'LVFULPLQDWLYH/DWHQW)DFWRUV
IRU/RZ3HUIRUPDQFH6WXGHQWV IRU+LJK3HUIRUPDQFH6WXGHQWV
Figure 4: Latent factors for 6 clusters and respective patterns
performance students either repeat their success if they have
achieved it by spending a longer time or try to reinforce what
they have learned after a long failure by spending the time to
get the problem right again. Low-performing students either
hastily repeat their successful attempts over and over again
without spending enough time or leave the problem with just
one short success after a long failure, only not to learn from
it. In the future, we would like to study the predictive power
of the discovered latent factors.
References
Ameri, S.; Fard, M. J.; Chinnam, R. B.; and Reddy, C. K.
2016. Survival analysis based framework for early prediction of student dropouts. In the 25th ACM International
on Conference on Information and Knowledge Management,
903–912. ACM.
Boyer, S., and Veeramachaneni, K. 2015. Transfer learning for predictive models in massive open online courses. In
International conference on artificial intelligence in education, 54–63. Springer.
Chunqiao, M.; Xiaoning, P.; and Qingyou, D. 2017. An
artificial neural network approach to student study failure
risk early warning prediction based on tensorflow. In International Conference on Advanced Hybrid Information Processing, 326–333. Springer.
Fournier-Viger, P.; Gomariz, A.; Campos, M.; and Thomas,
R. 2014. Fast vertical mining of sequential patterns using
co-occurrence information. In Advances in Knowledge Discovery and Data Mining, 40–52. Springer.
Gitinabard, N.; Heckman, S.; Barnes, T.; and Lynch, C. F.
2019. What will you do next? a sequence analysis on
the student transitions between online platforms in blended
courses. In the 12th International Conference on Educational Data Mining, 59–68.
Guerra, J.; Sahebi, S.; Lin, Y.-R.; and Brusilovsky, P. 2014.
The problem solving genome: Analyzing sequential patterns
of student work with parameterized exercises. 153–160.
Jovanović, J.; Gašević, D.; Dawson, S.; Pardo, A.; and Mirriahi, N. 2017. Learning analytics to unveil learning strategies
in a flipped classroom. The Internet and Higher Education
33(4):74–85.
444
Kim, H.; Choo, J.; Kim, J.; Reddy, C. K.; and Park, H.
2015. Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. In
the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 567–576. ACM.
Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. Computer
(8):30–37.
Maldonado, R. M.; Yacef, K.; Kay, J.; Kharrufa, A.; and AlQaraghuli, A. 2010. Analysing frequent sequential patterns
of collaborative learning activity around an interactive tabletop. In Educational Data Mining 2011.
Mirzaei, M.; Sahebi, S.; and Brusilovsky, P. 2019. Annotated examples and parameterized exercises: Analyzing students’ behavior patterns. In International Conference on Artificial Intelligence in Education, 308–319. Springer.
Mouri, K.; Suzuki, F.; Shimada, A.; Uosaki, N.; Yin, C.;
Kaneko, K.; and Ogata, H. 2019. Educational data mining
for discovering hidden browsing patterns using non-negative
matrix factorization. Interactive Learning Environments.
Pauca, V. P.; Shahnaz, F.; Berry, M. W.; and Plemmons, R. J.
2004. Text mining using non-negative matrix factorizations.
In SIAM International Conference on Data Mining, 452–
456.
Sahebi, S.; Lin, Y.-R.; and Brusilovsky, P. 2016. Tensor factorization for student modeling and performance prediction
in unstructured domain. In the 9th International Conference
on Educational Data Mining, 502–505.
Shahnaz, F.; Berry, M. W.; Pauca, V. P.; and Plemmons, R. J.
2006. Document clustering using nonnegative matrix factorization. Information Processing & Management 42(2):373–
386.
Wen, X.; Lin, Y.-R.; Liu, X.; Brusilovsky, P.; and
Barrı́a Pineda, J. 2019. Iterative discriminant tensor factorization for behavior comparison in massive open online
courses. In The World Wide Web Conference, 2068–2079.
Whitehill, J.; Williams, J.; Lopez, G.; Coleman, C.; and Reich, J. 2015. Beyond prediction: First steps toward automatic
intervention in mooc student stopout. In the 8th International Conference on Educational Data Mining.
Xing, W.; Guo, R.; Petakovic, E.; and Goggins, S. 2015.
Participation-based student final performance prediction
model through interpretable genetic programming: Integrating learning analytics, educational data mining and theory.
Computers in Human Behavior 47:168–181.
Xu, W.; Liu, X.; and Gong, Y. 2003. Document clustering
based on non-negative matrix factorization. In the 26th annual international ACM SIGIR conference on Research and
development in informaion retrieval, 267–273. ACM.
Zhang, Z.; Qin, Z.; Li, P.; Yang, Q.; and Shao, J. 2018.
Multi-view discriminative learning via joint non-negative
matrix factorization.
In International Conference on
Database Systems for Advanced Applications, 542–557.
Springer.