Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Reef-Insight: A Framework for Reef Habitat Mapping with Clustering Methods Using Remote Sensing
Next Article in Special Issue
The Application of Z-Numbers in Fuzzy Decision Making: The State of the Art
Previous Article in Journal
Improving the Effectiveness and Efficiency of Web-Based Search Tasks for Policy Workers
Previous Article in Special Issue
Fermatean Fuzzy-Based Personalized Prioritization of Barriers to IoT Adoption within the Clean Energy Context
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data Mining Using Association Rules for Intuitionistic Fuzzy Data

1
Cognitive Geospatial Systems, Naval Research Laboratory, Stennis Space Center, MS 39529, USA
2
Machine Intelligence Institute, Iona College, New Rochelle, NY 10804, USA
*
Author to whom correspondence should be addressed.
Information 2023, 14(7), 372; https://doi.org/10.3390/info14070372
Submission received: 15 May 2023 / Revised: 26 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023

Abstract

:
This paper considers approaches to the computation of association rules for intuitionistic fuzzy data. Association rules can provide guidance for assessing the significant relationships that can be determined while analyzing data. The approach uses the cardinality of intuitionistic fuzzy sets that provide a minimum and maximum range for the support and confidence metrics. A new notation is used to enable the representation of the fuzzy metrics. A running example of queries about the desirable features of vacation locations is used to illustrate.

1. Introduction

Knowledge discovery and data mining involve a number of approaches from the areas of database processing, pattern recognition, and machine learning. These techniques are used in order to find data patterns or associations of value for decision-making. To deal with the uncertainty often found in real-world data, there have been significant considerations about data mining involving uncertain and fuzzy data mining [1,2,3,4]. The use of fuzzy sets is well established in managing real-world applications in which uncertainty is commonly involved. More specifically for our interest here in association rules, a number of approaches have been considered for fuzzy data [5,6,7].While fuzzy sets present some capabilities, a simple membership function has limitations [8]. This motivates our use of intuitionistic fuzzy sets, as they provide more flexibility for human interactions under uncertainty in association rule data mining. Some concepts are more easily approached by separately envisaging positive and negative instances. In preference modeling involving various interrelated factors, it can be very difficult to simply specify a single simple membership function. [9] For example, in formulating preferences for a family vacation location, there are usually multiple criteria such as distance, costs, and location desirability which influence the evaluation. It can then be very difficult to formulate a simple degree of suitability, but by specifying membership and non-membership corresponding to preferences, the choices can be more easily modeled.
This paper considers the modifications required for association rule computations if fuzzy intuitionistic valued information is involved. The goal is to determine useful associations and patterns from large data sources. Association rules are used to develop valuable insights in determining significant correlations that can be found in the current environment of large data sets, such as those found in various databases or the cloud [10]. The extension using intuitionistic sets is very significant as it captures both positive and negative evaluations. This can then provide contrasting association rules to better inform decision-making. Additionally this provides a complementary capability to the approaches in which positive and negative association rules are generated [11,12].
There have been a large number of specific data mining and knowledge discovery algorithms that have been designed and implemented [13]. Patterns and relationships that can be discovered must be assessed based on interestingness measures in order to provide pruning of the combinatorial complexity of such relationships. Since this sort of information often integrates with decision-making systems, effective user interfaces must be developed with visualizations or other representations for presenting the knowledge discovered.
For this paper, association rule development is our main focus [14]. Associations correspond to correlations among data items which are represented in the form of rules composed of attribute–value conditions that occur with some significant correlations in a set of data [15]. Association rules are of the form antecedents A, and consequents C, A → C, where data entries satisfying the antecedent A commonly occur in conjunction with data values for C. The term “common occurrence” has a probabilistic implication and is not a functional dependency as used in databases.
Finally, we provide an outline of the paper:
Section 2: Fuzzy set representations to be used are defined. A review of previous research on fuzzy data mining, especially research on association rule data mining using fuzzy sets, is presented.
Section 3: The metrics used in association data mining are defined. Specifics of basic association rule data mining are given and a running example illustrates this. A case analysis of the bounds of the basic measures, support and confidence, and the interestingness metrics, lift and conviction, are provided.
Section 4: Next, the basics of the Apriori algorithm are illustrated using the running example.
Section 5: The extensions of association mining needed for intuitionistic fuzzy sets are described, in particular the required set cardinality used for association rule generation. The running example extended to fuzzy intuitionistic data is then used to illustrate the approach using the various metrics.

2. Background

In this section we provide an overview of the fuzzy set representations that are relevant to this paper. Then we discuss data mining in general and specifically fuzzy data mining. In particular, we review other research for fuzzy association rules.

2.1. Uncertainty Representations

In this section we briefly overview common uncertainty representations [16], including fuzzy sets and intuitionistic fuzzy sets for approaches to data mining.

2.1.1. Fuzzy Set Theory

Fuzzy set representations [17,18] provide the membership degrees of data values in a set, as opposed to crisp sets. For domain D, a fuzzy set, FS, is
FS(D) = {<ai, m(ai)>|0 ≤ m(ai) ≤ 1}, ai ∈ D, I = 1…n
where ai is a data value and m(ai) is the membership of the data value.

2.1.2. Intuitionistic Fuzzy Sets

Intuitionistic fuzzy set theory extends ordinary fuzzy set theory by allowing both positive and negative memberships to be specified. Recall that an ordinary fuzzy set FS (D) = {<ai,m(ai)>} has only one membership value for a data element ai. An intuitionistic fuzzy set IFS(D), [19] allows both positive, mS(ai), and negative membership values, mS*(ai).
IFS(D) = {<ai, mS(ai), mS*(ai)>|ai ∈ D} where mS(ai), m*S(ai), ∈ [0, 1].
Specifically, the sum of the membership, mS(ai), and non-membership, mS*(ai), is not necessarily one, then: 0 ≤ mS(ai) + m*S(ai) ≤ 1. Additionally the hesitation hS (ai)
hS (ai) = 1 − (mS(ai) + m*S(ai))
is the degree of indeterminacy (hesitation).

2.1.3. Interval-Valued Fuzzy Sets

Interval values are used in many areas to capture the imprecision and uncertainty of data. We first provide the formalisms for interval arithmetic [20,21] as needed for interval valued fuzzy sets. We let D be the domain and intervals will be represented by the values of the lower bound, z = lb(ai), and an upper bound, z = ub(ai), of an interval I (ai), for the data value ai ∈ D
I (ai) = [z, z] = {z ∈ D|z ≤ z ≤ z}
Now, in an interval-based fuzzy set, IVF(D), representation is based on using upper, mu(ai), and lower bounds, ml(ai), on fuzzy memberships
IVF(D) = {<ai, I (ai) >|I (ai) = [ ml(ai), mu(ai)]}
For an interval I (ai), the size or length of the interval, IW, is just the difference of the lower and upper bounds,
IW (I (ai)) = |ml(ai) − mu(ai)|.
IW is often used as a representation of the uncertainty of a data value ai in an IVF as an information measure [22].
We note that IVF and IFS sets are equivalent as generalizations of fuzzy sets [23]. In particular, we have
mS(ai) = ml(ai) and mS*(ai) = 1 − mu(ai)
So,
IVF(D) = {<ai, I (ai)>|I (ai) = [ms(ai), 1 − mS*(ai)]}
This can be used for the set cardinality of IVF(D), as related to the set cardinality of IFS(D) developed for the calculation of support and confidence of fuzzy association rules.

2.2. Data Mining Approaches

We are primarily concerned with some of the particular algorithms used for knowledge discovery, but we will overview the complete processes involved in data mining. To begin, the initial basic steps involve data preparation [24]. First is a data cleaning stage, which includes resolution of missing data [25] or data errors. Additionally, the integration of data from multiple sources which may be heterogeneous is performed [26,27,28]. Next are the steps needed to prepare for actual data mining, which include the selection of the specific data relevant to the task and the transformation of this data into a format required by the data mining approach. Often, these steps are seen to be those involved in developing data warehouses. This provides an organization for the formatting of data to facilitate data mining approaches.
Knowledge discovery or data mining algorithms can generally be classified into two categories: predictive and descriptive data mining. The descriptive category includes association rules, classification, and class characterization. Typically, data generalizations or characterizations are provided for class descriptions, such as data summarizations [29]. Additionally, data class comparisons allow discrimination of classes to be developed.
Lastly, a classification approach uses data with known class memberships and then builds class models from the features extracted from the training data. Often, this is output as classification rules or a type of decision tree to support predictive classification of new data. Another well-developed knowledge discovery technique is predictive analysis with clustering approaches. Discovery of collections of data items that are similar is achieved by cluster analysis. A distance function is often developed by experts to provide effective similarity metrics. The characteristics of an appropriate clustering algorithm are that it results in high similarity for intra-cluster measures and low similarity for inter-clusters. To determine potential values that may be missing or the distribution of attribute values, various prediction techniques may be used. Some such techniques include machine learning, genetic algorithms, and regression and correlation analysis.

2.3. Fuzzy Data Mining

Many early efforts in knowledge discovery appear in pattern recognition, especially in the form of fuzzy clustering [30]. There have been significant considerations of data mining involving uncertain and fuzzy data mining using neural networks and genetic algorithm approaches [2,3]. More specifically for our interest here in association rules, a number of approaches have been considered for fuzzy data [7,31,32].

2.3.1. Fuzzy Association Rules

In one approach, fuzzy association rule data mining has been developed which can use both fuzzy transactional and relational data [5]. Relational data enable extraction of multidimensional association rules and transactional data make the discovery of patterns more reliable. In another approach based on crisp sets of fuzzy transactions, a general model was developed to discover association rules [33]. The model was capable of being specialized to specific patterns or application data. Additionally, associative rules can be formed by using generalized implicit knowledge based on quantitative transaction data [34]. This technique uses a taxonomy, and items in the rules can be based on any level of the taxonomy. Another paper [35] proposed an automated method for autonomous mining of fuzzy association rules. They first find fuzzy sets by using an efficient clustering algorithm, and then determine their membership functions. Using these they find interesting fuzzy association rules. Two papers have proposed using type-2 fuzzy sets. In one, the quantitative values in transactions are dealt with as type-2 values. These can then be reduced to ordinary fuzzy data using split points and association data mining is carried out on this data [36]. Another approach [37] proposes a fuzzy frequent pattern-mining algorithm based on the type-2 fuzzy set theory of the data stream. The stream is partitioned based on a sliding window. This is then used to extract fuzzy association rules.

2.3.2. Fuzzy Spatial Association Rules

Some interesting approaches have focused on spatial data. Spatial association rules can be implemented by permitting spatial application features represented by fuzzy spatial objects and topological relationships [6]. They extract association rules using application-related spatial objects of interest and the fuzzy spatial features. Another approach to extracting association rules on spatially related data includes uncertain geographic and geologic information [38]. This makes use of fuzzy set cardinality to compute support and confidence metrics for rule evaluation.
None of these approaches have considered alternative uncertainty representations. In this paper we have extended the association rule approaches to consider the more flexible representation of uncertainty using intuitionistic fuzzy sets.

3. Association Rules

This section will describe the basis for the computation of association rules and provide examples demonstrating this. The extensions to the use of intuitionistic fuzzy sets will then be developed and the examples used to provide illustrations of the approach. A common example motivating association rules is the “market basket” of grocery items often purchased in the same transaction [14]. This sort of relationship can provide guidance on the marketing and placement of such items.

3.1. Association Rules Metrics

There are two main metrics, support and confidence, used for association rule data mining. Others are called interestingness metrics, used to evaluate discovered rules.
Let E = { e1, …, en} be a set of data items of interest. We will later consider the possibility that some of these items can involve uncertainty and be represented by intuitionistic fuzzy values. For the data in E we examine a set of items, R, that result from interactions such as a transaction consisting of items related to a purchase order or a query finding vacation locations for which desirable features appear. The interactions are specified by the resulting subsets of E, R = {R1, R2, …, Ri, …} where Ri = { ej, …, ek} ⊆ E.
Specifically, we are concerned with the possible relationships among the data items that occur in R. Let Sj, S k ⊆ E, then we are interested in rules of the form Fjk
Fjk: Sj ⇒ Sk
where Sj is the antecedent and Sk the consequent term of the rule. Such a rule means items in Sj co-occur with items in Sk. We must make an assessment of the value of such rules by using two common metrics, support and confidence, to be evaluated to quantify the relationships. We first define two counts, Nsp and Nant, used in the computation of the support and confidence metrics.
Nsp = |{RiR|Sj ∪ Sk ⊆ Ri}|
Nant = |{RiR|Sj ⊆ Ri}|
So Nsp is a count of the number of rules of the form Fjk and Nant is the number of occurrences of the rule antecedent Sj.

3.1.1. Support Metric: Msp

The support of a rule is a measure of how frequently the antecedent, Sj, and consequent, Sk, of a rule appear in the same entry Ri of the result set R. Using the count Nsp, we have
Support: Msp = Nsp/|R|

3.1.2. Confidence Metric: Mcf

The confidence factor indicates how strongly a relationship is represented in the set of results R. That is if Sj ⊆ Ri then it is also the case that Sk ⊆ Ri. So we use the count Nant here.
Confidence: Mcf = Nsp/Nant
The percentage of results Ri in R satisfying Mcf will determine if the rule is of value for decision-making.
The support and confidence can be interpreted as probabilities where we have
Msp: Prob (Sj ∪ Sk); Mcf: Prob (Sk|Sj)
Support and confidence metrics are distinct. Confidence measures a rule’s strength while support is a measurement of statistical significance. In other words, we want a rule with enough support above a threshold, or else it is not representative enough of the data. Rules with support above a threshold are termed as being frequent.

3.2. Examples of Rule Support and Confidence

We can examine a simple example of responses in a query to illustrate the formation of association rules Fjk. So we consider a website that can be queried to locate desirable eco-tourism vacation locations. In Table 1 are responses to queries on vacation preferences related to various available activities which constitute the items of interest, such as camping, fishing, etc.:
E = {S1: camp; S2: fish; S3: hike; S4: raft; S5: ski}.
Analyzing these query results, we can obtain association rules and the corresponding support and confidence values. Table 2 shows an example of some representative potential rules relating activities to be evaluated. Note that, in general, all possible combinations of the items in E must be considered, which is then a potentially large space. We discuss the Apriori algorithm in the next section to help prune computation of these combinations.
Consider the first line in Table 2 for the rule F12: “camp → fish”, where camp is the rule antecedent and fish the consequent. This rule shows that available locations with camping often occurred in conjunction with a location in which fishing was allowed. Then, for this rule, the support is 57% since camp and fish co-occur in four entries in Table 1, (R1, R2, R3, R6), out of the seven query responses. Furthermore, the confidence is 0.66% since the feature camp occurs in a total of six entries of Table 1. Additionally, the second rule, F21, has 80% confidence since the feature S2—fish appears in five of the responses. Note that not all potential rules have support such as F54, since the items ski and raft do not co-occur in any of the responses.
In considering the rules in Table 2, we can see a range of support values. Most often we are interested in rules with support above a minimum threshold Thsp. If the support were not high enough, then the rule would not be significant enough to be considered, or even simply not as preferred. So if a threshold were Thsp = 40%, then the first three rules would be considered to be of interest. These three also have good confidence values, and so are strong rules worth consideration. We see that rules that have high confidence are not necessarily of interest as their support may be low (for example F25 or F5{12}).

3.3. Interestingness Metrics

Support and confidence are the most common metrics for association rules; however we will examine others that can provide alternative analyses for rules. The commonly used lift measure relates the rule confidence and the expected confidence [39]. Strong rules with high support and confidence can be uninteresting depending on how much the rule antecedent and consequent are related. Lift is used to determine the statistical relationship between the antecedent and consequent, which relates to the interestingness or usefulness of the rule. Another metric is conviction, which relies on the expected support that the antecedent of the rule appears without the consequent. For these metrics, we obtain evaluations as to the prediction significance of the rule.
Although interestingness metrics are useful in evaluation of rules after they are determined, the rules were dependent on the thresholds for confidence and support. Similarly to decisions on fuzzy set memberships, this can be somewhat arbitrary and user-dependent, making consistency of the rule discovery process a problem. One interesting approach addressing this that has been proposed [40] can be applied if rule antecedents follow a determined distribution. Then the minimum confidence and support thresholds can be set based on using a one-sided statistical confidence interval with hypothesis testing.

3.3.1. Lift Metric

First we consider lift given as:
Lift = confidence/expected confidence
We must first specify the support of the consequent of Fjk by defining
Ncon = |{RiR|Sk ⊆ Ri}|;
and so the support of the consequent is
Consequent Support: Mcon = Ncon/|R|
Now the expected confidence for a rule is the product of the rule support and consequent support divided by the rule support:
(Msp ∗ Mcon)/Msp = Mcon.
Then finally we obtain:
Lift = Mcf/Mcon = (Nsp/Nant) ∗ (|R|/Ncon)
For the rule Sj ⇒ Sk, lift is interpreted as the correlation of the Sj and Sk.
(a)
Lift > 1, positive correlation.
(b)
Lift < 1, negative correlation.
(c)
Lift = 1, correlation is independent.
So lift determines how much more frequently Sj and Sk appear concurrently than would be expected if they were independent statistically.
In our example rules in Table 3, four rules have a lift less than 1. For example in rules F12 and F21 there is not a significant correlation between the items S2: fish and S1: camp, as camp occurs without fish twice and fish occurs without camp once. Now consider rules F25 and F52 for which the lift is 1.4. There is a positive correlation as there is a strong prediction of S2: fish when S5: ski occurs since in the only two occurrences of ski, fish also appears in the response.

3.3.2. Conviction Metric

Next we examine the conviction metric
Conviction = (1 − Mcon)/(1 − Mcf)
Conviction is sensitive to the direction of rules, Sj ⇒ Sk versus Sk ⇒ Sj. So, we see rule 1 (F12) and rule 2 (F21) differ in their direction, and conviction for rule 1 is 1.26 but for rule 2 it is 2.15. This can be attributed to the lower confidence (0.66) of rule 1, and so rule 2 has greater interest by this measure, leading to the stronger conviction. For logical implications where confidence is 1, the value is unbounded (+∞), as is true for rule 5 (F52). That is, S5 (ski) uniquely appears with S2 (fish) and not vice-versa.

3.4. Case Analysis of Metrics

We can examine the range of values of the metrics discussed by an analysis of the possible extreme values. Let |R| = Z and consider a supported rule of interest Fjk. Then for Fjk, {Sj ∪ Sk} must appear at least once, but can in the extreme, however unlikely, appear as many as the maximum of Z times or occurrences. The range of counts for such rules is then
1 ≤ Nsp ≤ Z

3.4.1. Support and Confidence Analysis

So, the range for support, Msp = Nsp/Z, is
1/Z ≤ Msp ≤ Z/Z = 1
Since, for each occurrence of a rule Fjk the antecedent Sj must appear, its ranges are similarly
1 ≤ Nant ≤ Z
Recall that confidence is Mcf = Nsp/Nant. We analyze the possible cases in Table 4.
Hence, the range of Mcf is:
1/Z ≤ Mcf ≤ 1/1 = 1

3.4.2. Lift Analysis

Now, we can perform a similar extreme case analysis for lift. Consider the relationship of Nant and Ncon. For Nsp = 1, the antecedent, Sj,, and consequent, Sk, must both appear at least once, and independently can each appear up to the extreme of Z times. For Nsp = Z, they must both appear Z times. The analysis of the cases for the range of lift is in Table 5.
From Table 5, the range for lift is:
1/Z ≤ Lift ≤ Z

3.4.3. Conviction Analysis

Finally, we examine conviction
Conviction = (1 − Mcon)/(1 − Mcf)
As for the antecedent, 1 ≤ Ncon ≤ Z. Then, since Mcon = Ncon/Z
1/Z ≤ Mcon ≤ 1
Since 1/Z ≤ Mcf ≤ 1 and 1/Z ≤ Mcon ≤ 1, the ranges for both the numerator and denominator are: [1 − 1/Z, 1 − 1] = [(Z − 1)/Z, 0)]. However Mcon and Mcf cannot both be 1, so there is not a 0/0 case. Then, for conviction the range is:
0 ≤ Conviction ≤ +∞

4. Apriori Procedure

As discussed, a major issue for finding association rules is the combinatorial complexity of computing all potential appropriate combinations of data items. This issue has been addressed by developing algorithms using the Apriori property [41]. Examples of the approach are described in this section.
An overview of the overall approach for discovering strong association rules can be formulated in three stages:
A.
Computing the frequent item sets: This is performed using the support metric for evaluation and utilizing the Apriori property to simplify the search.
B.
Determining strong association rules: From the frequent item sets in the first stage, the confidence metric is used in the evaluation to determine strong rules.
C.
Evaluating effectiveness of the resulting strong rules: Interestingness metrics, such as lift and conviction, are used in the selection of the most useful strong rules.
We will illustrate some possible frequent item sets using an example from the recreation locations application. A major issue is that there is a combinatorially large number of possible frequent item sets. The Apriori algorithm is commonly used to reduce computational complexity. It makes use of the prior knowledge of the support frequency to help reduce the generation of frequent sets. This property means that all subsets of a frequent set are frequent and if a set is not frequent all of its supersets are not frequent. The basic characteristic of the Apriori property is that it treats the support measure as being anti-monotonic.

Apriori Example of Frequent Set Generation

Finding frequent item sets is an iterative process that determines the next candidate frequent set from previous ones. The search generates the next candidate k + 1 level item set from the co-join of the previous k level item set. The Apriori property is then used to prune the search by eliminating possible sets from the generated set. The process follows as all item supersets of any item set that is not frequent are not frequent and can be eliminated. This terminates when no new frequent sets are generated.
We illustrate this with data similar to the previous discussion, with the same item set,
E = {S1: camp; S2: fish; S3: hike; S4: raft; S5: ski},
but suitable to illustrate the approach for computing frequent sets, Table 6.
To start the algorithm, we determine the support for each of the five features, as their support values determine the frequent feature sets.
For simplicity, we will use the first letter of the features for the rest of this example.
The iterative process starts using Table 7 as the candidate set C1 and generates the next possible set C2 by a join, (C1 ⊕ C1), of the items for the next set of candidates:
{CF, CH, CS, CR, FH FS, FR, HS, HR, SR}
Then we apply the Apriori condition using Table 7, where a set is pruned if any of its subsets are not above the support count threshold. For example, the set CH has C and H as subsets both with 0.66 support above the threshold and so CH is not pruned. However, if our support threshold Thsp is 0.3 then any set with S as a subset will be pruned.
Again, any sets with S as a subset, {CS, HS, FS, RS}, are pruned, since S is not frequent, and we obtain as the next candidate set, C2, Table 8.
From this candidate set we generate the next stage by joining C2 and C2 giving:
{CFH, CFR, CHR, FHR}
Now, these must be checked to see if all subsets are frequent. For example, the subsets of {CFH} are {CF, CH, FH}, all of which are frequent. For {FHR} the subsets are {FH, FR, HR}, and HR is not frequent, and it is pruned. Similarly, CHR is pruned. So, C3 = {CFH, CFR} and the join of these is {CFHR}. Since FHR is a non-frequent subset we go no further and the final result is C3.

5. Uncertainty Querying

In this section extensions to the use of intuitionistic fuzzy sets for association rules are developed based on extensions of the metrics. The examples used to provide illustrations of the approach are based on the running example of the querying of vacation locations.
We first examine more closely our example of querying of vacation locations described in Table 1. As well as the specific features for each site, there are usually criteria such as distance, costs, and location desirability influencing the user evaluation. With these subjective criteria specified by linguistic values such as lower cost, nearness, or pleasantness of location, we would then obtain a degree of the suitability of each location retrieved on this basis. We can use the term “desirable” for the combination of the factors and capture this using a fuzzy set representation for membership in the set of desirable locations. Basic fuzzy sets were used in previous research [38], and here we want to extend this to more flexible representations, including intuitionistic-based fuzzy memberships.

5.1. Fuzzy Intuitionistic Measures for Support and Confidence

Next we must consider how to deal with the uncertainty using intuitionistic fuzzy set representations of subjective results of an operation such as a query. In the cases here for a query response, Ri, we consider an intuitionistic fuzzy membership in R for the query.
{<Ri = {ej, …ek}, mR(Ri), m*R (Ri)>}
To determine frequent feature sets it is necessary to calculate the number of responses Ri that support the rule Fjk. For crisp operations, the support count Nsp is simply the size or cardinality of the set Ri. However, since we have a fuzzy membership, we must adapt the support count. For the case of a query with simple fuzzy membership, this was modeled using the set cardinality for fuzzy sets [42] as the sigma count, which is simply the sum of its membership values.
Card ( FS ( D ) ) = i = 1 n m ( a i ) ,   a i D

5.1.1. Cardinality of Intuitionistic Fuzzy Sets

The cardinality of intuitionistic-valued fuzzy sets (IFS) is needed in the procedure of determining association rules [43,44]. The development of the cardinality of an IFS set follows a geometrical framework [45]. Specifically, they define two bounds, the least cardinality:
Min   Card   ( IFS ( D ) ) = i = 1 n m S ( a i )
and the maximum cardinality:
Max   Card   ( IFS ( D ) ) = i = 1 n ( m S ( a i ) + h S ( a i ) ) = i = 1 n ( m S ( a i ) + 1 ( m S ( a i ) + m * S ( a i ) ) ) = i = 1 n ( 1 m * S ( a i ) )
Then, the overall cardinality is the interval of the least and maximum values:
Card(IFS(D)) = [Min Card (IFS(D)), Max Card (IFS(D))]
An interval may not be as convenient to use in calculations, so the average possibility cardinality AVCard can also be used:
AVcard ( IFS ( D ) ) = ½   ( i = 1 n ( m S ( a i ) + 1 m * S ( a i ) ) = ½   ( i = 1 n ( m S ( a i ) + m S ( a i ) + h S ( a i ) ) = i = 1 n ( m S ( a i ) + h S ( a i ) / 2 )
It is noted that this is the midpoint of the interval of the Min–Max cardinalities for the IFS.

5.1.2. Intuitionistic Metrics

Now, we can develop the support and confidence measures for fuzzy intuitionistic memberships using these cardinality approaches. First adapting the counts, FNsp, FNant, FNcon, for a rule Fjk we must define for each count, an index set.
I(sp) = {i|Sj ∪ Sk ⊆ Ri}; I(ant) = {i|Sj ⊆ Ri}; I(con) = {i|Sk ⊆ Ri};
Since the cardinality we use here is an interval, there are two measures for the Min and Max cardinality, respectively, in the counts:
FNsp = |{<Ri, mR(Ri), m*R (Ri)>|Sj ∪ Sk ⊆ Ri}|
MinFN sp = Min | { < R i ,   m R   ( R i ) ,   m * R   ( R i ) > } | = i I ( s p ) m R   ( R i )
MaxFN sp   = Max | { < R i ,   m R ( R i ) ,   m * R ( R i ) > } | = i I ( s p ) ( m R ( R i ) + h R ( R i ) ) = i I ( s p ) ( 1 m * R ( R i ) )
Similarly, for antecedent and consequent counts:
Min   FN ant = i I ( a n t ) m R   ( R i ) ;   Max   FN ant = i I ( a n t ) ( 1 m * R ( R i ) ) Min   FN con = i I ( c o n ) m R   ( R i ) ;   Max   FN con   = i I ( c o n ) ( 1 m * R ( R i ) )
Now the measures become two values for support and confidence each. First for fuzzy supports:
MinFMsp = MinFNsp/|R|
MaxFMsp = MaxFNsp/|R|
Following a similar approach for the counts for the antecedent, we have a min and a max count. Then, to determine fuzzy confidence, we use::
MinFMcf = MinFNp/Min FNant
MaxFMcf = MaxFNsp/Max FNant
However, to proceed, we must consider that there are two possible interpretations of |R|. It is an issue of whether or not the size or cardinality of R should be based on the membership values. So we will use two interpretations of |R|:
1 . Min | R | = i = 1 n m R ( R i )   2 .   Max   | R | = i = 1 n ( 1 m * R ( R i ) )

5.2. Fuzzy Query Example

We will first show the results of the query using the example of Table 1 with the example fuzzy memberships. Then we compute support and confidence using the extended measures we have developed.
Next, using the data in Table 9, we evaluate support and confidence for the same example rules as shown in Table 2.

5.3. Discussion of Results

It is clear that the most consistent use of |R| is to use the min value for the min support and the max value for the max support. These results for Min/Min and Max/Max produce values bracketing the support values for the crisp cases in Table 2. Averaging these produces values about 1% different from the crisp support values. Additionally, the confidence values are compatible with previous values. This shows that we can use the intuitionistic fuzzy sets to provide flexibility in capturing uncertainty in analyzing data and obtain useful results, Table 10.

5.3.1. Effect of Negative Memberships

In our example of intuitionistic data in Table 9 we chose to focus only on stronger positive memberships to allow a consistent analysis of rules. Now we examine the effect if a few data values have higher negative memberships. Specifically, we will use a membership of <0.3, 0.6> for R6 and then R1 to compare the effects on rule support. For the case of R6, we consider rules F25 and F52 as there is only one other entry, R1, involved. We expect less significant changes for rules in which more data is involved, and the effect of the one change would not be expected to be as strong.
For rules F25 and F52, the change affects |R| and FNsp:
Min|R| = 4.4; Max |R| = 4.9; Min FNsp = 0.9; Max FNsp = 1.1
So,
Min FMsp = 0.9/4.4 = 0.205; Max FMsp = 1.1/4.9 = 0.224
Then, the support for both rules has decreased, to a support of 28% for MinFMsp and 25% for Max FMsp.
Next, we replace the membership of R1 with <0.3, 0.6> and compare the effect on F12. Here,
Min|R| = 4.6; Max |R| = 5.1; Min FNsp = 2.3; Max FNsp = 2.7
Then,
Min FMsp = 2.3/4.6 = 0.5; Max FMsp = 2.7/5.1 = 0.529
As expected, the changes in the support are now much less—roughly 9.5% for both supports.

5.3.2. Lift Metric

Now, we examine the lift metric with the data of Table 9. Again, the results are consistent with the lift results of Table 3, as the same three rules, F25, F52, F{12}4, have lift >1, Table 11.
Finally, to be complete, we examine the same change in negative membership for R6 and R1 above. For F25 and F52, we see:
Lift Min: 1.52–6% increase; Lift Max: 1.44–3.5% increase.
For F12, the lift value changes are marginal:
Lift Min: 0.90–1% decrease; Lift Max: 0.91–1% decrease.
These results are still consistent with previous values, since F25 and F52 have lift >1, and F12 has lift <1. The increases for F25 and F52 reflect the change in the co-occurrence of the antecedents and consequents due to the increased negative membership as shown in Section 5.3.1 above. Additionally, for F12 the change is negligible, again consistent with the smaller changes in its support for similar reasons.

6. Conclusions

We have shown that it is quite feasible to discover association rules appropriate for applications in which the data involves intuitionistic fuzzy set descriptions.
The extensions required were applications of the set cardinality for intuitionistic fuzzy sets for determining the counts used in the metrics. This produced consistent results for the support and confidence of rules. We also examined the effects of data with higher negative memberships in some cases. The results implied that the approaches were still valid, depending on the percentage of data with such memberships. Additionally, to provide insight into the metrics used in the approach, the ranges of support and confidence as well as lift and conviction were determined.
Since interval valued fuzzy sets have similar formulations, the use of such uncertainty measures can be considered next for association rules. Entropy measures can be used [22] to evaluate the IFS and IVFS in the association rule computations.

Author Contributions

Conceptualization: F.P. and R.Y.; Methodology: F.P. and R.Y.; Writing: F.P. and R.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Fred Petry would like to acknowledge the Naval Research Laboratory’s Base Program for sponsoring this research.

Data Availability Statement

No new data were created for this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Aggarwal, C.; Li, Y.; Wang, J.; Wang, J. Frequent pattern mining with uncertain data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 29–38. [Google Scholar] [CrossRef] [Green Version]
  2. Hirota, K.; Pedrycz, W. Fuzzy computing for data mining. Proc. IEEE 1999, 87, 1575–1599. [Google Scholar] [CrossRef]
  3. Mirzakhanov, V. Value of fuzzy logic for data mining and machine learning: A case study. Expert Syst. Appl. 2020, 162, 113781. [Google Scholar] [CrossRef]
  4. Petry, F.; Yager, R. Intuitionistic and interval-valued fuzzy set representations for data mining. Algorithms 2022, 15, 249. [Google Scholar] [CrossRef]
  5. Au, W.; Chan, K. Mining fuzzy association rules in a bank-account database. IEEE Trans. Fuzzy Syst. 2003, 11, 238–248. [Google Scholar]
  6. da Silva, H.; Felix, T.; de Venâncio, P.; Carniel, A. Discovery of Spatial Association Rules from Fuzzy Spatial Data. In Proceedings of the Conceptual Modeling: 41st International Conference, ER 2022, Hyderabad, India, 17–20 October 2022; Ralyté, J., Chakravarthy, S., Mohania, M., Jeusfeld, M.A., Karlapalem, K., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2022; Volume 13607. [Google Scholar]
  7. Lin, C.; Li, T.; Fournier Viger, P.; Hong, T. A fast algorithm for mining fuzzy frequent itemsets. J. Intell. Fuzzy Syst. 2015, 29, 2373–2379. [Google Scholar] [CrossRef] [Green Version]
  8. Szmidt, E.; Kacprzyk, J. Medical Diagnostic Reasoning Using a Similarity Measure for Intuitionistic Fuzzy Sets. In Proceedings of the Eighth International Conference on IFSs, Varna, Bulgaria, 20–21 June 2004; pp. 61–69. [Google Scholar]
  9. Dubois, D.; Gottwald, S.; Hajek, P.; Kacprzyk, J.; Prade, H. Terminological difficulties in fuzzy set theory—The case of intuitionistic fuzzy sets. Fuzzy Sets Syst. 2005, 156, 485–492. [Google Scholar] [CrossRef]
  10. Solanki, S.; Patel, J. A survey on association rule mining. In Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, India, 21–22 February 2015; pp. 212–216. [Google Scholar] [CrossRef]
  11. Antonie, M.; Zaïane, O. Mining positive and negative association rules: An approach for confined rules. In Proceedings of the Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, 20–24 September 2004; pp. 27–38. [Google Scholar]
  12. Dong, X.; Hao, F.; Zhao, L.; Zu, T. An efficient method for pruning redundant negative and positive association rules. Neurocomputing 2020, 393, 245–258. [Google Scholar] [CrossRef]
  13. Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques, 4th ed.; Morgan Kaufmann: San Francisco, CA, USA, 2023. [Google Scholar]
  14. Ceglar, A.; Roddick, J. Association mining. ACM Comput. Surv. 2006, 38, 5:1–5:42. [Google Scholar] [CrossRef]
  15. Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. Algorithms for association rule mining—A general survey and comparison. SIGKDD Explor. 2000, 2, 58–64. [Google Scholar] [CrossRef]
  16. Kruse, R.; Mostaghim, S.; Borgelt, C.; Braune, C.; Steinbrecher, M. Computational Intelligence: A Methodological Introduction, 3rd ed.; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
  17. Klir, G.; St. Clair, U.; Yuan, B. Fuzzy Set Theory: Foundations and Applications; Prentice Hall: Hoboken, NJ, USA, 1997. [Google Scholar]
  18. Zadeh, L. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
  19. Atanassov, K. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 87–96. [Google Scholar] [CrossRef]
  20. Deschrijver, G. Arithmetic operators in interval-valued fuzzy set theory. Inf. Sci. 2007, 177, 2906–2924. [Google Scholar] [CrossRef]
  21. Moore, R.; Kearfott, B.; Cloud, M. Introduction to Interval Analysis; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
  22. Burillo, P.; Bustince, H. Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets. Fuzzy Sets Syst. 1996, 78, 305–316. [Google Scholar] [CrossRef]
  23. Atanassov, K.; Gargov, G. Interval valued intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 31, 343–349. [Google Scholar] [CrossRef]
  24. Rajagopalan, B.; Isken, M. Exploiting data preparation to enhance mining and knowledge discovery. IEEE Trans. Syst. Man Cybertincs 2001, 31, 460–467. [Google Scholar] [CrossRef]
  25. Islam, M.; Anderson, D.; Petry, F.; Smith, D.; Elmore, P. The fuzzy integral for missing data. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 9–12 July 2017. [Google Scholar]
  26. Elmore, P.; Anderson, D.; Petry, F. Evaluation of heterogeneous uncertain information fusion. Ambient. Intell. Humaniz. Comput. 2020, 11, 799–811. [Google Scholar] [CrossRef]
  27. Petry, F.; Elmore, P.; Yager, R. Combining uncertain information of differing modalities. Inf. Sci. 2015, 322, 237–256. [Google Scholar] [CrossRef] [Green Version]
  28. Yager, R.; Petry, F. An intelligent quality based approach to fusing multi-source probabilistic information. Inf. Fusion 2016, 31, 127–136. [Google Scholar] [CrossRef] [Green Version]
  29. Ahmed, M. Data summarization: A survey. Knowl. Inf. Syst. 2019, 58, 249–273. [Google Scholar] [CrossRef]
  30. Bezdek, J. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981. [Google Scholar]
  31. Alam, M.; Ahmed, C.; Samiullah, M.; Leung, C. Mining frequent patterns from hypergraph databases. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, 11–14 May 2021; Part II. pp. 3–15. [Google Scholar]
  32. de Graaf, J.; Kosters, W.; Witteman, J. Interesting fuzzy association rules in quantitative databases. In Principles of Data Mining and Knowledge Discovery; LNAI 2168; Springer: Berlin, Germany, 2001; pp. 140–151. [Google Scholar]
  33. Delgado, M.; Marin, N.; Sanchez, D.; Vila, M. Fuzzy association rules: General model and applications. IEEE Trans. Fuzzy Syst. 2003, 11, 214–225. [Google Scholar] [CrossRef] [Green Version]
  34. Hong, T.; Lin, K.; Wang, S. Fuzzy data mining for interesting generalized association rules. Fuzzy Sets Syst. 2003, 138, 255–269. [Google Scholar] [CrossRef]
  35. Kaya, M.; Alhajj, R.; Polat, F.; Arslan, A. Efficient automated mining of fuzzy association rules. In Proceedings of the Database and Expert Systems Applications: 13th International Conference, DEXA 2002, Aix-en-Provence, France, 2–6 September 2002; pp. 133–142. [Google Scholar]
  36. Chen, C.; Hong, T.; Li, Y. Fuzzy association rule mining with type-2 membership functions. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Bali, Indonesia, 23–25 March 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 128–134. [Google Scholar]
  37. Chen, J.; Lia, P.; Fanga, W.; Zhoua, N.; Yina, Y.; Xua, H.; Zheng, H. Fuzzy Association rules mining based on type-2 fuzzy sets over data stream. Procedia Comput. Sci. 2022, 199, 456–462. [Google Scholar] [CrossRef]
  38. Ladner, R.; Petry, F.; Cobb, M. Fuzzy set approaches to spatial data mining of association rules. Trans. GIS 2003, 7, 123–138. [Google Scholar] [CrossRef] [Green Version]
  39. Sael, N.; Alashqur, A.; Sowan, B. Using the interestingness measure lift to generate association rules. J. Adv. Comput. Sci. Technol. 2015, 4, 156–162. [Google Scholar]
  40. Chen, S.; Tsai, T.; Chung, C.; Li, W. Dynamic association rules for gene expression data analysis. BMC Genom. 2015, 16, 786. [Google Scholar] [CrossRef] [Green Version]
  41. Aggarwal, R.; Imielinski, T.; Swami, A. Mining Association Rules between sets of items in large databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; ACM Press: New York, NY, USA, 1993; pp. 207–216. [Google Scholar]
  42. Yen, J.; Langari, R. Fuzzy Logic: Intelligence, Control and Information; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
  43. Ioannis, K.; Vlachos, G.; Sergiadis, D. Subsethood, entropy, and cardinality for interval valued fuzzy sets—An algebraic derivation. Fuzzy Sets Syst. 2007, 158, 1384–1396. [Google Scholar]
  44. Tripathya, B.; Jenab, S.; Ghosh, S. An intuitionistic fuzzy count and cardinality of intuitionistic fuzzy sets. Malay J. Mat. 2013, 4, 123–133. [Google Scholar] [CrossRef]
  45. Szmidt, E.; Kacpryzk, J. Entropy for intuitionistic fuzzy sets. Fuzzy Sets Syst. 2001, 118, 467–477. [Google Scholar] [CrossRef]
Table 1. Vacation Features Items.
Table 1. Vacation Features Items.
Query ResponsesLocation Features
R1{S1, S2, S5}: camp, fish, ski
R2{S1, S2, S4}: camp, fish, raft
R3{S1, S2, S4}: camp, fish, raft
R4{S1, S3}: camp, hike
R5{S1, S3, S4}: camp, hike, raft
R6{S2, S3, S5}: fish, hike, ski
R7{S1, S2, S3}: camp, fish, hike
Table 2. Rules for Results of Table 1.
Table 2. Rules for Results of Table 1.
Rules: FjkMsp—SupportMcf—Confidence
1. F12: camp → fish4/7 = 0.574/6 = 0.66
2. F21: fish → camp4/7 = 0.574/5 = 0.8
3. F31: hike → camp3/7 = 0.433/4 = 0.75
4. F25: fish → ski2/7 = 0.282/5 = 0.4
5. F52: ski → fish2/7 = 0.282/2 = 1
6. F54: ski → raft0/7 = 00/2 = 0
7. F{12}4: camp, fish → raft2/7 = 0.282/4 = 0.5
8. F5{12}: ski → camp, fish1/7 = 0.141/2 = 0.5
Table 3. Lift Values.
Table 3. Lift Values.
Lift < 1Lift > 1
F12: 0.93F25: 1.4
F21: 0.93F52: 1.4
F31: 0.87F{12}4: 1.12
F5{12}: 0.88
Table 4. Confidence Ranges.
Table 4. Confidence Ranges.
Nsp = 1Nsp = Z
Nant = 1Mcf = 1⌀ (not possible)
Nant = ZMcf = 1/ZMcf = Z/Z = 1
Table 5. Range of values for Lift: (Nsp/Nant) * (|R|/Ncon).
Table 5. Range of values for Lift: (Nsp/Nant) * (|R|/Ncon).
LiftLift
NantNconNsp = 1Nsp = z
111*Z/1*1 = Z⌀ (not possible)
1Z1*Z/1*Z = 1⌀ (not possible)
Z11*Z/Z*1 = 1⌀ (not possible)
ZZ1*Z/Z*Z = 1/ZZ*Z/Z*Z = 1
Table 6. Example results for Apriori process.
Table 6. Example results for Apriori process.
Query ResponsesLocation Features
R1{S1, S2, S4}: camp, fish, raft
R2{S2, S5}: fish, ski
R3{S2, S3}: fish, hike
R4{S1, S3}: camp, hike
R5{S1, S, S4, S5}: camp, fish, raft, ski
R6{S2, S3}: fish, hike,
R7{S1, S3}: camp, hike
R8{S1, S2, S3, S4}: camp, fish, hike, raft
R9{S1, S2, S3}: camp, fish, hike
Table 7. Example features.
Table 7. Example features.
FeatureSupport
S1: Camp6/9–0.66
S2: Fish7/9–0.77
S3: Hike6/9–0.66
S4: Raft3/9–0.33
S5: Ski2/9–0.22
Table 8. Candidate set C2.
Table 8. Candidate set C2.
Item-SetSupport
S1 ⊕ S2: CF4/9–0.44
S1⊕ S3: CH4/9–0.44
S1 ⊕ S4: CR3/9–0.33
S2 ⊕ S3: FH4/9–0.44
S2 ⊕ S4: FR3/9–0.33
S3 ⊕ S4: HR1/9–0.11
Table 9. Query responses with intuitionistic uncertainty.
Table 9. Query responses with intuitionistic uncertainty.
Query ResponsesLocation FeaturesIntuitionistic Membership
(m, m*)
R1{S1, S2, S5}: camp, fish, ski<0.6, 0.3>
R2{S1, S2, S4}: camp, fish, raft<0.5, 0.3>
R3{S1, S2, S4}: camp, fish, raft<0.8, 0.2>
R4{S1, S3}: camp, hike<0.6, 0.4>
R5{S1, S3, S4}: camp, hike, raft<0.9, 0.1>
R6{S2, S3, S5}: fish, hike, ski<0.8, 0.1>
R7{S1, S2, S3}: camp, fish, hike<0.7, 0.2>
Table 10. Support and Confidence.
Table 10. Support and Confidence.
Rules: FjkMinFMsp
Min|R|
MinFMsp
Max |R|
MaxFMsp
Min |R|
MaxFMsp Max |R|MinFMcfMaxFMcf
1. F12: camp → fish0.5290.4790.6150.5570.630.67
2. F21: fish → camp0.5290.4790.6000.5570.760.77
3. F31: hike → camp0.4490.4020.4720.4280.760.74
4. F25: fish → ski0.2860.2590.3290.2980.410.41
5. F52: ski → fish0.2860.2590.3290.2981.01.0
6. F54: ski → raft0.00.00.00.00.00.0
7. F{12}4:camp, fish → raft0.2710.2460.300.2721.01.0
8. F5{12}: ski → camp, fish0.1290.1170.1430.1290.430.44
Table 11. Lift values for Intuitionistic Memberships.
Table 11. Lift values for Intuitionistic Memberships.
Rules: FjkLift MinLift Max
1. F12: camp → fish0.910.92
2. F21: fish → camp0.90.92
3. F31: hike → camp0.90.89
4. F25: fish → ski1.431.39
5. F52: ski → fish1.441.38
6. F54: ski → raft00
7. F{12}4:camp, fish → raft1.11.2
8. F5{12}: ski → camp, fish0.810.79
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Petry, F.; Yager, R. Data Mining Using Association Rules for Intuitionistic Fuzzy Data. Information 2023, 14, 372. https://doi.org/10.3390/info14070372

AMA Style

Petry F, Yager R. Data Mining Using Association Rules for Intuitionistic Fuzzy Data. Information. 2023; 14(7):372. https://doi.org/10.3390/info14070372

Chicago/Turabian Style

Petry, Frederick, and Ronald Yager. 2023. "Data Mining Using Association Rules for Intuitionistic Fuzzy Data" Information 14, no. 7: 372. https://doi.org/10.3390/info14070372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop