Nothing Special   »   [go: up one dir, main page]

Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

Let’s consider an example,

Transaction ID Items
1 { bread, milk }
2 { bread, diapers, beer, eggs
}
3 { bread, diapers, beer, cola }
4 { bread, milk, diapers, beer }
5 { bread, milk, diapers, cola }

Step 1 : DCARM first computes support counts of 1-itemsets from each site in the same
manner as it does for the sequential Apriori.

2: Broadcasts those itemsets to other sites and discovers the global frequent 1-itemsets.

3 : Subsequently, each site generates candidate 2-itemsets and computes their support
counts.
Distributed Count Association Rule Mining algorithm is based on parallelization of
Apriori that iteratively generates and tests candidate itemsets from length 1 to length k
until no more frequent itemsets are found.

The Count Distribution method follows a data-parallel strategy and statically partitions
the database into horizontal partitions that are independently scanned for the local counts
of all candidate itemsets on each process.

At the end of each iteration, the local counts will be summed up across all processes into
the global counts so that frequent itemsets can be found.

ALGORITHM

(1) Divide the database evenly into horizontal partitions among all processes;
(2) Each process scans its local database partition to collect the local count of
each item;
(3) All processes exchange and sum up the local counts to get the global counts
of all items and find frequent 1-itemsets;
(4) Set level k = 2;
(5) All processes generate candidate k-itemsets from the mined frequent (k-1)-
itemsets;
(6) Each process scans its local database partition to collect the local count of
each candidate k-itemset. At the same time, DCARM also eliminates all globally
infrequent (k-1)-itemsets from every transaction and inserts the
new transaction (that is, a transaction without infrequent (k-1)-
itemset) into memory.
(7) All processes exchange and sum up the local counts into the global counts
of all candidate k-itemsets and find frequent k-itemsets among them;
(8) Repeat (5) - (8) with k = k + 1 until no more frequent itemsets are found.

Glossary

Antecendent :
“If-part” in association rule.
Consequent :
“then-part” in association rule.

ItemSets :
The disjoint ( that do not have any items in common) set of antecedent
and
consequent are called as itemsets.

k-itemset:
An itemset containing k items.

Support:
The support is simply the number of
transactions that include all items in the antecedent and
consequent parts of the rule.

(The support is sometimes expressed as a percentage of the total number of records in


the database.)

Confidence :
Confidence is the ratio of the number of
transactions that include all items in the consequent as well as
the antecedent (namely, the support) to the number of
transactions that include all items in the antecedent.
i.e.

Confidence = Support / (number of transactions that include all


items in the antecedent)

Support count (s):


Frequency of occurrence of an itemset
E.g., s(fdiapers, bread, milkg)=2

Frequent itemset :
An itemset whose support is greater than a minimum threshold.
Association mining :
Association mining does identify items that are connected to each other, but
it does not reveal the nature of the connection!

Association rule :
Association rules express an implication of the form X !Y where X and Y are
itemsets.
E.g. {bread, butter} {jam}

Minimum Support:
To make the problem tractable, we introduce the concept of
minimum support. The user has to specify this parameter - let us call it
minsupport. Then any rule i1, i2, ... , in => j1, j2, ... , jn needs to be
considered, only if the set of all items in this rule which is { i1 , i2, ... , in, j1,
j2, ... , jn } has support greater than minsupport.

You might also like