1 Introduction

It has been an important research issue in the field of data mining to mine HUIs over transaction datasets, and a number of algorithms have been proposed to deal with this problem in terms of execution time, memory usage, and number of patterns [1,2,3]. However, when the transaction data are labeled, for example, each transaction is assigned a binary class label, it is not enough to only pay attention to the utility, because a pattern with high utility may not be statistically significant. Therefore, we should highlight the correlation of HUIs to one of the labels. Through statistical significance test, we know whether the mining results are meaningful or accidental. Statistically significant pattern mining [4] (SSPM) is the main method to handle such problems. It finds patterns that statistically occur more often in the data for one label than for another via hypothesis test based on p value [4] of each pattern. However, there has hitherto been little research on high utility and significant patterns (HUSPs) mining, yet it can be widely used in many applications, such as market analysis and healthcare. For example, managers identify itemsets that are purchased more profitable by one group of customers than by another one (e.g., married people vs. singles). Similarly, among the cured and uncured cases (two classes) for an unknown disease, doctors pay more attention to the curable treatment combinations with low cost.

In this study, we focus on mining HUSPs on transaction dataset under the control of FWER [10]. It is far more meaningful than only mining HUIs which ignore the significance of the pattern. A simple solution is first mining all HUIs by existing numerous HUIs algorithms, and then tests their significance. However, HUIs mining algorithms tend to find the itemsets containing many items, since they are more likely to have a high utility. However, the supports of thus patterns are usually small, and they may not be significant as we use LAMP strategy [5] which has a testable support requirement for the pattern’s support. Many HUIs should not be given the opportunity of test to increase test number and running time from the beginning. More importantly, if the test number is large, under the framework of FWER by traditional Bonferroni’s control [10], the test level is quite small, so that there may be few patterns passing the test and becoming an HUSP.

Such huge drawbacks determine that we should remove the redundant patterns in the mining process and maximize the number of true discoveries while controlling the number of false discoveries. This paper serves to propose an efficient HUSPs mining algorithm named HUSP-Mining-Test. It consists of two parts, i.e., HUSP-Mining and HUSP-Test. There are two main challenges must be coped with to achieve this goal.

One is HUSP candidates’ mining. As mentioned previously, we cannot consider all HUIs as HUSP candidates by commonly used HUIs mining algorithms because of the large test number. And it is not enough to apply the LAMP strategy only in the mining process, because the lower bound of support by LAMP is not big enough and often most of the insignificant HUIs cannot be removed timely, it may still be time-consuming. Considering that the high utility patterns usually have a large length but a small support and returning HUSPs in a short time, our HUSP-Mining algorithm introduces two length parameters MinLength and MaxLength to overcome this problem. The utility of the HUSP candidates has a maximal value under the length constraints, and HUSP candidates can be efficiently found according to the utility requirements. With the effect of the length constraints and LAMP strategy, the number of HUSP candidates and running time can be greatly reduced.

The other is HUSP test. The existing SSPM algorithms based on FWER often use Bonferroni’s control, if there are m patterns waiting for testing and the test threshold is α. Thus, the test level is set as α/m for each pattern, when the p value of the pattern is less than it, flagging the pattern as a significant pattern. This method is too strict when m is large, and a few patterns can be obtained as significant patterns, because the significance level is too small. Here, we use an iterative multiple testing procedures named HUSP-Test to get more patterns based on dynamic test level, and it allows the significance budget to be focused on higher utility patterns. In each iteration, if the p value of a pattern is greater than the critical test level, we delete those patterns with less utility than the testing pattern from the candidates. This approach can leave significant patterns with high utility, and the whole process controls FWER less than α.

In details, our contributions are the following:

  • (1) HUSP-Mining-Test returns HUSPs, providing strong statistical guarantees due to the p values. It controls FWER at level α, and it proves that the proposed method can always achieve a result relying on a two-step procedure for dealing with two sub-problems: (a) HUSP-Mining mines HUSP candidates and (b) HUSP-Test return HUSPs by hypotheses test.

  • (2) HUSP-Mining introduces length and support constraints on pattern mining and abandons some insignificant patterns early. The number of patterns and the running time are greatly reduced, compared with state-of-the-art HUIs mining method. We highlight the practical interest of constraints to better control the number and utility of the returned patterns.

  • (3) We propose an iterative multiple testing procedure HUSP-Test that can alternately reject a hypothesis and safely ignore the hypotheses that are less useful than the rejected hypothesis. This correction allows the significance budget to be more focused on HUSPs and it could find more HUSPs than Bonferroni’s control, a standard SSPM correction method.

  • (4) We evaluate the proposed algorithm on real datasets. The experiments are carried out to prove its effectiveness. The results show that it can draw HUSPs in a short time and has high statistical power, due to the number of true discoveries.

The organization of this paper is as follows: We introduce some related work in Sect. 2. The fundamental concepts and definitions are in Sect. 3. Section 4 gives the test method of Fisher and LAMP strategy. Section 5 introduces two corresponding algorithms for mining HUSPs. The theoretical analysis is given in Sect. 6. Section 7 shows the experimental results under different scenes, and Sect. 8 provides conclusions.

2 Related Work

SSPM is to remove redundant patterns from the perspective of statistical test. Hämäläinen [6] regards the significant patterns mining as a multiple hypothesis test problem and first proposes the mining method based on Fisher’s test. He et al. mine frequent and significant patterns [7] from transaction datasets. Zhao et al. draw significant sub-graph from network [8]. LTC [9] finds significant itemsets in data streams, etc. Bonferroni’s control and its improvements [10, 11] are used as the common correction method of test level for multiple hypothesis test, it limits the test level of a single pattern according to the test number. LAMP is proposed to overcome the situation that the significance threshold is too low by Bonferroni’s control when the dataset is large. It is an important strategy designed for finding more patterns under FWER [5, 12].

Test method is an important part in SSPM. Fisher’s test [12, 13] is the main test type, known as a conditional exact test method. It can be understood as fixing the total number of rows and columns of contingency tables, thereby removing redundant parameters. To simplify the calculation, its upper bound can be found in [14]. In addition, Pellegrina et al. propose a significance level adjustment method based on Barnard’s test [15], which is an unconditional test. This method does not fix the appropriate statistics of the redundant parameter on its observation value as a condition, but makes the redundant parameter within the range of its parameter space. Unlike Fisher’s test or Barnard’s test, Permutation test [16] is another useful method for mining significant patterns based on small data sample. Wu et al. introduce permutation test to mine the statistically significant association rules [17], and Pellegrina [18] et al. propose permutation test to mine top-k significant sequential patterns.

There is also a wide body of literature focused on mining significant patterns for application. For example, Zihayat [19] et al. draw significant patterns in gene regulation area. Tonon [20] et al. mine significant patterns over sequential data such as transaction flows. Fournier-Viger [21] et al. pick up significant structure combination in network graph, and Seyfi [22] et al. find significant patterns on the fast-growing big data in time window. However, to the best of our knowledge, no SSPM studies have focused on the high utility patterns.

In pattern mining, algorithms are often designed to establish tree structure [23, 24], Linked list [25, 26] or optimization algorithm [27, 28] for mining HUIs directly and efficiently. Recently, Song et al. propose SFUI_UF [29] for mining skyline frequent-utility itemsets with high efficiency and low memory usage based on state-of-the-art UL structure. However, no studies on HUIs mining have offered a statistical guarantee for discovered patterns.

3 Problem Statement and Definitions

D = {t1,t2, …, tn} is defined as a dataset where ti is the ith transactions. I = {x1, x2,…, xm} is a finite set of items, Set \(X \subseteq I\) is called an itemset, or a pattern. ti consists of items from I, and for each item xj in transaction ti, it has two utility value q(xj,ti) and p(xj). q(xj,ti) is the internal utility of item, while p(xj) is the external profit value of xj. Suppose there are two labels G0 and G1, and each transaction is labeled by one of them. An example can be found from Tables 1 and 2.

Table 1 Internal utility table
Table 2 External profit table

Definition 1. For itemset X in transaction t, we define the indicator function as

$$S_{t} (X) = \left\{ \begin{gathered} 1\begin{array}{*{20}c} {} & {X \subseteq t} \\ \end{array} \hfill \\ 0\begin{array}{*{20}c} {} & {else} \\ \end{array} \hfill \\ \end{gathered} \right..$$
(1)

The support [1] (the frequency of itemset X existing in the dataset D) can be obtained as

$$S(X) = \sum\limits_{t \in D} {S_{t} (X)} .$$
(2)

Definition 2. The utility of itemset X in transaction t is set as u(X,t), and it is defined [2] as

$$u(X,t) = \sum\limits_{x \in X}^{{}} {q(x,t) * p(x)} .$$
(3)

For example, with Tables 1 and 2, u({BE},t1) = 2*2 + 2*2 = 8.

Definition 3. The utility of itemset X in the dataset D is denoted as u(X). We have

$$u(X) = \sum\limits_{t \in D} {u(X,t)} .$$
(4)

For example, u({BE}) = u({BE},t1) + u({BE}, t2) + u({BE},t3) = 8 + 6 + 10 = 24.

Definition 4. The remaining utility of all the items except X in transaction t is denoted and defined as r(X, t)

$$r(X,t) = \sum\limits_{i \in (t/X)} {u(i,t)} .$$
(5)

t/X is the set of items in t but except X, for example, r({BE},t2) = 8 + 3 + 24 + 4 = 39.

Definition 5. The utility of transaction t is denoted as stu(t)

$$stu(t) = \sum\limits_{x \in t} {u(x,t)} .$$
(6)

For example, stu(t3) = 2*2 + 4*5 + 6*2 = 36.

Definition 6. The utility of the dataset D is denoted as swu(D), and we have

$$swu(D) = \sum\limits_{t \in D} {stu(t)} .$$
(7)

Based on Tables 1 and 2, swu(D) = 15 + 45 + 36 + 38 + 29 + 49 + 39 = 251.

Definition 7. Give a user-specified threshold \(\lambda\) (0 < \(\lambda\) < 1), the minimum utility threshold [2, 3] (MinUti) is defined by

$$MinUti = \lambda * swu(D).$$
(8)

An itemset X is an HUI, if its utility is no less than MinUti. For the example in definition 3, if \(\lambda\) = 0.05, MinUti = 0.05*251 = 12.55, {BE} is an HUI as its utility is 24.

Definition 8. If X is an HUI, it may appear in different transactions. Based on definitions 1–2, a contingency table of X can be found from Table 3.

Table 3 2 × 2 Contingency table for X

S1(X), S0(X) are the support of X with G1 or G0. S(X) is the total support in the dataset and n1, n0 are the rows number of each class, and n is the total rows number of the dataset. Based on the contingency table, the p value of pattern X can be calculated by certain testing method.

Definition 9. Given a user-specified threshold α as the test level, an HUI X is a high utility and significant pattern (HUSP), if its p value is less than α.

To return HUSPs in a short time, we propose an efficient approach HUSP-Mining-Test, which could efficiently return more HUSPs in a short time based on controlling FWER less than the test level.

4 FWER and LAMP

HUSPs’ mining is based on hypothesis test, if there is one HUSP candidate waiting for testing, it is easy to test its significance based on its p value. However, for different values of MinUti, there may exist thousands of candidates. It becomes a multiple hypothesis test problem.

Follow [3,4,5], here, we also calculate p value based on Fisher’s text which is to fix the redundant parameter on its observation value as a condition so as to obtain the conditional distribution of the statistic to be estimated. For the test pattern X, based on the Table 3, P(S1(X)) can be calculated as

$$P(S_{1} (X)) = \frac{{\left( {\begin{array}{*{20}c} {n_{1} } \\ {S_{1} (X)} \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {n_{0} } \\ {S_{0} (X)} \\ \end{array} } \right)}}{{\left( {\begin{array}{*{20}c} n \\ {S(X)} \\ \end{array} } \right)}}.$$
(9)

The possible contingency tables for X are known to the different values of S1(X) in the range of [min(S(X),n1), max(0, (n1 − (nS(X)))]. Based on (9), the final p-value for pattern X is established as

$$P_{S} (X) = \sum\limits_{{P(x) < = P(S_{{1}} (X))}}^{{}} P (x).$$
(10)

The p value is obtained according to the calculation procedure. However, it is necessary to point out that when the value of n in Table 3 is very large, it may take a lot of time to get the exact p value. A close upper bound proposed by Hämäläinen [14] is introduced, in which the new upper bounds are fast to calculate and approximate Fisher’s p value accurately. The upper bound is calculated as

$$P_{S} (X) < = P(S_{1} (X))*\left( {\frac{{1 - q_{1}^{k + 1} }}{{1 - q_{1} }}} \right),$$
(11)

where \(q_{1} = (S_{0} (X)(n_{1} - S_{1} (X)))/((S_{1} (X) + 1)(n_{0} - S_{0} (X) + 1))\), k = min(S0(X),n1-S1(X)). The absolute error of this upper bound is bounded by P(S1(X))*q12/(1-q1).

We use FWER to control the probability, i.e., the probability of reporting at least one false discovery. Given a user-specified significance threshold α, if there exist k HUSP candidates, Bonferroni’s correction is a common way to constraint FWER of k candidates, and α/k is used as the test level for every pattern to guarantee FWER <  = α. However, its disadvantages are obvious. Under the framework of FWER, the number of false discoveries can be acted as kα. When MinUti becomes small, k grows large, the test threshold may become so small that almost no pattern can be found and becomes a significant pattern, and the probability of having at least one false discovery also grows. Sophisticated techniques have been developed to obtain higher statistical power while keeping FWER at most α. LAMP strategy [5] is considered as an excellent method to overcome this problem. For x in [min(S(X), n1), max(0, (n1 − (nS(X)))], Pmin(X) is defined as

$$P_{min} (X) = min\{ P(x)\} .$$
(12)

Assuming w.l.o.g. that n1 <  = n0, it is known that there exists a minimal support value that the testing pattern must be satisfied. Otherwise, the Pmin(X) is more than α and the pattern is not considered as an HUSP candidate. From the requirement of Pmin(X), testable support value xs is acted as

$$P_{min} (X) = \frac{{\left( {\begin{array}{*{20}c} {n_{1} } \\ {x_{s} } \\ \end{array} } \right)}}{{\left( {\begin{array}{*{20}c} n \\ {x_{s} } \\ \end{array} } \right)}} = \alpha .$$
(13)

Based on the formula (13), for each pattern X, it must be satisfied as its support S(X) > xs so that it can have a chance to be tested. Since the number of hypotheses is decreasing with LAMP’s trick, a larger rejection threshold can be used to get more discoveries from the candidates.

5 HUSP-Mining-Test

5.1 Two-Step Procedure

Our HUSP-Mining-Test includes two parts as HUSP-Mining and HUSP-Test. HUSP-Mining mines HUSP candidates, and HUSP-Test tests their significance. As previously mentioned, the higher utility the pattern owns, the longer length it has. In fact, these HUIs that have long length are rare in the dataset, they are not satisfied the pattern support qualifications by LAMP and could be removed in the mining process. Therefore, HUSP-Mining does not output all HUIs by existing numerous HUIs algorithms as HUSP candidates directly. According to the time and memory advantage, HUSP-Mining uses state-of-the-art UL structure in the latest algorithm SFUI_UF [29] and mine HUSP candidates based on LAMP and length constraint with two useful pruning strategies (see next section). HUSP-Test aims to discover the most preferred patterns with high utility under the constraint of statistical significance, We propose an iterative multiple testing procedure, which can filter out patterns with low utility and focus on high utility candidates, it corrects the test level of each HUSP candidate which could draw more HUSPs than standard SSPM correction method of Bonferroni’s control under FWER less than a user-defined threshold.

5.2 HUSP-Mining

HUSP-Mining performs a depth-first search to discover HUSPs based on the length and support constraints. First, an order (e.g., Lexicographic order) of the mining items is imposed and denoted as \(\tau\). For an itemset X, the efficient UL structure is donated as X.UL. However, different from SFUI_UF, it consists of three parts as (id, u, M), id is the index of transaction which contain X, and u is the utility of itemset X in such transaction which can be known from formula (3). M is the maximal remaining utility in such transaction. Two important parameters MinLength, MaxLength are introduced to give the length constraints on the itemsets, and a maximal remaining utility M(X,td) in the transaction td (where d is a unique identifier) is defined as follows.

Definition 10. Given two lengths MinLength, MaxLength, M(X,td) is the maximal remaining utility w.r.t an itemset X in a transaction td, it represents the maximal utility that an extension of X respecting to the MinLength, and MaxLength constraints could have.

For example, consider the running example above with MinLength = 3 and MaxLength = 4. r({AB},t4) = 5*4 + 2*1 + 2*2 = 26. There are three items C,D,E behind {AB}. However, Maxlength is 4, we can only choose two items with the biggest utility in {CDE}, and the maximal utility could be obtained as M({AB},t4) = 5*4 + 2*2 = 24. If MinLength is 4, M({B},t5) = 0, as the total length is 3 which does not meet the MinLength requirement.

Based on UL structure and length constraint, the maximal transaction utility (MTU) of an itemset X in transaction td is calculated as

$$\begin{gathered} MTU(X) = \sum\nolimits_{d \in X.ids} {u(X,t_{d} ) + M(X,t_{d} ))} \hfill \\ \hfill \\ \end{gathered}$$
(14)

X.ids are the transaction identifiers containing X. MTU is a tighter upper bound than (transaction weighted utility) TWU [17, 18] used in most of the traditional HUIs algorithms, and it can easily prove that MTU(X) <  = TWU as the utility of all items in transactions including X are accumulated by TWU, while MTU only takes the part that meets the length requirements.

We have so far introduced the novel tighter upper bound on the utility of the patterns, with the goal of using MinLength and MaxLength constraints for reducing the search space, according to draw HUSPs more efficiently, two pruning strategies are proposed to optimize the mining process, and they are used in the initial stage and extension stage of the HUSP-Mining, respectively.

Theorem 1. For an itemset X, if MTU(X) is less than MinUti, its extensions will not be considered as HUSP candidates.

Proof: Let Xe is the extension of X in transaction td. That is to say, the number of remaining items of Xe is less than X; thus, M(Xe,td) <  = M(X,td). When M(X,td) <  = MinUti, Xe will not be an HUSP candidate. It is obvious that MTU(X) satisfies closure principle. The non-empty subset of any HUSP candidate is an HUSP candidate, and the superset of any non-candidate is a non-candidate.

Theorem 2. For an itemset X, if support S(X) is less than testable support value xs by LAMP, its extension will not be considered to be an HUSP candidate.

Proof: Let Xe is the extension of X, X is a subset of Xe. Therefore, the transactions containing Xe in dataset must also contain X, thus S(Xe) <  = S(X). When S(X) < xs, Xe will not be an HUSP candidate, and the closure properties are also satisfied.

Algorithm 1 in Fig. 1 describes the proposed HUSP-Mining based on the new UL structure and these two closure properties. The transaction database is first scanned to calculate the support value xs and MTU1-itemset (Line 1), and it should be noted that for a single item X in transaction td with length 1, MTU1-itemset is equal to r(X, td) with maxLength control. According to Theorem 1, Lines 2 and 3 prune unpromising items using MTU1-itemset and xs. With the MTU1-itemset ascending order, itemset L can be obtained based on \(\tau\)(Line 4). The items in each transaction are sorted in the ascending order of MTU1-itemset and the UL of each item is built up (Lines 5–10). For each item in L, procedure HUSP-Miner (described in Algorithm 2), which determines all HUSP candidates, is called in Lines 12–14. Finally, Line 15 outputs all the discovered HUSP candidates.

Fig. 1
figure 1

Procedure of HUSP-Mining

Algorithm 2 in Fig. 2 generates HUSP candidates by extending the enumerating itemset via depth-first search space traversal. If the itemset i meets the length and support requirements of HUSP candidate, add the itemset i to the set Candidates (Line 1–3). The function tail(X) is the set of items after all items in X according to the order of L in Algorithm 1(Line 4). Appending i after X, the new itemsets is not processed until its MTU is no lower than MinUti and the support satisfies the testable requirement. HUSP-Miner procedure is called recursively to add new candidate to HUSP_Candidates (Lines 5–10).

We use an example to illustrate the mining process based on the data in Tables 1 and 2, when n1 = 4, n0 = 3, if MinLength = 3, MaxLength = 4, α = 0.05,MinUti = 70. According to (13), we first calculate the testable support xs = 3 as 1/56 = 0.017 < 0.05. MTU1-itemset (G) = 40 < 70,G is deleted from the database, MTU1-itemset (A) = 122, MTU1-itemset (B) = 156, MTU1-itemset (C) = 246, MTU1-itemset (D) = 203, MTU1-itemset (E) = 128 and MTU1-itemset (F) = 162. According to the MTU- ascending order, the remaining six items are sorted as A- > E- > B- > F- > D- > C. Based on this order, the dataset is reorganized, and Fig. 3  shows the UL structures for all remaining itemsets with 1-itemset.

Fig. 2
figure 2

Procedure of HUSP-Miner

We take A as an example, the UL structures of {A} could be seen from Fig. 4. S(A) = 3, add E to A, MTU({AE}) = 6 + 4 + 26 + 9 + 2 + 36 = 83 > MinUti and calculate the extension of {AE}, add B to {AE}, but MTU({AEB}) = 36 < 80 which cannot be an HUSP candidate. Calculate the MTU of other extensions of {AE}, {AEF}, {AED} and {AEC} are all not satisfied, we should discard {AE} and then add B to A, but the extension of {AB} still does not satisfy the support and utility requirement, remove {AB}, so is {AF} until D is added to A, MTU({AD}) is bigger than MinUti. There is only one item C left behind and we can judge that {ADC} meets the requirements which can be an HUSP candidate. We perform those operations recursively until all items are processed.

Fig. 3
figure 3

UL structures of 1-itemset

Fig. 4
figure 4

UL structures of item {A}

Base on the procedure of HUSP-Mining, after such a process of mining and testing, HUSPs’ candidates could be obtained and the final results are in Table 4.

Table 4 Discovered HUSP candidates

The search space of the running example is shown in Fig. 5. The circle nodes are HUIs, which can be produced by traditional HUIs’ algorithm like SFUI_UF, and our HUSP candidates are the circles with blue base. This proves that HUSP-Mining is indeed different from the traditional HUI mining algorithms and it removes some patterns which may be insignificant.

Fig. 5
figure 5

Search space of HUSP candidates and HUIs

5.3 HUSP-Test

HUSP-Mining returns testable HUSP candidates according to length and support constraints. HUSP-Test does the task of significance test and controls FWER less than the given threshold.

For a pattern X in HUSP candidates, a hypothesis H0 is established that X statistically occurs not more often for one class than for another. \(\pi (X,G^{0} )\) and \(\pi (X,G^{1} )\) are the probability that X with label G0 and G1. The null hypothesis is established as H0:\(\pi (X,G^{0} ){ = }\pi (X,G^{1} )\), and the key task is to assess the statistical significance of X for evaluating whether it supports H0. Once a hypothesis H0 is accepted, as it is not statistically biased a particular label, we can ignore all patterns whose utilities are less than X. We propose an iterative multiple testing procedure, which can conduct such process while controlling FWER less than test level. The algorithm is given in Fig. 5. REAL_HUSPs are an empty set for adding HUSPs(Line 1), and Ps is calculated by Fisher’s test according to Sect. 3 (Lines 2–4). Lines 5–7 show the hypothesis, the pattern, and length with the testing p value, and we give the new test level based on the new size of testing pattern set Lt in Line 8. It can be proved that FWER is under control based on the new test level and it is loose than standard Bonferroni’s control. Its proof and derivation process with such correction are shown in (15)

$$FWER = \Pr \left\{ {\bigcup\limits_{i = 1}^{{m_{0} }} {\left( {p_{i} \le \frac{\alpha }{{L_{t} * (ln(L_{1} ) + 1)}}} \right)} } \right\} \le \sum\limits_{i = 1}^{{m_{0} }} {\left\{ {\Pr \left( {p_{i} \le \frac{\alpha }{{L_{t} * (ln(L_{1} ) + 1)}}} \right)} \right\}} = \frac{\alpha }{{(ln(L_{1} ) + 1)}}\sum\limits_{i = 1}^{{m_{0} }} {\frac{1}{{L_{i} }}} \le \alpha ;$$
(15)

m0 is the total number when H0 is true, and \(\sum\limits_{i = 1}^{{m_{0} }} {\frac{1}{{L_{i} }}} < = \sum\limits_{i = 1}^{{L_{1} }} \frac{1}{i} < = ln(L_{1} ) + 1\) could be held by Euler constant [30].

When p value of a pattern is more than the test threshold (Line 9), we accept the hypothesis (Line 10) and reject patterns with less utility than the test one (Line 11). Lots of p values may be removed from Ps, and we can save the significance budget for the high utility patterns. Otherwise, we should reject the hypothesis and add the pattern to the final result (Lines 13–14). Our method can leave the significant patterns with high utility with strong mathematical guarantee.

From Fig. 6, we can find out that the p value set is constantly changing, the size of the set gets smaller, and the test threshold may become bigger. This is very different from the conventional method of Bonferroni’s control with the same test level. It may obtain more HUSPs than Bonferroni’s control under the framework of FWER. Such conclusions can be verified in the experimental section.

Fig. 6
figure 6

Procedure of HUSP-Test

6 Theoretical Analysis

First, we study the complexity of HUSP-Mining. TWU strategy [29] in SFUI_UF is considered as a useful technology to reduce the mining time. TWU represents the sum of transaction utility of a pattern by formula (6). Once the TWU of a pattern is less than MinUti, the pattern will not be regarded as an HUI. Such strategy speeds up the process of HUIs’ mining. However, TWU is considered from every item in the transactions, which make it time-consuming when dataset is large. HUSP-Mining does not consider the utility upper bound like TWU, and it removes the pattern, which does not satisfy the requirements from the upper bound of MTU. The length of the MTU is not more than that of TWU in each transaction. Therefore, the calculation time is reduced. HUSP-Mining is performed in time O(|D|2 \(\cdot\) L \(\cdot\) H \(\cdot\) MaxLength). D is the dataset, L is the 1-itemset sequence in algorithm 1, and H is the maximum length of a transaction. However, with TWU strategy, the complexity is run in time O(|D|2 \(\cdot\) L \(\cdot\) H2). MaxLength is not more than H, and often, it is far smaller than H. In addition, with the implementation of LAMP strategy, some HUI candidates, and their extensions that do not meet the support requirements will also be discarded in time, HUSP-Mining has the complexity advantage in running time.

Next, we show that HUSP-Test can control the FWER more properly than Bonferroni’s control. Figure 6 shows that its framework is an iterative procedure. We extract the minimum p value of all HUSP candidates and its assumptions according to the all p values. FWER is evaluated by considering the minimal p value rejecting a true hypothesis at iteration. In this mode, FWER is the sum of such a minimal p value for all possible iterations. For k test patterns, Bonferroni’s control limits’ test level is α/k, and we use a time-changing formula \(L_{t} * (ln(L_{1} ) + 1)\) to replace k, L1 = k. Lt becomes smaller with the growth of the number of removed patterns. When t is large enough, \(L_{t} * (ln(L_{1} ) + 1)\) will be less than k, the test level becomes bigger than before, and it finds more HUSP patters under FWER <  = α. Our experimental results support such conclusions in the next section.

7 Experiments

7.1 Datasets

In this section, we evaluate the performance of the proposed algorithm on real datasets Covtype, Phishing, A1a, Dna, Mushrooms, and W1a. The datasets are from LibSVM [31] and SPMF [32]. The details of these datasets are provided in Table 5, where |I| indicates the numbers of distinct items, |S| is the numbers of transactions, Avglength is the average length of the transactions, and they all have two classes. Default test value is 0.05, and p value is calculated by (11). Minlength and MaxLength can be any value between 1 and H(H is the maximum transaction length in Sect. 6), for mining more HUSP candidates; here, we use upper and lower quartile value of H. For each dataset, two variables A and B take random values in [1, H/4] and [3H/4, H], and Minlength and MaxLength are the average values of the two variables after taking 100 times. All of the algorithms are programmed by Python, and the experiment platform is configured as follows: Windows 10 operating system, 2G Memory, Intel(R) Core (TM) i3-2310 CPU@2.10 GHz.

Table 5 Data characteristics

7.2 Evaluate Mining HUSP Candidates by HUSP-Mining

First, we are going to verify the efficiency of HUSP-Mining compared with recent advanced HUIs mining algorithm SFUI_UF [29], TKUS [25], and HUIM-SPSO [28]. We show the running time consumption of the algorithms according to the different minimum utility threshold in Fig. 7.

Fig. 7
figure 7

Running time comparisons of the algorithms

From Fig. 7, we can see that the running time of HUSP-Mining spent least time of the five algorithms, SFUI_UF takes less time than TKUS and HUIM-SPSO, and the running time of HUSP-Mining is less than SFUI_UF based on the closure attribute of theorem 1 and 2. With the increasing of minimum utility threshold, the number of the HUSP candidates decreases, and HUSP-Mining always shows the best time efficiency of the five algorithms. The running time of HUSP-Mining-Test is more than HUSP-Mining as it contains the process of HUSP-Test, but benefitting from HUSP-Mining, its running time is far less than SFUI_UF and we can obtain HUSPs in a short time by HUSP-Mining-Test.

Figure 8 shows the memory consumption of the five algorithms. The extraction method of memory consumption value in this experiment is as follows: during the operation of the algorithm, set different points, extract the amount of memory used at each point, and then take the maximum memory from all different points as the memory consumption of the algorithm. It is found that the memory consumption of the five algorithms decreases linearly with the increase of the utility threshold. But generally speaking, SFUI_UF takes less memory than TKUS and HUIM-SPSO, HUSP-Mining and HUSP-Mining-Test consume less memory than the other three algorithms, and we can obtain HUSPs with low memory.

Fig. 8
figure 8

Memory consumptions’ comparisons of the algorithms

Table 6 shows the number of HUSPs produced by the five algorithms. SFUI_UP and TKUS can draw the exact pattern number in the dataset and they return the same pattern number, HUIM-SPSO outputs the approximation of pattern number. It can be seen that the number returned by HUSP-Mining and HUSP-Mining-Test are less than the other three algorithms as we consider the significance of the patterns, some redundant patterns which are not significant are removed in the process of HUSP-Mining. For example, on Mushrooms when the utility threshold is 0.008, SFUI_UF produces 2047 patterns, but HUSP-Mining returns 555 HUSP candidates. The number of HUSP-Mining-Test is close to HUSP-Mining. On Mushrooms and Covtype, the numbers by HUSP-Mining and HUSP-Mining-Test are the same. The poor results by SFUI_UF with Mining-Test or traditional Bonferroni’s correction could be known in the next section and it proves that many patterns produced by SFUI_UF are not significant and HUSP-Mining discards them early in the mining process.

Table 6 Pattern numbers

7.3 Evaluate HUSP-Mining-Test with Length Constraints and New Correction

To evaluate the whole performance of HUSP-Mining-Test, it is compared to the standard SSPM correction method with Bonferroni’s control. SFUI_UF performs better than TKUS and HUIM-SPSO, we donate SFUI_UF*-Test, SFUI_UF*-BFC are the algorithms by SFUI_UF with HUSP-Test and Bonferroni’s control, HUSP-Mining-BFC as HUSP-Mining with Bonferroni’s control. respectively. Figure 9 shows the number comparison based on the six datasets to the different utility threshold.

Fig. 9
figure 9

Number comparisons of HUSPs

From Fig. 9, we focus on the HUSP numbers produced by the four algorithms. We can see that the number by HUSP-Mining-Test is always not less than which returned by the other three algorithms. The two algorithms with HUSP-Mining return much more HUSPs than that with SFUI_UF based on the same test level correction method, this verifies that there are many redundant HUSP candidates produced by SFUI_UF and HUSP-Mining-Test removes them in the mining process. With the new test level correction, HUSP-Mining-Test can always find more results than HUSP-Mining-BFC with strict test level. We also discover that the number of SFUI_UF*-Test is more than SFUI_UF*-BFC on each dataset under different minimum utility thresholds under FWER control, which illustrates that HUSP-Test with the new test level correction can always find more patterns than the standard SSPM method of Bonferroni’s control. We analyze the compositions of the results, and according to our statistics, the HUSPs generated by the other three algorithms can be found in the results generated by HUSP-Mining-Test, which further proves the correctness of our HUSP-Mining -Test.

HUSP-Mining-Test returns more HUSPs with less HUSP candidates, the performances of these algorithms under different sizes are also evaluated for the big dataset Covtype and Phishing. On Covtype, the minimum utility threshold is set uniformly from 9 to 12%, and the sizes are set from 500 to 100,000. On Phishing dataset, the utility threshold is set as 1%, and the sizes are set from 100 to 10,000. The numbers of HUSPs by these four algorithms on the two datasets are shown in Fig. 10, in which the pattern number produced by HUSP-Mining-Test is the biggest one of the four algorithms, the number returned by SFUI_UF*-Test is more than which is produced by SFUI_UF*-BFC. It can be concluded again that our HUSP-Mining-Test can always draw more HUSPs compared with Bonferroni’s control under the same test level.

Fig.10
figure 10

Number comparisons of HUSPs on different data size

Figure 11 shows the time consumption of the four algorithms, the running time of the HUSP-Mining-Test is the smallest of the four algorithms on all datasets due to the length and support requirements in HUSP-Mining and low utility pattern removal strategy by HUSP-Test. For small size dataset like W1a, A1a, Dna, and Mushrooms, the running time of the four algorithms can be obtained in the second level. Without length and support constraints like HUSP-Mining, the search space by SFUI_UF is bigger than HUSP-Mining, it needs more time to look for HUSP candidates. The running time of SFUI_UF*-BFC without length constraint is the biggest in the four algorithms and we can also find that the running time of SFUI_UF*-BFC is more than SFUI_UF*-Test on all datasets. We can say that our algorithm is not limited by running time and it can achieve results in a short time with more significant patterns compared with the standard SSPM methods.

Fig. 11
figure 11

Running time comparisons of HUSPs

Figure 12 shows the memory consumption of the four algorithms. Compared with the other three algorithms, HUSP-Mining-Test also has advantages in memory consumption efficiency. When the minimum threshold is reduced, the significant patterns with high utility contained in the data set will increase, so that the memory occupied by the UL structure will increase, and the number of HUSP candidates will also increase. Therefore, HUSP-Mining-Test will consume more memory space as the minimum threshold decreases. However, compared with the other three algorithms, HUSP-Mining-Test still outperforms them with low memory consumption.

Fig. 12
figure 12

Memory consumption comparisons of HUSPs

It is worth noting that besides time and memory efficiency, utility value is also our important observation point. We evaluate the average utility returned by the four algorithms based on different utility thresholds which could be known from Fig. 13. HUSP-Mining-Test can return patterns not only significant but with high utility on most of datasets. For example, on Covtype, with the utility threshold increase, the number of HUSPs decreases, but the average utility increases from 4.3e8 to 4.85e8. This gives the credit to HUSP-Test with returning higher utility patterns, although the total number of HUSPs with HUSP-Mining-Test is the biggest in four algorithms.

Fig. 13
figure 13

Average utility comparisons of HUSPs

To test the sensitivity of HUSP-Mining-Test based on different significance level α, we conduct experiments to check how α can influence the results. Figure 14 shows the number of the reported patterns of the four algorithms under certain utility thresholds, α is increased from 0.02 to 0.1 and Fig. 15 returns the p value distribution when α = 0.05 under the same utility thresholds on the six datasets. In Fig. 15, when α increases, the number of reported patterns has a trend of increasing. For example, on dataset phishing, the output number is from 563 to 782 by SFUI_UF*-Test, while for SFUI_UF*-BFC, the return number is from 451 to 537. This is because a larger α allows more patterns to be included in the final test set, and more patterns have the opportunity to reject as an HUSP. It should also be noted that the numbers of HUSPs that are output by algorithms with length constraints are not very significant. For example, on Phishing, the number is always 68 by HUSP-Mining-Test. The total number by HUSP-Mining-BFC has a small change from 63 to 67, indicating that our HUSP-Mining-Test is insensitive to this parameter. We can observe that the proposed algorithm HUSP-Mining-Test can efficiently return HUSPs, and its advantage is stable with the statistical test parameters.

Fig. 14
figure 14

The effect of the significance level on the number of patterns

Fig. 15
figure 15

The p-value distribution by HUSP-Mining-Test

8 Conclusions and Limitations

A large portion of the results by traditional HUIs algorithms may not have statistical significance, i.e., many HUIs are occurred just by chance. In this paper, according to mine high utility and significant patterns, we propose a novel SSPM algorithm called HUSP-Mining-Test. We integrate multiple hypothesis test corrections into the task of mining FSSPs with length and support constraints. The first-step HUSP-Mining can remove redundant HUI candidates and the second-step HUSP-Test leaves out the pattern with low utility, our algorithm can yield satisfactory results with regard to pattern number, running time, memory, and pattern utility. The FWER of the generated HUSPs can be controlled by the predefined test threshold. The experimental results on a few real-world datasets demonstrate the effectiveness of this new algorithm. Our method is capable of efficiently returning HUSPs than standard SSPM method with Bonferroni’s correction.

It must be admitted that our method still has some shortcomings and limitations. First, we do not consider the sequence of items in the transactions, and often high utility itemsets mining is accompanied by the sequence problem. Second, Fisher’s test is not applicable to every type of transaction dataset; when the sample size is small, it may not be able to obtain an accurate p value. Third, on long transaction dataset, our algorithm still needs to spend a lot of time to get the results. We would like to extend our approach to such interestingness measures in the future.