Incremental high average-utility itemset mining: survey and challenges

The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researchers have developed incremental HAUIM (iHAUIM) algorithms to identify HAUIs in a dynamically updated database. Contrary to conventional methods that begin from scratch, the iHAUIM algorithm facilitates incremental changes and outputs, thereby reducing the cost of discovery. This paper provides a comprehensive review of the state-of-the-art iHAUIM algorithms, analyzing their unique characteristics and advantages. First, we explain the concept of iHAUIM, providing formulas and real-world examples for a more in-depth understanding. Subsequently, we categorize and discuss the key technologies used by varying types of iHAUIM algorithms, encompassing Apriori-based, Tree-based, and Utility-list-based techniques. Moreover, we conduct a critical analysis of each mining method's advantages and disadvantages. In conclusion, we explore potential future directions, research opportunities, and various extensions of the iHAUIM algorithm.

The structure of this article is as follows: "Preliminaries and problem statement of iHAUIM" section provides an overview of the fundamental concepts and definitions related to iHAUIM."State-of-the-art algorithms for iHUIM" section classifies and explains iHAUIM approaches based on dynamic datasets, evaluating their advantages and disadvantages."Summary and discussion" section presents a thorough overview and evaluation of the latest iHAUIM techniques, which highlights potential research directions and opportunities for future advancements in iHAUIM.Lastly, "Conclusion" section concludes the survey, summarizing the key findings and contributions of the article.

Preliminaries and problem statement of iHAUIM
In this section, we lay the foundation by providing essential preparations and presenting a formal definition of the iHAUIM problem.We will also introduce the symbols that will be used throughout the rest of this paper, as shown in Table 1, and these symbols will be explained in subsequent sections.Below are examples of the original database and item utility table, presented as Table 2 and Table 3, respectively.The original database comprises five transactions, each identified by a transaction identifier (TID) and containing non-redundant items.The internal utility of each item is specified after a colon.Table 3 displays seven items that are present in the original database, represented as I = {a, b, c, d, e, f, g}.The external utility of each item is shown in Table 3.

I
A set of m items,I = {i 1 , i 2 , …, i m }, where each item i j has a profit value p j DB An original quantitative database, DB = {T 1 , T 2 , …, T n }, in which each transaction is a subset of I, with purchase quantities for each item DBn+ A set of new transactions, DBn = {t 1 , t 2 , …, t q }, in which each transaction includes a subset of items, with purchase quantities TID Each transaction T n ∈ D has a unique transaction identifier (TID) X A k-itemset containing k distinct items {i 1 , i 2 , …, i k } u(i j , T p ) The utility of an item i j in a transaction T p u(T p ) The sum of the utilities of items in a transaction T p tu DB The total utility tu DB of a database DB au(X,T p ) The average utility of X in T p au(X) The average utility of X in DB HAUI High-average-utility Itemset mu(T p ) The maximum utility of transaction T p auub(i j ) The average-utility upper-bound (AUUB) of item i j HAUUBI DB High average-utility upper bound itemset PAUUBI DB Pre-large average-utility upper-bound itemset HAUI UDB An itemset X is classified as an HAUI UDB in the updated (DB + DBn) database Table 2.An example database (DB).Table 3. Unit profits of items.The minimum high average utility upper-bound threshold δ and the lower-bound threshold δ L are set based on the user's preference (positive integers).Below are commonly used definitions for incremental high average utility pattern mining 44,59,60 , sliding window 47,49,61 , and dampened window models 52,55 , derived from the provided original database and item utility table.
Definition 1 Item utility 62 .The utility of an item i j in a transaction T p is represented as u(i j , T p ) and is computed as the product of its internal utility in transaction T p , denoted as iu(i j , T p ) 62 , and its external utility eu(i j ).
For instance, in Table 2, the item utility of 'a' in T 1 is calculated as u(a, T 1 ) = 3 × 4 = 12.
Definition 2 Transaction utility 63 .The transaction utility of 63 T p is indicated and computed as follows 52 .
Definition 3 Total utility 64 .The total utility (tu DB ) of a database DB is defined as follows: As an example, the total utility in the illustrated case of Table 2 is computed as tu DB = 33 + 69 + 40 + 41 + 49 (= 232).Definition 4 Average utility 62 .The average utility of item X in transaction T p , denoted as au(X, T p ), is calculated by dividing the sum of item utilities in X by the length of X 61 , |X|.Definition 5 Itemset Average utility.The average utility of X in the database (au(X)) is determined by summing up the average utilities of X in all transactions present in the database DB 61 .

Definition 6
HAUI.An itemset is categorized as a HAUI if its au satisfies 65 : For example, if δ is 8%, then itemset a, c is a HAUI since au(a, c) = 19.5 ≥ 2320.08 = 18.56. 66.The maximum utility of 66 transaction T p is notated as follows:
According to the downward closure property of AUUB 46 , if an itemset Y is a superset of itemset X 46 , denoted as Y ⊇ X, the following formula (9) can be obtained.Hence, if auub(X) DB ≥ tu DB × δ then auub(Y) DB ≤ auub(X) DB ≤ tu DB × δ is satified for any superset of X 46 .

Definition 12
The condition of HAUI UDB .In the updated (DB + DBn) database, in Table 4, an itemset X qualifies as a HAUI UDB if it meets the following conditions 46 : where au(X) UDB indicates the new average-utility of X,TU DB and TU DBn+ are respectively the transaction utility in DB and DBn + , and δ is the upper bound of utility threshold 46 .

State-of-the-art algorithms for iHUIM
In recent years, a considerable number of iHAUIM (Insert-based High Average Utility Itemset Mining) techniques have been developed to handle dynamic databases involving transaction insertions.So far,a total of 19 iHAUIM algorithms have been proposed, as shown in Fig. 1, which can be classified into three main categories: apriori-based, tree-based, and utility-list-based methodologies.In the upcoming sections, we will evaluate the strengths and weaknesses of each algorithm, as indicated in Table 5, with the primary aim of mining itemsets that exhibit high average utility during transaction updates.
The traditional HAUIM algorithm is only applicable to static datasets.However, if the dataset undergoes record updates, the static techniques necessitate processing all the data from the start to extract HAUI.Consequently, this results in high time and memory consumption.

Apriori-based iHAUIM
Based on the Fast Update (FUP) concept 40 , the TPAU 67 algorithm discovered HAUI from dynamic datasets that change with the insertion of new records.FUP records the previously frequent large itemsets and their counts for use in the maintenance process.when a new transactions are added, the FUP(Frequent Utility Pattern) algorithm generates candidate 1-itemsets.Subsequently, the candidate itemsets are compared with the previous itemsets in order to classify them into the following four cases: Case 1: The itemset is large in both the original database and the newly added transactions, resulting in its categorization as large in both domains.Case 2: The itemset is large in the original database but not in the newly inserted transactions.Case 3: Although not considered large in the original database, the itemset demonstrates significance in the newly inserted transactions.Case 4: The itemset does not meet the threshold for being deemed large in either the original database or the newly inserted transactions.
The suggested algorithm adopts an approach similar to Apriori to systematically explore the layers of HAUI.To optimize the search process, it employs early pruning techniques to discard low-utility itemsets.The algorithm leverages the downward closure property in a two-stage process, enabling it to generate a reduced set of candidate items at each level.In the first stage, an overestimated itemset is obtained using an average utility upper bound.In the second stage, an actual average utility value is computed, considering a high upper bound.Through these steps, the algorithm efficiently extracts HAUIs from incremental transaction datasets, enhancing its mining capabilities.Afterwards, the modified itemsets are categorized into four groups based on their characteristics, and whether their count difference in the modified records is positive, negative, or zero.Each group is then subjected to its specific processing approach.
The M-TP 59 proposes a two-stage record modification maintenance method, aimed at mining HAUI from updated datasets.To begin, this approach calculates the count difference by comparing the AUUB (Average Utility Upper Bound) of each modified itemset before and after modification.Then, the modified itemsets are divided into four parts based on their characteristics.This classification is determined by whether they are HAUUBI (High Average Utility Upper Bound Itemsets) in the original dataset and whether their count difference in the modified records is positive or negative (including zero).Each part is then subjected to its specific processing approach.The M-TP algorithm reduces the time required to reprocess the entire updated dataset.In the original dataset, the itemsets are larger in the first two cases, and smaller in the last two cases.Conversely, the first and third cases exhibit a positive count difference, while in the modified records, the count difference turns negative or remains zero in the second and fourth cases.Lan et al. 59 proposed four cases of modifying records from existing datasets in Fig. 2.
In contrast to conventional approaches, the algorithm 59 reduced the time required for the entire dataset updating time.In terms of runtime, the M-TP algorithm demonstrates superior performance to the Batch TP algorithm across different minimum average utility thresholds 68 .
The algorithm 69 is proposed to handle transaction deletions in dynamic databases using the pre-large concept on HAUIM, called PRE-HAUI-DEL.The pre-large concept is used as a buffer on HAUIM to reduce the number of database scans, particularly during transaction deletions, and its overview is illustrated in Fig. 3. Additionally, two upper bounds are established in the algorithm to early prune unpromising candidates, which can accelerate computation costs.Compared to Apriori-like models, PRE-HAUI-DEL excels in efficiently mining high-average utility items in updated databases.In addition, the developed algorithm also uses the LPUB upper bound model, which can significantly reduce the number of candidate items that need to be checked in the search space.Compared to the general model that updates discovered knowledge Using batch processing mode, our designed PRE-HAUI-DEL can effectively maintain the discovered HAUI without the need for multiple database scans, as illustrated in Figs. 4 and 5.This not only reduces computational costs but also correctly and completely maintains knowledge about HAUI.
In 65 , this article introduces the APHAUI algorithm, a HAUP (High-Utility Association Pattern) algorithm based on Apriori, capable of effectively mining HAUI from dynamic datasets.This algorithm follows an Apriori-like approach 23 and employs the pre-large concept 56 to reduce the search space and proactively prune less promising candidate items, revealing promising itemsets during maintenance.The final results of cases 1, 5, 6, 8, and 9 remain unaffected.Moreover, the amount of information discovered in cases 2 and 3 can be reduced, while some new information might emerge in cases 4 and 7.As shown in Fig. 6,the pre-large concept can easily handle itemsets in cases 2, 3, and 4. The authors devised two upper bounds, namely Partial Upper Bound (pub) and Lower pub (lpub), to enhance the efficiency of the mining process.The pub serves as astringent upper limit that reduces the size and upper utility bound of promising itemsets.A High pub itemset (pubi) with greater utility than pub was developed.
Furthermore, the algorithm introduces a subset named lpubi (Lead-pubi) as a part of pubi, capable of further reducing the candidate itemset for subsequent mining processes.Despite the algorithm generating both pubi and lpubi itemsets, the applicability of lpubi is constrained compared to pubi.Lead-pubi contributes to reducing the count of candidate items.Additionally, a formula is employed to curtail unnecessary dataset scans.Lastly, the introduction of a linked list ensures that each transaction is scanned at most once, thereby minimizing the frequency of dataset scans during the update process.
The algorithm begins by scanning the input dataset, followed by the dynamic processing flow of the APHAUI method.By employing a designed re-scanning threshold, it can automatically determine the update pace of the incremental dataset, enhancing mining efficiency.During the algorithm's execution, two upper bounds, pub, and lead-pub, along with two itemsets, pubi and lead-pubi, are used to reveal the complete set of HAUIs within the transaction dataset.This algorithm not only demonstrates strong performance but also holds significant potential in real-time scenarios.Previous HAUIM algorithms processed dynamic datasets using batch processing.As a result, the APHAUIM 46 incurred costs in terms of past computations and the discovery of pattern information.To address this issue, the concept of FUP (Frequent Update Pattern) was introduced 40 for real-time pattern discovery and storage of pattern information.However, this requires rescanning the dataset to acquire the latest information.In 70 , a new model called Apriori-based Potential High Utility Itemset Mining (APHAUIM) is proposed, which effectively reveals potential high utility patterns from uncertain databases in industrial IoT by maintaining two item sets (phps and plhps) using two tight upper-bound values (pub and lead-pub), while ensuring the completeness and correctness of the mining results.
Based on the concepts of pre-large 56,58 and the Apriori method 23 , a new algorithm called APHAUIM is introduced to mine HAUI from incremental transaction datasets.PAUBI is introduced to retain promising HAUBI.PAUBI acts as a buffer to minimize the rescans needed for checking whether a small itemset evolves into a large itemset.An overview of the pre-large concept is depicted in Fig. 6.
Compared with the benchmark FUP-based HAUIM algorithm 67 , the designed algorithm is better suited for streaming environments in dynamic datasets.However, a limitation lies in the fact that, similar to the benchmark method, the proposed algorithm also incurs a considerable amount of rescanning time.This is because locating itemsets in the buffer to update the insertion process requires additional time.Therefore, selecting appropriate thresholds is a topic of significant importance.

Tree-based iHAUIM
The SHAU 61 introduces an effective algorithm named SHAU for analyzing time-sensitive data in terms of significance.The algorithm employs the HAUPM algorithm based on sliding windows to process data streams.The HAUPM algorithm considers only new data during the pattern mining process for discovering data streams.As the algorithm is based on the concept of sliding windows [71][72][73][74] , it divides the data stream into multiple blocks or batches.The concept of sliding windows for data streams was initially proposed by Yun et al. 61 .
The SHAU algorithm employs a novel SHAU tree structure.In this tree, each node consists of three elements: the first element stores the tid that includes the item, the second element is used to store the recent auub data information of the data stream batch by batch, and the third element is a link pointing to another node with the same tid.The auub of different items in the data stream is stored in the header table of the SHAU tree.Additionally, the efficiency of SHAU is enhanced by utilizing a new strategy called RUG.
The EHAUI-tree algorithm 75 is proposed as an improved iteration of the HAUI-tree algorithm 76 .The primary objective is to enhance mining efficiency and reduce memory consumption.The algorithm aims to mine by adding new transactions instead of restarting the dataset.It utilizes the downward closure property and employs an index table structure.This innovative approach enhances computational efficiency while simultaneously reducing memory requirements.In addition, the algorithm introuces a bit-array structure to compute utility values more efficiently.However, the algorithm performs poorly on large datasets or small thresholds.
In 45 , a new approach called IHAUPM is proposed for handling frequent transaction insertions in updated datasets.The algorithm leverages an adapted FUP concept to efficiently integrate prior information and update the results when new information is discovered during updates.The newly inserted transactions are categorized  www.nature.com/scientificreports/into four distinct cases, considering the occurrence frequency of the original dataset and the newly inserted transactions.This categorization ensures effective handling of different scenarios and minimizes repetition during the updating process.In cases where the itemset is the original dataset or the HAUUBI in the new insert, it remains a HAUUBI, while in cases where it is not, it remains non-HAUUBI.
For cases where it is necessary to determine whether the itemset is actually a HAUUBI based on existing information or to rescan the original dataset, the algorithm employs a compressed HAUP tree data structure to store and utilize the required information.This approach requires minimal scanning of the original dataset and is highly efficient while preserving the count of prefix items processed in each node of the tree.This article 60 proposes an algorithm called IMHAUI, which is based on the IHAUI-tree and uses node sharing to preserve the information of the incremental dataset, thereby addressing the problem of adding new data to the dataset which may cause the number of items to exceed or fall below the minimum support threshold.Each time new data is added, node sharing undergoes reconstruction.To achieve this, transactions within the dataset are sorted in descending order based on their AUUB values.During the reconstruction process, each path is rearranged in decreasing order of the optimal AUUB value.To maintain compactness, a path adjustment technique is utilized 77 .Additionally, the algorithm preserves the AUUB value of each itemset by maintaining a header table.Subsequently, the mined tree is examined to access candidate itemsets, and their actual average utility is computed during the candidate validation phase.
FIMHAUI based on mIHAUI-tree, to address the problems of time-consuming candidate itemset generation and expanding search space while determining the upper limit value caused by IMHAUI 60 .The algorithm performs a singlescan of the dataset to extract information from HAUI.It stores transaction information in each node of IHAUI-Tree, which completely overlaps with the path from the root to that node, and thus only saves the necessary information in the leaf nodes of mIHAUI-Tree.Initially, all transactions are inserted into an empty mIHAUI-tree in a sequential order based on alphabetical order.Subsequently, the path adjustment method proposed in 60 is to adjust the paths in order to enhance the sharing efficiency of nodes within the mIHAUI-tree.The algorithm uses data set projection and merge techniques to efficiently find itemsets.mIHAUI-tree introduces a novel approach by directly obtaining the projected data set for candidate itemsets, eliminating the need for generating conditional patterns and local trees.Additionally, a transaction merge technique identifies identical transactions in dictionary order within one scan.In contrast to the IHAUI-tree, the proposed algorithm offers not only time savings but also a reduction in repetition.However, the performance of the algorithm is unsatisfactory when applied to large datasets or small thresholds.
In 55 the MAMs algorithm was designed to effectively analyze time-sensitive data which was applicable to data streams and employs an exponential damping window model and pattern growth methods.Furthermore, the algorithm considers the temporal aspect of the provided data to acquire pertinent and current pattern knowledge.The algorithm employs DAT structure and TUL to handle dynamic data streams.As new data is inserted into a transaction, the algorithm constructs a DAT data structure and incorporates average utility information.This procedure persists until a user-initiated mining request is encountered.Upon receiving such a request, the MPM algorithm follows the pattern growth approach on the dataset.
The common goal of these algorithms is to enhance the efficiency of data mining, reduce memory consumption, and adapt to the dynamic nature of data.The SHAU algorithm utilizes the HAUPM algorithm based on sliding windows to process data streams, employing the SHAU tree structure to store itemset information from the data stream and enhancing efficiency through the RUG strategy.The EHAUI-tree algorithm, as an improved version of the HAUI-tree algorithm, and the IHAUPM algorithm, introduce new methods for handling frequent transaction insertions in updated datasets.The FIMHAUI algorithm, based on the mIHAUI-tree, addresses the time-consuming generation of candidate itemsets and the expansion of search space in IMHAUI.These algorithms share a common challenge in that they attempt to optimize the mining process through various data structures and strategies to accommodate the dynamic changes and time sensitivity of data.However, they may encounter performance issues when dealing with large datasets or small thresholds, indicating that further optimization and improvement may be necessary in practical applications.

List-based iHAUIM
To address the issue of inadequate performance in mining advanced association rules in dynamic environments, Wu et al. proposed an update algorithm 44 to update the obtained advanced association rules using transaction insertion.The proposed algorithm builds upon the AU list 39 and incorporates the concept of FUP (Frequency Upper Bound) 40 to enhance its performance.To adapt and update advanced association rules with transaction insertion, the proposed algorithm employs a two-stage approach.In the initial stage, the 1-HAUUBI set is derived from the original dataset.Subsequently, an AU list is constructed from the 1-HAUUBI set, facilitating subsequent processing.In the second stage, the algorithm efficiently handles transaction insertion by dividing the HAUUBI set into four partitions based on the FUP (Frequency Upper Bound) criterion.This partitioning strategy minimizes repetition and enhances efficiency during the updating process.The proposed algorithm, as described in 44 , presents four distinct cases for handling transaction insertion, as illustrated in Fig. 7.
In each case, the algorithm preserves the HAUUBI set for each partition, with the exception of non-advanced association rules in case 4.These non-advanced items are excluded from the HAUUBI set during dataset updates, as they do not qualify as advanced association rules.This approach effectively reduces redundancy in the algorithm, as illustrated in Fig. 8.The updateADD and updateDEL methods are used for adding and deleting items in the AU-list structure, respectively.The updateADD function can easily update the auub value of the itemsets based on the AU-list structure.As for the updateDEL function, it can directly remove the unpromising itemsets based on the AU-list structure after the database has been updated.The AU list reduces the number of scans on the dataset and the generation of candidate itemsets.After updating the dataset, HAUUBI is added to the AU list, while non-HAUUBI is removed from the AU list.The proposed algorithm effectively updates HAUUBI to identify the actual HAUI in the updated dataset.Subsequently, the remaining itemsets in the AU list are compared against the minimum high average utility threshold, resulting in the identification of the true HAUI within the updated dataset.The proposed algorithm efficiently updates the HAUUBI to discover the actual HAUI.However, sometimes more candidate items need to be evaluated.
The FUP-HAUIMI 78 algorithm is a modified version based on the FUP concept 40 , for discovering HAUI from updated datasets.The algorithm consistently preserves and updates the uncovered information, eliminating the requirement to create data for transaction deletion.Furthermore, it improves the updating process by avoiding the need for multiple scans of the dataset.The algorithm first constructs the AU-list 39 data structure by scanning the original dataset effectively storing information for mining patterns (candidates) and gradually updating results.All items inserted in transactions are kept in the initial AU-list, and then 1-HAUUBI is classified into four categories based on the FUP concept, as described in 45 , with these four categories illustrated in Fig. 9. Finally, the algorithm is able to efficiently discover updated HAUUBI and HAUI without generating candidates, as illustrated in Figs. 10, 16 and 17.
After the dataset is updated, the concept of FUP is applied to handle transaction insertions 42 .Moreover, a depth-first search approach is employed to generate candidate itemsets.
A data mining method called the FUP-HAUIMD algorithm 79 , which is based on the removal of transactions from the original dataset and utilizes the MFUP (modified FUP) 40 extension from 80 .In this algorithm, deleted transactions can be categorized into four types, each with distinct implications for identifying HAUUBI (Highly Associated Unordered Unique Binary Itemsets), as illustrated in Fig. 11.In the first category, existing information can be used to determine whether the itemset remains a HAUUBI.For the second category, the item continues to be a HAUUBI.The third category can be safely discarded as it only contains non-HAUUBI.For the fourth category, a complete rescan of the original dataset is necessary.The auub value of each HAUUBI is stored in an AU list 39 , and the AU list is updated every time data is removed.Mining the enumeration tree allows for the evaluation of its true HAUI without requiring multiple scans of the dataset, as illustrated in Figs. 12 and 13.Initially, Algorithm 4 scans the database to identify items from the recently added transactions, creating their AUL structures.Subsequently, the AUL structures originating from the initial database and the added transactions are combined.Upon merging the AUL structures, if the mean utility of an itemset surpasses the revised minimum average utility count, it qualifies as a high average utility itemset.Following this, its supersets are explored through a depth-first search approach based on the enumeration tree.This iterative process continues recursively until no additional tree nodes are generated.The average utility of the chosen itemsets is then computed, culminating in the algorithm's conclusion.The revised patterns are then successfully derived.
The process of Algorithm 5 commences by examining the removed transactions in order to form the AU-lists for 1-itemsets.Subsequently, utilizing these eliminated transactions, the AU-lists within the original database are modified, resulting in the acquisition of the revised AU-lists.Following this, Algorithm 6 is iterated recursively, merging the AU-lists of k-itemsets through a depth-first search strategy based on the enumeration tree structure.Should an itemset satisfy specific criteria, it is designated as an HAUI.In instances where these conditions are not  met, the auub value of the itemset is compared to the updated minimum high utility count to ascertain its superset.Additional details regarding the construction function are provided in reference 39 .Subsequent to the retrieval of the revised AU-lists, if the average utility of an itemset equals or exceeds the minimum high utility count, it is identified as an HAUI.Ultimately, the algorithm yields the updated outcomes and concludes its operation.
By default, Algorithm 7 initializes the buffer (buf) to 0 in the first iteration.Next, it computes the safety boundary (f) and the total utility d.Following this, AUL structures for all 1-item sets in d are generated to guarantee the accuracy and entirety of the resulting HAUIs.This approach is logical as, in practice, the number of transactions in d is typically small compared to the original database D. The AUL structures from D and d are then merged through a sub-routine, and the total utility of the combined databases is calculated.The updated AUL structures are managed, and if the auub value of an itemset X does not exceed the upper utility, a HAUI is detected.Subsequently, the supersets of X are evaluated for potential scanning using the recursive PRE-HAUIMI method.The list of HAUIs is updated, with PHAUIs serving as the buffer, while the AUL structures are refreshed for subsequent maintenance.
Aims to mine Highly Associated Unordered Unique Binary Itemsets (HAUI) while simultaneously reducing their search space and the number of database scans, the MHAUIPNU algorithm 81 employs a database with both positive and negative utilities.It introduces a novel, tighter upper-bound model named TUBPN, alongside a list data structure to store the required information for mining HAUI.Furthermore, three new pruning strategies are proposed to further enhance the algorithm's performance.The first strategy is based on characteristics derived from the TUBPN model, while the other two leverage attributes are associated with items (or itemsets) having negative utilities.
The paper 65 proposes an algorithm called PRE-HAUMI (High Average Utility Itemset Mining with Pre-large Itemset concept) which efficiently mines HAUI from the updated dataset with transaction insertions.The algorithm utilizes the Pre-large Itemset concept to effectively discover HAUIs and maintains an Average Utility List (AUL) structure, which ensures that each transaction is scanned at most once during the maintenance process, as illustrated in Figs.14, 15, 16, 17.In 63 , the paper introduces an efficient algorithm called LIMHAUP, which requires only a single scan of the dataset to extract HAUP from the updated dataset, thereby reducing the cost of performing multiple dataset scans.Additionally, a new structure named HAUP List is introduced, which stores pattern information in a compact manner, eliminating the need for candidate patterns.The algorithm constructs the HAUP List through a single dataset scan and eliminates numerous irrelevant patterns, resulting in reduced execution time and memory consumption during the mining process.Initially, all HAUP Lists are rearranged in real-time order from small to large items,aiming to shrink the search space.Then, organization process is designed to rebuild the HAUP List with an effective sorting order.Ultimately, the algorithm effectively handles new insertions in the incremental dataset.
Unpromising patterns are not removed from the global HAUI list, as they might be HAUPs in a dynamic dataset.This is because the upper-bound pruning strategy can potentially overestimate the average utility.Therefore, an additional pruning strategy called MAU 82 is required to better reduce unpromising patterns.MAU rigorously mines extended patterns.The proposed algorithm demonstrates superior performance in terms of memory consumption, runtime, and scalability compared to the baseline algorithm.
The DMAUP 52 utilizes a damping window framework to extract time-sensitive patterns from incremental databases, aiming to mine high-utility frequent patterns.This method effectively extracts the latest high-utility frequent patterns, thanks to its use of damping factors to adjust item utility values based on their arrival time.Furthermore, to efficiently identify the latest high-utility frequent patterns, the method introduces new data structures known as dA-List, MU, and dUB tables.For incremental data streams, the dA-List undergoes a rebuilding process to incorporate newly added data.Moreover, the mining algorithm employs two pruning techniques, namely damping upper bound and damping maximum average utility, in compliance with the elastic properties of the damping window model.By following these steps, the method can effectively extract the most recent high-utility frequent patterns.The algorithm 49 utilizes a newly developed list structure, the SHAUP list, to gather information on recent batches.By deleting the oldest batch and introducing a new one after completing the mining process of the current window, the algorithm effectively addresses the most recent stream data.The proposed approach extracts valuable and trustworthy pattern results while considering the length of pat-terns in unlimited data streams.To optimize performance, a new pruning strategy is implemented to reduce the search space, lowering the upper bound by utilizing residual utility.Prior algorithms resulted in numerous candidate patterns and suffered from performance degradation when computing the actual average utility.Conversely, our approach utilizes a list structure to store actual utility information of patterns.Through experimental analysis, results show the SHAUPM algorithm is superior in runtime, memory usage, and scalability on both real-time and synthetic datasets compared to the latest algorithms.

Indexed list based iHAUIM
In the realm of mining high average utility patterns, multiple algorithms have been developed for handling incremental environments.Nevertheless, tree-based algorithms produce potential patterns that necessitate validation through additional database scans.Conversely, list-based algorithms do not generate potential patterns but require numerous comparison operations to identify shared transaction entries with identical identifiers throughout the mining process.These limitations have adverse impacts on algorithms aiming to expediently deliver result patterns.Conversely, indexed list structures 84,85 effectively mitigate these shortcomings and have demonstrated superior efficiency compared to tree and list structures in mining high utility patterns.
A novel method for enhancing the efficiency of current average utility driven methods is introduced in the literature as IIMHAUP 86 (Indexed List Based Incremental Mining of High Average Utility Patterns).This approach involves designing a structured list index to facilitate the mining of high average utility patterns in incremental databases.In the IIHAUP algorithm uses three key subroutines to efficiently discover resultant patterns from the initial database ODB.

Categories of iHAUIM
The previous section provided an overview of three primary categories of iHAUIM algorithms: those utilizing the Apriori algorithm 46,59,65,67,69 , those using tree algorithms 45,55,60,75 , and those relying on utility lists 44,49,52,63,65,78,79,81,83 .These algorithms differ in six key ways: a. number of scans of the original database; b. strategy for updating and maintaining high average utility itemsets when data changes dynamically; c. method for searching for HAUIM; d. type of upper bound strategy to reduce candidate itemsets; e. type of data structure for maintaining transaction and itemset information (tree-based or utility-list-based); f. pruning strategies to reduce search space and speed up mining.
Tables 6, 7 summarizes these characteristics for the 19 algorithms discussed, noting that not all have been comprehensively studied in the literature.Moving forward, we will delve deeper into these iHUIM algorithms, analyzing and discussing them from the angles of runtime and memory consumption.

Runtime, memory consumption and scalability
The performance of various algorithms for itemset mining has been evaluated, including those proposed by APITPAU Hong et al. 67 and SHAU Yunet al. 61 that utilize tree structures, as well as IHAUPM Lin et al. 45 , FUPHAUIMI Zhang et al. 78 , and LIMHAUP Kim et al. 63 that use utility lists.The results indicate that utility-listbased algorithms exhibit superior performance comparable to Apriori-based methods.Each iHAUIM algorithm has its own limitations, which have been analyzed.Both utility-list-based and tree-structure-based approaches can reduce the number of candidate itemsets generated and the transactions scanned during maintenance.The www.nature.com/scientificreports/use of the pre-large concept strategy has been found to be more effective than the FUP concept strategy based on experimental results obtained from FUP-based Wu et al. 44 and PRE-HAUIMI Lin et al. 65 .Lastly, sliding windows and pruning techniques have been shown to enhance the runtime of the algorithm based on the experimental results of LIMHAUP Kim et al. 63 and SHAUPM Lee et al. 49 .

Challenges and future directions
Despite the effectiveness of the existing methods, there are still many future directions that require being explored.Following are some crucial research opportunities associated with the iHAUIM algorithm.

Enhancing the effectiveness of the algorithms
The iHAUIM algorithm can be time-consuming and occupy a large memory while executing, which can raise concerns in real-time dynamic database updates.Even though the current incremental high-utility mining algorithms are faster than their predecessors, there is a scope for improvement.To name a few, compact data structures like trees or lists and more efficient pruning strategies could be developed for mining methods.

Handling the complex dynamic data
Real-life data is highly dynamic, comprising vast and complex datasets used in various fields.Although the principle behind it is straightforward, integrating it into the design of data mining algorithms is complicated.Discovering dynamic data environments is much more difficult and challenging than analyzing static data.

Analyzing the massive amounts of data
Incremental mining of big databases has higher computational costs and memory consumption.Nonetheless, in the era of big data, processing data step-by-step and having a look at earlier analyzed results is indispensable.
Research opportunities exist for iHAUIM to process large databases, such as designing parallelized iHUIM algorithms.The FUP-HAUIMI algorithm has been updated by utilizing the FUP concept to effectively save the information of the mining patterns using the AUL structure FUP-HAUIMD The HAUP-list was adopted, which compactly stores information about patterns and easily removes many hopeless patterns DMAUP In the experiment, we assessed the runtime of five algorithms across various TH values while maintaining a fixed IR (= 1%), as depicted in Fig. 18.As depicted in Fig. 18, it's clear that the designed PRE-HAUIMI algorithm outperforms the other two algorithms across six datasets.
As the TH value increases, the running time of the five algorithms decreases.This is reasonable because as TH increases, less HAUI is found.Therefore, these five algorithms require less runtime.In addition, it can be seen that for some datasets, such as Fig. 19a,c,f, the PRE-HAUIMI algorithm designed remains stable for various TH values.HAUI Miner represents the most advanced algorithm for mining HAUI using the auub model, while IHAUIM stands as the most advanced algorithm for incremental HAUIM utilizing tree structures.Consequently, it can be concluded that the designed PRE-HAUUIMI, FUP-HAUIMI, and FUP-based algorithms exhibit strong performance when handling dynamic databases with transaction inserts.The efficiency of the AUL (Average  www.nature.com/scientificreports/Utility List) structure facilitates streamlined calculations and retrieval of the required HAUI.Experimental evaluations were conducted on six datasets, maintaining fixed TH (Transaction-Utility) values, while varying IR (Item Reduction) values.Figure 19 presents the results derived from these experiments, showcasing the comparative performance of the algorithms.As illustrated in Fig. 19, the PRE-HAUIMI algorithm demonstrates superior performance compared to both FUP-HAUIMI and FUP-based algorithms.Furthermore, it is observed that the FUP-HAUIMI and FUP-based algorithms still outperform the HAUI Miner and IHAUPM algorithms.The stability of all algorithms, particularly the PRE-HAUIMI algorithm, is evident as the IR (Item Reduction) increases.This indicates that as the IR increases, the performance of all algorithms remains consistent, with the PRE-HAUIMI algorithm consistently displaying the best performance.

Memory usage improvement
We conducted experiments to analyze the memory usage of various algorithms considering fixed IR values and different TH values.The results are depicted in Fig. 20.Notably, the HAUI Miner algorithm demonstrates superior memory usage performance across datasets (Fig. 20a,c,e).This can be attributed to the utilization of a utility list structure in HAUI Miner, which efficiently compresses and maintains discovered information.As a result, it usually demands less memory when compared to the IHAUPM algorithm, which utilizes a tree structure for incremental maintenance.Moreover, HAUI Miner doesn't necessitate holding extra information for maintenance purposes.Instead, when the database size changes, the algorithm rescan the database to acquire updated information, resulting in potential computational costs but lesser memory requirements.
Through experiments with fixed IR values and different TH values, we evaluated the memory usage of various algorithms.Figure 21 illustrates the results, showcasing the superior memory usage performance of the HAUI Miner algorithm across datasets 21a, c, and e.This advantage can be attributed to the efficient compression and maintenance of discovered information facilitated by the utility list structure utilized by HAUI Miner.Consequently, it requires less memory compared to the IHAUPM algorithm, which employs a tree structure for incremental maintenance.Additionally, HAUI Miner does not require the retention of additional information for maintenance.Instead, it rescans the database when its size changes, obtaining updated information at the cost of computational overhead but with reduced memory requirements.

Number of patterns
The experiment involved evaluating the number of candidate patterns generated during the discovery of actual HAUI.The results, considering different TH values with fixed IR, are presented in Fig. 22. Observing Fig. 22, it is evident that, with the exception of Fig. 22c and d, the proposed PRE-HAUIMI, FUP-HAUIMI, and FUPbased algorithms generate significantly fewer candidate patterns compared to the HAUI Miner and IHAUPM algorithms.Notably, the PRE-HAUIMI algorithm produces the fewest number of candidate patterns.
This discrepancy can be attributed to the dense nature of the T10I4N4KD100K dataset, where many transactions contain the same maintenance items.As a result, the proposed PRE-HAUIMI, FUP-HAUIMI, and FUPbased algorithms may require additional checks in the enumeration tree to determine if a superset needs to be generated.However, overall, these algorithms still evaluate fewer patterns compared to the other algorithms.This highlights the effectiveness of the AUL structure and adaptive FUP (Frequent Utility Pattern) concept in reducing the incremental mining cost of average utility itemsets.The results, considering different DR (Dependency Ratio) values with fixed TH, are depicted in Fig. 23.Similarly, it is observed that in very sparse and dense datasets, such as the ones depicted in Fig. 23c and d, the PRE-HAUUIMI, FUP-HAUIMI, and FUP-based algorithms may require checking more candidate patterns.However, for other datasets, like those in Fig. 23a,b,e, these algorithms surpass the performance of the IHAUPM algorithm and even achieve the best outcomes, as demonstrated in Fig. 23f.
In terms of runtime performance, the proposed PRE-HAUUIMI, FUP-HAUIMI, and FUP-based algorithms outshine the alternative approaches.This can be attributed to the efficiency derived from the FUP concept and the AUL structure, enabling a significant reduction in runtime.Considering the overall results, it can be inferred that while the PRE-HAUUIMI, FUP-HAUIMI, and FUP-based algorithms require additional memory usage and may need to check more candidate patterns in certain scenarios, nevertheless, they consistently achieve higher levels of efficiency and effectiveness in the majority of cases.Among them, the PRE-HAUUIMI algorithm performs the best, with the exception of very sparse datasets with long transactions or extremely dense datasets.Taking these findings into account, it becomes evident that there are numerous directions that can be explored to further enhance and improve the iHAUIM algorithm, catering to the ever-evolving and dynamic demands of data mining.

Conclusion
A detailed summary of different algorithms for the IHAUIM problem is presented in this paper.We provide an all-inclusive and current analysis of IHAUIM algorithms in dynamic datasets and propose a classification system for the existing IHAUIM techniques.We explore various iHAUIM algorithms for modifying datasets in dynamic data settings, streaming data, and sequential datasets, and evaluate the advantages and drawbacks of the most advanced approaches.Additionally, we identify the significant areas for future research in incremental high-average utility itemset mining. https://doi.org/10.1038/s41598-024-60279-0www.nature.com/scientificreports/

Figure 2 .
Figure 2. Four cases when records are modified from an existing dataset.

Figure 3 .
Figure 3. Nine cases of the pre-large concept.

Figure 7 .
Figure 7. Four cases of the proposed algorithm with transaction insertion.

Figure 9 .
Figure 9. Four cases of the adapted FUP concept.

Figure 11 .
Figure 11.Four cases of the designed FUP-HAUIMD algorithm.

Figure 20 .
Figure 20.The results of memory usage w.r.t varied thresholds.

Figure 22 .
Figure 22.Number of candidate patterns for various threshold values.

Table 5 .
Algorithm advantage.IM incremental maintenance, DCP downward closure property, TSA two-stage algorithm, HDS handling data streams, PNU positive and negative utilities, PpP pre-processing and pruning, TD transaction deletion, IU incremental updates.