Dual incremental fuzzy schemes for frequent itemsets discovery in streaming numeric data

Published on Apr 1, 2020in Information Sciences5.524
路 DOI :10.1016/j.ins.2019.11.023
Hui Zheng2
Estimated H-index: 2
Peng Li1
Estimated H-index: 1
(NUPT: Nanjing University of Posts and Telecommunications)
+ 5 AuthorsJing He15
Estimated H-index: 15
(Swinburne University of Technology)
Abstract Discovering frequent itemsets is essential for finding association rules, yet too computational expensive using existing algorithms. It is even more challenging to find frequent itemsets upon streaming numeric data. The streaming characteristic leads to a challenge that streaming numeric data cannot be scanned repetitively. The numeric characteristic requires that streaming numeric data should be pre-processed into itemsets, e.g., fuzzy-set methods can transform numeric data into itemsets with non-integer membership values. This leads to a challenge that the frequency of itemsets are usually not integer. To overcome such challenges, fast methods and stream processing methods have been applied. However, the existing algorithms usually either still need to re-visit some previous data multiple times, or cannot count non-integer frequencies. Those existing algorithms re-visiting some previous data have to sacrifice large memory spaces to cache those previous data to avoid repetitive scanning. When dealing with big streaming data nowadays, such large-memory requirement often goes beyond the capacity of many computers. Those existing algorithms unable to count non-integer frequencies would be very inaccurate in estimating the non-integer frequencies of frequent itemsets if used with integer approximation of frequency-counting. To solve the aforementioned issues, in this paper we propose two incremental schemes for frequent itemsets discovery that are capable to work efficiently with streaming numeric data. In particular, they are able to count non-integer frequency without re-visiting any previous data. The key of our schemes to the benefits in efficiency is to extract essential statistics that would occupy much less memory than the raw data do for the ongoing streaming data. This grants the advantages of our schemes 1) allowing non-integer counting and thus natural integration with a fuzzy-set discretization method to boost robustness and anti-noise capability for numeric data, 2) enabling the design of a decay ratio for different data distributions, which can be adapted for three general stream models: landmark, damped and sliding windows, and 3) achieving highly-accurate fuzzy-item-sets discovery with efficient stream-processing. Experimental studies demonstrate the efficiency and effectiveness of our dual schemes with both synthetic and real-world datasets.
  • References (36)
  • Citations (0)
馃摉 Papers frequently viewed together
4 Citations
16 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Xingjuan Cai (Taiyuan University of Science and Technology)H-Index: 13
#2Yun Niu (Taiyuan University of Science and Technology)H-Index: 1
Last. Jinjun Chen (Swinburne University of Technology)H-Index: 38
view all 7 authors...
14 CitationsSource
#1Penghong Wang (Taiyuan University of Science and Technology)H-Index: 3
#1Penghong Wang (Taiyuan University of Science and Technology)H-Index: 1
Last. Jinjun Chen (Swinburne University of Technology)H-Index: 38
view all 5 authors...
6 CitationsSource
#1Gai-ge Wang (Taiyuan University of Science and Technology)H-Index: 2
#2Xingjuan Cai (Ocean University of China)H-Index: 13
Last. Jinjun Chen (Northeast Normal University)H-Index: 4
view all 5 authors...
Cyber-physical social systems (CPSS) is an emerging complicated topic which is a combination of cyberspace, physical space, and social space. Many problems in CPSS can be mathematically modelled as optimization problems, and some of them are multi-objective optimization (MOO) problems (MOPs). In general, the MOPs are difficult to solve by traditional mathematical programming methods. High performance computing with much faster speed is required to address these issues. In this paper, a kind of h...
44 CitationsSource
#1Muneeb Ul Hassan (Swinburne University of Technology)H-Index: 2
#2Mubashir Husain Rehmani (CIT: Cork Institute of Technology)H-Index: 23
Last. Jinjun Chen (Swinburne University of Technology)H-Index: 38
view all 5 authors...
Abstract The increasing energy costs and increase in losses in traditional power grid system triggered the integration of Renewable Energy Resources (RERs) in smart homes. The global desire of consumers to rely on RERs such as solar energy, and wind energy has increased dramatically. Similarly, the IT technologies are also playing their part in smart grid development, such as real time data monitoring. On the other hand, with the advancement of these IT technologies in smart meters, the privacy ...
5 CitationsSource
#1Yinghua Li (Xidian University)H-Index: 1
#2He Yu (SYSU: Sun Yat-sen University)H-Index: 1
Last. Jinjun Chen (Swinburne University of Technology)H-Index: 38
view all 4 authors...
8 CitationsSource
#1Zhihua Cui (Taiyuan University of Science and Technology)H-Index: 1
#2Jiangjiang Zhang (Taiyuan University of Science and Technology)H-Index: 3
Last. Jinjun Chen (UTS: University of Technology, Sydney)H-Index: 38
view all 7 authors...
26 CitationsSource
#1Penghong WangH-Index: 1
#2Fei XueH-Index: 3
Last. Jinjun ChenH-Index: 1
view all 6 authors...
Locating node technology, as the most fundamental component of wireless sensor networks (WSNs) and internet of things (IoT), is a pivotal problem. Distance vector-hop technique (DV-Hop) is frequently used for location node estimation in WSN, but it has a poor estimation precision. In this paper, a multi-objective DV-Hop localization algorithm based on NSGA-II is designed, called NSGA-II-DV-Hop. In NSGA-II-DV-Hop, a new multi-objective model is constructed, and an enhanced constraint strategy is ...
12 CitationsSource
#1Chunyao Song (NKU: Nankai University)H-Index: 4
#2Xuanming Liu (University of Massachusetts Lowell)H-Index: 2
Last. Yao Ge (NKU: Nankai University)H-Index: 1
view all 4 authors...
Abstract Many big data applications today require querying highly dynamic and large-scale data streams to find the top-k most frequent items in the most recent window of a specified size at a specific time. This is a challenging problem. We propose a novel approach called Floating Top-k. Our algorithm does not need to explicitly maintain any item counts over time or deal with count updates upon item entry and expiration. Succinctly , we use only a small-size data structure to retrieve the top-k ...
2 CitationsSource
#1Yechuang WangH-Index: 2
#2Penghong WangH-Index: 3
Last. Jinjun ChenH-Index: 38
view all 7 authors...
A bat algorithm (BA) is a heuristic algorithm that operates by imitating the echolocation behavior of bats to perform global optimization. The BA is widely used in various optimization problems because of its excellent performance. In the bat algorithm, the global search capability is determined by the parameter loudness and frequency. However, experiments show that each operator in the algorithm can only improve the performance of the algorithm at a certain time. In this paper, a novel bat algo...
7 CitationsSource
#1Maoqing Zhang (Taiyuan University of Science and Technology)H-Index: 1
#2Hui Wang (Nanchang Institute of Technology)H-Index: 35
Last. Jinjun Chen (Taiyuan University of Science and Technology)H-Index: 38
view all 4 authors...
Cuckoo search (CS) is a recently developed meta-heuristic, which has shown good search abilities on many optimization problems. In this paper, we present a hybrid multi-objective CS (HMOCS) for solving multi-objective optimization problems (MOPs). The HMOCS employs the non-dominated sorting procedure and a dynamical local search. The former is helpful to generate Pareto fronts, and the latter focuses on enhance the local search. In order to verify the performance of our approach HMOCS, six well-...
47 CitationsSource
Cited By0