Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch. Match!

Data Augmentation in Deep Learning-Based Fusion of Depth and Inertial Sensing for Action Recognition

Published on Jan 1, 2019
· DOI :10.1109/lsens.2018.2878572
Neha Dawar (UTD: University of Texas at Dallas), Sarah Ostadabbas10
Estimated H-index: 10
(NU: Northeastern University),
Nasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
Cite
Abstract
This article covers a deep learning-based decision fusion approach for action or gesture recognition via simultaneous utilization of a depth camera and a wearable inertial sensor. The deep learning approach involves using a convolutional neural network (CNN) for depth images captured by a depth camera and a combination of CNN and long short–term memory network for inertial signals captured by a wearable inertial sensor, followed by a decision-level fusion. Due to the limited size of the training data, a data augmentation procedure is carried out by generating depth images corresponding to different orientations of the depth camera and by generating inertial signals corresponding to different orientations of the inertial sensor placement on the body. The results obtained indicate the positive impact of the decision-level fusion as well as the data augmentation on the recognition accuracies.
  • References (12)
  • Citations (1)
Cite
References12
Newest
Published on Jun 1, 2018
Neha Dawar (UTD: University of Texas at Dallas), Nasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
This paper presents a convolutional neural network-based sensor fusion system to monitor six transition movements as well as falls in healthcare applications by simultaneously using a depth camera and a wearable inertial sensor. Weighted depth motion map images and inertial signal images are fed as inputs into two convolutional neural networks running in parallel, one for each sensing modality. Detection and thus monitoring of the transition movements and falls are achieved by fusing the movemen...
Published on Jan 1, 2018in IEEE Access 4.10
Neha Dawar (UTD: University of Texas at Dallas), Nasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
This paper presents a real-time detection and recognition approach to identify actions of interest involved in the smart TV application from continuous action streams via simultaneous utilization of a depth camera and a wearable inertial sensor. Continuous action streams mean when actions of interest are performed continuously and randomly among arbitrary actions of non-interest. The developed approach consists of a detection part and a recognition part. In the detection part, two support vector...
Published on Aug 9, 2017in Sensors 3.03
Aras Yurtman4
Estimated H-index: 4
,
Billur Barshan24
Estimated H-index: 24
Most activity recognition studies that employ wearable sensors assume that the sensors are attached at pre-determined positions and orientations that do not change over time. Since this is not the case in practice, it is of interest to develop wearable systems that operate invariantly to sensor position and orientation. We focus on invariance to sensor orientation and develop two alternative transformations to remove the effect of absolute sensor orientation from the raw sensor data. We test the...
Published on Jan 1, 2017in IEEE Access 4.10
Chen Chen23
Estimated H-index: 23
(UCF: University of Central Florida),
Mengyuan Liu7
Estimated H-index: 7
(PKU: Peking University)
+ 3 AuthorsNasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
This paper presents a local spatio-temporal descriptor for action recognistion from depth video sequences, which is capable of distinguishing similar actions as well as coping with different speeds of actions. This descriptor is based on three processing stages. In the first stage, the shape and motion cues are captured from a weighted depth sequence by temporally overlapped depth segments, leading to three improved depth motion maps (DMMs) compared with the previously introduced DMMs. In the se...
Published on Aug 1, 2016in IEEE Transactions on Human-Machine Systems 3.33
Pichao Wang12
Estimated H-index: 12
(UOW: University of Wollongong),
Wanqing Li19
Estimated H-index: 19
(UOW: University of Wollongong)
+ 3 AuthorsPhilip Ogunbona19
Estimated H-index: 19
(UOW: University of Wollongong)
This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the tr...
Published on Sep 1, 2015 in ICIP (International Conference on Image Processing)
Chen Chen23
Estimated H-index: 23
(UTD: University of Texas at Dallas),
Roozbeh Jafari29
Estimated H-index: 29
(UTD: University of Texas at Dallas),
Nasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
Human action recognition has a wide range of applications including biometrics, surveillance, and human computer interaction. The use of multimodal sensors for human action recognition is steadily increasing. However, there are limited publicly available datasets where depth camera and inertial sensor data are captured at the same time. This paper describes a freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities. These modalities include RGB vid...
Published on Jun 1, 2014in IEEE Sensors Journal 3.08
Kui Liu10
Estimated H-index: 10
(UTD: University of Texas at Dallas),
Chen Chen23
Estimated H-index: 23
(UTD: University of Texas at Dallas)
+ 1 AuthorsNasser Kehtarnavaz31
Estimated H-index: 31
(UTD: University of Texas at Dallas)
This paper presents the first attempt at fusing data from inertial and vision depth sensors within the framework of a hidden Markov model for the application of hand gesture recognition. The data fusion approach introduced in this paper is general purpose in the sense that it can be used for recognition of various body movements. It is shown that the fusion of data from the vision depth and inertial sensors act in a complementary manner leading to a more robust recognition outcome compared with ...
Published on Jun 1, 2012 in CVPR (Computer Vision and Pattern Recognition)
Jiang Wang17
Estimated H-index: 17
(NU: Northwestern University),
Zicheng Liu40
Estimated H-index: 40
(Microsoft)
+ 1 AuthorsJunsong Yuan38
Estimated H-index: 38
(NTU: Nanyang Technological University)
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action a...
Published on May 1, 2012in IEEE Sensors Journal 3.08
Ruize Xu1
Estimated H-index: 1
(MIT: Massachusetts Institute of Technology),
Shengli Zhou6
Estimated H-index: 6
(CUHK: The Chinese University of Hong Kong),
Wen J. Liz32
Estimated H-index: 32
(CUHK: The Chinese University of Hong Kong)
This paper presents three different gesture recognition models which are capable of recognizing seven hand gestures, i.e., up, down, left, right, tick, circle, and cross, based on the input signals from MEMS 3-axes accelerometers. The accelerations of a hand in motion in three perpendicular directions are detected by three accelerometers respectively and transmitted to a PC via Bluetooth wireless protocol. An automatic gesture segmentation algorithm is developed to identify individual gestures i...
Published on May 1, 2009in IEEE Transactions on Consumer Electronics 2.08
W Weilun Lao6
Estimated H-index: 6
(TU/e: Eindhoven University of Technology),
Jungong Han21
Estimated H-index: 21
(TU/e: Eindhoven University of Technology)
With the continuous improvements in video-analysis techniques, automatic low-cost video surveillance gradually emerges for consumer applications. Video surveillance can contribute to the safety of people in the home and ease control of home-entrance and equipment-usage functions. In this paper, we study a flexible framework for semantic analysis of human behavior from a monocular surveillance video, captured by a consumer camera. Successful trajectory estimation and human-body modeling facilitat...
Cited By1
Newest
Published on Dec 1, 2018 in ICIET (International Conference Innovation Engineering and Technology)
Masud Rana1
Estimated H-index: 1
(KUET: Khulna University of Engineering & Technology),
Mazed Rayhan Shuvo (KUET: Khulna University of Engineering & Technology)
In sensor nodes, localization is very important characteristic in wireless sensor networks. By using localisation technique, it can find out the position of any sensor node in the network. Technically, unlicensed users in this network share the spectrum of primary users using a spectrum sensing process while do not cause significant interference to the primary users. Unfortunately, the spectrum sensing process is hampered by security problem called primary user emulation attack. In this paper, t...