Match!
Masashi Unoki
Japan Advanced Institute of Science and Technology
Digital watermarkingPattern recognitionAcousticsSpeech recognitionComputer science
273Publications
12H-index
818Citations
What is this?
Publications 283
Newest
Source
#1Masashi Unoki (Japan Advanced Institute of Science and Technology)H-Index: 12
Source
#1Zhichao PengH-Index: 1
#2Xingfeng LiH-Index: 1
Last. Masato AkagiH-Index: 15
view all 6 authors...
Emotion information from speech can effectively help robots understand speaker’s intentions in natural human-robot interaction. The human auditory system can easily track temporal dynamics of emotion by perceiving the intensity and fundamental frequency of speech, and focus on the salient emotion regions. Therefore, speech emotion recognition combined with the auditory mechanism and attention mechanism may be an effective way. Some previous studies used auditory-based static features to identify...
Source
#1Shunsuke KidaniH-Index: 2
#2Ryota MiyauchiH-Index: 8
Last. Masashi UnokiH-Index: 12
view all 3 authors...
Source
Source
#1Masashi Unoki (Japan Advanced Institute of Science and Technology)H-Index: 12
#2Zhi Zhu (Japan Advanced Institute of Science and Technology)H-Index: 2
Source
#2Jessada Karnjana (Thailand National Science and Technology Development Agency)H-Index: 4
Last. Masashi Unoki (Japan Advanced Institute of Science and Technology)H-Index: 12
view all 4 authors...
Source
#1Zhichao Peng (TJU: Tianjin University)
#2Zhi ZhuH-Index: 2
Last. Masato Akagi (Japan Advanced Institute of Science and Technology)H-Index: 15
view all 5 authors...
Dimensional emotion recognition (DER) from speech is used to track the dynamics of emotions for robots to naturally interact with humans. The DER system needs to obtain frame-level feature sequences by selecting the appropriate acoustic features and duration. Moreover, these sequences should reflect the dynamic characteristics of the utterance. Temporal modulation cues are good at capturing the dynamic characteristics for speech perception and understanding. In this paper, we propose a DER syste...
Source
#1Teruki Toya (Japan Advanced Institute of Science and Technology)H-Index: 1
#2Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
Last. Masashi Unoki (Japan Advanced Institute of Science and Technology)H-Index: 12
view all 3 authors...
Beacause transmission characteristics of bone-conducted (BC) speech from the larynx to auditory systems have not yet been clarified, this paper investigates the transmission characteristics related to the BC speech perception focusing on temporal bone (TB) vibration signals and ear canal (EC) radiated speech signals. First, long-term average spectra (LTAS) of the normally produced speech signals recorded at the lips, TB and EC were analyzed. It was found that the frequency components above 2 kHz...
Source
#1Weitao Yuan (Tianjin Polytechnic University)H-Index: 1
#2Shengbei Wang (Tianjin Polytechnic University)H-Index: 1
Last. Wenwu Wang (University of Surrey)H-Index: 23
view all 5 authors...
This work proposes a simple but effective attention mechanism, namely Skip Attention (SA), for monaural singing voice separation (MSVS). First, the SA, embedded in the convolutional encoder-decoder network (CEDN), realizes an attention-driven and dependency modeling for the repetitive structures of the music source. Second, the SA, replacing the popular skip connection in the CEDN, effectively controls the flow of the low-level (vocal and musical) features to the output and improves the feature ...
Source
12345678910