scenario

Abstract

Intelligent audio systems are ubiquitous in our lives, such as speech command recognition and speaker recognition. However, it is shown that deep learning-based intelligent audio systems are vulnerable to adversarial attacks. In this paper, we propose a physical adversarial attack that exploits reverberation, a natural indoor acoustic effect, to realize imperceptible, fast, and targeted black-box attacks. Unlike existing attacks that constrain the magnitude of adversarial perturbations within a fixed radius, we generate reverberation-alike perturbations that blend naturally with the original voice sample. Additionally, we can generate more robust adversarial examples even under over-the-air propagation by considering distortions in the physical environment. Extensive experiments are conducted using two popular intelligent audio systems in various situations, such as different room sizes, distance, and ambient noises. The results show that Echo can invade into intelligent audio systems in both digital and physical over-the-air environment.


Demo Audios

We provide demo audios in this part, including voice reverberation results of real-world and Echo for Down, No, Stop, and Left. We also compare the audio fidelity of our work and Fakebob.

Group 1: Reverberation - Down

Down #1: Real-world
Down #1: Adversarial
Down #2: Real-world
Down #2: Adversarial
Down #3: Real-world
Down #3: Adversarial
Down #4: Real-world
Down #4: Adversarial

Group 2: Reverberation - No

No #1: Real-world      
No #1: Adversarial
No #2: Real-world
No #2: Adversarial
No #3: Real-world
No #3: Adversarial
No #4: Real-world
No #4: Adversarial

Group 3: Reverberation - Stop and Left

Stop #1: Real-world  
Stop #1: Adversarial
Left #1: Real-world
Left #1: Adversarial

Group 4: Audio fidelity comparison - Fakebob

B0: Raw                       
B1: Fakebob #1
B2: Fakebob #2