Voice activation is becoming an increasingly popular form of computer interface, thanks to the availability and uptake of systems such as Google Home and Amazon Alexa. These are AI assistants housed in devices that don’t need buttons or keyboards to operate them, and can organise things, complete online orders and give information on different things in response to different commands. Digital assistants may be convenient, but they may also be exposing their users to the potential risks of vocal hacking. This form of e-crime may involve mimicking or recording the voices of users in order to hijack their accounts or otherwise exploit their identities. Luckily, a team from the State University of New York (SUNY) at Buffalo and some other institutions have demonstrated their anti-voice theft software to ward off these possible threats.
Vocal Patterns and Technology
Voice-activated software is becoming very common in mobile and smart home technology. Users can ask ‘assistant’ programmes such as Siri and Cortana questions spoken out loud, or to report the results of searches in response to the same. However, voice activation can still be tricky to work at times, and can involve repeating a phrase or question a number of times before the device ‘hears’ it properly and responds. Calibrating or setting up a digital assistant can also take some repetitions as the software ‘learns’ or gets used to your vocal patterns.
This may spark off concerns about ‘voice hacking’, in which the user’s voice is recorded, synthesised and exploited for illegal gains. In addition, having your voice captured while out in public may lead to the risk of the same. This worry led a team working in computer science departments at SUNY Buffalo, West Chester University, Hong Kong’s City University, Jinan University and Wuhan University to develop software that disrupted the ability to use a voice recording to activate a device. The researchers, led by SUNY’s Aziz Mohaisen, hope to package the results of their work into a smartphone app to help others do the same.
Voice-Activation and How to Verify It
Voice-activated applications can also be used for authentication and security purposes. This fact has led to a wave of attacks based on voice-spoofing, replicating or recording strategies. This, in turn, is addressed by automatic speaker verification (ASV) which uses a range of unique vocal characteristics to distinguish genuine users from hackers mimicking or otherwise re-creating authorised voices. Previous research has indicated that even talented, highly-trained human impersonators cannot reliably fool an ASV system.
However, the researchers noted in the paper that reported their breakthrough, published in the proceedings of the 2017 IEEE International Conference on Distributed Computing Systems (ICDCS), that ASV systems are still vulnerable to recordings of a user’s voice. Therefore, a new ASV system that can also detect when a speaker is being used to generate the voice would be advantageous. The team set out to do this by exploiting the fact that electronic speakers of many kinds, including the in-built loudspeaker of a phone or PC, emits a magnetic field while active, and that many portable devices, particularly smartphones, also have a magnetometer.
This useful intersection may prove handy in the prevention of vocal data theft, as it brings conventional means of hacking and detection into conjunction. However, the speaker used to replay a stolen vocal pattern may be so small that its magnetic field is too small for a consumer-level magnetometer to detect. As a contingency for this, the team added channel size detection to their software, which may also discriminate between ‘live’ voices and recordings of the same, based on the average ‘size’ of a human mouth as a channel. Therefore, from the software’s point of view, a recording becomes a much less convincing candidate as the source of a voice compared to a person.
The team worked on the assumption that a hacker has captured vocal samples from a target, and has extrapolated this victim’s vocal patterns to an extent that they can use synthesised recordings to hack voice-activated systems and thus achieve identity theft. The researchers’ system analysed acoustic and other sensor data to assess the incoming sound’s field and the distance of its source from their microphone. This enabled the system to prevent the theoretical hacker from using a recorded sound source at a ‘safe’ distance away from their test devices. This strategy ultimately allowed it to evaluate the presence of a magnetic field, and whether or not it was moving as a smartphone used by a genuine user tends to as it is held in front of the mouth.
The study resulted in error rates, or instances in which recordings were mistaken for real voices or vice versa, of zero in normal indoor conditions when the sound sources were 4cm or 6cm away from the microphone. However, in another condition (inside a car) or at greater distances, these rates did go up to approximately 50 percent in some cases, particularly with regard to mistaking genuine users for recordings. On the other hand, this did demonstrate the 100 percent accuracy of the new system at close range. The researchers also claim to have tested a large variety of speakers, from high-end laptop speakers to low-end earphones.
Voice activation may become an increasingly common source of secure authentication and computer interaction as devices become more hands-free. Therefore, sophisticated ASV systems that detect all the various types of voice hacking including recordings, impersonation or text-to-speech synthesis, may become more and more necessary in the future. A new system developed collaboratively across U.S. and Chinese institutions uses various sensors already present in the average smartphone to tell voice recordings from genuine human speakers. Hopefully, they will be able to convert this effective system into an app soon to offer this expanded protection to the general, digital assistant-using public.
Top image: Siri voice assistant. (CC BY-SA 2.0)
Chen S, Ren K, Piao S, Wang C, Wang Q, Weng J, et al., editors. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS); 2017 5-8 June 2017.
Mariéthoz J, Bengio S. Can a Professional Imitator Fool a GMM-Based Speaker Verification System? IDIAP, 2005.