CarderPlanet
Professional
- Messages
- 2,549
- Reaction score
- 724
- Points
- 113
The more expensive your smartphone, the more you are at risk of information theft.
Researchers have developed a technique for extracting sounds from static images captured by smartphone cameras. Moving camera parts, such as CMOS roller shutters, optical image stabilization (OIS) and autofocus (AF) lenses, create sounds that are modulated in images as subtle distortions.
In the research paper, scientists explain that smartphone cameras create a special optical-acoustic side channel that does not require direct visibility or the presence of an object in the camera's field of view, but allows you to read this information with high accuracy.
Focusing on the limitations of the side channel, which relies on "a suitable mechanical path from the audio source to the smartphone," the researchers extracted and analyzed leaked acoustic information that can effectively identify multiple speakers at once, their gender, and even the numbers they speak.
Scientists have relied on machine learning to reconstruct information from human speech transmitted by speakers. The study was conducted from the point of view of an attacker who has a malicious application running on his smartphone, but does not have access to a microphone. However, the threat model assumes that an attacker can capture video using the victim's camera and can obtain speech samples from targeted people in advance to use as part of the training process.
Using a data set of 10,000 samples of signal digit utterances, the researchers heavily upgraded their algorithm and adjusted it to perform various tasks. For experiments, scientists used Google Pixel, Samsung Galaxy and Apple iPhone devices.
"Our estimate using 10 smartphones in the spoken digit dataset shows 80.66%, 91.28% , and 99.67% accuracy in recognizing 10 spoken digits, 20 speakers, and 2 members of different genders, respectively," the researchers reported.
Researchers believe that lower-quality cameras with a more primitive mechanism of operation will greatly limit the potential information leakage associated with this type of attack. It should also help to physically remove smartphones from speakers and add vibration-proofing materials between the phone and the transmitting surface.
The scientists also added: "We believe that the high classification accuracy obtained during our analysis, as well as the associated work using motion sensors, suggest that this opto-acoustic side channel can support more diverse malicious applications by including speech recovery functions in the signal processing pipeline."
Smartphone manufacturers can mitigate the attack by using a higher shutter speed, which can be achieved both programmatically and in hardware. The only question is whether smartphone manufacturers will take the time to do this, especially if adjusting the shutter speed will affect the overall performance of the camera.
Researchers have developed a technique for extracting sounds from static images captured by smartphone cameras. Moving camera parts, such as CMOS roller shutters, optical image stabilization (OIS) and autofocus (AF) lenses, create sounds that are modulated in images as subtle distortions.
In the research paper, scientists explain that smartphone cameras create a special optical-acoustic side channel that does not require direct visibility or the presence of an object in the camera's field of view, but allows you to read this information with high accuracy.
Focusing on the limitations of the side channel, which relies on "a suitable mechanical path from the audio source to the smartphone," the researchers extracted and analyzed leaked acoustic information that can effectively identify multiple speakers at once, their gender, and even the numbers they speak.
Scientists have relied on machine learning to reconstruct information from human speech transmitted by speakers. The study was conducted from the point of view of an attacker who has a malicious application running on his smartphone, but does not have access to a microphone. However, the threat model assumes that an attacker can capture video using the victim's camera and can obtain speech samples from targeted people in advance to use as part of the training process.
Using a data set of 10,000 samples of signal digit utterances, the researchers heavily upgraded their algorithm and adjusted it to perform various tasks. For experiments, scientists used Google Pixel, Samsung Galaxy and Apple iPhone devices.
"Our estimate using 10 smartphones in the spoken digit dataset shows 80.66%, 91.28% , and 99.67% accuracy in recognizing 10 spoken digits, 20 speakers, and 2 members of different genders, respectively," the researchers reported.
Researchers believe that lower-quality cameras with a more primitive mechanism of operation will greatly limit the potential information leakage associated with this type of attack. It should also help to physically remove smartphones from speakers and add vibration-proofing materials between the phone and the transmitting surface.
The scientists also added: "We believe that the high classification accuracy obtained during our analysis, as well as the associated work using motion sensors, suggest that this opto-acoustic side channel can support more diverse malicious applications by including speech recovery functions in the signal processing pipeline."
Smartphone manufacturers can mitigate the attack by using a higher shutter speed, which can be achieved both programmatically and in hardware. The only question is whether smartphone manufacturers will take the time to do this, especially if adjusting the shutter speed will affect the overall performance of the camera.
