Seeing speaking - two ways vision helps us understand speech
Gordon McLeod & Ben Hopson of Cirrus Logic (www.cirrus.com
Speaker identification technology is increasingly common on mobile devices. The virtual assistant becomes more compelling when it knows who you are and, in some use cases, voice ID is a more convenient way of unlocking than by face or fingerprint.
Incoming audio is first processed into features which are stable, distinctive and hard to imitate – ideally capturing unique aspects of the speaker's vocal tract. Several different visualisations are required to select features and confirm that feature extraction works consistently over a large number of speakers and environmental conditions.
Features are then fed into a classifier which compares the extracted features to enrolled users to determine if the audio is recognised. Training, tuning testing and debugging this system brings with it many of the challenges of machine learning. There is a need to visualise data in many dimensions - and it can also be difficult to determine why the system has made a particular decision. Such systems need to be tested in large trials where results are seldom clear cut, so visualisation is critical during debugging to understand overall trends, while distinguishing individual behaviours.
Becky Mead, Speech Graphics (www.speech-graphics.com
Speech audio contains rich phonetic information which can be visualized in a variety of ways. People who communicate with spoken language are acutely sensitive to the relationship between acoustic phonetics and the way speech articulators move when creating speech sounds, which is why bad lip-syncing is so jarring. Speech Graphics uses the information contained in a speech audio signal to simulate the visual (facial) motion that generated that sound. Our technology is used to generate accurate facial movement and expressions for character dialogue in a growing number of AAA video games, among other applications. Becky will show us:
- An introduction to spectrograms (a way of visualizing audio data) and the ways that linguists have used them to analyze speech phonetically
- How Speech Graphics uses that same phonetic information to simulate facial movement corresponding to a speech audio signal
Thanks to Cirrus Logic for sponsoring our food and refreshments.
Cirrus Logic's engineers and data scientists design intelligent audio chips that power the smartphone in your pocket, consumer and car audio systems, and smart homes. As a major presence in the buzzing local tech ecosystem, Cirrus Logic is proud to sponsor the Edinburgh Data Visualisation Meetup.
As usual, there's time and space if you would like to share anything.
We're always open to suggestions for topics and speakers, so let us know if you have someone or something in mind.
See you at the meetup, and do bring along your friends & colleagues.
Brendan (Hill), Ben (Bach), Uta (Hinrichs)
VENUE: this will our first meeting at our our new venue, Cirrus Logic's office in Quartermile, which we plan to alternate from now on with InSpace in the University of Edinburgh School of Informatics. See below for directions.