It's just like poker

I'm a big fan of Texas Hold'Em poker.  It's really quite fascinating, because it's multidisciplinary -- to succeed, you have to not only master the rules but also everything from mathematics (what's my percentage chance of winning this hand?) to logic and gaming theory (if she had a flush draw, she'd have already folded) to psychology (is he really bluffing again?).  It's a unique, complex blend of knowledge that has recently captured players and TV watching audiences ever since they put the cameras underneath the tables -- once you have an idea of what's going on in the game, you want to know more.

 

It's just like speech recognition.

 

All right, hear me out, hear me out.  How much do you *really* know about speech recognition?  Because it too is a fascinating multidisciplinary effort -- a blend of art and science.  And as in watching poker, once you've got an idea of what's going on, you may find yourself hungry to learn more.  There is the mathematics of it -- Hidden Markov Models, Viterbi searches, pattern recognition... all the algorithms that have been refined over years of research.  There is the physics of acoustics and resonances that must be captured and interpreted, along with some physiology -- the shape of the vocal tract, how the position of the tongue in one's mouth affects formant frequencies for different vowels.  Then there's linguistics -- creating extensive dictionaries of spellings and semantics, then trying to string together phonemes into those words, then words into sentences, then sentences into *meaning*.  There is the programmatic aspect of it -- what VoiceXML or Java code do you write to handle the flow of data from caller to server to engine and back?  And, perhaps most interesting, there is a human-facing psychology behind the design and creation of these systems too -- putting yourself in the shoes of the caller, writing clear prompts that encourage certain responses, testing to discover that one way of presenting information is three times more effective than another.  No one part survives without the other -- in fact, many a poor poker player or older speech recognition system has failed because the math was right but the human factors were way off.

 

I'm telling you all this because this year at Conversations, we've expanded the Sunday morning Speech University sessions to TWO simultaneous courses.  While one will be a more traditional offering, providing a practical look at our speech technology and products, the other will be a closer look at what makes speech recognition work.  Dr. Rob Kassel, a veteran in the speech industry and the current leader of our network speech product management team, has graciously agreed to spend a full 3 hours delving into the mechanics of speech recognition.  He gave a similar "Speech 101" talk at a SpeechTEK conference years ago... and the room was completely packed, people sitting on the floor or standing at the back wall, there to learn a little bit more about spectrograms and language models.  I think that's evidence that, even if you're already quite the expert in one area of speech recognition, there's always much more to learn about the others.  After all, you can never get too much poker practice.

 

The Speech University sessions are currently planned to be held from 2-5pm on the Sunday before the conference officially starts.  Make sure to indicate your interest when you register for the conference.

Published Wednesday, June 20, 2007 4:56 PM POST BY: Jeff Foley

Tags: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
required 
(required)