Genre colored glasses A blog on genre, language, and other norms Rhetoric

What Voice Recognition Software Doesn’t Recognize

12/11/2016

Siri, Bernard Goldbach, flickr

Why do Siri and buddies so often seem so stupid? They just don't get human language.

For six weeks, I’ve been depending on voice recognition software and dictating all of my text messages, emails, blog posts, any and all documents. I’ve mentioned before that I had shoulder surgery and had to wear a bulky shoulder sling, so I couldn’t type except one-handed. Thank goodness for Siri on my phone and dictation on all my computers. I've become quite adept at speaking in voice hyphen text comma though not as stuck as the character in Hilary Price's funny Rhymes with Orange strip comma I hope exclamation point

A few days ago, my surgeon gave me permission to burn the sling. (see before and after photos, below.) In honor of being sling-free, I’m dictating this paragraph and this blog post About the mistakes Voice recognition software makes in Trying to transcribe human speech (don't worry--this won't go on much longer). Wednesday nine [One thing mine] does all the time you just saw– It capitalizes words apparently at random. It also substitutes bizarre phrases and words in place of the perfectly reasonable phrases and words I say. Like writing out ”Wednesday nine” instead of “one thing mine,” as adjusted [“it just did”]. My favorite so far in this post Was in the first paragraph. Instead of ”had to wear a bulky” shoulder sling, it wrote “how do I wear a bogey.” I think I would’ve much preferred to wear a bogey.

Sling, as worn for six weeks

Sling, as hung from ceiling fan after 6 weeks

Well, I won’t keep preserving the dictation mistakes and drive us all crazy. From here on I'll correct what my dictation assistant gets wrong. Fortunately, I’ve been keeping a list of some of my favorite voice recognition errors. So let’s join Ellen DeGeneres and have fun playing with the ways computers just don’t get human language. Mostly because they just don’t get context.

And now for some of my own examples [some of these are NSFW, because dictation assistants apparently have dirty minds. But I will safety them up for you with strategically placed asterisks.]

Voice recognition software doesn’t recognize context

A little bit of context would go a long way in helping these dictators (as I started calling my dictation assistants) to figure out which words I probably meant, even if I mumbled.

In the middle of a sentence, I said the phrase “over a” and Siri substituted “Oprah.” Now I’m not a personal friend of Oprah’s [or “operas” as it wanted to say this time]. I wasn’t writing about talk shows. Or even TV. I just needed the much more common phrase ”over a.” No human would make that mistake. Both the grammar of the sentence and the meaning from the context wouldn’t work with Oprah. But Siri decided I should be talking about Oprah. So be it. [or Soviet, as Siri just said. Was I talking about recent Russian misdeeds?].

I dictated ”no human would mishear” and got back ”no human witness here.” For most human listeners, knowing that the words come in the context of a post about voice recognition software would mean they would more easily call up related words like “hear” and “mishear” rather than the unrelated “witness.” But Soviet and witness--maybe Siri has a top secret background we don't know about.

Humans do a lot of their hearing and understanding by interpreting the most likely meaning in a given context. Interpreting meaning is not the strength of machines.

Context would clear up a lot of the mistakes made with homonyms, too. My dictator actually corrected above “mishear” to “miss here.” Understandable. And I can understand and even forgive its confusing “assistants” with “assistance.” But it never chooses the correct version of “your” or “you’re,” and it usually guesses wrong for “it’s” versus “its.” Humans get those wrong, too, but the context clears up some. And they would never make the other common dictator homonym mistake of using “two” for “to.”

Homonyms can cause spelling problems for everyone. But only voice recognition software and other machines will know the difference between the homonyms, spell each one correctly, and still use the wrong one in the wrong context.

Voice recognition software doesn’t recognize dialect variations

My dictation assistant doesn’t do well when two different words sound the same, as in homonyms, but it also doesn’t do well when one word can sound in two different ways. So when I get sloppy and pronounce “thing” as “thang,” my dictator knows only to write down “thank.” On the other hand, it frequently changed “sling” to “slang.” And the dialect specialist in my house found it completely reasonable that voice recognition software would hear my “bulky” as “bogey.” Apparently, my dialect is funny to some humans as well as confusing to some machines.

My dictator has made me aware of how much Kansas seems to have influenced my original Northern Colorado dialect. I’m embarrassed to say how often it not only hears my ”thang” but also reads my “then” as “than” and my ”and” as ”in.” It really never knows which one I’m saying, and I can’t blame it.

I do blame it for not being able to distinguish negative from positive verbs. I’ve been close to trouble many times in the last six weeks from my dictator changing “do” to “don’t” and “won’t” to “will.” I was texting a new friend about her cookie business, placing an order to show my support, and Siri changed my “Christmas tree designs for the first dozen sound good” to “Christmas tree designs for the first doesn’t sound good.” Now that would’ve shown her what I thought of her business. In fact, Siri did it again just now as I repeated the story. And by the way, every “to” in that last paragraph came out first as ”two.”

Like I said, computers just don’t get how context tells us humans which word is which (or is that witch is witch? Siri misses my northern pronunciation distinction between “hwich” and “witch,” too—or is that “two”?)

Voice recognition software doesn’t recognize appropriateness

These nasty little dictators have dirty mouths, too. They don’t seem to understand appropriateness—as in, that language is not appropriate here. Or “Watch your language, young lady.”

Instead of “talk shows” in the paragraph about Oprah above, my dictator wrote ”talk sh*t.”

I drafted an email to my students about bringing snacks to a class workshop. Fortunately, I caught the mistake before I sent it, so I managed to warn my students about a classmate’s allergy with the words ”no peanuts” instead of Siri’s near-rhyming caution ”no p*nis.”

My dictator often shortens “assistant” to its first three letters. Just for fun, I figure.

I bragged to my brother in a text message at Thanksgiving that my kind family was doing all the cooking so “I didn’t cook a thing.” But Siri told him that “I didn’t f*ck a thing.” What could Siri have been thinking? Obviously not about my family’s version of Thanksgiving dinner.

Those were probably the worst examples. Similar problems pop up with auto correct, too, even without the voice recognition complication. Online you can find lots of examples of the worst auto correct mistakes. Auto correct too seems to like to embarrass us. But perhaps I’m attributing too much maliciousness to machines.

Sometimes, of course, I could understand why the dictator made the mistake. I’ve been known to mumble. Or so I’ve been told.

But other times it just seems stupid.

Voice recognition software doesn’t recognize me

I know this software is supposed to learn and get better at recognizing my voice over time. I’m sure that has happened here and there. But how then do you explain the way Siri keeps changing my name? After weeks of my signing off as Amy Devitt, thanks to Siri I signed off an email as Amy Done.

Even my first name has given it trouble. I can tell Siri what to call me, and it obeys my instruction. But when I’m signing off an email, it forgets everything I’ve taught it. About half the time it signs my emails as Any. It has also signed me off as Dwayne. But my personal favorite –and apparently one of Siri’s favorites too since it uses the signature so often–is when it signs me off as Baby.

This is just some of the fun I’ve had dictating all my texts over the last six weeks. It’s been almost as fun as wearing a big pokey shoulder sling. . . .

Yes, my dictator decided this time that Bohlke was pokey. . . .

And yes, this last time, it chose Bohlke for bulky. As the red squiggly line of spell checker signalled, even the dictator didn’t know who Bohlke was. But that unknown proper name still seemed more sensible to the dictator then (than) the way I pronounced bulky.

So maybe I really do pronounce things funny. Maybe the voice recognition software knows more than I think it does.

Nah. It still doesn’t understand context or meaning, so it still doesn’t understand human language.

Until next week, signing off,

Baby Done. That’s me

Mute Loudspeaker, Pete Linforth, Pixabay

7 Comments

Genre-colored glasses

What Voice Recognition Software Doesn’t Recognize

Author
Amy Devitt

Would you like to be notified when I publish a new post?

Archives

Previous Posts

Genre-colored glasses

What Voice Recognition Software Doesn’t Recognize

Author​Amy Devitt

Would you like to be notified when I publish a new post?

Archives

Previous Posts

Author
Amy Devitt