PowerShell Speech Recognition: How To Set up Voice Commands and Responses

Learn how to create a PowerShell script that can listen to your voice and respond with spoken words.

Brien Posey

September 6, 2024

10 Min View
ITPro Today

This tutorial demonstrates how to use voice commands to control your computer with PowerShell. Expert Brien Posey will walk you through creating a script that listens to your spoken commands and responds with spoken replies. Whether you’re a PowerShell enthusiast or simply interested in voice-controlled automation, this step-by-step guide will get you started with PowerShell’s speech recognition capabilities.

The following transcript has been edited for clarity and length.

Transcript:

Brien Posey: Hello, greetings, and welcome. I am Brien Posey. I want to show you a rather unique PowerShell script in this video.

Recently, I created a PowerShell script that essentially acted as a text-to-speech engine. In other words, you could generate a string within PowerShell and then have PowerShell verbally speak the contents of that string.

It was a simple program, but after I finished it, I began to wonder if it might be possible to do the opposite: to set up a PowerShell script that would listen for spoken words and then be able to recognize the words and interpret that as input. Well, it is possible, and in this video, I want to show you how to do just that.

You can see my script:

## Adapted From https://outnull.wordpress.com/2016/11/14/powershell-speech-recognition/
# Allow PowerShell to Speak
Add-Type -AssemblyName System.Speech
$Talk = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
#Setup the Speech Recognition Engine Object
$SpeechRecognitionEngine = New-Object -TypeName System.Speech.Recognition.SpeechRecognitionEngine
#Define the verbal commands to be supported by the script
$Grammar = New-Object -TypeName System.Speech.Recognition.GrammarBuilder
$Grammar.Append(“Hello”);
$SpeechRecognitionEngine.LoadGrammar($Grammar);
$Grammar = New-Object -TypeName System.Speech.Recognition.GrammarBuilder
$Grammar.Append(“Exit”);
$SpeechRecognitionEngine.LoadGrammar($Grammar);
$SpeechRecognitionEngine.InitialSilenceTimeout = 15
$SpeechRecognitionEngine.SetInputToDefaultAudioDevice();
$CMDBoolean = $false;
While ($CMDBoolean -eq $False) {
         $SpeechRecognize = $SpeechRecognitionEngine.Recognize();
         $Conf = $SpeechRecognize.Confidence;
         $MyWords = $SpeechRecognize.text;
         if ($MyWords -match “hello” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Hello”);
         }
         if ($MyWords -match “exit” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Goodbye”);
                   $CMDBoolean = $True;
         }
}

Before I show you how the script works, I want to quickly point out that I adapted the script from another script found at the above URL. However, I didn't just copy and paste the script. I rewrote a good bit of this script. If you were to open that webpage and look at the script, you can see that my code looks quite different. I did things in a significantly different way. Even so, the basic functionality of the script and what it does is still the same from one to the other. So, I wanted to be sure and give credit where credit is due.

Related:How To Use PowerShell for Automated Event Response

Setting up Speech Synthesis

Let's look at how this works.

Add-Type -AssemblyName System.Speech
$Talk = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer

The first section of the script allows PowerShell to speak and is identical to what I used in the script I talked about in previous video, where I created the text-to-speech engine for PowerShell.

So, I am loading the System.Speech assembly, and then I am creating a New-Object. That New-Object is called $Talk. and that object is of type System.Speech.Synthesis.SpeechSynthesizer. That is what's going to allow PowerShell to speak.

You may be wondering why I have a speech synthesizer in the script. After all, I said that the purpose is to accept spoken words as input. However, this script does both. It is going to be able to listen to spoken words, and it is going to be able to respond with text-to-speech.

Let me skip forward, and I will show you what I am doing.

Related:Using ChatGPT as a PowerShell Debugging Tool

While ($CMDBoolean -eq $False) {
         $SpeechRecognize = $SpeechRecognitionEngine.Recognize();
         $Conf = $SpeechRecognize.Confidence;
         $MyWords = $SpeechRecognize.text;
         if ($MyWords -match “hello” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Hello”);
}
         if ($MyWords -match “exit” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Goodbye”);
                   $CMDBoolean = $True;
         }
}

This script is only going to have a vocabulary of two words. It would be easy to add additional words. To keep things simple, I am only going to use two words. Those two words are “Hello” and “Exit.” So, if the computer hears me say “Hello,” it will say “Hello” back.

If I return to the first few lines of code, you can see that I created a variable called $Talk that was set equal to my object of System.Speech.Synthesis.SpeechSynthesizer. In other words, anytime I want the computer to talk, I will reference this $Talk variable.

If I say “Exit,” the computer will say “Goodbye.”

So, the only thing that the script does is it listens for me to talk. If I say “Hello,” it will say “Hello” back. If I say “Exit,” it will say “Goodbye,” and the script will exit.

Configuring Speech Recognition

Let's talk about how the rest of this script works.

Next, we need to set up a SpeechRecognitionEngine object:

$SpeechRecognitionEngine = New-Object -TypeName System.Speech.Recognition.SpeechRecognitionEngine

I am setting a variable $SpeechRecognitionEngine. I am setting it equal to a New-Object of type System.Speech.Recognition.SpeechRecognitionEngine. I am creating a speech recognition object.

$Grammar = New-Object -TypeName System.Speech.Recognition.GrammarBuilder
$Grammar.Append(“Hello”);
$SpeechRecognitionEngine.LoadGrammar($Grammar);
$Grammar = New-Object -TypeName System.Speech.Recognition.GrammarBuilder
$Grammar.Append(“Exit”);
$SpeechRecognitionEngine.LoadGrammar($Grammar);

Next, we need to define the verbal commands accepted by the script. As I mentioned a moment ago, I only have two. However, it would be super simple to add additional commands.

Related:Getting Started With Custom Shortcut Menus in PowerShell

For the first command, I am creating an object and calling the object $Grammar. The object is of type System.Speech.Recognition.GrammarBuilder. GrammarBuilder is an object type that will allow us to build a vocabulary for our speech recognition engine.

And so, the first thing we do is use $Grammar.Append and then supply the word we want to add to the vocabulary – in this case, (“Hello”).

Then, we call the SpeechRecognitionEngine and provide it with the contents of the GrammarBuilder, which is one word, “Hello.”

You can see that I have repeated one of the lines that I used earlier. Once again, I define a variable called $Grammar and set it equal to New-Object -TypeName. I am setting that type equals System.Speech.Recognition.GrammarBuilder. Why am I repeating that line? I am doing this because I want this script to support multiple words, not just a single word. The easiest way to repopulate the GrammarBuilder with a new word is to recreate the object. Recreating an object clears out what was there to begin with. That is an easy way to add a word to the SpeechRecognitionEngine.

So, we have the section that adds the word “Hello,” and then we have the section just beneath it that is almost identical, that adds the word “Exit.”

Implementing the Script Logic

Let's look at the rest of the script.

$SpeechRecognitionEngine.InitialSilenceTimeout = 15
$SpeechRecognitionEngine.SetInputToDefaultAudioDevice();
$CMDBoolean = $false;

This section sets a few variables that control the script’s basic behavior. I am setting an InitialSilenceTimeout of 15 seconds. I am also setting the SpeechRecognitionEngine to use the default audio device. I have one microphone hooked up to my computer, which will be used as a result of this line: $SpeechRecognitionEngine.SetInputToDefaultAudioDevice();. Next, I created a variable called $CMDBoolean and set it to $false. This Boolean variable will control my loop. The loop begins on the next line of code.

While ($CMDBoolean -eq $False) {
         $SpeechRecognize = $SpeechRecognitionEngine.Recognize();
         $Conf = $SpeechRecognize.Confidence;
         $MyWords = $SpeechRecognize.text;
         if ($MyWords -match “hello” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Hello”);
}
         if ($MyWords -match “exit” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Goodbye”);
                   $CMDBoolean = $True;
         }
}

Let's look at how this loop works. We are checking the $CMDBoolean variable and ensuring that it is $False, which we set to $False. So, as long as that is $False, what we are going to do is enable speech recognition, and then I have set up another variable, $Conf, equal to $SpeechRecognizer.Confidence. We need to know how confidently the SpeechRecognizer recognizes a word correctly. Then, we have $MyWords = $SpeechRecognize.text. In other words, whatever I say will be written to a variable called $MyWords.

Next, we are looking at the $MyWords variable. We have an If statement: if $MyWords -match “hello” -and [double]$conf is greater than 0.85, then we are going to speak the word “Hello.” So, what are we doing with the confidence variable? We are looking at how confident this system is that it recognizes the word correctly. It is saying if the system is more than 85% sure that it recognized the word correctly, then we are going to assume that it is correct at that point, and we will go ahead and speak the word “Hello.”

We do the same thing in the next section:

if ($MyWords -match “exit” -and [double]$conf -gt 0.85) {
                   $Talk.Speak(“Goodbye”);
$CMDBoolean = $True;

If $MyWords matches “Exit” and we are at least 85% confident, it will say, “Goodbye.” But we are taking an additional action right here: We are also setting the $CMDBoolean variable to $True, which will terminate the loop. When the loop terminates, the entire script is going to terminate. So, if we want to end the script, we say “Exit,” then the script will say, “Goodbye,” and change the value of that $CMDBoolean variable. That will cause the script to exit.

Running the Script

Let's go ahead and run the script.

I am going to switch over to PowerShell, and I will go ahead and type my script name. I will press R to run the script.

Hello.

PowerShell voice: Hello.

Posey: Exit

PowerShell voice: Goodbye.

Posey: As you can see, I say “Hello,” and the script responds by saying “Hello.” I say the word “Exit,” the script responds by saying, “Goodbye,” then terminates.

That is how you can perform speech recognition in PowerShell. I am Brien Posey. Thanks for watching you.

About the Author

Brien Posey

Brien Posey is a bestselling technology author, a speaker, and a 20X Microsoft MVP. In addition to his ongoing work in IT, Posey has spent the last several years training as a commercial astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space.

https://brienposey.com/

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like