Progressive Perl for Windows: Teaching Perl How to Speak


I find having to stop what I'm doing to read a system message annoying, so I set about trying to leverage technology to help. I stumbled on Microsoft Direct Speech Synthesis, which is part of the Microsoft Speech API. SAPI comes with Windows 2000 and Windows Me. For most other Windows OSs, you can install Microsoft Speech SDK 5.0, which includes SAPI. To download and obtain more information about this software development kit (SDK), go to Downloading Speech SDK 5.0 might take awhile because it's about 140MB. If you don't plan to write Visual C++ (VC++) code that integrates with SAPI, you might want to save time and disk space by downloading version 4.0's runtime components (spchapi.exe) only. However, if you want all 21 voice engines, you need to download sapi4sdksuite.exe, which is about 40MB.

Using Direct Speech Synthesis
With Direct Speech Synthesis, you can write code that speaks to you. For example, you can use this technology to announce the time of day every 30 minutes. Writing code that speaks is quite easy, as the script in Listing 1 shows. begins by loading the Win32::OLE extension and preparing that extension to listen for events. The script uses the Win32::OLE extension to instantiate a DirectSS COM object. Preparing the extension for events is important because events indicate when Direct Speech Synthesis has finished speaking a phrase and thus is ready to speak another phrase. Specifying qw( EVENTS ) enables the script to call the Win32::OLE extension's SpinMessageLoop() function (more about this function later).

Next, the script instantiates the DirectSS object. As the code at callout A in Listing 1 shows, you need to use the DirectSS interface's globally unique ID (GUID) to procure an instance of the DirectSS object. To obtain GUIDs, I often use oleview.exe, which comes with the Microsoft Platform SDK and VC++.

After the code at callout A in Listing 1 executes, the $DirectSS variable points to the DirectSS object, which means that Direct Speech Synthesis can start talking. You just need to call the Speak() method and specify the phrase you want Direct Speech Synthesis to speak. After you pass the specified phrase to the Speak() method, Direct Speech Synthesis begins to process the request.

At this point, the script repeatedly calls the SpinMessageLoop() function, as the code at callout B in Listing 1 shows. These repeated calls are important. Components such as Direct Speech Synthesis place messages about events in a message queue. However, non-Windows-based applications (such as Perl) typically don't bother processing this message queue. Therefore, you need to explicitly call SpinMessageLoop() to force Perl to process any messages in the queue, thereby permitting Direct Speech Synthesis to continue speaking phrases. The script continuously calls the SpinMessageLoop() function until Direct Speech Synthesis has finished speaking.

That's all the code you need to enable Windows to convert text to speech. Now you just need to run the script by typing


on the command line. Direct Speech Synthesis figures out how to pronounce each word and determines the inflection needed for any punctuation.

As illustrates, applying Direct Speech Synthesis can be quite simple. However, if you use Direct Speech Synthesis in more complex applications, the code might eat up valuable CPU cycles because it needs to continuously call $DirectSS and query its Speaking property to determine whether Direct Speech Synthesis has finished speaking. You can minimize this effect by leveraging events and occasionally using a sleep cycle to free up CPU time.

Using Events for Efficiency
After you start working with Direct Speech Synthesis, you soon realize that constantly calling $DirectSS and querying its Speaking property wastes time and slows your script's execution. Fortunately, Direct Speech Synthesis has the DirectSS event interface.

Direct Speech Synthesis fires events for various reasons, such as when it starts speaking (AudioStart event) and when it finishes speaking (AudioStop event). Thus, all you need to do is wait until the AudioStop event occurs to determine when Direct Speech Synthesis has finished speaking. After this event occurs, you can reset a global variable. In other words, you can simply monitor a global variable's state instead of continually calling $DirectSS and querying its Speaking property, which saves considerable processing time. The script in Listing 2 uses this technique. Notice that the $fSpeaking variable is set to 1 before entering the while() loop. This loop continues until the variable changes to 0, which occurs when the AudioStop event fires.

To take advantage of events, you need to call the WithEvents() function right after you create the $DirectSS variable, as callout A in Listing 2 shows. When you call this function, you must pass in $DirectSS, a reference to the ProcessEvents() subroutine that will process events, and the event interface's GUID. (Some COM classes have a default event interface. For COM objects based on such classes, you don't need to specify the GUID. However, Direct Speech Synthesis doesn't define a default event interface, so you must pass in this parameter.) The WithEvents() function doesn't return a value to specify whether it was successful. Thus, you don't need to check for a return value.

When an event occurs, the script calls the ProcessEvents() subroutine, which callout B in Listing 2 shows. The script passes in $DirectSS, which specifies the target event type, and a list of event-specific parameters. In this case, you want only the AudioStop event (aka event 4), which occurs when Direct Speech Synthesis has finished speaking the last entered phrase. The subprocedure ends by setting the $fSpeaking variable to 0. reads from the standard input device (<STDIN>) until it reaches an end-of-file indicator. In this case, you indicate the end of file by pressing Ctrl+Z, then Enter.

To run, type


on the command line. By running this script, you'll soon see that provides a more efficient way to wait for Direct Speech Synthesis to finish speaking. With this technique, you can use Direct Speech Synthesis in more advanced applications, such as speaking text files and instant messages.

Speaking Text Files
Direct Speech Synthesis isn't limited to speaking a couple of phrases. As the script illustrates, you can use this technology to speak an entire text file. You'll find in the Code Library on the Windows Scripting Solutions Web site ( At the command line, you specify the text file for Direct Speech Synthesis to read aloud when you launch this script. You type the command

perl C:\temp\readme.txt

where C:\temp\readme.txt is the text file you want Direct Speech Synthesis to speak. If you don't specify a text file, the script accepts keyboard input from STDIN. Regardless of the data source, the main loop reads the input, then waits for Direct Speech Synthesis to finish speaking it.

Direct Speech Synthesis comes with different voices. In, I've added code that makes Direct Speech Synthesis speak in the voice called Sam. As the excerpt in Listing 3, page 8, shows, sets that voice, then prints a description of it.

Speaking Instant Messages
Thus far, you've only toyed with Direct Speech Synthesis, so you might be wondering about the usefulness of this technology. In the Code Library on the Windows Scripting Solutions Web site, you'll find a script called that uses Direct Speech Synthesis and MSN Messenger events to speak instant messages. This capability is enormously useful if you typically perform several tasks at once and therefore can't stop to read chat messages.

Unlike and, doesn't use the DirectSS event interface for Direct Speech Synthesis to finish speaking before proceeding. Instead, the script uses a while() loop that never exits, as Listing 4 shows. Inside this loop, the script calls the SpinMessageLoop() function and prints the current time once per second. This loop falls asleep for 100 milliseconds (ms) each iteration, which prevents the script from eating CPU time. Because this loop constantly calls SpinMessageLoop(), the script doesn't need to listen for AudioStop events. The script just listens for MSN Messenger events.

To use, you need to have the MSN Messenger Service installed on your machine. (For information about how to obtain this instant-messaging program, see "Progressive Perl for Windows: Messing with Instant Messaging," May 2001.) To run, type


on the command line.

Practically Speaking
SAPI is a little known but extremely useful technology that's available for almost every flavor of Windows. With a bit of creativity, you can use SAPI's Direct Speech Synthesis in many aspects of your job. For example, you can take advantage of this technology to announce network failures or pending drive crashes. (Typically a drive will start to fail before the crash. The event log usually receives events that indicate sectors are failing or the drive bus is having trouble.) You can even use this technology to announce that you'll be late for a meeting.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.