Skip navigation

Voice Portals and VoiceXML, Part 3

In Voice Portals and VoiceXML, Part 1, I provided an introduction to voice portals and VoiceXML, and in Part 2, I took a closer look at some features associated with voice browsers and VoiceXML applications. Now, I want to share some sample code that you can use to develop VoiceXML applications, and I want to describe some key development features.

VoiceXML is an XML 1.0-based environment for delivering voice-enabled Internet applications. Although a discussion about every tag in the VoiceXML specification is beyond the scope of this UPDATE, I'll review a few tags to help you understand how to construct a VoiceXML application.

In my example, I want to give the user the choice of selecting English or Spanish within a VoiceXML application. Although current voice browsers natively support only the English language, you can support other languages by phonetically spelling foreign words so that the application can recognize non-English grammar. The following VoiceXML code is a complete example of a voice dialog that lets a user select a language so that the application can transfer the user to the appropriate VoiceXML file:

   <?xml version="1.0"?>
   <!DOCTYPE vxml PUBLIC "-//Tellme Networks//Voice Markup Language
   1.0//EN" "http://resources.tellme.com/toolbox/vxml-tellme.dtd">
   
   <vxml>
      <form id="welcome">
         <block>
         <audio src="audio/welcome.wav">Welcome to Immedient.</audio>
         </block>
      </form>
      <form id="englishspanish">
         <field name="languageselection">
            <grammar>
               <!\[CDATA\[
                  \[
                     \[(english) (ingles)
                     \]   \{<option "english">\}
                     \[(spanish) (espagnol) 
                     \]   \{<option "spanish">\}
                     \[(hangup) 
                     \]   \{<option "hangup">\}
                   \]
                   \]\]>
            </grammar>

            <prompt>
               <audio src="audio/englishspanish.wav">
                           Would you like to hear information in
                           English, or Spanish?
            </audio>
            </prompt>
			
            <nomatch>
               <audio src="audio/didnotunderstand.wav">
                            I'm sorry, I didn't understand.
               </audio>
               <pause>500</pause>
            <reprompt/>
            </nomatch>

            <noinput>
            <audio src="audio/didnothear.wav">
                            I'm sorry, I didn't hear that.
               </audio>
               <pause>500</pause>
            <reprompt/>
            </noinput>
			
            <filled>
                <result name="english">
                  <goto next="en_home.vxml"/>
               </result>

               <result name="spanish">
                  <goto next="sp_home.vxml"/>
               </result>
               <result name="hangup">
                            <audio>Thank you for calling Immedient.
                            Goodbye</audio>
                  <disconnect/>
               </result>
            </filled>
         </field>
      </form>
</vxml>

Notice the XML 1.0 declaration and Document Type Definition (DTD) reference. I used the Tellme DTD because this application uses the Tellme Voice application service provider (Voice ASP) service. Depending on the voice browser software vendor or Voice ASP, the DTD can vary. This dependency reemphasizes a point I made in a previous UPDATE that VoiceXML isn't yet completely independent of the voice browser. If you want to change voice browsers, you'll probably need to modify the VoiceXML code.

The VoiceXML markup starts with <vxml> tags that describe the VoiceXML page. Within the VoiceXML page, you can see the multiple forms, and within the "englishspanish" form, you can see the "languageselection" field. The <field> tag encapsulates the logic you use when you want to capture some kind of user response.

Next, you can see the embedded grammar definition that defines words that you attempt to match (e.g., English). If the word "English" is matched, then the English option is selected. The <prompt> tags that follow the grammar definition define the audio output that the user hears. At this point, the code references a previously recorded .wav file for this prompt. If this .wav file doesn't exist, the voice browser’s Text To Speech (TTS) engine synthesizes the audio output.

After the application prompts the user, other tag groups define certain conditions. For example, if the user doesn't respond to the prompt, the <noinput> condition occurs and the application prompts the user again. If the user says something, the voice browser attempts to match the input against grammar definitions. If the application can’t determine a match, the application continues to prompt the user. If a match occurs, the <filled> condition is met and the application transfers the user to the next part of the application. As you can see, the application transfers the user to different VoiceXML files, depending on which language the user selects. Ideally, you would just pass the language variable to the same page that dynamically generates voice prompts in the appropriate language, but I'll save that discussion for a future UPDATE. Next time, I'll look at the VoiceXML application development environment, discuss IDEs, and offer a few development tips.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish