Adding VoiceXML to Our .NET Wireless Repertoire, Part 2

In my April 3 column, I gave you a quick rundown of my company's CRM Central Web application, one version of which incorporates VoiceXML. I presented a small code sample that I use to get the employee ID from the user. In this column, I provide the method I use to verify and match the user's input as well as the method that lets the user update an open action item.

I tried several ways to let the users speak their user names and passwords for authentication. Unfortunately, this concept proved trickier to implement than I expected. VoiceXML currently doesn't convert voice input directly to text, and the speech recognition engine doesn't understand a password such as 55*YXDA%$-IK_ROCKS-01 (how do you pronounce that anyway?). So I tried having the user spell out the password. I stored each letter in an array using JavaScript. When the user said "Done," the system spoke the complete concatenated string back.

This approach worked about half the time; however, certain letters (e.g., b, e, g, t, v) presented problems because of their similar sounds. On Tellme Network's Web site, I discovered that my approach is discouraged for that reason. The only other way to authenticate the user was with an employee ID, but that approach could lose a layer of authentication. Finally, I found a way to use the employee ID and still keep our authentication scheme intact.

In a VoiceXML application, you define a set a rules, called "grammars." Grammars are expressions that define legal or valid input that the user can say; they return a string value when the user's input matches a particular expression. For example,

<field name="action">
            <grammar>
            <!\[CDATA\[
                  \[
                  \[dtmf-0 done end finished stop\] \{<option "done">\}
                  TM_SPEECH_DigitString
                  \]
\]\]> 
</grammar>

As users interact with the application at certain times, the application prompts them for input. The application matches the input against what is in the grammar tags. In the tag above, if the user says, "done," "end," "finished," or "stop," the return value is whatever is in the option directive. The TellMe platform provides some intrinsic grammars, and the one I used above—TM_SPEECH_DigitString—takes in an arbitrary number of spoken digits and returns a string of digits. For instance, if a user says "seven six zero nine three zero zero zero seven five," TM_SPEECH_DigitString returns "7609300075."

Depending on where you define a grammar, it has an application-, document-, form-, or field-level scope. When a match occurs, the application stores the value in a container. In the example above, the value is stored in a field container called action. In the code example in my last column, I stored the matched value for the employee ID in variable empid. Below, I use the evaluated expression and assign it the empid variable. If the user's spoken value matches the counter variable, the application can perform some type of action. I use a submit tag to post over the empid parameter:

<assign name="empid" expr="action"/>
<audio>The employee i d you said was <value expr="empid"/> </audio>
<script type="text/javascript">
<!\[CDATA\[
  var Counter;
  for (Counter = 5000; Counter < 6000; Counter++)
  \{
      if (Counter == empid)
       \{
      empid=Counter;
      break;
      \}
  \}
  if (empid != Counter)
  \{
    empid="0";
  \}     
\]\]>
</script>

<submit next="getItems.asp"  namelist="empid" method="post"/>

On getItems.asp, I use the Request Object to get the posted parameter. Then I declare a variable, assign the parameter to it, and pass the employee variable as a parameter to a stored procedure that queries SQL Server, which returns the user's name and the number of open action items associated with the user (as you can see, you can use VBScript directly inside a VoiceXML tag):

        <block>
            <audio>Hello <%=UserName%></audio>
            <pause>20</pause>
<audio>You have  <%=ItemsCount%> open action items still 
pending.</audio>
            <pause>25</pause>
</block>

The user can choose a particular item to update. The user records the update, which the application saves to the Web server. You can use several different ways and several different components to provide this feature: I used a component called SAFileUp (you can find more information about SAFileUp on the SoftArtisans Web site). Here's a sample of how the record tag works:

      <form id="wav">
      <record name="recordfilename" maxtime="20" dtmfterm="true">
            <prompt>
                  <audio>Begin recording </audio>
            </prompt>
      </record>
      <nomatch> 
                  <audio>Sorry, I didn't understand</audio> 
   <reprompt/> 
      </nomatch>
      <noinput> 
                  <audio>Sorry, I didn't hear you</audio> 
           <reprompt/> 
      </noinput>
      <filled>
            <audio> You said </audio> 
            <audio expr="recordfilename"/>
<submit next="status.asp" namelist="recordfilename" method="post" />
      </filled>
</form>

On the status.asp page, I put recordfilename into a VbScript variable, add a .wav extension, and save it to a file. In our main application, when you view the statuses for a particular company, you see an entry that you can click to have it play the .wav file. See Figure 1 for a sample Status History screen.

Note that I had 3 weeks to learn VoiceXML and develop this solution. If you have more experience in this area and see something that I could have done more efficiently, please email me at [email protected] and let me know. Or post your suggestions as Reader Comments.

Adding VoiceXML to Our .NET Wireless Repertoire, Part 2

Comments

Plain text