VPL Lab 7 - Task Learning via Speech Recognition

Glossary Item Box

VPL User Guide: Getting Started

See Also Microsoft Robotics Developer Studio Send feedback on this topic

VPL Lab 7 - Task Learning via Speech Recognition

In this lab you will write code to allow a human to joystick the robot via speech. You will also enable the human to teach the robot a simple task by sequencing multiple commands. This lab covers some of the more advanced features of VPL. In particular, it uses Lists and discusses concurrency related issues.

This is a complex diagram. You might find it difficult to reproduce the diagram based solely on the text. Therefore it is preferable if you work from the diagram supplied with RDS and just follow through the text as an aid to understanding.

This lab is provided in the language. You can find the project files for this lab at the following location under the Microsoft Robotics Developer Studio installation folder:

Samples\VplHandsOnLabs\Lab7

This lab teaches you how to:

  • Getting Started
  • Get the human's command
  • Verbally joystick the robot
  • Use Lists to keep records
  • Remove elements from a list
  • Calculate the difference between two times
  • Use recursion to perform the learned task

See Also:

  • Recursion, Activities and Exclusivity in VPL

Prerequisites

Hardware

This lab is written for the iRobot Create, but any robot with a differential drive could be substituted. It can be run using a simulated Create. You will also need a microphone connected to the computer.

Software

This lab is designed for use with VPL, you will also need the speech recognition software that comes standard with Windows Vista or XP. You should configure speech recognition using the Windows Control Panel before attempting this lab. Otherwise, if you have problems, you will not know if it is a mistake in the program or the speech recognition is not working.

Problem Description

This section outlines the Task Learning problem.

The robot must be able execute the following drive related commands:

  • "Drive Forwards" - set the DrivePower in both wheels to 0.1.
  • "End Drive" - set the DrivePower in both wheels to 0.0
  • "Turn Left" - turns the robot left by 45 degrees (by calling RotateDegrees).
  • "Turn Right" - turns the robot right by 45 degrees.
  • "Turn Around" - turns the robot 180 degrees.

There are three additional commands:

  • "Begin Learning" - when in the learning state the robot records the commands it executes so that the human can teach it a task.
  • "End Learning" - this command instructs the robot to stop recording commands.
  • "Perform Actions" - if the robot has learnt a task (and learning is ended) it should respond to this command by executing the actions in the task.

The objective is for you to be able to joystick the robot verbally. As such the robot should always execute any drive command it is given. The additional commands just control whether or not the robot is recording the commands it is executing.

Some of the command choices may seem unusual. For instance you might think that "Stop" is a better command than "End Drive". You can change the phrases to anything you like and also add more phrases of your own once you understand how it works. You might find that the speech recognizer sometimes confuses certain phrases so it is best to select phrases that are quite different.

Getting Started

For the robot to receive verbal input, you need to add the speech recognition service to your diagram. Select Speech Recognizer from the Services toolbox and drag it onto the diagram.

The Speech Recognizer service uses a grammar that defines the words and phrases that should be recognized. Without a grammar, the Speech Recognizer does not recognize anything and your program will not work. The easiest way to set up a grammar is to run the Speech Recognizer Gui service which provides a web page as an interface. Drag this service to the diagram as well. It does not require any connections, you just need to put it on the diagram so that it will be started when you run the program.

Getting Started

Getting Started - Speech Recognizer and Gui activities

Run the program (you might be asked to save it first) with just these two blocks. Once the program has started, open a web browser and go to the URL https://localhost:50000/speechrecognizergui to see the interface to the Speech Recognizer service.

Step 1: Make the main diagram: get the human's commands

Whenever you speak, you want to know what the speech recognizer thought you said. To display this information without popping up a whole bunch of dialog boxes, you can use a Flexible Dialog. Drag one to the diagram and make sure it is selected. In the Properties toolbox, select Set initial configuration from the drop-down list. Make sure that you check the box beside the Visble property or the dialog will not appear on the screen.

Click on the plus (+) beside the Count at the top of the Controls area. Fill in the details as in the following screenshot. Note that there should be four controls on the dialog, but only two are visible in the figure. All of the details are in the table below the figure.

FlexibleDialog1

FlexibleDialog1 - Set initial configuration

Id ControlType Text Value
Label1LabelLast Command Heard:Last Command Heard:
LastCommandTextbox
Label2LabelReplayed Command:Replayed Command:
ReplayedCommandTextbox

When you run the diagram, the Flexible Dialog will appear as shown below. Verbal commands that are recognized are displayed as shown in the figure. This is a useful feedback mechanism so that you know whether the speech recognition is working or not.

FlexibleDialog2

FlexibleDialog2 - As displayed

Notice that there are two label/textbox combinations. There is no need to use a textbox because you will not be entering any information. This is just for illustration. Also, there are no buttons on this dialog because you do not need to close it. A Flexible Dialog is a much better choice than Simple Dialogs for this purpose because Simple Dialogs quickly get annoying and have to be dismissed.

Add a Calculate block, connect it to the notifications port of Speech Recognizer, and select SpeechRecognized. Inside the Calculate, type Text to get the string the service believes was just spoken.

Make another connection from the notification pin to the Flexible Dialog. Select UpdateControl as the request and set the parameters in the Data Connections dialog as shown below:

FlexibleDialog3

FlexibleDialog3 - Data Connections

Pass the string that was spoken to a new activity, called TaskLearner. Add this new activity to your diagram. Open it up and rename its action to TakeHumanInput. Add one input, a string, called PersonSaid.

TaskLearner

TaskLearner - Actions and Notifications

Back in the main diagram, connect the new activity, TaskLearner, to the Calculate block.

Main Diagram

Main Diagram - Process Speech Commands

Step 2: Verbally joystick the robot

The first part of the task is verbally joysticking the robot. Add a new activity to the TakeHumanInput action in TaskLearner. Name this activity ExecuteWithoutTiming. Later when the robot is executing actions and learning, you will be using timing. For instance, you need to know how long to spend driving forwards, before executing the next command.

Call the input to the new activity Command (of type string). The activity requires no outputs.

Once you have setup your activity's input, it is time to code the activity itself. This is a very simple activity to write. All you need is an If statement that evaluates the Command input and decides what command to pass to the Generic Differential Drive. The outputs from the Generic Differential Drive blocks should then be merged and connected to the activity's output. Note that you must use the WaitForDriveNotification service to wait for the RotateDegrees operations to complete. However, as its name suggests, this activity does not do any timing so SetDrivePower commands are just sent straight to the robot.

ExecuteWithoutTiming

ExecuteWithoutTiming - Completed diagram

You can choose the values for the rotations. Angles of 30 or 45 degrees for "Turn Left" and "Turn Right" are useful for making small rotations. However, you might want to set the angles to 90 degrees which is a quarter-turn. Obviously, "Turn Around" should be 180 degrees.

Select appropriate power settings for your robot.

Once you have finished, set the manifest for the Generic Differential Drive to iRobot.Drive.Manifest.xml or IRobot.Create.Simulation.Manifest.xml, or any other robot manifest that you like. Note that the selected robot must support the Rotate Degrees operation. If it does not, you will have to modify the diagram to use timing for the rotations as well and also to stop the robot when you turn left or right.

In TakeHumanInput action in TaskLearner connect up the ExecuteWithoutTiming action you created earlier to the input and output of the activity. Make sure you pass in the value of PersonSaid as the Command.

ExecuteWithoutTiming

ExecuteWithoutTiming - Executing the action

You are now ready to run your program and test out the verbal joystick capabilities! The first time you run it, you will need to enter some commands for the speech recognizer. (This is not necessary if you run the diagram that came with RDS). When the Flexible Dialog pops up, open a web browser and navigate to https://localhost:50000/speechrecognizergui.

Add the following phrases to the dictionary leaving the Semantic Value field blank.

  • "Drive Forwards"
  • "Drive Backwards"
  • "End Drive"
  • "Stop"
  • "Turn Left"
  • "Turn Right"
  • "Turn Around"

Now is also a good time to add in the other phrases that are required:

  • "Begin Learning"
  • "End Learning"
  • "Perform Actions"

When you have finished entering the phrases, the dictionary in the web page should look like the following:

Speech Recognizer Gui

Speech Recognizer Gui - Enter these phrases into the dictionary

Try speaking some commands and see what happens. If they are recognized, they will appear in the Speech Events as illustrated in the next figure:

Speech Recognizer Gui

Speech Recognizer Gui - Recognized phrases

You should be able to control the robot using voice commands.

Step 3: Tracking state

Now you have the ability to verbally joystick the robot, the next step is to enable the robot to learn a task and repeat it. In order to do this you need to track the robot's state.

When the robot is instructed to "Begin Learning" you need to update the robot to record that it is now in the Learning state. If you say "End Learning" when the robot is in the Learning state then you should record that the robot has Learnt. It may be the case that the robot was given no commands in between these two instructions of course, so if the robot is instructed to "Perform Actions" when no actions have actually been learnt the code should be able to handle this.

On the Start page of the TaskLearner activity initialize two variables, Learning and Learnt to false.

Now update these two variables in TakeHumanInput according to the value of PersonSaid. Add an If block to TakeHumanInput and connect it to the input of the action. Disconnect ExecuteWithoutTiming from the input of the action and reconnect it to the output of the Else branch of the If statement (make sure the input is still being passed in correctly).

Next, add the conditions, value.PersonSaid == "Begin Learning" and value.PersonSaid == "End Learning" && state.Learning. Set the variables appropriately for each condition.

The final step is to merge the control flow from the two conditions you just created with ExecuteWithoutTiming control flow that is currently connected to the action result. The easiest way to do this is to click on the output of the last part of one of your If condition control flows and drag the cursor over to the line connected to the action output. When your cursor reaches the line, just let go of the mouse button and VPL will ask you if you would like to make a Merge or a Join. Choose Merge, and do not forget to also hook up the remaining control flow.

TakeHumanInput

TakeHumanInput - Temporary diagram

Step 4: Write the activity ExecuteAction and account for the timing

You are now going to start working on the part of the code where the robot must record the sequence of actions it takes to learn a task. Every time the robot receives a command from you, the robot will execute it. You will also record the command, and the time at which the robot started executing the command.

There are a number of ways you could choose to structure this code. One way is to encapsulate the part of the code that instructs the robot to carry out the action and queries the Timer for the time in an activity, called ExecuteAction. This activity will be very similar to the previous activity, ExecuteWithoutTiming, except you will pass as output everything that must be stored, namely the command and the time information.

Add the new activity ExecuteAction to TakeHumanInput. To start with, add one input, a string, Command. Next add ExecuteWithoutTiming and connect it to the action's input pin, but don't connect up the action result.

The next task is to get the current time. Connect the output of ExecuteWithoutTiming to a Timer, which you can add from the Services toolbox, and select GetCurrentTime. The current time is now available by connecting to the output of the Timer. There is no List of type Time in VPL, this means you can't store a series of Time objects. As such you will have to return the Hour, Minute, Second, and Millisecond components of the time instead. These components are all ints and can thus be stored in Lists. Note that there is an implicit assumption that the robot is not running across a day boundary in time, i.e. midnight!

Connect four Calculate blocks to the Timer to extract each of the required Time components. To pass all this information as part of the action result, combine these values in a Join. In addition the Command must be part of the action result, so add this in as well similar to the following code.

ExecuteAction1

ExecuteAction1 - Completed diagram

Before connecting up to the action result, add the appropriate outputs in the Actions and Notifications dialog.

ExecuteAction2

ExecuteAction2 - Inputs and outputs

When you connect to the action result from the Join make sure you set each of the outputs correctly. Once you have done that, your activity is complete!

Result

Result - Assign the data to the outputs

Step 5: Recording actions and timing

When the robot receives commands in the Learning state it needs to record these commands so it can execute them sequentially when later instructed. The robot also needs to store the time each command was given so it can correctly carry out the task. Use the List type to store this information.

Go to the Start action page for the TaskLearner activity. (Select it from the drop-down list at the top of the TaskLearner page). To create a List variable drag in a green List box from Basic Activities and connect it to a Variable block. Name the variable ActionList and set its type to List of string.

Create a List

Create a List - The ActionList will hold the list of learned commands

You also need to create lists for the hours, minutes, seconds and milliseconds. (See the figure below). These variables should be of type List of int. They will record the duration of each motion.

Start page of TaskLearner

Start page of TaskLearner - Initialize the variables and lists

Next you need to write code to add to the lists. In TakeHumanInput, make add a new Activity block and name it AppendToLists. The input of this activity should be a list of each of the types we just created, as well as an element to append to each of the lists. The output should be the new lists.

AppendToLists1

AppendToLists1 - Inputs and outputs

To append to a list, use ListFunctions from Basic Activities. Add one of these blocks to your diagram and select Append from the drop-down menu. Note that the ListFunctions work by returning a new List, equivalent to the old list, except for the requested change. Append requires two inputs, an item to append, and a List (of an appropriate type) to append it to. It is very important to note that a ListFunctions block act like a Join. The control flow must reach the ListFunctions block at both inputs before it will continue.

Keeping all this mind, it is time to fill out the code in your AppendToLists activity. As always, remember to carefully set the outputs.

AppendToLists2

AppendToLists2 - Completed diagram

Step 6: Put the activities together to respond to commands and record them

You can now use the two activities, ExecuteAction and AppendToLists, to write the code for the case when the robot receives a command when it is in the Learning state.

Start by adding another condition to the If statement in TakeHumanInput - state.Learning. If this condition is satisfied, then call the action in ExecuteAction passing in value.PersonSaid. ExecuteAction will return the Command, as well as the Hour, Minute, Second, and Millisecond at which it was begun. You need to pass these values to AppendToLists, so connect the two activities. The Data Connections dialog for this line is shown in the following figure.

Data Connection

Data Connection - your values should look like the ones here.

AppendToLists returns new lists, so you need to assign these new lists to the list variables. Make these assignments and combine the data flow in a Join. You can then connect the Join to the result merge and you are finished handling the case where the robot receives a command when Learning!

Learning Case

Learning Case - Code to handle commands received in the Learning state

Step 7: Start writing code to execute a learned task

The final command to respond to is the "Perform Actions" command. In other words, you need to write the code to carry out the sequence of actions that the robot has learned.

In TakeHumanInput add the appropriate condition to the If statement: value.PersonSaid == "Perform Actions" && state.Learnt. Next, create a new activity, and name it ExecuteLearnedActions. This activity has no outputs, but requires each of the lists as input. Connect your new activity to the new If condition, and connect its output to the result merge.

TakeHumanInput

TakeHumanInput - Completed diagram

Step 8: Write an activity to pop from the lists:

You need some helper code to write ExecuteLearnedActions. The action in ExecuteLearnedActions can be written as a recursive method. This will be good practice at writing recursion in VPL! The idea behind the recursion, is to pass the action the list of commands and time information. It will start executing the first command in the list and then call the itself again after waiting an appropriate amount of time, passing in the lists with the first elements popped off the top. Eventually the lists will be empty and it will stop calling itself.

Add a new Activity box to your code and call it PopFromLists. The input and output to this activity are the lists you need to pop from.

To pop from a list you can use a ListFunctions block and select RemoveItem from the drop-down menu. This function requires two inputs, a List and an index (of type int). Since you want to pop from each of the lists, the index you want is zero. When you finish writing this activity, your code should look similar to the code in the figure following. Remember to set the return values.

PopFromLists

PopFromLists - Completed diagram

Step 9: Calculate the time difference:

If the command currently at the top of the Command list is not "Drive Forwards" (or backwards), you can simply call ExecuteActionWithoutTiming, PopFromLists and recurse (call the same action from inside itself). However, if the Command is "Drive Forwards" (or backwards), you need to compute the amount of time between when this command was started and when the next command in the list was started. This is the amount of time to wait before recursing.

Create a new activity called CalculateTimeDifference. The input is the start time and the end time (in terms of the hour, minute, second and millisecond components) and the output should be the int valued time difference in milliseconds.

Actions and Notifications

Actions and Notifications - make these inputs and output.

To calculate the difference in terms of milliseconds, you can simply calculate the pair-wise difference, normalizing each difference to milliseconds, and then add the result of each difference.

CalculateTimeDifference

CalculateTimeDifference - Completed diagram

Cc998513.hs-tip(en-us,MSDN.10).gif

If the time difference is greater than 60 seconds then you might have a problem running your VPL. The Timer service will not accept time delays greater than 60 seconds. It does this because VPL sets a timeout of 60 seconds on all service requests to avoid hanging if a service becomes unresponsive, and the Timer would be aborted anyway.

Step 10: Put it all together using recursion

Lastly, you should have a custom activity called ExecuteLearnedActions with an action of Run. The final code for the Run action is shown in the following screenshot. If you are entering the code yourself, you need to copy the diagram below. (If you are using the diagram supplied with RDS, then you already have this code). This is a complex diagram. If you can build this successfully, you are a VPL Master!

ExecuteLearnedActions

ExecuteLearnedActions - Completed Run action

The following instructions step you through building the diagram. However, there are a few key points that you should note about it. Across the bottom of the diagram is the test to terminate recursion when there are no more items in the lists. When this happens, a message is also sent to the Flexible Dialog saying "*** Done ***". However, if there are still commands to execute, the current command is displayed in the Flexible Dialog and also processed.

The top section of the diagram is executed if the command is "Drive Forwards" or "Drive Backwards". This code must execute the command, start a timer, and then execute the next command when the timer expires. Notice that the Run action in ExecuteLearnedActions is called again, i.e. this action calls itself. To make sure that the command just executed is thrown away, PopFromLists is called. The Join ensures that nothing happens until both the timer has finished and the lists have been popped.

The middle section of the diagram handles the Else case where the command is not "Drive Forwards" or "Drive Backwards". As with the previous case, the lists must be popped and the command has to be executed. However, this time there is no timer. Behind the scenes, ExecuteWithoutTiming might be waiting on a drive command using WaitForDriveCompletion. The "without timing" in the name of the action means that no timer is necessary, but the action can still introduce a delay.

To finish the diagram, start by renaming the action in ExecuteLearnedActions to Run. Now connect an If block to the input of Run. It is important to test if the procedure is done or not, otherwise the recursion will never stop. So if value.ActionList.Count > 0 then continue to recuse, but otherwise the recursion is over, so connect the Else case to the output of Run. By doing this, you make the recursion terminate when there are no more commands in the learned sequence to execute.

If the lists are non-empty there are two cases. Add another If block, and connect it to the first one. If the command to execute is "Drive Forwards" (or backwards) then it is the more complicated case where the code must handle the timing.

To access an element of a list in VPL you can use the square brackets notation used for arrays in many programming languages. The condition then is value.ActionList[0] == "Drive Forwards".

Start by working on the Else branch of this If statement. The If block forwards on the input lists, so you can connect both a copy of the PopFromLists activity to the Else branch, and the ExecuteActionWithoutTiming activity. To the PopFromLists you need to pass each of the lists, and to the ExecuteActionWithoutTiming, value.ActionList[0]. Note that since you are passing by value to PopFromLists this is not a problem. (Passing a variable by value means that a copy is made and given to the action and the original remains intact). Once both of these activities are complete, call Run recursively. To wait for them both to complete, add a Join. To get a block for ExecuteLearnedActions to connect to the join, one method is to just left-click on the join and drag the cursor to where you want the activity to appear. When you release the mouse-button you will be able to select the activity.

Remember that when values pass through a join, the join label gets attached to them, so when you pass your inputs into Run they will have the form Lists.ActionList.

Now, if the condition value.ActionList[0] == "Drive Forwards" evaluates to true, connect to PopFromLists, and ExecuteWithoutTiming as before, but also connect to CalculateTimeDifference.

When the activities ExecuteWithoutTiming and CalculateTimeDifference both return, the code must wait the appropriate amount of time. Add a join to wait and until both activities return and then set the Timer to wait for value.TimeDiff.MillisecondDiff. Note, it is OK to use Wait in this instance because there are no variables set in this code. As such, none of this code is exclusive and having a Wait will not cause bad blocking behavior.

Once the wait is over, and PopFromLists has also returned, the code can recurse.

The final step is to connect the result of the recursive calls to the result of Run. Note that this is just like in Java or C, when you type return methodName(input); as your recursive call. Do this in the normal way using merges.

Step 11: Try it out!

You can now try out your program! Make sure the timing is working fairly accurately. You should also be able to have the robot incrementally learn a task, by having it begin and end learning multiple times.

There are plenty of ways you can extend this exercise. This is already a large project for VPL however, so it might make sense to work in C# instead. One obvious extension is to have the robot learn multiple, named tasks. You could also extend the complexity of the command language.

Recursion, Activities and Exclusivity in VPL

When you wrote the recursive action, you might have noticed it did not ever set a variable. This was actually the key to the whole thing working! If you are using recursion where the result depends on the output of the recursive call, i.e. where the recursive call is connected to the activity result (as in this case), you cannot set a variable. If you set a variable the action will become exclusive and your code will get stuck at the recursive call.

This does not mean that setting variables and recursion can't go together, but for this you need to use a different paradigm. Instead of having the action return, you can use notifications instead. When your action detects that on this call processing is complete, it can post to the notification port of the action (the round connection instead of the square connection). To see an example of using notifications in this manner, watch the second VPL video lab available from the DSS homepage.

Similar to the recursive example, you need to be careful about setting variables when one action in an activity calls another action in the same activity. If no variables are being set this can work really well as a way of breaking up code. However, if the two actions are declared exclusive and depend on each other, your code will get stuck.

In general, if you are not sure about how something works, the best way to learn is to write a small test program and see what happens! You can even watch your program run in the debugger to see how it is working in more detail.

Summary

In this lab, you learned how to:

  • Getting Started
  • Get the human's command
  • Verbally joystick the robot
  • Use Lists to keep records
  • Remove elements from a list
  • Calculate the difference between two times
  • Use recursion to perform the learned task
See Also 

VPL User Guide: Getting Started

 

 

© 2012 Microsoft Corporation. All Rights Reserved.