Zachary Dawson’s 16-264: Humanoids Course Website

Assignment 1:

For this assignment, I modified the drive2.vcxproj file from the drive 2 example as well as the wood.ppm file in C:\cga\ode-1906\drawstuff\textures as explained in the hints on the course website. In order to put my face onto the robot and the blocks, I replaced the existing wood.ppm file with a ppm file of a picture of my face. In order to create the road, I commented out the existing red road and replaced it with three large green cylinders and three smaller ground-textured cylinders which created a road composed of two, perpendicular, overlapping figure 8s. Unfortunately, I could not zoom out far enough without losing too much resolution to see the road, so I couldn’t get one picture of the entire track.

Image 4 of Assignment 1

Image 5 of Assignment 1


Assignment 2:

Part 1:

For this assignment, I modified the drive4.vcxproj file from the drive 4 example from the course website. The goal of this assignment was “to figure out how to generate steer_desired commands to follow the road.” To do this, I first examined the example code to determine how the program worked and initially set the “steer_desired” level. This was done via a proportional controller that multiplied the error between what the camera viewed as the center of the road in front of the car and the “center” of the car by a constant and then added a nominal steering angle that varied by the diameter of the track. I first tried setting the track to the smallest radius and removing the nominal steering angle, and as expected, the car began to turn to follow the track, but not sharply enough to avoid hitting a barrel off the side of the road.

Upon further examination of the code, I determined that the “center” of the car in the image used by the proportional controller (called “target_d”) was not in fact the center point of the image, but rather defined as “(LEFT_PIXEL+RIGHT_PIXEL)/2,” which is the center between apparently arbitrary left and right side boundaries that are defined explicitly at the top of the program and change value with the track size and nominal steering angle. When this “target_d” value is computed for each of its corresponding track sizes, it is always far to the right of the actual center of the image, meaning that if this point were aligned with the perceived center of the track the car would slowly spiral farther and farther towards the left side of the track until it runs into a barrel, which is exactly what happened when the nominal steering angle was removed. Since I could see no apparent reason for this definition of the center of the car as “target_d,” I replaced “target_d” in the proportional controller with “WINDOW_WIDTH/2,” which represents the actual center of the car in the image, and continued to leave out the nominal steering angle so that “steer_desired = STEERING_GAIN*(actual - WINDOW_WIDTH/2);”. Upon testing this new definition of “steer_desired,” the car drove perfectly, staying on the track without any nominal steering angle even on the smallest track (radius of only 8). Therefore, the only thing that needed to be changed for the code to become functional was to replace “target_d” with “WINDOW_WIDTH/2.”

My modified version of the code can be found below:

Image 1 of Assignment 2 part 1 trail 1

Image 2 of Assignment 2 part 1 trail 1

Part 2:

          For part 2 of this assignment, the circular track was populated with barrels placed in the road at set intervals around the track and at random radii from the center of the track. The barrels were placed in groups of three, with one on either side of the track and one at a random radius on the track between them. The barrels were green and blue to distinguish them from the road.

          The algorithm I developed to navigate the track is far more complicated than the one used for part one. First, I took the image array and found the edges the road made with the ground and the barrels. Next, I looped through a row of pixels representing the y value of roughly one car length ahead of the car and marked the points whose location was over the road. Then, I looped through each of these marked values and computed the distance from them to every edge detected in the image. I sum 1000/these values and then select the smallest value as my target. This target represents the point on the road that is farthest away from every edge. Unfortunately, just following this point proved to bring the car too close to the sides of the barrels, so I continued to modify the code. I determined that the most effective approach was to compare this target value with the actual center of the road at the same y value. Based on this comparison, if the target was to the left of the actual center, the car would move to a preset distance from the left edge. If it was to the left, but to only a small extent, the car would move to a point on the left of the road closer to the center. The same was true of the right side. Also, I implemented logic so that the car would only change direction if it had achieved its previous desired direction and added a damper to the steer equation.

          My car can almost make it all the way around the track, but gets caught when it needs to quickly switch sides of the track.



Assignment 4:

          The goal of this assignment was to “1) Find a desired trajectory for the cart in order for it to move from 0 m to 10 m and come to rest within 12 m. 2) Find a desired trajectory for the pole in order to achieve this motion… 3) Find the PD control gains that will successfully track the desired motion.” To do this, I first examined the provided MatLab code to gain an understanding of how the simulation was run and to locate the gain and trajectory variables that I needed to modify to complete the task. Once I had familiarized myself with the code, I moved on to solving the challenging assignment at hand by working through it in several stages; balancing, moving to a target, and optimization.

          The first stage, balancing, was rather easy to complete by itself. In order to give the cart some initial disturbance to correct, I temporarily modified simulate_planarCartPole.m so that the pole would start off with an initial angle of about -1 radian (negative so that the pole was pointing towards the positive x-axis, that way the correcting motion would not take the cart off of the screen) and left the desired trajectory of the pole in getDesiredTrajectory_planarCartPole.m as 0. Then, I ran simulate_planarCartPole.m with all the gains set to 0 just to see how the cart pole would behave and sure enough, the cart moved to the left as the pole fell to the ground. Next, I set the gain Kp_p to a value of 1 and ran the simulation over and over again, each time increasing the order of magnitude of Kp_p, until the enough force was applied to the cart that it was able to keep the pole from initially falling over. However, modifying Kp_p was not enough to stabilize the pole in and of itself because after initially keeping the pole from falling, the cart pole entered a slightly unstable oscillation which eventually caused the pole to fall after a few repetitions. Therefore, I continued to run simulations, this time increasing the order of magnitude of the damper gain Kd_p from 1 until around 100, when it seemed to stabilize the pole quickly and with little oscillation. Both of these gains are logically positive since the error term of both the pole’s position and its angular velocity are positive (technically they are represented as the desired value of 0 minus a negative value) and a positive force needs to be applied in order to move the cart to the right and underneath the falling pole (since positive force is applied towards the positive side of the x-axis). Unfortunately, once the pole was balanced, the cart still maintained some velocity and drove itself off the screen, thereby bringing me to stage two.

          In the second stage, I sought to move the cart to a specified point on the x-axis while keeping the pole balanced above it. To do this, I first modified getDesiredTrajectory_planarCartPole.m so that the desired position of the cart xc was 10 (I left the desired velocity at 0). Next, I modified gains Kp_c and Kd_c. In the beginning, I thought that Kp_c should be positive since the cart needed to move to the right, therefore needing a positive force to be applied, and the error term of the position was initially positive (desired position of 10 minus an actual position of 0). Also, I thought that Kd_c should be positive as well since the faster the cart moved to the right, the larger the actual velocity and therefore the more negative the error term (desired velocity of 0 minus actual velocity), meaning that if the gain was positive, there would be a negative force applied that would slow the cart back to its desired velocity of 0. However, when I attempted to implement this the same way I had with the pole gains by gradually increasing the order of magnitude of the gains, it didn’t seem to work. The higher I put Kp_c, the faster the cart would initially move right then change direction and drive rapidly off the screen to the left. After modifying the value of Kp_c from 1 to 10,000, I determined that I must be doing something very wrong. I remembered that in order for the cart to safely move accelerate right, the pole must be pointing to the right as well, so I tried modifying the desired pole angle to be an inversely proportional function with respect to time, so that initially the initial desired angle was to the right and as time passed it moved back towards being 0. While this did seem to move the cart in the correct direction initially, once the desired angle essentially returned to 0, the cart again started to move away from the target position. To get a better understanding of the dynamics of the system, I reviewed the class notes and in doing so remembered the gui cart pole simulator that was put on the course webpage. I proceeded to download the code and use it to investigate exactly what was going on. After plugging in my pole gain values, I began a random trial and error of cart gain values. Eventually, in desperation, I tried using negative values for Kp_c and Kd_c and to my surprise the cart seemed to move in the correct direction. After modifying the values a bit so that Kp_c and Kd_c were both around -50, the cart managed to attain and maintain a desired position while still balancing the pole in the gui program. This result perplexed me at first, but after much though it occurred to me why it works. Basically, the reason the negative cart gains work is because the pole gains are so much larger. In essence, because of the difficulty and sensitivity of balancing an inverted pendulum, the controller must be primarily focused on maintaining the balance of the pole and influenced far less by trying to reach the target displacement (meaning the Kp_p term must be greater than the Kp_c term, or the pole will easily fall over). This means that when Kp_c is positive, the cart initially tries to move to the right to get to the target since the pole starts off vertically. However, as soon as the cart moves right, the pole rotates to the left, causing the far more influential pole error terms to take over the controller and cause the cart to drive left (away from the target) to stabilize the pole. Therefore, it can be said that the cart will in effect only accelerate in the direction the pole is pointed, meaning that the controller should try to point the poll in the direction of the cart’s target displacement. To do this the target theta value could be modified to be a function of time or the error of the cart’s displacement, but this is overly complicated. Instead, it is much simpler to leave the target theta at 0 (to be sure the system will always try to maintain perfect balance) and make Kp_c negative. By doing so, the cart will initially move away from its target, causing the pole to move towards it and thereby causing the balancing portion of the controller to kick in and move the cart towards its desired position. When the cart gets close to its desired position, the error term goes down, reducing the negative force on the cart, thereby causing the cart to accelerate right and the pole (which has essentially become balanced in the vertical position) to rotate to the left. This then causes the balancing portion of the controller to switch directions and begin to slow the cart in order to return the pole to vertical. Because of this unique relationship between the two portions of the controller, as the cart pole oscillates about the desired cart position, it will gradually slow down and eventually reach equilibrium over the target. Once I understood this, I plugged these values into the race simulation and after some slight modification was able to achieve a successful, if slow, completion of the race.

          The final stage of completing this assignment was the optimization of the controller. This mainly involved a large amount of guess and check, modifying one value until the model failed to complete the race as fast as it had before, then moving on to modifying another value. This lengthy process is recorded in the excel table provided in the zip file below. One key insight though that did allow me to reduce my race time was my decision to exploit a surprisingly crucial lack of specificity in the rules. While the cart is timed on how long it takes to travel to the 10 mark and must never cross the 12 mark, it is allowed to reach equilibrium anywhere along the course. Because of this, I modified my target cart displacement to be 9 instead of 10, giving me additional room to reverse direction before the 12 mark and thereby meaning that my cart pole can reach a higher velocity prior to reaching its target cart displacement, which translates to a faster time to reach the 10 mark. I tried a few other target values (such as 8) but they all resulted in either a slower time or the pole falling over due to the force application limit (which prevented the cart from correcting fast enough). In the end, I reached a point where modifying any of the values in either direction would result in either a higher race time or a crash, so I deemed the system optimized.


My final results were:

          Time: 1.952 seconds

          Kp_c = -27

          Kd_c = -31

          Kp_p = 997

          Kd_p = 100

          xc = 10

          theta = 0

          dxc = 0

          dtheta = 0

My files and excel table are included in the zip file that can be downloaded below:



Also, a video of my optimized simulation can be found at


Final Project – Creating a Data Structure for Behavior-Based Machine Learning/Path Planning:

          For my final project, I decided to further explore the areas of machine learning and path planning. Specifically, I sought to develop a method which could allow a robot to plan appropriate responses to new situations based upon its prior experiences and to remember these responses to use again in the future. Once this method was developed, I created a data structure and algorithm to utilize it. Finally, I implemented this data structure and algorithm in Python on a simple proof of concept robot model. To assess the effectiveness of this data structure, I examined both the amount of time it takes for the Python robot model to formulate a reaction to a situation and the total number of behaviors created by the model.

The Method:

          After much thought on how a behavior-based system could be used by a robot to achieve its target final state, I created a method for this behavior-based machine learning and path planning system is very simple and intuitive. The first step in this method is to determine the desired final state (hence force referred to as the target). This can be done in a plethora of ways, ranging from the target being determined by a sensor system to it being preprogramed in a set of repeated processes the robot carries out to it being directly told to the system by a human operator. Second, every behavior the robot already knows is checked to see if its execution will take the robot it its desired final state. If an existing behavior is found to do this, it is executed and the robot is successfully brought to its target. However, if there is no singular behavior that will allow the robot to reach the target, it will then create a new behavior made up of already known behaviors that can and then execute that behavior (an example of this structure can be seen in figure 1 below). In order to achieve this type of recursive structure where new behaviors are made up of consecutively executed old behaviors, a special data structure needed to be developed.

Figure 1 Example of how a behavior is composed of other behaviors (grey) and primitives (orange)


The Data Structure:

          In order for this method of behavior-based learning and planning to be implemented, a unique data structure needed to be created. Primarily, there needed to be some structure that represented a behavior. This structure would need to be able to be executed on a system, so that the robot would feel the effects of the behavior. Also, it would need to be able to predict its effect on the robot, so that the proper behavior could be selected without executing every possible behavior on the robot during the selection process. Given these requirements for the behavior structure, it is clear that some specific information must be known about the implementation of the robot model so that the way the behavior is executed on a model is consistent no matter what model it is being executed on. For this reason, a structure representing the model also needed to be developed that would consistently contain all the information about the robot needed for the behaviors to be successfully created and executed.

The Behavior Data Structure:

          The behavior data structure (referred to as a class in Python) I developed consists of one variable and four functions. The one variable is an array (or list in Python) which contains the instructions the behavior executes in the order they should be executed on the model. In this recursive data structure, each instruction in this array is another behavior, either ones generated by the use of the data structure to reach targets or ones explicitly created by the client, called primitives (these will be discussed in more detail in the client implementation section). Additionally, the structure contains four functions (known as methods in Python), which are as follows: __init__, makeBehavior, predict, and execute. The first of these is __init__, which is the predefined function name for the function that is called automatically by Python when a new instance of a class is being created. All the __init__ function does is set the behavior’s instruction list equal to the array of instructions that the function takes as its only parameter. This function is primarily used when the client explicitly initializes the primitive behaviors and inside of the makeBehavior function. The makeBehavior function serves a similar purpose to the __init__ function in that it is designed to initialize new behaviors. However, unlike the __init__ function, makeBehavior is the function called when the client wishes to automatically create a new behavior to achieve a target state. This function is called by the user with the parameters of the existing list of known behaviors, the robot model, the target state, and the client implemented stateCompare function and returns a new instance of the behavior data structure that, when executed on the model, will bring the model to the target state. Since this method returns a new instance of the behavior class, it is known in Python as an @classmethod. In order to achieve this, the makeBehavior function first creates a new instance of the behavior class that has an empty array as its instruction list using the __init__ function. Then, the function creates a copy of the current model (using the model @classmethod copymodel) and enters a loop wherein it predicts what outcome of each behavior in the behavior list will produce on the copy of the model, selects the behavior with the best outcome according to the provided stateCompare function, adds this behavior to the end of the instruction list of the new behavior, and executes this behavior on the copy of the model. The function will continue in this loop until the state of the copied model is the same as that of the target state according to the stateCompare function. Finally, the makeBehavior function returns the new behavior it has created to the client. The third function, predict, takes the parameters of the model and the target state and returns the predicted state of the model after executing the behavior on it. To do this, the predict function first makes a copy of the model (using the model @classmethod copymodel). Then, it destructively executes the instructions in the behavior’s instruction list on the copy of the model. Finally, the function returns the state of the copied model, which represents what the state of the actual model would be if the behavior were executed on it. The final function in the behavior data structure is execute. Execute recursively breaks down the instructions in the instruction list of the behavior using a depth first search until it reaches primitive functions, which it then adds in a tuple with the target state to the end of the todo queue, which is a part of the model data structure (an illustration of how behaviors are broken into primitives can be seen in figure 2 below). Elsewhere in the program, when the model advances one time step, the next instruction and target are dequeued from the todo queue and carried out on the robot model. In this fashion, the behavior is carried out in a realistic manner of making numerous small changes to the model over time rather than trying to instantaneously making large alterations to the robot’s state.

Figure 2: Example of how a behavior’s instruction list references lists of primitive instructions

The Model Data Structure:

          This data structure consists of four essential variables and four main functions. The first of these variables is the state variable. The state variable is an array that contains all of the information regarding the current state of the robot, such as joint angles, the length of components, and the position of an points of interest (such as the x-y-z coordinates of the end of a manipulator arm). The actual way in which this array is implemented is entirely up to the client, just so long as it is an array (for example the joint angles could come before the lengths between joints or all the joint angles could be stored in a separate array that is stored as the first element of the state array, etc.). Therefore, the state array may be implemented in whatever manner best fits the client’s needs. Next, the data structure contains a draw function variable. This variable is simply a pointer to a client implemented function which will redraw the visualization of the robot on the screen based upon the model’s state. Of course, if the visualization of the robot is not being done in Python or the data structure is controlling a real world robot, the draw function would be where the client specifies how to translate the model’s state array into instructions for the other visualization program (such as Gazebo) or the motor controllers of the robot and where the client specifies how this information will be passed to these other entities (for example, via rospy). The third required variable is the validity check function variable, which is a pointer to a client implemented function to check if the current model state is valid. Finally, the model data structure also has a todo list variable (whose purpose was explained in the behavior data structure section), that is initialized to an empty array. Additionally, since I rendered the visualization of my robot in Python using the built-in graphics package Tkinter, my model also included a Canvas variable which held the Tkinter Canvas (a specific structure in Tkinter that allows for the creation of graphics in a window). The first function in the model data structure is the same as that of the behavior data structure, __init__. This function takes the initial state of the robot, a pointer to the draw function, a pointer to the validity check function, and (in my case) a Canvas as parameters and then initializes all of the data structure’s variables, which are outlined above. The next function, draw, simply calls the function pointed to by the draw function variable, passing it the model as a parameter. The third function, check, similarly calls the function pointed to by the check function variable, passing it the model as a parameter. Finally, the last function, copymodel, is an @classmethod that takes a model as a parameter and will return a new instance of the model class that has the exact same characteristics as the model passed into the function.

The Client-Side Implementation:

          In order for this data structure to be useful, the client utilizing it must implement a few key pieces of code. Primarily, the client must determine how the robot’s characteristics will be represented in the state array and initialize this array with the initial state of the robot. Furthermore, the client must initialize an array to hold all the known behaviors and set its starting value to an empty array. Additionally, the client must write the drawRobot function, which, as noted in the model data structure section, takes the model’s state as a parameter and specifies how to convert the state into a visualization of the robot or into signals to physical robot components. Furthermore, the client must implement the stateChecker function, which takes the model’s state as a parameter and determines if this is a valid state for the robot to be in (for example, this function checks to make sure joints angles are not beyond their maximum deflection). This function returns true if the state that is passed in is a valid state and false otherwise. The third function the client must implement is the stateCompare function. This function takes two different states as parameters and determines the extent to which the two states are different. This function should return 0 if the two states are identical (or close enough to identical for the client’s purposes) and otherwise should return a floating point number representing how close the two states are from being the same, with close to the same being close to 0. An example of this function would be to sum the absolute value of the differences between the elements at each index of the arrays. Also, the client may want to implement a checkExisting function, which would take the list of existing behaviors, the model, and the target as parameters and check to see if an existing behavior would bring the robot to the target state, returning the index of the behavior that would successfully take the robot to the target if one such behavior exists and -1 otherwise. Finally, the client must implement the primitive functions and initialize them as behaviors. These primitive functions are the basic building blocks upon which all other behaviors are built upon. Each one should be a function that takes a model and a target as parameters and destructively modifies the model’s state in order to bring it closer to the target. In order to function properly, each primitive should only modify one joint on the robot and should only modify it by, at max, the maximum amount the joint can change in one time step of the system (this time step is also specified by the client, but is realistically determined by how frequently the visualization is redrawn or commands are sent to the motors of the robot). Additionally, the primitives should account for the physical limits of the robot (i.e. not letting the joints bend past what is physically possible, etc.). Once theses primitive functions are created, they must be turned into primitive behaviors by using the behavior __init__ function, passing an array containing one pointer to the primitive function as the parameter, and then appended to the list of all known behaviors.

Algorithmic Implementation:

          Through the creation of this behavior based machine learning/path planning data structure, an algorithm for exactly how this data structure could be used became apparent. This algorithm is outlined below:


Main Function Calls

1.    Receive target

Client implementation

2.    Check to see if any known behavior brings the robot to the target


3.    If one is found, execute it and terminate the algorithm. Otherwise, proceed with the algorithm assuming no known behavior gets to the target




4.    Loop through existing behaviors and get predicted result

makeBehavior and predict

5.    Select behavior with the predicted result closest to target


6.    Add that behavior to the instructions for the new behavior


7.    Repeat steps 4 through 6 until target is achieved


8.    If program hangs (indicates that the distance from the target is the same in two iterations), pick a random behavior from the behavior list that does NOT have the closest predicted value


9.    Once new behavior is made, add it to the list and execute

Client implementation and execute


There are a few key points that should be noted regarding the implementation of the data structure in the context of this algorithm that I discovered when implementing this data structure in my Python example. First, in some cases, the only way for the robot to move to the target is for it to first move further away from the target. In these cases I have found that in the makeBehavior function, the program will simply try to execute the same behavior over and over again because its predicted value is still the closest predicted value to the target but by executing this behavior the robot does not actually change its state. To combat this, I added a check in the makeBehavior loop which looks to see if the predicted state after executing the selected best behavior actually causes a change from the old state. If it does, the loop continues to run as normal. But if there is no change, the program selects a random different behavior and executes it in an attempt to get past the point where the program has become stuck. Through my testing, I have found that this method works well and usually resolves cases where the program gets stuck quickly. The second important point I discovered in implementing this data structure is that if the next instruction in the todo list is predicted to cause no change to the state, the instructions should be dequeued until an instruction is found that will cause a change or the queue is empty. This should be done immediately and rather than evaluating each instruction one at a time at each subsequent time step because if this is not done the robot will essentially freeze until an instruction causing a change is found.


          In my opinion, this project was very successful. First off, in this project I was able to successfully create and implement a data structure which enabled behavior based machine learning/path planning that functioned when run in conjunction with a simple Python based robot model. Furthermore, I was able to collect data which indicated that this data structure was in fact improving the performance of the path planning capabilities of the robot model. Specifically, using a timing function, I was able to show that the time it took for the robot model to create a new behavior was around twice as long as it took for the robot to select the that behavior from the list of existing behaviors in order to perform that same task again (in my particular example, creating a behavior took around 3 seconds and pick the behavior took around 1.5 seconds). This is mathematically supported by an asymptotic runtime analysis of the code, which reveals that the makeBehavior function has a relative big-O runtime of around O(n*s) where n is the number of existing behaviors and s is the number of steps needed to reach the target whereas the relative big-O runtime of selecting an existing behavior is only O(n). Additionally, by thoroughly testing the system, I determined that eventually, given a finite space of operation, a definite number of behaviors will be created that will allow the robot to achieve any target within the space. For my specific example, this number could be as low as 6. This is a very exciting result because it means that eventually, when implemented properly, the model will develop enough behaviors to not need to run the expensive makeBehavior function again and will achieve any future targets by selecting existing behaviors. Finally, this data structure exhibits a very interesting property unique to machine learning algorithms, if not taught in the correct manner, just like with humans, the data structure will not perform at peak efficiency. This was exhibited in testing with my Python robot model, which would generate more than the optimal 6 behaviors if the robot was taught behaviors in a different order. In conclusion, this behavior based machine learning/path planning data structure was able to not only generate behaviors to successfully achieve every possible target, but also to improve the runtime of the path planning of the robot as new behaviors are created and to eventually create a definite number behaviors that could reach all targets.

Further Developments:

          After the in class presentations, I attempted to implement my data structure on the DRC robot using rospy and was able to successfully send commands to the robot. Unfortunately, I did not have enough time to gain a proper understanding of the geometry of the robot and was therefore unable to properly create the primitive functions to implement the data structure. Specifically, while I was able to create a rotation matrix to determine the location of the end of the left arm (the appendage I was primarily trying to control), I was unable to get it to properly translate into Python in the limited time I had between the presentation and when the final report was due. While I am confident that the data structure would work for the DRC robot, I have not yet been able to work out all of the bugs. Additional areas of further development could be to improve the predict function in order to reduce the time it takes to select existing behaviors. Another area of improvement could be to organize the behaviors into an ordered tree so that the time to check if an existing behavior could achieve a target would be reduced from O(n) to O(log(n)).


Basic Python proof of concept robot running with proper training:

Basic Python proof of concept robot running without proper training:


Example code:

Basic Python proof of concept robot (click on the screen to specify a target position for the end of the arm): Robotics Webpage_files\

Python DRC robot controller (requires a Gazebo instance of the DRC robot to be running in Linux): Robotics Webpage_files\ (does not work properly)


PowerPoint Presentation: Robotics Webpage_files\Data Structure for Behavior-Based Machine Learning.pptx