String Copying

The task here is to copy a given string. The actions available to the agent are to move right or left along the string, and to insert a character at the current location. Reward is provided when the agent moves off the right end of the string with value 100 per correct character in the new string, and then a new string is provided for copying. The state provided to the agent is a string composed of the characters in the original string and the copy at the current (cursor) location. Characters in the copy string may be overwritten. Epsilon = 0.1 (i.e. exploration steps 10% of the time), alpha = 0.1 (learning rate) and gamma = 0.9 (discount factor).

Original: null
Cursor:   null
Copy:     null
Current State: null
Step Count: 0
Last State: null
Last Action: null
Mode: null
Reward: null
StateABLeftRight
A_0000
AA0000
AB0000
B_0000
BA0000
BB0000