The task here is to copy a given string. The actions available to the agent are to move right or left along the string, and to insert a character at the current location. Reward is provided when the agent moves off the right end of the string with value 100 per correct character in the new string, and then a new string is provided for copying. The state provided to the agent is a string composed of the characters in the original string and the copy at the current (cursor) location. Characters in the copy string may be overwritten. Epsilon = 0.1 (i.e. exploration steps 10% of the time), alpha = 0.1 (learning rate) and gamma = 0.9 (discount factor).
Original: null
Cursor: null
Copy: null
State | A | B | Left | Right |
---|---|---|---|---|
A_ | 0 | 0 | 0 | 0 |
AA | 0 | 0 | 0 | 0 |
AB | 0 | 0 | 0 | 0 |
B_ | 0 | 0 | 0 | 0 |
BA | 0 | 0 | 0 | 0 |
BB | 0 | 0 | 0 | 0 |