rlcard.agents

rlcard.agents.deep_cfr

rlcard.agents.dqn_agent

DQN agent

The code is derived from https://github.com/dennybritz/reinforcement-learning/blob/master/DQN/dqn.py

Copyright (c) 2019 DATA Lab at Texas A&M University Copyright (c) 2016 Denny Britz

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

class rlcard.agents.dqn_agent.DQNAgent(sess, scope, replay_memory_size=20000, replay_memory_init_size=100, update_target_estimator_every=1000, discount_factor=0.99, epsilon_start=1.0, epsilon_end=0.1, epsilon_decay_steps=20000, batch_size=32, action_num=2, state_shape=None, norm_step=100, mlp_layers=None, learning_rate=5e-05)

Bases: object

copy_params_op(global_vars)

Copys the variables of two estimator to others.

Parameters

global_vars (list) – A list of tensor

eval_step(state)

Predict the action for evaluation purpose.

Parameters

state (numpy.array) – current state

Returns

an action id

Return type

action (int)

feed(ts)
Store data in to replay buffer and train the agent. There are two stages.

In stage 1, populate the Normalizer to calculate mean and std. The transition is NOT stored in the memory In stage 2, the transition is stored to the memory.

Parameters

ts (list) – a list of 5 elements that represent the transition

feed_memory(state, action, reward, next_state, done)

Feed transition to memory

Parameters
  • state (numpy.array) – the current state

  • action (int) – the performed action ID

  • reward (float) – the reward received

  • next_state (numpy.array) – the next state after performing the action

  • done (boolean) – whether the episode is finished

feed_norm(state)

Feed state to normalizer to collect statistics

Parameters

state (numpy.array) – the state that will be feed into normalizer

predict(state)

Predict the action probabilities

Parameters

state (numpy.array) – current state

Returns

a 1-d array where each entry represents a Q value

Return type

q_values (numpy.array)

step(state)

Predict the action for genrating training data

Parameters

state (numpy.array) – current state

Returns

an action id

Return type

action (int)

train()

Train the network

Returns

The loss of the current batch.

Return type

loss (float)

class rlcard.agents.dqn_agent.Estimator(scope='estimator', action_num=2, learning_rate=0.001, state_shape=None, mlp_layers=None)

Bases: object

Q-Value Estimator neural network. This network is used for both the Q-Network and the Target Network.

predict(sess, s)

Predicts action values.

Parameters
  • sess (tf.Session) – Tensorflow Session object

  • s (numpy.array) – State input of shape [batch_size, 4, 160, 160, 3]

Returns

Tensor of shape [batch_size, NUM_VALID_ACTIONS] containing the estimated action values.

update(sess, s, a, y)

Updates the estimator towards the given targets.

Parameters
  • sess (tf.Session) – Tensorflow Session object

  • s (list) – State input of shape [batch_size, 4, 160, 160, 3]

  • a (list) – Chosen actions of shape [batch_size]

  • y (list) – Targets of shape [batch_size]

Returns

The calculated loss on the batch.

class rlcard.agents.dqn_agent.Memory(memory_size, batch_size)

Bases: object

Memory for saving transitions

sample()

Sample a minibatch from the replay memory

Returns

a batch of states action_batch (list): a batch of actions reward_batch (list): a batch of rewards next_state_batch (list): a batch of states done_batch (list): a batch of dones

Return type

state_batch (list)

save(state, action, reward, next_state, done)

Save transition into memory

Parameters
  • state (numpy.array) – the current state

  • action (int) – the performed action ID

  • reward (float) – the reward received

  • next_state (numpy.array) – the next state after performing the action

  • done (boolean) – whether the episode is finished

class rlcard.agents.dqn_agent.Normalizer

Bases: object

Normalizer class that tracks the running statistics for normlization

append(s)

Append a new state and update the running statistics

Parameters

s (numpy.array) – the input state

normalize(s)

Normalize the state with the running mean and std.

Parameters

s (numpy.array) – the input state

Returns

normalized state

Return type

a (int)

class rlcard.agents.dqn_agent.Transition(state, action, reward, next_state, done)

Bases: tuple

property action

Alias for field number 1

property done

Alias for field number 4

property next_state

Alias for field number 3

property reward

Alias for field number 2

property state

Alias for field number 0

rlcard.agents.dqn_agent.copy_model_parameters(sess, estimator1, estimator2)

Copys the model parameters of one estimator to another.

Parameters
  • sess (tf.Session) – Tensorflow Session object

  • estimator1 (Estimator) – Estimator to copy the paramters from

  • estimator2 (Estimator) – Estimator to copy the parameters to

rlcard.agents.random_agent

class rlcard.agents.random_agent.RandomAgent(action_num)

Bases: object

A random agent. Random agents is for running toy examples on the card games

eval_step(state)
Predict the action given the curent state for evaluation.

Since the random agents are not trained. This function is equivalent to step function

Parameters

state (numpy.array) – an numpy array that represents the current state

Returns

the action predicted (randomly chosen) by the random agent

Return type

action (int)

static step(state)

Predict the action given the curent state in gerenerating training data.

Parameters

state (numpy.array) – an numpy array that represents the current state

Returns

the action predicted (randomly chosen) by the random agent

Return type

action (int)