rlcard.utils

rlcard.utils.logger

class rlcard.utils.logger.Logger(log_dir)

Bases: object

Logger saves the running results and helps make plots from the results

log(text)

Write the text to log file then print it. :param text: text to log :type text: string

log_performance(timestep, reward)

Log a point in the curve :param timestep: the timestep of the current point :type timestep: int :param reward: the reward of the current point :type reward: float

plot(algorithm)
rlcard.utils.logger.plot(csv_path, save_path, algorithm)

Read data from csv file and plot the results

rlcard.utils.seeding

rlcard.utils.seeding.colorize(string, color, bold=False, highlight=False)

Return string surrounded by appropriate terminal color codes to print colorized text. Valid colors: gray, red, green, yellow, blue, magenta, cyan, white, crimson

rlcard.utils.seeding.create_seed(a=None, max_bytes=8)

Create a strong random seed. Otherwise, Python 2 would seed using the system time, which might be non-robust especially in the presence of concurrency.

Parameters
  • a (Optional[int, str]) – None seeds from an operating system specific randomness source.

  • max_bytes – Maximum number of bytes to use in the seed.

rlcard.utils.seeding.error(msg, *args)
rlcard.utils.seeding.hash_seed(seed=None, max_bytes=8)

Any given evaluation is likely to have many PRNG’s active at once. (Most commonly, because the environment is running in multiple processes.) There’s literature indicating that having linear correlations between seeds of multiple PRNG’s can correlate the outputs:

http://blogs.unity3d.com/2015/01/07/a-primer-on-repeatable-random-numbers/ http://stackoverflow.com/questions/1554958/how-different-do-random-seeds-need-to-be http://dl.acm.org/citation.cfm?id=1276928

Thus, for sanity we hash the seeds before using them. (This scheme is likely not crypto-strength, but it should be good enough to get rid of simple correlations.)

Parameters
  • seed (Optional[int]) – None seeds from an operating system specific randomness source.

  • max_bytes – Maximum number of bytes to use in the hashed seed.

rlcard.utils.seeding.np_random(seed=None)

rlcard.utils.utils

rlcard.utils.utils.elegent_form(card)

Get a elegent form of a card string

Parameters

card (string) – A card string

Returns

A nice form of card

Return type

elegent_card (string)

rlcard.utils.utils.get_device()
rlcard.utils.utils.init_54_deck()

Initialize a standard deck of 52 cards, BJ and RJ

Returns

Alist of Card object

Return type

(list)

rlcard.utils.utils.init_standard_deck()

Initialize a standard deck of 52 cards

Returns

A list of Card object

Return type

(list)

rlcard.utils.utils.print_card(cards)

Nicely print a card or list of cards

Parameters

card (string or list) – The card(s) to be printed

rlcard.utils.utils.rank2int(rank)

Get the coresponding number of a rank.

Parameters

rank (str) – rank stored in Card object

Returns

the number corresponding to the rank

Return type

(int)

Note

  1. If the input rank is an empty string, the function will return -1.

  2. If the input rank is not valid, the function will return None.

rlcard.utils.utils.remove_illegal(action_probs, legal_actions)
Remove illegal actions and normalize the

probability vector

Parameters
  • action_probs (numpy.array) – A 1 dimention numpy array.

  • legal_actions (list) – A list of indices of legal actions.

Returns

A normalized vector without legal actions.

Return type

probd (numpy.array)

rlcard.utils.utils.reorganize(trajectories, payoffs)

Reorganize the trajectory to make it RL friendly

Parameters
  • trajectory (list) – A list of trajectories

  • payoffs (list) – A list of payoffs for the players. Each entry corresponds to one player

Returns

A new trajectories that can be fed into RL algorithms.

Return type

(list)

rlcard.utils.utils.set_seed(seed)
rlcard.utils.utils.tournament(env, num)

Evaluate he performance of the agents in the environment

Parameters
  • env (Env class) – The environment to be evaluated.

  • num (int) – The number of games to play.

Returns

A list of avrage payoffs for each player