Search

Simulation tools

There are several tools available for simulating realizations of probabilistic objects — outcomes from a probability space, events, values of random variables or random vectors, stochastic processes — and analyzing the results. In general, the methods below can be chained, one after the other, e.g. P.sim(1000).apply(sum).tabulate().

  1. Simulate: .sim() simulates outcomes of a probability model, events, or values of random variables or processes
  2. Apply: .apply() applies a function to each outcome or value
  3. Tabulate: .tabulate() creates a table summary of simulated outcomes of a probability model or simulated values of random variables
  4. Count: .count() and its relatives count the values which satsify some criteria.
  5. Filter: .filter() and its relatives create subsets of the values which satsify some criteria.
  6. Get: .get() returns the results of a particular realization of the simulation.
  7. Summary of commands: Examples using tabulate, filter, and count to manipulate simulation results.

Be sure to import Symbulate using the following commands.

from symbulate import *
%matplotlib inline

Simulate

The .draw() extension simulates one realization of the simulation. Many realizations can be simulated using .sim(); the single argument is the number of repetitions of the simulation to run. (Note that draw is most useful when defining probability spaces, while sim is most useful when actually running simulations.)

Example. Simulate 100 repetitions of rolling two fair six-sided dice; each repetition involves a pair of values.

die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
roll.sim(100)
Index Result
0(1, 3)
1(3, 1)
2(3, 6)
3(4, 2)
4(3, 1)
5(5, 3)
6(5, 6)
7(6, 3)
8(5, 6)
......
99(2, 5)

Caution: Note that every time .sim() is called new realizations are simulated. Store simulated realizations as variables in order to perform multiple operations in different lines of code on the same set of simulated values.

Example. Ten percent of all e-mail is spam. Thirty percent of spam e-mails contain the word "money", while 2% of non-spam e-mails contain the word "money". Simulate the email status (spam or not) and wording (money or not) for 1000 emails.

def spam_sim():
    email_type = BoxModel(["spam", "not spam"], probs=[.1, .9]).draw()
    if email_type == "spam":
        has_money = BoxModel(["money", "no money"], probs=[.3, .7]).draw()
    else:
        has_money = BoxModel(["money", "no money"], probs=[.02, .98]).draw()
    return email_type, has_money

P = ProbabilitySpace(spam_sim)
sims = P.sim(1000)
sims
Index Result
0('not spam', 'no money')
1('not spam', 'no money')
2('not spam', 'no money')
3('not spam', 'no money')
4('not spam', 'no money')
5('not spam', 'no money')
6('not spam', 'no money')
7('not spam', 'no money')
8('not spam', 'no money')
......
999('not spam', 'no money')

Apply

Use .apply() to apply a function on an outcome-by-outcome basis to each realization of a simulation.

Example. Roll two fair six-sided dice and compute their sum.

die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
roll.sim(1000).apply(sum)
Index Result
011
16
210
35
45
54
64
73
88
......
9995

User defined functions can also be applied. Standard Python commands can be used to define functions.

Example. Ten cards labeled 1, 2, $\ldots$, 10 are shuffled and dealt one at a time. Find the the probability that the number on the card matches its position in the deal for at least one of the cards. (For example, a match occurs if card 3 is the third card dealt.)

First we define a probability space whose outcomes correspond to ordered deals of the cards. Each realization of the simulation is a permutation of the numbers 0, 1, $\ldots$ 9 (using Python indexing). Simulate 10000 realizations and store as sims.

n = 10
labels = list(range(n)) # remember, Python starts the index at 0, so the cards are labebeled 0, ..., 9
P = BoxModel(labels, size = n, replace = False)
sims = P.sim(10000)
sims
Index Result
0(5, 9, 4, 0, 3, ..., 6)
1(3, 7, 9, 1, 2, ..., 0)
2(9, 8, 5, 7, 3, ..., 1)
3(7, 0, 2, 6, 3, ..., 8)
4(8, 6, 0, 4, 9, ..., 1)
5(9, 7, 3, 4, 1, ..., 2)
6(3, 6, 1, 4, 9, ..., 2)
7(8, 6, 5, 9, 7, ..., 1)
8(1, 8, 7, 5, 2, ..., 6)
......
9999(2, 6, 7, 5, 3, ..., 8)

Next we define a function is_match which takes as an input a permutation of 0, $\ldots$, 9 and checks if there is at least once match. (Defining custom functions like this is the main way that Python code is used with Symbulate.)

def is_match(x):
    for i in range(n):
        if x[i] == labels[i]:
            return 'At least one match'
    return 'No match'
    
is_match((1, 2, 0, 3, 5, 6, 7, 8, 9, 4)) # for the outcome, card 3 matches position 3
'At least one match'

Now we apply the is_match function to the simulated realizations, stored in sims.

sims.apply(is_match)
Index Result
0No match
1At least one match
2At least one match
3At least one match
4No match
5No match
6No match
7No match
8No match
......
9999No match

Finally, there are various ways to estimate the probability of at least one match, which will be described below. For example, the "match/no match" results could be tabulated to determine the relative frequence of a match.

sims.apply(is_match).tabulate(normalize = True)
Outcome Value
At least one match0.6253
No match0.3747
Total1.0

Tabulate

The results of a simulation can be quickly tabulated using .tabulate(). Tabulate counts the number of times each particular outcome occurs among the simulated realizations. Use .tabulate(normalize=True) to find the proportion (relative frequency) of times each outcome occurs.

Example. Roll two fair four-sided. Each realization is an ordered pair of rolls. There are 16 possible ordered pairs - (1, 1), (1, 2), ..., (4, 3), (4, 4) - all equally likely.

die = list(range(1, 4+1, 1)) # this is just a list of the numbers 1 through 4
roll = BoxModel(die, size=2)
rolls = roll.sim(16000)
rolls.tabulate()
Outcome Value
(1, 1)1061
(1, 2)990
(1, 3)980
(1, 4)1044
(2, 1)989
(2, 2)984
(2, 3)971
(2, 4)1027
(3, 1)954
(3, 2)1025
(3, 3)972
(3, 4)990
(4, 1)985
(4, 2)989
(4, 3)1023
(4, 4)1016
Total16000

Now sum the dice, and approximate the probability distribution of the sum using tabulate with the normalize option.

rolls.apply(sum).tabulate(normalize=True)
Outcome Value
20.0663125
30.1236875
40.182375
50.2515625
60.18675
70.1258125
80.0635
Total1.0

Individual entries of the table can be referenced using .tabulate()[label] where label represents the value of interest.

rolls.tabulate()[(2,4)]
1027
roll_sum = rolls.apply(sum).tabulate(normalize=True)
roll_sum[6] + roll_sum[7] + roll_sum[8]
0.37606249999999997

By default, tabulate only tabulates those outcomes which are among the simulated values, rather than all possible outcomes. An argument can be passed to .tabulate() to tabulate all outcomes in a given list.

die = list(range(1, 4+1, 1)) # this is just a list of the number 1 through 4
rolls = BoxModel(die).sim(2)
rolls.tabulate(die)
Outcome Value
11
20
31
40
Total2
# Compare with
rolls.tabulate()
Outcome Value
11
31
Total2

By default, the outcomes in the table produced by .tabulate() are in alphanumeric order. A list can be passed to .tabulate() to achieve a specified order.

BoxModel(['a', 'b', 1, 2, 3]).sim(10).tabulate([3, 'a', 2, 'b', 1])
Outcome Value
31
a6
21
b0
12
Total10

Count

You can count the number of simulated realizations equal to a particular value using count_eq().

BoxModel(['H','T']).sim(10000).count_eq('H')
5017
die = list(range(1, 4+1, 1)) # this is just a list of the number 1 through 4
roll = BoxModel(die, size = 2)
rolls = roll.sim(16000)
rolls.count_eq((2,4))
1012

In addition to .count_eq(), the following count functions can be used when the values are numerical.

  • .count_neq() counts the values not equal to the argument
  • .count_lt() counts the values less than the argument
  • .count_leq() counts the values less than or equal to the argument
  • .count_gt() counts the values greater than the argument
  • .count_geq() counts the values greater than or equal to the argument
rolls.apply(sum).count_geq(6) / 16000
0.3731875

You can also count the number of outcomes which satisfy some criteria specified by a user defined function. Define a function that returns True for the outcomes you want to count, and pass the function into .count(). For example, the following code is equivalent to using .count_geq(6).

def greater_than_or_equal_to_6(x):
    return x >= 6

rolls.apply(sum).count(greater_than_or_equal_to_6) / 16000
0.3731875

Custom functions can also be used to count based on multiple criteria. For example, the following counts the pairs of rolls in which the first roll is equal to 2 and the second roll is at most 3.

def custom_count(x): # x represents a pair of values (x[0], x[1])
    if (x[0] == 2) & (x[1] <= 3):
        return True

rolls.count(custom_count)
3112

Counting can also be accomplished by creating logical (True = 1, False = 0) values according to whether an outcome satisfies some criteria and then summing.

rollsums = rolls.apply(sum)
sum(rollsums >= 6) / 16000
0.3731875

Since a mean (average) is the sum of values divided by the number of values, changing sum to mean in the above method returns the relative frequency directly (without having to divide by the number of values).

mean(rollsums >= 6)
0.3731875

Filter

You can get the subset of simulated realizations equal to a particular value using .filter_eq().

Heads = BoxModel(['H','T']).sim(10000).filter_eq('H')
Heads
Index Result
0H
1H
2H
3H
4H
5H
6H
7H
8H
......
4915H

Using len (length) with the filter functions is one way to count the simulated occurrences of outcomes which satisfy some criteria.

len(Heads)
4916

In addition to .filter_eq(), the following filter functions can be used when the values are numerical.

  • .filter_neq() subsets the values not equal to the argument
  • .filter_lt() subsets the values less than the argument
  • .filter_leq() subsets the values less than or equal to the argument
  • .filter_gt() subsets the values greater than the argument
  • .filter_geq() subsets the values greater than or equal to the argument
die = list(range(1, 1+4)) # this is just a list of the number 1 through 4
sims = BoxModel(die, size=2).sim(1000).apply(sum)
len(sims.filter_geq(6)) / 1000
0.357

You can also define your own custom filter function. Define a function that returns True for the outcomes you want to keep, and pass the function into .filter(). For example, the following code is equivalent to using .filter_geq(6).

def greater_than_or_equal_to_6(x):
    return x >= 6

len(sims.filter(greater_than_or_equal_to_6)) / 1000
0.357

Get

The outcome of a particular repetition of the simulation can be accessed with .get(). Recall that Python starts an index at 0, so .get(0) returns the first simulated value, .get(1) the second, etc.

die = list(range(1, 4+1)) # this is just a list of the number 1 through 4
roll = BoxModel(die, size = 2)
sims = roll.sim(4)
sims
Index Result
0(4, 4)
1(4, 1)
2(1, 4)
3(1, 1)
sims.get(0)
(4, 4)
sims.get(2)
(1, 4)

Summary of commands

The tabulate, count, and filter functions in Symbulate allow for several ways of manipulating the outcomes of a simulation. As an example, here is a recap of some of the ways Symbulate can be used to estimate the probability that the sum of two fair four-sided dice is at least 6.

die = list(range(1, 4+1, 1)) # this is just a list of the number 1 through 4
roll = BoxModel(die, size = 2)
rolls = roll.sim(10000)
rollsums = rolls.apply(sum)
rollsums.count_geq(6) / 10000
0.3749
(rollsums.tabulate()[6] + rollsums.tabulate()[7] + rollsums.tabulate()[8]) / 10000
0.3749
(rollsums.tabulate(normalize=True)[6] + rollsums.tabulate(normalize=True)[7] + rollsums.tabulate(normalize=True)[8])
0.3749
len(rollsums.filter_geq(6)) / 10000
0.3749
sum(rollsums >= 6) / 10000
0.3749
mean(rollsums >= 6)
0.3749

We will also see that Random variables (RV) and events play an extremely important roll in defining and simulating probabilistics objects. The following code (1) defines a probability space corresponding to two rolls of a fair four-sided die, (2) define a RV X equal to the sum of the rolls, (3) simulates realizations of X and uses a count function.

die = list(range(1, 4+1, 1)) # this is just a list of the number 1 through 4
roll = BoxModel(die, size = 2)
X = RV(roll, sum)
X.sim(10000).count_geq(6) / 10000
0.3693

We can also simulate and summarize realizations of the event (X >= 6).

(X >= 6).sim(10000).tabulate(normalize=True)
Outcome Value
False0.624
True0.376
Total1.0