# Probability Spaces

A **probability space** consists of a **sample space** of possible outcomes and a **probability measure** which specifies how to assign probabilities to related events.  Many common probability spaces are available in Symbulate.  Users can also define their own probability spaces.

<a id='contents'></a>

  1. [**BoxModel:**](#boxmodel) Define a simple box model probability space.
  1. [**Draw:**](#draw) Draw an outcome according to a probability model.
  1. [**ProbabilitySpace:**](#probability_space) Define more general probability spaces.
  1. [**Independent spaces:**](#indep) Combine independent probability spaces with `*` and `**`

Be sure to import Symbulate during a session  using the following commands.
<a id='prob'></a>

In [1]:
from symbulate import *
%matplotlib inline

<a id='boxmodel'></a>

### BoxModel

The probability space in many elementary situations can be defined via a "box model": Labeled tickets are placed in a box and shuffled, and some number of tickets are drawn - either with or without replacement between draws. To define a Symbulate `BoxModel` enter a list repesenting the tickets in the box.  For example, rolling a fair six-sided die could be represented as a box model with six tickets labeled 1 through 6.

In [2]:
die = [1, 2, 3, 4, 5, 6]
roll = BoxModel(die)

The list of numbers could also have been created using `range()` in Python. Remember that Python indexing starts from 0 by default. Remember also that `range` gives you all the values, up to, but *not including* the last value.

In [3]:
die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die)

<a id='draw'></a>

### Draw

`BoxModel` itself just defines the model; it does not return any values.  The same is true for any probability space. Each probability space comes with a `draw` method for simulating a single outcome.

In [4]:
die = [1, 2, 3, 4, 5, 6]
roll = BoxModel(die)
roll.draw()

2

**Important note:** `draw` is most useful when *defining* probability spaces, while [`sim`](sim.html#sim) is most useful when actually running simulations of many outcomes.  Most of the examples in this document use `sim` rather than `draw`.

Calling `draw` again will simulate another roll of the die.

In [5]:
roll.draw()

2

### BoxModel options
* `box`: A list of "tickets" to sample from.
* `size`: How many tickets to draw from the box.
* `replace`: `True` if the draws are made with replacement; `False` if without replacement
* `probs`: Probabilities that the tickets are selected.  By default, all tickets are equally likely.
* `order_matters`: `True` (default) if different orderings of the same tickets drawn are counted as different outcomes; `False` if the order in which the tickets are drawn is irrelevant.

Multiple tickets can be drawn from the box using the **`size`** argument.

In [6]:
BoxModel(die, size=3).draw()

(5, 4, 4)

Infinitely many tickets can be drawn (with replacement) using `size=inf`.

In [7]:
BoxModel(die, size=inf).draw()

(5, 1, 4, 1, 1, 2, ...)

By default `BoxModel` assumes equally likely tickets.  This can be changed using the **`probs`** argument, by specifying a probability value for each ticket.

*Example.* Suppose 32% of Americans are Democrats, 27% are Republican, and 41% are Independent.  Five randomly selected Americans are surveyed about their political party affiliation.

This situation could be represented as sampling with replacement from a box with 100 tickets, 32 of which are Democrat, etc, from which 5 tickets are drawn.  But rather than specifying a list of 100 tickets, we can just specify the three tickets and the corresponding probabilities with `probs`.  

In [8]:
BoxModel(['D', 'R', 'I'], probs=[0.32, 0.27, 0.41], size=5).draw()

(I, D, R, D, I)

<a id='dictionary'></a>
The `probs` argument requires that the probabilities are already normalized to sum to 1.  Non-normalized values can be handled by entering the tickets as a dictionary, specifying the label on each ticket and the number of tickets in the  box with that label.  Note that a dictionary is enclosed in braces `{}` rather than brackets `[]`.

The following code is equivalent to the previous code which used the `probs` option.

In [9]:
BoxModel({'D': 32,'R': 27, 'I': 41}, size=5).draw()

(R, I, D, R, I)

By default `BoxModel` assumes sampling with replacement (`replace=True`); each ticket is placed back in the box before the next ticket is selected.  Sampling *without replacement* can be handled with `replace=False`.

*Example.*  Two people are selected at random from Anakin, Bella, Frodo, Harry, Katniss to go on a quest.

In [10]:
BoxModel(['A','B','F','H','K'], size=2, replace=False).draw()

(F, K)

Note that by default, `BoxModel` returns ordered outcomes, e.g. ('A', 'B') is distinct from ('B', 'A'). To return unordered outcomes, set `order_matters=False`.

In [11]:
BoxModel(['A','B','F','H','K'], size=2, replace=False, order_matters=False).draw()

(B, K)

<a id='probability_space'></a>

### ProbabilitySpace

While Symbulate has many [common probability models](common.html) built in, custom probability spaces can be defined using the `ProbabilitySpace` command.  The first step in creating a probability space is to define a function that explains how to draw one outcome. 

*Example.* Ten percent of all e-mail is spam. Thirty percent of spam e-mails contain the word "money", while 2% of non-spam e-mails contain the word "money". Suppose an e-mail contains the word "money". What is the probability that it is spam?

We can think of the sample space of outcomes of pairs of the possible email types (spam or not) and wordings (money or not), with the probability measure following the above specifications.  First we draw from a `BoxModel` to determine the email type.  Then, depending on the result of the first draw, we draw from one of two `BoxModel`s to determine the wording.  The function `spam_sim` below encodes these specifications; note the use of `.draw()`.

In [12]:
def spam_sim():
    email_type = BoxModel(["spam", "not spam"], probs=[0.1, 0.9]).draw()
    if email_type == "spam":
        has_money = BoxModel(["money", "no money"], probs=[0.3, 0.7]).draw()
    else:
        has_money = BoxModel(["money", "no money"], probs=[0.02, 0.98]).draw()
    return email_type, has_money

A `ProbabilitySpace` can be created once the specifications of the simulation have been defined via a function, like `spam_sim` above.

In [13]:
P = ProbabilitySpace(spam_sim)
P.draw()

('not spam', 'no money')

### Commonly used probability spaces

Symbulate has many [commonly used probability spaces](common.html) built in.  Here are just a few examples.  

In [14]:
Binomial(n=10, p=0.5).draw()

6

In [15]:
Normal(mean=0, sd=1).draw()

-0.5934710096667243

In [16]:
mean_vector = [0, 1, 2]
cov_matrix = [[1.00, 0.50, 0.25],
              [0.50, 2.00, 0.00],
              [0.25, 0.00, 4.00]]

MultivariateNormal(mean = mean_vector, cov = cov_matrix).draw()

(1.4766090641448149, 3.4092966537925666, 4.437591281203854)

Built in probability spaces can be used in defining custom probability via `ProbabilitySpace`, as in [this example](conditioning.html#conditional).

<a id='indep'></a>

### Independent probability spaces

**Independent** probability spaces can be constructed by multiplying (`*` in Python) two probability spaces. The product `*` syntax reflects that under independence joint probabilities are products of marginal probabilities: For example, events $A$ and $B$ are independent if and only if $P(A\cap B) = P(A)P(B)$. 

*Example.*  Roll a fair six-sided die and a fair four-sided die.

In [17]:
die6 = list(range(1, 6+1))
die4 = list(range(1, 4+1))
rolls = BoxModel(die6) * BoxModel(die4)
rolls.draw()

(5, 4)

*Example.* A triple of independent outcomes

In [18]:
( BoxModel(['H', 'T']) * Poisson(lam=2) * Exponential(rate=5) ).draw()

(H, 2, 0.40575095364500896)

**Multiple independent copies** of a probability space can be created by raising a probability space to a power using `**`.

*Example.* A sequence of 10 fair coin flips using `BoxModel` and `**`.  (Note: for `BoxModel`, the `size` argument can be used instead of `**`.)

In [19]:
(BoxModel(['H', 'T']) ** 10).draw()

(T, T, H, T, H, ..., T)

*Example.* Four independent Normal(0,1) values.

In [20]:
P = Normal(mean=0, sd=1) ** 4
P.draw()

(1.1853058564861767, 0.30913127795214823, 0.6799233253484153, -0.6190611408426031)

**Infinitely many independent copies** of a probability space can be created by raising the  probabilty space to the `inf` power, i.e. `** inf`

*Example.* An infinite sequence of fair coin flips using `BoxModel` and `** inf`.  (Note: for `BoxModel`, the `size=inf` argument can be used instead of `** inf`.)

In [21]:
(BoxModel(['H', 'T']) ** inf).draw()

(H, H, H, T, H, T, ...)

*Example*. Infinitely many independent Normal(0, 1) values.

In [22]:
P = Normal(mean=0, sd=1) ** inf
P.draw()

(0.128155271646392, -0.4654743596571168, -0.017203817707428366, 0.20665029252484357, 2.4307077949271374, 1.104763555755993, ...)

<!--
For more information on `*` and `**` see NEED LINK HERE.
-->