<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Probabilistic Graphical Models |</title><link>https://example.com/tags/probabilistic-graphical-models/</link><atom:link href="https://example.com/tags/probabilistic-graphical-models/index.xml" rel="self" type="application/rss+xml"/><description>Probabilistic Graphical Models</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Thu, 11 Jan 2024 00:00:00 +0000</lastBuildDate><image><url>https://example.com/media/logo.svg</url><title>Probabilistic Graphical Models</title><link>https://example.com/tags/probabilistic-graphical-models/</link></image><item><title>Mean field variational inference</title><link>https://example.com/blog/mean-field/</link><pubDate>Thu, 11 Jan 2024 00:00:00 +0000</pubDate><guid>https://example.com/blog/mean-field/</guid><description>&lt;p&gt;In this problem, you will investigate &lt;em&gt;mean field&lt;/em&gt; approximate inference algorithms (Koller &amp;amp; Friedman&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; 11.5).
Consider the Markov network in the above figure.
Define edge potentials $\phi_{ij}(x_i,x_j)$ for all edges $(x_i , x_j)$ in the graph.
We can write&lt;/p&gt;
$$
P\left(x_1, \ldots, x_{12}\right)=\frac{1}{Z} \prod_{(i, j) \in E} \phi_{i j}\left(x_i, x_j\right)
$$&lt;h2 id="fully-factored-mean-field"&gt;Fully factored mean field&lt;/h2&gt;
&lt;p&gt;Assume a fully factored mean field approximation $Q$ (&lt;em&gt;b&lt;/em&gt; in figure), parameterized by node potentials $Q_i$.&lt;/p&gt;
&lt;p&gt;In both of the cases below, please expand out any expectations in the formulas (your answer should be in terms of $Q_i$ and $\phi_{ij}$).&lt;/p&gt;
&lt;p&gt;Write down the update formulas for $Q_1(X_1)$ and $Q_6(X_6)$.&lt;/p&gt;
&lt;h3 id="solution"&gt;Solution&lt;/h3&gt;
&lt;h4&gt;$Q_1(X_1)$&lt;/h4&gt;
&lt;p&gt;Using the update formula for a fully factored mean field, where $D_j$ represents clique $j$,&lt;/p&gt;
$$
\begin{aligned}
Q_i(X_i) &amp;= \frac{1}{Z_i} \exp\left(\sum_{D_j: X \in D_j} \mathbb{E}_{Q_{-i}} \ln \phi(D_j)\right) \\
Q_1(X_1) &amp;= \frac{1}{Z_1} \exp\left(\mathbb{E}_{Q_{2}} \ln \phi(X_1,X_2) + \mathbb{E}_{Q_{5}} \ln \phi(X_1,X_5)\right) \\
&amp;= \color{Green} \frac{1}{Z_1} \exp\left(\sum_{X_2} Q_2(X_2) \ln\phi(X_1,X_2) + \sum_{X_5} Q_5(X_5) \ln\phi(X_1,X_5)\right) \\
\end{aligned}
\newcommand{\EE}{\mathbb{E}}
\newcommand{\ind}{\mathbb{1}}
\newcommand{\answertext}[1]{\textcolor{Green}{\fbox{#1}}}
\newcommand{\answer}[1]{\answertext{$#1$}}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
\newcommand{\comment}[1]{\textcolor{gray}{\textrm{#1}}}
\newcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\inv}[1]{\frac{1}{#1}}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\lr}[1]{\left(#1\right)}
\newcommand{\lrb}[1]{\left[#1\right]}
\newcommand{\lrbr}[1]{\lbrace#1\rbrace}
\newcommand{\Bx}[0]{\mathbf{x}}
$$
&lt;h4&gt;$Q_6(X_6)$&lt;/h4&gt;
&lt;p&gt;The derivation is similar to the previous part:&lt;/p&gt;
$$
\begin{multline}
Q_6(X_6) = \color{Green} \frac{1}{Z_1} \exp\Biggl(
\sum_{X_2} Q_2(X_2)\ln \phi(X_2,X_6) + \sum_{X_5} Q_5(X_5) \ln\phi(X_5,X_6) \\
\color{Green} + \sum_{X_7} Q_7(X_7)\ln \phi(X_6,X_7) + \sum_{X_{10}} Q_{10}(X_{10})\ln \phi(X_6,X_{10})
\Biggr)
\end{multline}
$$
&lt;h2 id="structured-mean-field"&gt;Structured mean field&lt;/h2&gt;
&lt;p&gt;Now we consider a structured mean field approximation $Q$ (&lt;em&gt;c&lt;/em&gt; in figure), parameterized by edge potentials $\psi_{ij}(x_i,x_j)$ for each edge $(x_i , x_j)$.&lt;/p&gt;
&lt;h3&gt;$\psi_{12}(x_1,x_2)$&lt;/h3&gt;
&lt;p&gt;Write down the update formula for $\psi_{12}(x_1,x_2)$ up to a proportionality constant. This time, you can write it in terms of expected values, but do not include unnecessary terms.&lt;/p&gt;
&lt;h4 id="solution-1"&gt;Solution&lt;/h4&gt;
&lt;p&gt;We start with equation (11.62) from Koller &amp;amp; Friedman&lt;sup id="fnref1:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:
$$
\psi_j\left(D_j\right) \propto \exp \left\{\sum_{\phi \in A_j} \mathbb{E}_{\mathcal{X} \sim Q}\left[\ln \phi \mid D_j\right]-\sum_{\psi_k \in B_j} \mathbb{E}_{\mathcal{X} \sim Q}\left[\ln \psi_k \mid D_j\right]\right\},
$$
&lt;/p&gt;
&lt;p&gt;where&lt;/p&gt;
$$
A_j=\left\{\phi \in \Phi: Q \not \models\left(\mathbf{U}_\phi \perp \mathbf{D}_j\right)\right\}
$$
$$
B_j=\left\{\psi_k: Q \not \models\left(\mathbf{D}_k \perp \mathbf{D}_j\right)\right\}-\left\{\mathbf{D}_j\right\}.
$$
&lt;p&gt;$Q \not\models$ means that the statement is not true in $Q$ and $\mathbf{U}_\phi$ refers to all variables involved in the factor $\phi$.
These $A_j$ and $B_j$ sets essentially include factors that are not independent from the clique $D_j$ in our approximate distribution $Q$.&lt;/p&gt;
&lt;p&gt;Thus, we have&lt;/p&gt;
$$
\newcommand{\EE}{\mathbb{E}}
\begin{multline}
\psi_{12}(x_1,x_2) \propto \\
\color{Green} \exp \left(
\begin{aligned}
&amp;\mathbb{E}_Q[\ln\phi_{12}(x_1,x_2)|x_1,x_2] + \mathbb{E}_Q[\ln\phi_{23}(x_2,x_3)|x_1,x_2] + \mathbb{E}_Q[\ln\phi_{34}(x_3,x_4)|x_1,x_2]) \\
&amp;+ \mathbb{E}_Q[\ln\phi_{15}(x_1,x_5)|x_1,x_2] + \mathbb{E}_Q[\ln\phi_{26}(x_2,x_6)|x_1,x_2] + \mathbb{E}_Q[\ln\phi_{37}(x_3,x_7)|x_1,x_2]) + \mathbb{E}_Q[\ln\phi_{48}(x_4,x_8)|x_1,x_2]) \\
&amp;- \mathbb{E}_Q[\ln\psi_{23}(x_2,x_3)|x_1,x_2] - \mathbb{E}_Q[\ln\psi_{34}(x_3,x_4)|x_1,x_2]) \\
\end{aligned}
\right)
\end{multline}
$$
&lt;h3 id="hahahugoshortcode75s5hbhb"&gt; $\mathbb{E}_Q[\ln\phi_{26}(X_2,X_6)|x_1,x_2]$
&lt;/h3&gt;
&lt;p&gt;Write out the formula for $\mathbb{E}_Q[\ln\phi_{26}(X_2,X_6)|x_1,x_2]$.
Make sure to show how you would calculate the distribution that this expectation is over.&lt;/p&gt;
&lt;h4 id="solution-2"&gt;Solution&lt;/h4&gt;
&lt;p&gt;We only need to take the expectation over the variables in the function:&lt;/p&gt;
$$ \EE_{Q(X_1,\ldots,X_N)} f(X_1) = \EE_{Q(X_1)}f(X_1) $$&lt;p&gt;Also, we need not take the expectation over a variable being conditioned on:
&lt;/p&gt;
$$ \EE_{Q(X_1, X_2|x_1)}[f(X_1,X_2)] = \EE_{Q(X_2|x_1)}[f(x_1, X_2)] $$&lt;p&gt;Hence,
$$
\begin{aligned}
&amp;\EE_{Q(X_1,\ldots,X_n)}[\ln\phi_{26}(X_2,X_6)|x_1,x_2] \\
&amp;= \EE_{Q(X_1,\ldots,X_n|x_1,x_2)}\ln\phi_{26}(X_2,X_6) \\
&amp;= \EE_{Q(X_6|x_1,x_2)}\ln\phi_{26}(X_2,X_6). \\
&amp;= \EE_{Q(X_6|x_1,x_2)}\ln\phi_{26}(x_2,X_6). \\
&amp;= \EE_{Q(X_6)}\ln\phi_{26}(x_2,X_6) &amp;\color{Gray}\textsf{since $X_6 \perp X_1,X_2$ in $Q$} \\
\end{aligned}
$$
$$
\begin{aligned}
Q(X_6) &amp;= \sum_{\mathcal{X}\backslash\lrbr{X_6}} Q(X_1,\ldots,X_n) \\
&amp;= \sum_{x_5,x_7,x_8} Q(X_5,X_6,X_7,X_8) \\
&amp;= \inv{Z_{C_2}} \sum_{x_5,x_7,x_8} \psi(X_5,X_6)\psi(X_6,X_7)\psi(X_7,X_8) \\
\end{aligned}
$$
$$
\color{Green} \EE_Q[\ln\phi_{26}(X_2,X_6)|x_1,x_2] = \inv{Z_{C_2}}\sum_{x_5,x_6,x_7,x_8} \ln\phi(x_2,x_6)\psi(x_5,x_6)\psi(x_6,x_7)\psi(x_7,x_8)\\
$$
&lt;/p&gt;
&lt;h3&gt;$\EE_Q[\ln\phi_{15}(X_1,X_5)|x_1,x_2]$&lt;/h3&gt;
&lt;p&gt;Write out the formula for $\EE_Q[\ln\phi_{15}(X_1,X_5)|x_1,x_2]$.
Again, show how you would evaluate distribution $Q$.&lt;/p&gt;
&lt;h4 id="solution-3"&gt;Solution&lt;/h4&gt;
&lt;p&gt;We follow the same steps and arrive at a very similar answer:&lt;/p&gt;
$$
\color{Green} \EE_Q[\ln\phi_{15}(X_1,X_5)|x_1,x_2] = \inv{Z_{C_2}} \sum_{x_5,x_6,x_7,x_8} \ln\phi(x_1,x_5)\psi(x_5,x_6)\psi(x_6,x_7)\psi(x_7,x_8)\\
$$
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;
&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Approximate inference via Gibbs sampling</title><link>https://example.com/blog/gibbs-approx-inference/</link><pubDate>Tue, 09 Jan 2024 00:00:00 +0000</pubDate><guid>https://example.com/blog/gibbs-approx-inference/</guid><description>&lt;p&gt;Consider a setting in which there are $D$ diseases and a patient either has ($d_i=1$) or does not have ($d_i=0$) each disease.
The hospital can measure $S$ symptoms, where $s_j=1$ when the patient has the symptom and $s_j=0$ otherwise.
A simple Bayesian network for this setting is given by:&lt;/p&gt;
$$ p\left(s_1, \ldots, s_S, d_1, \ldots, d_D\right)=\prod_{j=1}^D p\left(s_j \mid \mathbf{d}\right) \prod_{i=1}^D p\left(d_i\right) $$&lt;p&gt;where $\mathbf{d}=\left(d_1, \ldots, d_D\right)^T$ and&lt;/p&gt;
$$ p\left(s_j=1 \mid \mathbf{d}\right)=\sigma\left(\mathbf{w}_j^T \mathbf{d}+b_j\right) $$&lt;p&gt;where $\sigma(x) = 1/(1+e^{-x})$.&lt;/p&gt;
&lt;p&gt;In the above $\mathbf{w}_j$ is a vector of parameters relating symptom $j$ to the diseases and $b_j$ is related to the prevalence of the symptom.
The hospital provides the collection of parameters $W$ and $b$, the prior disease probabilities $p$ (with $p(d_i = 1) = p_i$) and a vector $s$ of symptoms for the patient; see &lt;code&gt;SymptomDisease.mat&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Use Gibbs sampling to estimate (using a reasonable amount of burn-in and sub-sampling) to estimate the vector&lt;/p&gt;
$$ \bigl[p(d_1=1|s),\ldots,p(d_D=1|s)\bigr] $$&lt;h2 id="solution"&gt;Solution&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;SymptomDisease.mat&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;var&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;W&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;p&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;s&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;globals&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;squeeze&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;W.shape = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;W.shape = (200, 50)
(50,)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To perform Gibbs sampling, we need $p(d_i|d_{-i},s)$, where $d_{-i}$ refers to $\{d_1,\ldots,d_{i-1},d_{i+1},\ldots,d_D\}$.
We can compute this using Bayes&amp;rsquo; rule:&lt;/p&gt;
$$ p(d_i|d_{-i},s) = \frac{p(s,d)}{p(s,d_{-i})} = \frac{p(d)p(s|d)}{p(d_{-i})p(s|d_{-i})} = \frac{p(d)p(s|d)}{\sum_{d_i} p(d)p(s|d)}$$
&lt;p&gt;We can compute the denominator by summing over the numerator when $d_i=1$ and $d_i=0$.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;σ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# multiply symptom probs p(s_j|d)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;ps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;σ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="nd"&gt;@d&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# and disease probs p(d_i)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sample_d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_d_s_di1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_d_s_di0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;condp_di&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_d_s_di1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_d_s_di1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p_d_s_di0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;binomial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;condp_di&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;sample_d&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_rng&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;array([1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 1])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can use the complete data likelihood $p(s, d)$ not only to sample, but also for a principled approach to decide when burn-in is complete.&lt;/p&gt;
&lt;p&gt;And we&amp;rsquo;ll continue sampling until the disease posterior probabilities converge.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gibbs_sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# burn-in period&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_without_improvement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;curr_lklhd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;lklhd_max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iters_without_improvement&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sample_d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;curr_lklhd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;curr_lklhd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;lklhd_max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_without_improvement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;lklhd_max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;curr_lklhd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_without_improvement&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;burn-in period complete after &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; iterations&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;burn_in_iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Using burn-in time to determine sampling frequency&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Taking every &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;burn_in_iters&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;th sample&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;dtot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;burn_in_iters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sample_d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;dtot&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;Converged in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; samples (taken out of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;burn_in_iters&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; total samples)&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dtot&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;iters&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;posteriors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gibbs_sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_rng&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;posteriors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;burn-in period complete after 19 iterations
Using burn-in time to determine sampling frequency
Taking every 19th sample
Converged in 500 samples (taken out of 9500 total samples)
[ 3. 498. 9. 500. 333. 4. 8. 0. 3. 500. 1. 500. 500. 500.
492. 481. 491. 453. 500. 496. 46. 359. 500. 499. 5. 500. 12. 0.
500. 0. 0. 47. 6. 500. 0. 1. 495. 0. 0. 0. 497. 499.
500. 500. 0. 5. 499. 499. 0. 500.]
array([0.006, 0.996, 0.018, 1. , 0.666, 0.008, 0.016, 0. , 0.006,
1. , 0.002, 1. , 1. , 1. , 0.984, 0.962, 0.982, 0.906,
1. , 0.992, 0.092, 0.718, 1. , 0.998, 0.01 , 1. , 0.024,
0. , 1. , 0. , 0. , 0.094, 0.012, 1. , 0. , 0.002,
0.99 , 0. , 0. , 0. , 0.994, 0.998, 1. , 1. , 0. ,
0.01 , 0.998, 0.998, 0. , 1. ])
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;posteriors&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;p(d_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;|s)&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;= &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s1"&gt;.3f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;p(d_1|s) = 0.006
p(d_2|s) = 0.996
p(d_3|s) = 0.018
p(d_4|s) = 1.000
p(d_5|s) = 0.666
p(d_6|s) = 0.008
p(d_7|s) = 0.016
p(d_8|s) = 0.000
p(d_9|s) = 0.006
p(d_10|s) = 1.000
p(d_11|s) = 0.002
p(d_12|s) = 1.000
p(d_13|s) = 1.000
p(d_14|s) = 1.000
p(d_15|s) = 0.984
p(d_16|s) = 0.962
p(d_17|s) = 0.982
p(d_18|s) = 0.906
p(d_19|s) = 1.000
p(d_20|s) = 0.992
p(d_21|s) = 0.092
p(d_22|s) = 0.718
p(d_23|s) = 1.000
p(d_24|s) = 0.998
p(d_25|s) = 0.010
p(d_26|s) = 1.000
p(d_27|s) = 0.024
p(d_28|s) = 0.000
p(d_29|s) = 1.000
p(d_30|s) = 0.000
p(d_31|s) = 0.000
p(d_32|s) = 0.094
p(d_33|s) = 0.012
p(d_34|s) = 1.000
p(d_35|s) = 0.000
p(d_36|s) = 0.002
p(d_37|s) = 0.990
p(d_38|s) = 0.000
p(d_39|s) = 0.000
p(d_40|s) = 0.000
p(d_41|s) = 0.994
p(d_42|s) = 0.998
p(d_43|s) = 1.000
p(d_44|s) = 1.000
p(d_45|s) = 0.000
p(d_46|s) = 0.010
p(d_47|s) = 0.998
p(d_48|s) = 0.998
p(d_49|s) = 0.000
p(d_50|s) = 1.000
&lt;/code&gt;&lt;/pre&gt;</description></item><item><title>Markov chain Monte Carlo sampling</title><link>https://example.com/blog/mcmc/</link><pubDate>Tue, 09 Jan 2024 00:00:00 +0000</pubDate><guid>https://example.com/blog/mcmc/</guid><description>&lt;h2 id="inverse-cdf-sampling"&gt;Inverse CDF sampling&lt;/h2&gt;
&lt;p&gt;A simple sampling method adopted by many of the standard math libraries is the &lt;em&gt;inverse probability transform&lt;/em&gt;: draw $u \sim \text{Unif}(0, 1)$, then draw $x\sim F^{-1}(u)$, where $F^{-1}$ is the inverse of the CDF $F$. Show that $x$ is distributed according to $F$. What is the drawback of this method?&lt;/p&gt;
&lt;h3 id="solution"&gt;Solution&lt;/h3&gt;
&lt;p&gt;We let $U$ represent the uniform random variable and start with the fact that &lt;/p&gt;
$$p(U \leq u) = u$$&lt;p&gt;Showing that $x$ is distributed according to $F$ can be done by showing that $F(x) = p(F^{-1}(U) \leq x)$.&lt;/p&gt;
&lt;p&gt;So,&lt;/p&gt;
$$
\newcommand{\EE}{\mathbb{E}}
\newcommand{\ind}{\mathbb{1}}
\newcommand{\answertext}[1]{\textcolor{Green}{\fbox{#1}}}
\newcommand{\answer}[1]{\answertext{$#1$}}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
\newcommand{\comment}[1]{\textcolor{gray}{\textrm{#1}}}
\newcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\inv}[1]{\frac{1}{#1}}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\lr}[1]{\left(#1\right)}
\newcommand{\lrb}[1]{\left[#1\right]}
\newcommand{\lrbr}[1]{\lbrace#1\rbrace}
\newcommand{\Bx}[0]{\mathbf{x}}
\begin{aligned}
&amp;p(F^{-1}(U) \leq x) \\
=&amp; p(U \leq F(x)) \\
=&amp; \color{Green}F(x) &amp; \color{gray}\textsf{because } p(U\leq u) = u \\
&amp; &amp; \blacksquare \\
\end{aligned}
$$
&lt;p&gt;And we&amp;rsquo;re done.&lt;/p&gt;
&lt;p&gt;The drawback is that we do not always have the inverse CDF in an analytically tractable form; hence, the need for methods like Markov chain Monte Carlo (MCMC).&lt;/p&gt;
&lt;h2 id="combining-multiple-stationary-distributions"&gt;Combining multiple stationary distributions&lt;/h2&gt;
&lt;p&gt;Show that if both transition kernels $K_1$ and $K_2$ have $p(\cdot)$ as stationary density, so do $K_1 K_2$ and $\lambda K_1 + (1-\lambda) K_2$.
In practice, the former corresponds to sampling from $K_1$ and $K_2$ cyclically and the latter draws either $K_1$ with probability $\lambda$ or $K_2$ otherwise.
Although it is not required to show, extension to more than two kernels should be straightforward.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In the continuous case, the cyclic kernel can be defined as composition of functions:&lt;/strong&gt;&lt;/p&gt;
$$
\left(K_1 \circ K_2\right)(z|x)=\int K_2(y|x) K_1(z|y) \mathrm{d} y
$$&lt;h3 id="solution-1"&gt;Solution&lt;/h3&gt;
&lt;h4 id="cyclic-sampling"&gt;Cyclic sampling&lt;/h4&gt;
&lt;h5 id="discrete"&gt;Discrete&lt;/h5&gt;
&lt;p&gt;We let $\pi$ be the vector of probabilities of the stationary distribution and $K$ a matrix where $K_{ji}$ represents the transition probability from $x_i$ to $x_j$.&lt;/p&gt;
$$
\begin{aligned}
\pi_j &amp;= \sum_i \pi_i K_{ji} \\
\pi &amp;= K \pi \\
\pi &amp;=K_1{\color{Orange}\pi} =K_2\pi \\
&amp;= K_1{\color{Orange}K_2\pi} \\
&amp;= {\color{Green}K_1K_2}\pi \\
\end{aligned}
$$
&lt;h5 id="continuous"&gt;Continuous&lt;/h5&gt;
$$
\begin{aligned}
p(\cdot) &amp;= \int_x p(x)K_1(\cdot|x)dx = \int_x p(x)K_2(\cdot|x)dx \\
p(z)&amp;= \int_y {\color{Orange}p(y)}K_1(z|y)dy \\
&amp;= \int_y {\color{Orange}\int_x p(x)K_2(y|x)dx}K_1(z|y)dy \\
&amp;= \int_x p(x){\color{Teal} \int_y K_2(y|x)K_1(z|y)dy}dx &amp; \color{Gray}\textsf {rearranging}\\
&amp;= \int_x p(x){\color{Teal}(K_1 \circ K_2)(z|x)}dx &amp; \color{Green}\blacksquare
\end{aligned}
$$
&lt;h4 id="sampling-with-a-mixture-of-kernels"&gt;Sampling with a mixture of kernels&lt;/h4&gt;
&lt;h5 id="discrete-1"&gt;Discrete&lt;/h5&gt;
$$
\begin{aligned}
\lambda\pi &amp;= \lambda K_1\pi \\
(1-\lambda)\pi &amp;= (1-\lambda)K_2\pi \\
\pi &amp;= \lambda K_1\pi + (1-\lambda)K_2\pi &amp; \color{Gray}\textsf{add first two lines together}\\
\pi &amp;= \underbrace{(\lambda K_1 + (1-\lambda)K_2)}_\textsf{combined kernel}\pi &amp; \color{Green}\blacksquare\\
\end{aligned}
$$
&lt;h5 id="continuous-1"&gt;Continuous&lt;/h5&gt;
$$
\begin{aligned}
\lambda p(y) &amp;= \lambda \int_x p(x)K_1(y|x)dx \\
(1-\lambda)p(y) &amp;= (1-\lambda)\int_x p(x)K_2(y|x)dx \\
p(y) &amp;= \lambda \int_x p(x)K_1(y|x)dx + (1-\lambda)\int_x p(x)K_2(y|x)dx &amp; \color{Gray}\textsf{add first two lines together}\\
p(y) &amp;= \int_x p(x)\underbrace{(\lambda K_1(y|x) + (1-\lambda)K_2(y|x))}_\textsf{combined kernel}dx &amp; \color{Green}\blacksquare\\
\end{aligned}
$$
&lt;h2 id="metropolis-hastings-sampling-produces-a-stationary-distribution-equal-to-the-target"&gt;Metropolis-Hastings sampling produces a stationary distribution equal to the target&lt;/h2&gt;
&lt;p&gt;Recall MH sampling for target distribution $p(x)$ using proposal $q(x|y)$: at state $s$, first draw $t \sim q(t|s)$, then accept t with probability &lt;/p&gt;
$$ A(t|s) = \min\left(1, \frac{\tilde p(t)q(s|t)}{\tilde p(s)q(t|s)}\right) $$&lt;p&gt;where $p(x)$ is the unnormalized target distribution. Show that $p(x)$ is the stationary distribution of the Markov chain defined by this procedure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Consider both continuous and discrete cases.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="solution-2"&gt;Solution&lt;/h3&gt;
&lt;p&gt;We will show that the detailed balance equation $p(t)T(s|t)=p(s)T(t|s)$ is satisfied, which implies that $p(x)$ is the stationary distribution of the Markov chain (see Koller &amp;amp; Friedman&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;, Proposition 12.3).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This holds for both continuous&lt;/strong&gt; ($p(t)=\int_s p(s)T(t|s)ds$) and discrete ($p(t)=\sum_s p(s)T(t|s)$) state spaces.&lt;/p&gt;
&lt;h4 id="when"&gt;When $s\neq t$&lt;/h4&gt;
$$
\begin{aligned}
p(s)T(t|s) &amp;= p(s)q(t|s)A(t|s) \\
&amp;= p(s)q(t|s) \min\left(1,\ \frac{p(t)q(s|t)}{p(s)q(t|s)}\right) &amp;\color{gray}\frac{\tilde p(t)}{\tilde p(s)}=\frac{p(t)}{p(s)} \\
&amp;= \min(p(s)q(t|s),\ p(t)q(s|t)) \\
&amp;= p(t)q(s|t)\min\left(\frac{p(s)q(t|s)}{p(t)q(s|t)},\ 1\right) \\
&amp;= p(t)q(s|t)A(s|t) \\
&amp;= \answer{p(t)T(s|t)} &amp; \blacksquare\\
\end{aligned}
$$
&lt;h4 id="when-1"&gt;When $s=t$&lt;/h4&gt;
&lt;p&gt;Though $T(s|t)$ has a different form when $s=t$, it does not matter. The proof is trivial; we can simply exchange $s$ and $t$ on one side to get the other: &lt;/p&gt;
$$ \answer{p(s)T(t|s)=p(s)T(s|s)=p(t)T(s|t)}$$&lt;h2 id="gibbs-sampling-produces-a-stationary-distribution-equal-to-the-target"&gt;Gibbs sampling produces a stationary distribution equal to the target&lt;/h2&gt;
&lt;p&gt;Recall Gibbs sampling for target distribution $p(x) = p(x_1 , \ldots, x_d )$: for each $j \in {1,\ldots,d}$, draw $t \sim p(x_j|\textsf{rest})$ and set $x_j=t$. Show that $p(x)$ is the stationary distribution of the Markov chain defined by this procedure.&lt;/p&gt;
&lt;h3 id="solution-3"&gt;Solution&lt;/h3&gt;
&lt;p&gt;Again, we prove that the detailed balance equation holds; in the case of Gibbs case, it is&lt;/p&gt;
$$p(x_{-j},x_j)T(x_{-j},x_j'|x_{-j},x_j)=p(x_{-j},x_j')T(x_{-j},x_j|x_{-j},x_j')$$
&lt;p&gt;where $T(x_{-j},x_j'|x_{-j},x_j) = p(x_j'|x_{-j})$ and $x_{-j} = \lrbr{x_i: i\neq j}$.&lt;/p&gt;
&lt;p&gt;We start with the left side and show it is equal to the right:&lt;/p&gt;
$$
\begin{aligned}
p(x_{-j},x_j)T(x_{-j},x_j'|x_{-j},x_j) &amp;= p(x_{-j},x_j)p(x_j'|x_{-j}) \\
&amp;= p(x_{-j})p(x_j|x_{-j})p(x_j'|x_{-j}) \\
&amp;= p(x_{-j}, x_j')p(x_j|x_{-j}) \\
&amp;= \answer{p(x_{-j},x_j')T(x_{-j},x_j|x_{-j},x_j')} \\
\end{aligned}
$$
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;
&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Parameter learning in probabilistic graphical models</title><link>https://example.com/blog/pgm-param-learning/</link><pubDate>Sun, 07 Jan 2024 00:00:00 +0000</pubDate><guid>https://example.com/blog/pgm-param-learning/</guid><description>&lt;h1 id="parameter-learning-in-bayesian-networks-and-markov-random-fields"&gt;Parameter learning in Bayesian networks and Markov random fields&lt;/h1&gt;
&lt;h2 id="cost-of-learning-crf-parameters"&gt;Cost of learning CRF parameters&lt;/h2&gt;
&lt;p&gt;Consider the process of gradient-ascent training for a conditional random field (CRF) log-linear model with $k$ features, given a data set $\mathcal{D}$ with $M$ instances.
Assume for simplicity that the cost of computing a single feature over a single instance in our data set is constant, as is the cost of computing the expected value of each feature once we compute a marginal over the variables in its scope.
Also assume that we can compute each required marginal in constant time after we have a calibrated clique tree.
(Clique tree calibration is a smart way of reusing messages in the message passing algorithm for calculating marginals on a graphical model, but all you need to know is that once we finish the clique tree calibration, each required marginal can be computed in constant time).&lt;/p&gt;
&lt;p&gt;Assume that we use clique tree calibration to compute the expected sufficient statistics in this model, and that the cost of running clique tree calibration is $c$.
Assume that we need r iterations for the gradient process to converge.
(We are using a batch algorithm, so each iteration means using all the data to calculate the gradients once) What is the cost of this procedure? (Choose one of the following answers and explain your rational)&lt;/p&gt;
&lt;h3 id="solution"&gt;Solution&lt;/h3&gt;
&lt;p&gt;A CRF is a MRF where variables $\{X_i\}$ are observed and variables $\{Y_i\}$ are hidden, and is structured to model the conditional distribution $P(Y|X)$.
Equation (20.7) from Koller &amp;amp; Friedman&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; shows us the form of the gradient of the conditional log-likelihood:
$$
\newcommand{\EE}{\mathbb{E}}
\newcommand{\ind}{\mathbb{1}}
\newcommand{\answertext}[1]{\textcolor{Green}{\fbox{#1}}}
\newcommand{\answer}[1]{\answertext{$#1$}}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
\newcommand{\comment}[1]{\textcolor{gray}{\textrm{#1}}}
\newcommand{\Prec}{\Sigma^{-1}}
\newcommand{\inv}[1]{\frac{1}{#1}}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\lr}[1]{\left(#1\right)}
\newcommand{\lrb}[1]{\left[#1\right]}
\newcommand{\lrbr}[1]{\lbrace#1\rbrace}
\newcommand{\Bx}[0]{\mathbf{x}}
\begin{aligned}
\frac{\partial}{\partial \theta_i} \ell_{Y \mid X}(\theta: \mathcal{D}) &amp;= \sum_{m=1}^M\left(f_i(y[m], x[m])-\EE_{\theta}\left[f_i \mid x[m]\right]\right) \\
&amp;= \EE_\mathcal{D} f_i(y, x)-\sum_{m=1}^M\EE_{\theta}\left[f_i \mid x[m]\right]
\end{aligned}
$$
&lt;/p&gt;
&lt;p&gt;The left term contributes $\mathcal{O}(Mk)$ to the computation since it only needs to be done once, for each of $M$ samples and for each of $k$ features—one for each possible $x,y$ pair for the basic case where each feature $f$ is an indicator function $f_{x^i,y^j}(x,y)=\ind(x=x^i)\ind(y=y^j)$).
It does not need to be done on every iteration since it does not depend on $\theta$.&lt;/p&gt;
&lt;p&gt;The right term, however, does need to be done on each of $r$ iterations, summing over $M$ samples each time.
Each sample requires separate calibration of the clique tree $\lr{\mathcal{O}(c)}$ and computation of the expectation of each of $k$ features, yielding $\mathcal{O}(rM(c+k))$.
It is worth noting that the clique tree calibration cost $c$ and the feature count $k$ are both less than in an equivalent MRF, since conditioning on observations $X$ essentially reduces the size of the graph.
Let&amp;rsquo;s refer to those reduced costs as $c'$ and $k'$, respectively.&lt;/p&gt;
&lt;p&gt;Thus, the total computational cost of the algorithm is $\answer{\mathcal{O}\lr{Mk + rM\lr{c'+k'}}}$.&lt;/p&gt;
&lt;h2 id="cost-of-learning-mrf-parameters"&gt;Cost of learning MRF parameters&lt;/h2&gt;
&lt;p&gt;Consider the process of gradient-ascent training for a log-linear Markov random field (MRF) model with $k$ features, given a data set $\mathcal{D}$ with $M$ instances.
Assume for simplicity that the cost of computing a single feature over a single instance in our data set is constant, as is the cost of computing the expected value of each feature once we compute a marginal over the variables in its scope.
Also assume that we can compute each required marginal in constant time after we have a calibrated clique tree.&lt;/p&gt;
&lt;p&gt;Assume that we use clique tree calibration to compute the expected sufficient statistics in this model and that the cost of doing this is $c$.
Also, assume that we need $r$ iterations for the gradient process to converge.
What is the cost of this procedure?&lt;/p&gt;
&lt;h3 id="solution-1"&gt;Solution&lt;/h3&gt;
&lt;p&gt;From eq. (20.4) of Koller &amp;amp; Friedman&lt;sup id="fnref1:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;, we have the gradient calculation&lt;/p&gt;
$$
\frac{\partial}{\partial \theta_i} \frac{1}{M} \ell(\theta: \mathcal{D})=\mathbb{E}_{\mathcal{D}}\left[f_i(\mathcal{X})\right]-\mathbb{E}_{\theta}\left[f_i\right].
$$
&lt;p&gt;Again, the left term costs $\mathcal{O}(Mk)$, since it is computed just once by summing over samples and features.&lt;/p&gt;
&lt;p&gt;The right term, however, costs $\mathcal{O}(r(c+k))$ since it is run on each of $r$ iterations and involves $c$ for calibration of the clique tree and $k$ for computing the expectation of each feature.&lt;/p&gt;
&lt;p&gt;The total cost is thus $\answer{\mathcal{O}(Mk + r(c+k))}$.&lt;/p&gt;
&lt;h2 id="parameter-learning-in-mns-vs-bns"&gt;Parameter learning in MNs vs BNs&lt;/h2&gt;
&lt;p&gt;Compared to learning parameters in Bayesian networks, learning in Markov networks is generally&lt;/p&gt;
&lt;ol type="A"&gt;
&lt;li&gt;Less difficult as we do not need to account for the directed nature of factors as we do in a Bayes Net.&lt;/li&gt;
&lt;li&gt;More difficult because factors in MNs need not sum up to 1.&lt;/li&gt;
&lt;li&gt;Equally difficult, though MN inference will be better by a constant factor difference in the computation time as we do not need to worry about directionality.&lt;/li&gt;
&lt;li&gt;More difficult because we cannot use parallel optimization of subparts of our likelihood as we often can in BN learning.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="solution-2"&gt;Solution&lt;/h3&gt;
&lt;p&gt;$\answer{\textrm{D}}$: The fact that parameters are coupled together by the computation of the normalization constant means we cannot parallelize parameter learning as easily.&lt;/p&gt;
&lt;h1 id="maximum-likelihood-training-of--minimizes-kl-divergence-with-empirical-distribution"&gt;Maximum-likelihood training of $q$ minimizes KL divergence with empirical distribution&lt;/h1&gt;
&lt;p&gt;For a distribution $p(x, c)$ and an approximation $q(x, c)$, show that when $p(x, c)$ corresponds to the empirical distribution, finding that minimizes the Kullback-Leibler divergence&lt;/p&gt;
$$ KL(p(x,c) \parallel q(x,c)) $$&lt;p&gt;corresponds to maximum likelihood training of assuming i.i.d. data.&lt;/p&gt;
&lt;h2 id="solution-3"&gt;Solution&lt;/h2&gt;
&lt;p&gt;The empirical distribution is defined as $p(x,c)=m(x,c)/M$ where $m(x,c)$ is the number of times $(x,c)$ appears in the data and $M$ is the total number of samples.
The likelihood is then
&lt;/p&gt;
$$L(\theta: D) = p(D; \theta) = \prod_m q(x[m], c[m]),$$&lt;p&gt;
which is maximized by maximizing the log-likelihood:
$$
\begin{aligned}
\argmax{\theta} L(\theta: D) &amp;= \argmax{\theta} \ell(\theta: D) \\
&amp;= \argmax{\theta} \sum_m \log q(x[m], c[m]) \\
&amp;= \argmax{\theta} \sum_{x,c} m(x,c) \log q(x,c) \\
&amp;= \argmin{\theta} - \sum_{x,c} m(x,c) \log q(x,c) \\
&amp;= \argmin{\theta} - \sum_{x,c} \frac{m(x,c)}{M} \log q(x,c) \\
&amp;= \argmin{\theta} - \sum_{x,c} p(x,c) \log q(x,c) \\
&amp;= \argmin{\theta} \sum_{x,c} p(x,c) (-\log q(x,c)) \\
&amp;= \argmin{\theta} \sum_{x,c} p(x,c) (-\log q(x,c)) + \log p(x,c) &amp;\comment{since p doesn't depend on $\theta$} \\
&amp;= \argmin{\theta} \sum_{x,c} p(x,c) \frac{\log p(x,c)}{\log q(x,c)} \\
&amp;= \answer{D_{KL}(p(x,c) \parallel q(x,c))} &amp; \blacksquare \\
\end{aligned}
$$
&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;
&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Learning maximum likelihood tree structure with the Chow-Liu algorithm</title><link>https://example.com/blog/learning-tree/</link><pubDate>Fri, 05 Jan 2024 15:31:21 -0500</pubDate><guid>https://example.com/blog/learning-tree/</guid><description>&lt;p&gt;Write a function &lt;code&gt;ChowLiu(X) -&amp;gt; A&lt;/code&gt; where X is a D by N data matrix containing a multivariate data point on each column that returns a Chow-Liu maximum likelihood tree for X.
The tree structure is to be returned in the sparse matrix A.&lt;/p&gt;
&lt;!-- You may find the routine spantree.m in “+brml” package useful. --&gt;
&lt;p&gt;The file &lt;code&gt;ChowLiuData.mat&lt;/code&gt; contains a data matrix for 10 variables.
Use your function to find the maximum-likelihood Chow-Liu tree, and draw a picture of the resulting DAG with edges oriented away from variable 1.&lt;/p&gt;
&lt;h2 id="solution"&gt;Solution&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ChowLiuData.mat&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;X&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;(1000, 10)
array([[0, 1, 1, ..., 2, 1, 0],
[1, 1, 1, ..., 0, 2, 0],
[0, 0, 1, ..., 2, 1, 1],
...,
[1, 1, 1, ..., 0, 2, 0],
[1, 1, 0, ..., 0, 0, 0],
[1, 0, 1, ..., 2, 1, 0]], dtype=uint8)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Chow-Liu algorithm simply computes the mutual information $I(X_i,X_j)$ between each pair of variables and constructs a maximum spanning tree where the edge weights are the mutual information.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;networkx&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;nx&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;plt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chow_liu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# num discrete values each X can take&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# compute empirical p(x_i, x_j)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_pair&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# and empirical p(x_i)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_single&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_single&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_pair&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_single&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;p_single&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_single&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# compute all I_ij = I(X_i, X_j)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errstate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;divide&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ignore&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;invalid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ignore&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;logterm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_single&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_single&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# set these to constants, will be nullified assuming they come from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# instances when p_pair=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# from log0 - log0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isinf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isinf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# from log0 - ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;I&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_pair&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;logterm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# compute max spanning tree&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;G&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;G&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_nodes_from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;G&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;maximum_spanning_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;G&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# get directions from breadth-first search&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfs_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chow_liu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draw_planar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;with_labels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;img src="index_files/figure-markdown_strict/cell-3-output-1.png" width="787" height="499" /&gt;
&lt;p&gt;The directions are coincidental from the breadth-first search&amp;mdash;the result is an undirected tree (Markov random field).&lt;/p&gt;</description></item><item><title>Expectation-maximization for a Markov chain mixture model</title><link>https://example.com/blog/markov-chain-mixture/</link><pubDate>Fri, 12 May 2023 00:00:00 +0000</pubDate><guid>https://example.com/blog/markov-chain-mixture/</guid><description>&lt;p&gt;Assume that a sequence $v_1,\ldots,v_T \in \{1,\dots,V\}$ is generated by a Markov chain.
For a single chain of length $T$, we have
&lt;/p&gt;
$$
p(v_1,\dots,v_T) = p(v_1)\prod_{t=1}^{T-1} p(v_{t+1}|v_t)
\newcommand{\EE}{\mathbb{E}}
\newcommand{\ind}{\mathbb{1}}
\newcommand{\answertext}[1]{\textcolor{Green}{\fbox{#1}}}
\newcommand{\answer}[1]{\answertext{$#1$}}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
\newcommand{\comment}[1]{\textcolor{gray}{\textrm{#1}}}
\newcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\inv}[1]{\frac{1}{#1}}
\newcommand{\abs}[1]{\lvert{#1}\rvert}
\newcommand{\norm}[1]{\lVert{#1}\rVert}
\newcommand{\lr}[1]{\left(#1\right)}
\newcommand{\lrb}[1]{\left[#1\right]}
\newcommand{\lrbr}[1]{\lbrace#1\rbrace}
\newcommand{\Bx}[0]{\mathbf{x}}
$$&lt;p&gt;For simplicity, we denote the sequence of visible variables as
&lt;/p&gt;
$$ \vec v = \lr{v_1,\dots,v_T} $$&lt;p&gt;For a single Markov chain labelled by $h$,
&lt;/p&gt;
$$ p(\vec v|h) = p(v_1|h)\prod_{t=1}^{T-1}p(v_{t+1}|v_t,h) $$&lt;p&gt;In total there are a set of $H$ such Markov chains $h=1,\dots,H$.
The distribution on the visible variables is therefore
&lt;/p&gt;
$$ p(\vec v) = \sum_{h=1}^H p(\vec v|h)p(h) $$&lt;h2 id="deriving-em-update-equations"&gt;Deriving EM update equations&lt;/h2&gt;
&lt;p&gt;There are a set of training sequences, $\vec v^n,n=1,\dots,N$.
Assuming that each sequence $\vec v^n$ is independently and identically drawn from a Markov chain mixture model with $H$ components, derive the Expectation Maximization algorithm for training this model.&lt;/p&gt;
&lt;h3 id="solution"&gt;Solution&lt;/h3&gt;
&lt;h4 id="e-step"&gt;E step&lt;/h4&gt;
&lt;p&gt;We need to compute the distribution of the hidden variables $h$ given the current parameters, which is equivalent to the expected &amp;ldquo;count&amp;rdquo; of each chain $h$ and each sample $n$.
These counts are the sufficient statistics for the categorical distribution $p(h|v^n,\theta)$.
We&amp;rsquo;ll later pretend this distribution doesn&amp;rsquo;t depend on $v^n$ during the M step, and thus represent this as $p(h|v^n,\theta)=q^n(h)=\tau_h^n$:
&lt;/p&gt;
$$
\begin{aligned}
\tau_h^n &amp;= \EE_{p(h | v^n,\theta)} [\ind(h^n=h)] \\\\
&amp;= p(h | v^n,\theta) \\\\
&amp;= \frac{p(h, v^n | \theta)}{p(v^n|\theta)} \\\\
&amp;= \frac{p(h, v^n | \theta)}{\sum_h p(h, v^n|\theta)} \\\\
&amp;= \answer{\frac{p(h)p(v_1|h)\prod_{t=2}^{T}p(v_t|v_{t-1,h})}
{\sum_h p(h)p(v_1|h)\prod_{t=2}^{T}p(v_t|v_{t-1,h})}} \\\\
\end{aligned}
$$&lt;p&gt;At each iteration, these counts are computed and used to take expectations over $q^n(h)$.
This is needed to derive the complete data likelihood, which is maximized in the M step.&lt;/p&gt;
&lt;h4 id="m-step"&gt;M step&lt;/h4&gt;
&lt;p&gt;In the M step, we maximize the expected complete data log likelihood $f(\theta)$ with respect to $\theta$:
&lt;/p&gt;
$$
\begin{aligned}
f(\theta)
&amp;= \EE_{h \sim q} \log p(v^1,\dots,v^n,h^1,\dots,h^n|\theta) \\\\
&amp;= \EE_{h \sim q} \sum_n \log p(v^n,h|\theta) \\\\
&amp;= \sum_n \EE_{h \sim q^n} \log p(v^n,h|\theta) \\\\
&amp;= \sum_n \sum_h q^n(h) \log p(v^n,h|\theta) \\\\
&amp;= \sum_{n,h} \tau_h^n \log p(v^n,h|\theta) \\\\
&amp;= \sum_{n,h} \tau_h^n \log p(h) + \log p(v_1^n|h) + \sum_{t=2}^T \log p(v_t^n|v_{t-1}^n,h) \\\\
&amp;= \sum_{n,h} \tau_h^n \log \theta_h + \log \pi_{v_1^n|h} + \sum_{t=2}^T \log a_{v_t^n|v_{t-1}^n,h} \\\\
\end{aligned}
$$&lt;p&gt;$\theta_h, \pi_{k|h}, a_{j|i,h}$ represent the chain prior, initial, and transition probabilities, respectively.
We now we maximize $f(\theta)$ to derive the update step for each.&lt;/p&gt;
$$
\newcommand{\pdd}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\pdfd}[1]{\pdd{f}{#1}}
\newcommand{\pdld}[1]{\pdd{\mathcal{L}}{#1}}
$$&lt;h5 id="update"&gt;$\theta_h$ update&lt;/h5&gt;
&lt;p&gt;We need to introduce a Lagrange multiplier to enforce the constraint that $\sum_h \theta_h = 1$, and set $\pdld{\theta_h}=\pdld{\lambda}=0$:
&lt;/p&gt;
$$ \hat{\theta}_h = \argmax{\theta_h} \mathcal{L}(\theta,\lambda) $$&lt;p&gt;
&lt;/p&gt;
$$ \mathcal{L}(\theta,\lambda) = f(\theta) - \lambda \lr{\sum_h \theta_h - 1} $$&lt;p&gt;Then we set $\pdld{\theta_h}=0$:
$$ \pdld{\theta_h} = \frac{\partial}{\partial \theta_h} \sum_{n} \tau_h^n \log \theta_h - \lambda \theta_h = 0 $$
$$ \lambda = \frac{\sum_n \tau_h^n}{\hat{\theta}_h} \qquad \hat{\theta}_h = \frac{\sum_n \tau_h^n}{\lambda}$$
&lt;/p&gt;
&lt;p&gt;How do we get $\lambda$?
We know that $\sum_h \theta_h = 1$, so we sum top and bottom over $h$:
$$ \lambda = \frac{\sum_{n,h} \tau_h^n}{\sum_h \hat{\theta}_h} = \sum_{n,h} \tau_h^n = N $$
&lt;/p&gt;
&lt;p&gt;Hence,
&lt;/p&gt;
$$ \answer{\hat{\theta}_h = \frac{\sum_n \tau_h^n}{N}} $$&lt;h5 id="update-1"&gt;$\pi_{k|h}$ update&lt;/h5&gt;
&lt;p&gt;We follow the same pattern as before, using a Lagrange multiplier:&lt;/p&gt;
$$ \hat{\pi}_{k|h} = \argmax{\pi_{k|h}} $$
$$ \hat{\pi}_{k|h} = \argmax{\pi_{k|h}} \mathcal{L}(\theta,\lambda) $$
$$ \mathcal{L}(\theta,\lambda) = f(\theta) - \lambda \lr{\sum_{k} \pi_{k|h} - 1} $$
$$ \pdld{\pi_{k|h}} = \pdd{}{\pi_{k|h}} \sum_{n} \tau_h^n \log \pi_{k|h} \ind(v_1^n=k) - \lambda \pi_{k|h} = 0 $$
$$ \lambda = \frac{\sum_n \tau_h^n \ind(v_1^n=k) }{\hat{\pi}_{k|h}} \qquad \hat{\pi}_{k|h} = \frac{\sum_n \tau_h^n\ind(v_1^n=k) }{\lambda}$$
&lt;p&gt;Similar to before, taking our constraint into account, we sum over $k$ on top and bottom to solve for $\lambda$.
$$ \lambda = \frac{\sum_{n,k} \tau_h^n \ind(v_1^n=k) }{\sum_k \hat{\pi}_{k|h}} = \sum_{n,k} \tau_h^n \ind(v_1^n=k) = \sum_n \tau_h^n $$
$$ \answer{\hat{\pi}_{k|h} = \frac{\sum_{n} \tau_h^n \ind(v_1^n=k)}{\sum_{n} \tau_h^n}} $$
&lt;/p&gt;
&lt;h5 id="update-2"&gt;$a_{j|i,h}$ update&lt;/h5&gt;
&lt;p&gt;The process is again similar to what we did before:&lt;/p&gt;
$$
\newcommand{\transprob}{a_{j|i,h}}
\newcommand{\transprobhat}{\hat{a}_{j|i,h}}
$$
$$ \transprob = \argmax{\transprob} \mathcal{L}(\theta,\lambda) $$
$$ \mathcal{L}(\theta,\lambda) = f(\theta) - \lambda \lr{\sum_{j} \transprob - 1} $$
$$ \pdld{\transprob} = \pdd{}{\transprob} \sum_{n} \tau_h^n \sum_{t=2}^T \ind(v_t^n=j,v_{t-1}^n=i) \log\transprob - \lambda \transprob = 0 $$
$$
\begin{aligned}
\lambda &amp;= \frac{\sum_n \tau_h^n \sum_{t=2}^T \ind(v_t^n=j,v_{t-1}^n=i)}{\transprobhat} \\\\
&amp;= \sum_n \tau_h^n \sum_{t=2}^T \sum_j \ind(v_t^n=j,v_{t-1}^n=i) \\\\
&amp;= \sum_n \tau_h^n \sum_{t=2}^T \ind(v_{t-1}^n=i) \\\\
\end{aligned}
$$
$$
\answer{
\transprobhat = \frac{\sum_n \tau_h^n \sum_{t=2}^T \ind(v_t^n=j, v_{t-1}^n=i)} {\sum_n \tau_h^n \sum_{t=2}^T \ind(v_{t-1}^n=i)}
}
$$
&lt;h2 id="python-code-and-application-to-biological-sequences"&gt;Python code and application to biological sequences&lt;/h2&gt;
&lt;p&gt;The file &lt;code&gt;sequences.mat&lt;/code&gt; contains a set of fictitious bio-sequences in a cell array &lt;code&gt;sequences{n}(t)&lt;/code&gt;.
Thus &lt;code&gt;sequences{3}(:)&lt;/code&gt; is the third sequence, &lt;code&gt;GTCTCCTGCCCTCTCTGAAC&lt;/code&gt;, which consists of 20 timesteps.
There are 20 such sequences in total.
Your task is to cluster these sequences into two clusters, assuming that each cluster is modelled by a Markov chain.
State which of the sequences belong together by assigning a sequence $\mathbf{v}^n$ to that state for which $p(h|\mathbf{v}^n)$ is highest.&lt;/p&gt;
&lt;p&gt;Your solution must print and report the two clusters members (show which of the 20 sequences from the file &lt;code&gt;sequences.mat&lt;/code&gt; are assigned to Cluster 1 and Cluster 2).&lt;/p&gt;
&lt;h3 id="solution-1"&gt;Solution&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loadmat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;sequences.mat&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;sequences&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;C&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;G&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;T&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;array([[1, 0, 3, 0, 2, 2, 1, 0, 3, 3, 1, 3, 0, 3, 2, 3, 2, 1, 3, 2],
[1, 1, 0, 2, 3, 3, 0, 1, 2, 2, 0, 1, 2, 1, 1, 2, 0, 0, 0, 2],
[3, 2, 2, 0, 0, 1, 1, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[2, 3, 1, 3, 1, 1, 3, 2, 1, 1, 1, 3, 1, 3, 1, 3, 2, 0, 0, 1],
[2, 3, 2, 1, 1, 3, 2, 2, 0, 1, 1, 3, 2, 0, 0, 0, 0, 2, 1, 1],
[1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 3, 1, 1, 2, 2, 2, 0, 0, 1, 2],
[0, 0, 0, 2, 3, 2, 1, 3, 1, 3, 2, 0, 0, 0, 0, 1, 3, 1, 0, 1],
[0, 1, 0, 3, 2, 0, 0, 1, 3, 0, 1, 0, 3, 0, 2, 3, 0, 3, 0, 0],
[2, 3, 3, 2, 2, 3, 1, 0, 2, 1, 0, 1, 0, 1, 2, 2, 0, 1, 3, 2],
[1, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1, 3, 3, 3, 1, 1, 3, 2, 1],
[1, 0, 1, 3, 0, 1, 2, 2, 1, 3, 0, 1, 1, 3, 2, 2, 2, 1, 0, 0],
[1, 2, 2, 3, 1, 1, 2, 3, 1, 1, 2, 0, 2, 2, 1, 0, 1, 3, 1, 2],
[3, 0, 0, 2, 3, 2, 3, 1, 1, 3, 1, 3, 2, 1, 3, 1, 1, 3, 0, 0],
[1, 0, 1, 1, 0, 3, 1, 0, 1, 1, 1, 3, 3, 2, 1, 3, 0, 0, 2, 2],
[0, 0, 0, 2, 0, 0, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 3, 2, 1, 1],
[1, 0, 0, 0, 3, 2, 1, 1, 3, 1, 0, 1, 2, 1, 2, 3, 1, 3, 1, 0],
[2, 1, 1, 0, 0, 2, 1, 0, 2, 2, 2, 3, 1, 3, 1, 0, 0, 1, 3, 3],
[1, 0, 3, 2, 2, 0, 1, 3, 2, 1, 3, 1, 1, 0, 1, 0, 0, 0, 2, 2],
[0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 1, 3, 0, 0, 2],
[2, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 2, 3, 1, 1, 3, 2, 2, 2, 3]])
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;H&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;em_cluster&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;theta&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;theta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# i, j, h&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keepdims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_no_change&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;h_hat_prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tau_prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tau&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tau_prev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# while iters_no_change &amp;lt; 10:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tau_prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tau_prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# E-step&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# h n x h n x h&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;p_hv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theta&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;p_hv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tau&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_hv&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;p_hv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keepdims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# M-step&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;theta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;/=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keepdims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# stopping condition&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h_hat_prev&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_no_change&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;iters_no_change&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;h_hat_prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;converged in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;iters&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt; iterations&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;likelihood&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newaxis&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;logl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;likelihood&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;h_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;em_cluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;h_hat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;converged in 11 iterations
(array([0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0]),
-497.82024107176716)
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# repeat and keep highest likelihood model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;best_logl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;h_hat_i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logl_i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;em_cluster&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;logl_i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;best_logl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;h_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h_hat_i&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;best_logl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logl_i&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;best log likelihood: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;best_logl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;clusters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Cluster &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h_hat&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;clusters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;:&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;converged in 12 iterations
converged in 26 iterations
converged in 63 iterations
converged in 21 iterations
converged in 8 iterations
converged in 11 iterations
converged in 9 iterations
converged in 12 iterations
converged in 17 iterations
converged in 27 iterations
converged in 20 iterations
converged in 11 iterations
converged in 12 iterations
converged in 19 iterations
converged in 10 iterations
converged in 25 iterations
converged in 14 iterations
converged in 25 iterations
converged in 24 iterations
converged in 24 iterations
best log likelihood: -483.6486774197766
Cluster 1: 1 2 6 8 9 11 12 14 16 17 18
Cluster 2: 3 4 5 7 10 13 15 19 20
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# print sequences&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;clusters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;:&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ACGT&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seqs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Cluster 1:
CATAGGCATTCTATGTGCTG
CCAGTTACGGACGCCGAAAG
CGGCCGCGCCTCCGGGAACG
ACATGAACTACATAGTATAA
GTTGGTCAGCACACGGACTG
CACTACGGCTACCTGGGCAA
CGGTCCGTCCGAGGCACTCG
CACCATCACCCTTGCTAAGG
CAAATGCCTCACGCGTCTCA
GCCAAGCAGGGTCTCAACTT
CATGGACTGCTCCACAAAGG
Cluster 2:
TGGAACCTTAAAAAAAAAAA
GTCTCCTGCCCTCTCTGAAC
GTGCCTGGACCTGAAAAGCC
AAAGTGCTCTGAAAACTCAC
CCTCCCCTCCCCTTTCCTGC
TAAGTGTCCTCTGCTCCTAA
AAAGAACTCCCCTCCCTGCC
AAAAAAACGAAAAACCTAAG
GCGTAAAAAAAGTCCTGGGT
&lt;/code&gt;&lt;/pre&gt;</description></item><item><title>Learning edge direction in a Bayesian network model</title><link>https://example.com/blog/bn-structure/</link><pubDate>Tue, 09 May 2023 00:00:00 +0000</pubDate><guid>https://example.com/blog/bn-structure/</guid><description>&lt;!-- from https://tex.stackexchange.com/questions/213113/using-latex-to-solve-math-problems-in-latex-more-efficiently-than-with-only-pape#:~:text=With%20that%20in%20mind%2C%20here%20are%20some%20methods,4%20Use%20scripting%20to%20auto-number%20labels.%20More%20items --&gt;
&lt;p&gt;Our interest here is to discuss a method to learn the direction of an edge in a belief network.
Consider a distribution
&lt;/p&gt;
$$ P(x,y | \theta,M_{y\to x}) = P(x|y,\theta_{x|y})P(y|\theta_y) $$&lt;p&gt;where $\theta$ are the parameters of the conditional probability tables.
For a prior $p(\theta)=p(\theta_{x|y})p(\theta_y)$ and i.i.d.
data $\mathcal{D}= \lbrace x^n,y^n,n=1,...,N\rbrace$, the likelihood of the data is&lt;/p&gt;
$$ p(\mathcal{D}|M_{y\to x}) = \int_\theta p(\theta_{x|y})p({\theta_y}) \prod_n p(x^n|y^n,\theta_{x|y})p(y^n|\theta_y) $$&lt;p&gt;For binary variables $x\in \lbrace 0,1\rbrace,y\in\lbrace 0,1\rbrace$,&lt;/p&gt;
$$ p(y=1|\theta_y) = \theta_y, \qquad p(x=1|y,\theta_{x|y})=\theta_{1|y} $$&lt;p&gt;and Beta distribution priors&lt;/p&gt;
$$
p(\theta_y) = B(\theta_y|\alpha,\beta),\qquad
p(\theta_{1|y}) = B(\theta_{1|y}|\alpha_{1|y},\beta_{1|y}), \qquad
p(\theta_{1|1}\theta_{1|0} = p(\theta_{1|1})p(\theta_{1|0}).
$$&lt;h2 id="expanding-the-formula-for"&gt;Expanding the formula for $P(\mathcal{D}|M_{y\to x})$&lt;/h2&gt;
&lt;p&gt;Find a formula to estimate&lt;/p&gt;
$$ P(\mathcal{D}|M_{y\to x}) $$&lt;p&gt;in terms of parameters of Beta priors and the actual counts in training data.&lt;/p&gt;
&lt;svg width="768" height="480" viewbox="0.00 0.00 98.00 116.00" xmlns="http://www.w3.org/2000/svg" xlink="http://www.w3.org/1999/xlink" style="; max-width: none; max-height: none"&gt;
&lt;g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 112)"&gt;
&lt;title&gt;
G
&lt;/title&gt;
&lt;polygon fill="white" stroke="transparent" points="-4,4 -4,-112 94,-112 94,4 -4,4"&gt;&lt;/polygon&gt;
&lt;!-- x --&gt;
&lt;g id="node1" class="node"&gt;
&lt;title&gt;
x
&lt;/title&gt;
&lt;ellipse fill="white" stroke="black" cx="18" cy="-90" rx="18" ry="18"&gt;&lt;/ellipse&gt;
&lt;text text-anchor="middle" x="18" y="-85.8" font-family="Times,serif" font-size="14.00" fill="#000000"&gt;x&lt;/text&gt;
&lt;/g&gt;
&lt;!-- y --&gt;
&lt;g id="node2" class="node"&gt;
&lt;title&gt;
y
&lt;/title&gt;
&lt;ellipse fill="white" stroke="black" cx="18" cy="-18" rx="18" ry="18"&gt;&lt;/ellipse&gt;
&lt;text text-anchor="middle" x="18" y="-13.8" font-family="Times,serif" font-size="14.00" fill="#000000"&gt;y&lt;/text&gt;
&lt;/g&gt;
&lt;!-- x&amp;#45;&amp;gt;y --&gt;
&lt;g id="edge1" class="edge"&gt;
&lt;title&gt;
x-\&gt;y
&lt;/title&gt;
&lt;path fill="none" stroke="black" d="M18,-71.7C18,-63.98 18,-54.71 18,-46.11"&gt;&lt;/path&gt;
&lt;polygon fill="black" stroke="black" points="21.5,-46.1 18,-36.1 14.5,-46.1 21.5,-46.1"&gt;&lt;/polygon&gt;
&lt;/g&gt;
&lt;!-- x1 --&gt;
&lt;g id="node3" class="node"&gt;
&lt;title&gt;
x1
&lt;/title&gt;
&lt;ellipse fill="white" stroke="black" cx="72" cy="-18" rx="18" ry="18"&gt;&lt;/ellipse&gt;
&lt;text text-anchor="middle" x="72" y="-13.8" font-family="Times,serif" font-size="14.00" fill="#000000"&gt;x&lt;/text&gt;
&lt;/g&gt;
&lt;!-- y1 --&gt;
&lt;g id="node4" class="node"&gt;
&lt;title&gt;
y1
&lt;/title&gt;
&lt;ellipse fill="white" stroke="black" cx="72" cy="-90" rx="18" ry="18"&gt;&lt;/ellipse&gt;
&lt;text text-anchor="middle" x="72" y="-85.8" font-family="Times,serif" font-size="14.00" fill="#000000"&gt;y&lt;/text&gt;
&lt;/g&gt;
&lt;!-- y1&amp;#45;&amp;gt;x1 --&gt;
&lt;g id="edge2" class="edge"&gt;
&lt;title&gt;
y1-\&gt;x1
&lt;/title&gt;
&lt;path fill="none" stroke="black" d="M72,-71.7C72,-63.98 72,-54.71 72,-46.11"&gt;&lt;/path&gt;
&lt;polygon fill="black" stroke="black" points="75.5,-46.1 72,-36.1 68.5,-46.1 75.5,-46.1"&gt;&lt;/polygon&gt;
&lt;/g&gt;
&lt;/g&gt;
&lt;/svg&gt;
&lt;p&gt;Figure 1: Two alternative structures for a belief network&lt;/p&gt;
&lt;h3 id="solution"&gt;Solution&lt;/h3&gt;
$$
\begin{aligned}
P(\mathcal{D}|M_{y\to x}) &amp;= \int_\theta p(\theta_y)p(\theta_{x|y})\prod_n p(y^n|\theta_y)p(x^n|y^n,\theta_{x|y})d\theta \\
&amp;= \int_\theta \left(
\begin{aligned}
&amp;B(\alpha_{y},\beta_{y})\theta_{y}^{\alpha-1}\left(1-\theta_{y}\right)^{\beta-1}
B(\alpha_{x|1},\beta_{x|1})\theta_{x|1}^{\alpha_{x|1}-1}\left(1-\theta_{x|1}\right)^{\beta_{x|1}-1}
B(\alpha_{x|0},\beta_{x|0})\theta_{x|0}^{\alpha_{x|0}-1}\left(1-\theta_{x|0}\right)^{\beta_{x|0}-1} \\
&amp;*\theta_{y}^{N_{y=1}}\left(1-\theta_{y}\right)^{N_{y=0}}
\theta_{1|1}^{N_{1|y=1}}\left(1-\theta_{1|1}\right)^{N_{0|y=1}}
\theta_{1|0}^{N_{1|y=0}}\left(1-\theta_{1|0}\right)^{N_{0|y=0}} \\
\end{aligned}
\right) d\theta \\
&amp;= \int_\theta \left(
\begin{aligned}
&amp;B(\alpha_{y},\beta_{y})\theta_{y}^{N_{y=1}+\alpha-1}\left(1-\theta_{y}\right)^{N_{y=0}+\beta-1} \\
&amp;B(\alpha_{x|1},\beta_{x|1})\theta_{x|1}^{N_{1|y=1}+\alpha_{x|1}-1}\left(1-\theta_{x|1}\right)^{N_{0|y=1}+\beta_{x|1}-1} \\
&amp;B(\alpha_{x|0},\beta_{x|0})\theta_{x|0}^{N_{1|y=0}+\alpha_{x|0}-1}\left(1-\theta_{x|0}\right)^{N_{0|y=0}+\beta_{x|0}-1} \\
\end{aligned}
\right) d\theta \\
&amp;= \left(\begin{aligned}
&amp;\int_{\theta_y}B(\alpha_{y},\beta_{y})\theta_{y}^{N_{y=1}+\alpha-1}\left(1-\theta_{y}\right)^{N_{y=0}+\beta-1}d\theta_y \\
&amp;*\int_{\theta_{x|1}}B(\alpha_{x|1},\beta_{x|1})\theta_{x|1}^{N_{1|y=1}+\alpha_{x|1}-1}\left(1-\theta_{x|1}\right)^{N_{0|y=1}+\beta_{x|1}-1}d{\theta_{x|1}} \\
&amp;*\int_{\theta_{x|0}}B(\alpha_{x|0},\beta_{x|0})\theta_{x|0}^{N_{1|y=0}+\alpha_{x|0}-1}\left(1-\theta_{x|0}\right)^{N_{0|y=0}+\beta_{x|0}-1}d{\theta_{x|0}} \\
\end{aligned}\right) \\
&amp;= \textcolor{Green}{\fbox{$B(\alpha+N_{y=1},\beta+N_{y=0})B(\alpha_{x|1}+N_{1|y=1},\beta_{x|1}+N_{0|y=1})B(\alpha_{x|0}+N_{1|y=0},\beta_{x|0}+N_{0|y=0})$}} \\
\end{aligned}
$$
&lt;h2 id="expanding-the-formula-for-1"&gt;Expanding the formula for $P(\mathcal{D}|M_{x\to y})$&lt;/h2&gt;
&lt;p&gt;Now derive a similar expression as in part 1 for the model with the edge direction reversed,&lt;/p&gt;
$$ P(\mathcal{D}|M_{x\to y}) $$&lt;p&gt;assuming the same values (as in part 1, except that the role of $x$ and $y$ are switched) for all hyperparameters of this reverse model.&lt;/p&gt;
&lt;h3 id="solution-1"&gt;Solution&lt;/h3&gt;
&lt;p&gt;Same as above, but switching $x$ and $y$:&lt;/p&gt;
$$
\begin{aligned}
P(\mathcal{D}|M_{x\to y}) &amp;= \textcolor{Green}{\fbox{$B(\alpha+N_{x=1},\beta+N_{x=0})B(\alpha_{y|1}+N_{1|x=1},\beta_{y|1}+N_{0|x=1})B(\alpha_{y|0}+N_{1|x=0},\beta_{y|0}+N_{0|x=0})$}} \\\\
\end{aligned}
$$&lt;h2 id="deriving-the-bayes-factor"&gt;Deriving the Bayes factor&lt;/h2&gt;
&lt;p&gt;Using the above, derive a simple expression for the Bayes&amp;rsquo; factor, which computes the ratio of probability you found in part 1 to probability in part 2:&lt;/p&gt;
$$ \frac{P(\mathcal{D}|M_{y\to x})}{P(\mathcal{D}|M_{x\to y})} $$&lt;h3 id="solution-2"&gt;Solution&lt;/h3&gt;
$$
\begin{aligned}
&amp;\frac{P(\mathcal{D}|M_{y\to x})}{P(\mathcal{D}|M_{x\to y})} = \\\\
&amp;\quad\textcolor{Green}{\fbox{$\frac{
B(\alpha+N_{y=1},\beta+N_{y=0})B(\alpha_{x|1}+N_{1|y=1},\beta_{x|1}+N_{0|y=1})B(\alpha_{x|0}+N_{1|y=0},\beta_{x|0}+N_{0|y=0})
}{
B(\alpha+N_{x=1},\beta+N_{x=0})B(\alpha_{y|1}+N_{1|x=1},\beta_{y|1}+N_{0|x=1})B(\alpha_{y|0}+N_{1|x=0},\beta_{y|0}+N_{0|x=0})
}$}}
\end{aligned}
$$&lt;h2 id="leveraging-priors-to-choose-the-more-deterministic-model"&gt;Leveraging priors to choose the more deterministic model&lt;/h2&gt;
&lt;p&gt;By choosing appropriate hyperparameters (i.e., beta priors), give a numerical example that illustrates how to encode the heuristic that if the table $p(x|y)p(y)$ is &amp;lsquo;more deterministic&amp;rsquo; than $p(y|x)p(x)$, then we should prefer the model $M_{y\to x}$.
(Note that when we say is &amp;lsquo;more deterministic&amp;rsquo; that implies that given $y$, the outcome of $x$ is quite certain (e.g., to be zero).)&lt;/p&gt;
&lt;p&gt;Compute the Bayes factor for the situation&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;y=0&lt;/th&gt;
&lt;th&gt;y=1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;x=0&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;x=1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;And also compare the resulting Bayes factor in the above situation but use a uniform prior for all priors instead.&lt;/p&gt;
&lt;h3 id="solution-3"&gt;Solution&lt;/h3&gt;
&lt;p&gt;We use the same hyperparameters for both models (e.g., the same $\alpha$ and $\beta$ for both $p(y)$ and $p(x)$), for a fair, uninformed comparison.
Using $\alpha_{a|b}=\beta_{a|b}&lt;1/2$ corresponds to favoring $\theta_{a|b}$ close to 0 or 1, that is, more deterministic models.
$\alpha=\beta=\frac{1}{2}$ is what is called the Jeffreys prior, and is considered the truly &amp;lsquo;uninformed&amp;rsquo; prior, favoring neither deterministic or entropic distributions.&lt;/p&gt;
&lt;p&gt;This image from Wikipedia illustrates the effect of different priors on the beta distribution:&lt;/p&gt;
&lt;img src="beta-priors.png" id="fig-priors" style="background:white;" alt="Figure 2: source: Wikipedia" /&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.special&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;beta&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gamma&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;G&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bayes_factor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xy00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta00&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xy00&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;Ny1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;Ny0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xy00&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;Nx1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;Nx0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xy00&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Ny1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Ny0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Nx1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Nx0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha11&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta01&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha11&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta01&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta00&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy00&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beta00&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;xy00&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alphabeta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta_cond&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bayes_factor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta_cond&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta_cond&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta_cond&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alphabeta_cond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rtol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Bayes factor = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;, neither model preferred&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Bayes factor = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;, My-&amp;gt;x preferred&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Bayes factor = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;, Mx-&amp;gt;y preferred&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Bayes factor = 1.4689932412843798, My-&amp;gt;x preferred
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is as expected.&lt;/p&gt;
&lt;p&gt;We get a stronger preference for $M_{y\to x}$ if we use a uniform rather than a Jeffreys prior for $\alpha,\beta$, since the marginal distribution of $y$ is more entropic (uniform) than that of $x$ (there are no samples with $x=1$).
This is what we should use to make our decision, since it means we don&amp;rsquo;t care about whether $\theta$ produces stochastic or deterministic marginal $x$ and $y$.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Bayes factor = 5.932237498634842, My-&amp;gt;x preferred
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can confirm that the Jeffreys prior does indeed prefer neither model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Bayes factor = 1.0, neither model preferred
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we use the Jeffreys prior for both parent and conditional parameters, $M_{x\to y}$ is preferred since it better balances stochastic and deterministic distributions.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Bayes factor = 0.24762886543609094, Mx-&amp;gt;y preferred
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, using uniform priors as requested, $M_{x\to y}$ is preferred, as we would expect:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;bayes_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;Bayes factor = 0.17355371900826447, Mx-&amp;gt;y preferred
&lt;/code&gt;&lt;/pre&gt;</description></item></channel></rss>