Jekyll2017-08-28T22:07:34-07:00/D. Zack GarzaHome of my personal blog, online projects, and various articles on Mathematics and Computer Science.D. Zack Garzadzackgarza@gmail.comCompressive_Sampling2017-05-16T00:00:00-07:002017-05-16T00:00:00-07:00/compressive-sampling<h2 id="compressive-sampling-overview">Compressive sampling Overview</h2>
<p>In our previous discussion, we saw that imposing bandlimited-ness on our class
of signals permits point-wise sampling of our signal and then later perfect
reconstruction. It turns out that by imposing <em>sparsity</em> we can also obtain
perfect reconstruction irrespective of whether or not we have satsified the
sampling rate limits imposed by Shannon’s sampling theorem. This has extremely
important in practice because many signals are naturally sparse so that
collecting samples at high rates only to dump most of them as the signal is
compressed is expensive and wasteful.</p>
<h2 id="what-are-sparse-signals">What Are Sparse Signals?</h2>
<p>Let’s carefully discuss what we mean by <em>sparse</em> in this context. A signal $f$
is sparse if it can be expressed in very few nonzero components ($\mathbf{s}$)
with respect to a given basis ($ \mathbf{\Psi} $ ). In other words, in matrix-
vector language:</p>
<p>$ \mathbf{f} = \mathbf{\Psi} \mathbf{s} $</p>
<p>where $ || \mathbf{s} ||_0 \leq N $ where $N$ is the length of the vector and
$|| \cdot||_0$ counts the number of nonzero elements in $\mathbf{s}$.
Furthermore, we don’t actually collect $N$ samples point-wise as we did in the
Shannon sampling case. Rather, we measure $\mathbf{f}$ indirectly as
$\mathbf{y}$ with another matrix as in:</p>
<p>$\mathbf{y} = \mathbf{\Phi f} = \mathbf{\Phi} \mathbf{\Psi} \mathbf{s} =
\mathbf{\Theta s} $</p>
<p>where $\mathbf{\Theta}$ is an $M \times N$ matrix and $ M < N $ is the number
of measurements. This setup means we have two problems to solve. First, how to
design a <em>stable</em> measurement matrix $\mathbf{\Phi}$ and then, second, how to
reconstruct $ \mathbf{f} $ from $ \mathbf{y} $.</p>
<p>This may look like a standard linear algebra problem but since $ \mathbf{\Theta}
$ has fewer rows than columns, the solution is necessarily ill-posed. This is
where we inject the sparsity concept! Suppose that $f$ is $K$-sparse (
$||f||_0=K$ ), then if we somehow knew <em>which</em> $K$ columns of $ \mathbf{\Theta}
$ matched the $K$ non-zero entries in $\mathbf{s}$, then $\mathbf{\Theta}$ would
be $ M \times K $ where we could make $M > K$ and then have a stable inverse.</p>
<p>This bit of reasoning is encapsulated in the following statement for any vector
$\mathbf{v}$ sharing the same $K$ non-zero entries as $\mathbf{s}$, we have</p>
<script type="math/tex; mode=display">1-\epsilon \leq \frac{|| \mathbf{\Theta v} ||_2}{|| \mathbf{v} ||_2} \leq
1+\epsilon</script>
<p>which is another way of saying that $\mathbf{\Theta}$ preserves the lengths of
$K$-sparse vectors. Of course we don’t know ahead of time which $K$ components
to use, but it turns out that this condition is sufficient for a stable inverse
of $\mathbf{\Theta}$ if it holds for any $3K$-sparse vector $\mathbf{v}$. This
is the <em>Restricted Isometry Property</em> (RIP). Unfortunately, in order to use this
sufficient condition, we would have to propose a $\mathbf{\Theta}$ and then
check all possible combinations of nonzero entries in the $N$-length vector
$\mathbf{v}$. As you may guess, this is prohibitive.</p>
<p>Alternatively, we can approach stability by defining <em>incoherence</em> between the
measurement matrix $\mathbf{\Phi}$ and the sparse basis $\mathbf{\Psi}$ as when
any of the columns of one cannot be expressed as a small subset of the columns
of the other. For example, if we have delta-spikes for $\mathbf{\Phi}$ as the
row-truncated identity matrix</p>
<script type="math/tex; mode=display">\mathbf{\Phi} = \mathbf{I}_{M \times N}</script>
<p>and the discrete Fourier transform matrix for $\mathbf{\Psi}$ as</p>
<p>$\mathbf{\Psi} = \begin{bmatrix}\<br />
e^{-j 2\pi k n/N}\<br />
\end{bmatrix}_{N \times N}$</p>
<p>Then we could not write any of the columns of $\mathbf{\Phi}$ using just a few
of the columns of $\mathbf{\Psi}$.</p>
<p>It turns out that picking the measuring $M \times N$ matrix randomly according
to a Gaussian zero-mean, $1/N$ variance distribution and using the identity
matrix as $\mathbf{\Phi}$, that the resulting $\mathbf{\Theta}$ matrix can be
shown to satisfy RIP with a high probability. This means that we can recover
$N$-length $K$-sparse signals with a high probability from just $M \ge c K \log
(N/K)$ samples where $c$ is a small constant. Furthermore, it also turns out
that we can use any orthonormal basis for $\mathbf{\Phi}$, not just the identity
matrix, and these relations will all still hold.</p>
<h2 id="reconstructing-sparse-signals">Reconstructing Sparse Signals</h2>
<p>Now that we have a way, by using random matrices, to satisfy the RIP, we are
ready to consider the reconstruction problem. The first impulse is to compute
the least-squares solution to this problem as</p>
<script type="math/tex; mode=display">\mathbf{s}^* = \mathbf{\Theta}^T
(\mathbf{\Theta}\mathbf{\Theta}^T)^{-1}\mathbf{y}</script>
<p>But a moment’s thought may convince you that since $\mathbf{\Theta}$ is a random
matrix, most likely with lots of non-zero entries, it is highly unlikely that
$\mathbf{s}^* $ will turn out to be sparse. There is actually a deeper geometric
intuition as to why this happens, but let’s first consider another way of
solving this so that the $\mathbf{s}^*$ is $K$-sparse. Suppose instead we
shuffle through combinations of $K$ nonzero entries in $\mathbf{s}$ until we
satisfy the measurements $\mathbf{y}$. Stated mathematically, this means</p>
<script type="math/tex; mode=display">\mathbf{s}^* = argmin || \mathbf{s}^* ||_0</script>
<p>where</p>
<script type="math/tex; mode=display">\mathbf{\Theta} \mathbf{s}^* = \mathbf{y}</script>
<p>It can be shown that with $M=K+1$ iid Gaussian measurements, this optimization
will recover a $K$-sparse signal exactly with high probability. Unfortunately,
this is numerically unstable in addition to being an NP-complete problem.</p>
<p>Thus, we need another tractable way to approach this problem. It turns out that
when a signal is sparse, it usually means that the nonzero terms are highly
asymmetric meaning that if there are $K$ terms, then most likely there is one
term that is dominant (i.e. of much larger magnitude) and that dwarfs the other
nonzero terms. Geometrically, this means that in $N$-dimensional space, the
sparse signal is very close to one (or, maybe just a few) of the axes.</p>
<p>It turns out that one can bypass this combinatorial problem using $L_1$
minimization. To examine this, let’s digress and look at the main difference
between $L_2$ and $L_1$ minimization problems.</p>
<p>reference:
<code class="highlighter-rouge">http://users.ece.gatech.edu/justin/ssp2007</code></p>
<h2 id="l_2-vs-l_1-optimization">$L_2$ vs. $L_1$ Optimization</h2>
<p>The classic constrained least squares problem is the following:</p>
<table>
<tbody>
<tr>
<td>min $</td>
<td> </td>
<td>\mathbf{x}</td>
<td> </td>
<td>_2^2$</td>
</tr>
</tbody>
</table>
<p>where $x_1 + 2 x_2 = 1$</p>
<p>with corresponding solution illustrated below.</p>
<p><strong>In [1]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">division</span>
<span class="kn">from</span> <span class="nn">matplotlib.patches</span> <span class="kn">import</span> <span class="n">Circle</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span>
<span class="n">fig</span><span class="o">=</span><span class="n">figure</span><span class="p">()</span>
<span class="n">ax</span><span class="o">=</span><span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,(</span><span class="mi">1</span><span class="o">-</span><span class="n">x1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">Circle</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span><span class="mi">1</span><span class="o">/</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">))</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">5</span><span class="p">,</span><span class="mi">2</span><span class="o">/</span><span class="mi">5</span><span class="p">,</span><span class="s">'rs'</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'equal'</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'$x_1$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'$x_2$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span></code></pre></figure>
<p><img src="/images/compressive_sampling_9_0.png" alt="png" /></p>
<p>Note that the line is the constraint so that any solution to this problem must
be on this line (i.e. satisfy the constraint). The $L_2$ solution is the one
that just touches the perimeter of the circle. This is because, in $L_2$, the
unit-ball has the shape of a circle and represents all solutions of a fixed
$L_2$ length. Thus, the one of smallest length that intersects the line is the
one that satisfies the stated minimization problem. Intuitively, this means that
we <em>inflate</em> a ball at the origin and stop when it touches the contraint. The
point of contact is our $L_2$ minimization solution.</p>
<p>Now, let’s do same problem in $L_1$ norm</p>
<table>
<tbody>
<tr>
<td>min $</td>
<td> </td>
<td>\mathbf{x}</td>
<td> </td>
<td>_1=</td>
<td>x_1</td>
<td>+</td>
<td>x_2</td>
<td>$</td>
</tr>
</tbody>
</table>
<p>where $x_1 + 2 x_2 = 1$</p>
<p>In this case the constant-norm unit-ball contour in the $L_1$ norm is a diamond-
shape instead of a circle. Comparing the graph below to the last shows that the
solutions found are different. Geometrically, this is because the line tilts
over in such a way that the inflating circular $L_2$ ball hits a point of
tangency that is different from the $L_1$ ball because the $L_1$ ball creeps out
mainly along the principal axes and is less influenced by the tilt of the line.
This effect is much more pronounced in higher $N$-dimensional spaces where
$L_1$-balls get more <em>spikey</em>.</p>
<p>The fact that the $L_1$ problem is less sensitive to the tilt of the line is
crucial since that tilt (i.e. orientation) is random due the choice of random
measurement matrices. So, for this problem to be well-posed, we need to <em>not</em> be
influenced by the orientation of any particular choice of random matrix and this
is what casting this as a $L_1$ minimization provides.</p>
<p><strong>In [2]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">matplotlib.patches</span> <span class="kn">import</span> <span class="n">Rectangle</span>
<span class="kn">import</span> <span class="nn">matplotlib.patches</span>
<span class="kn">import</span> <span class="nn">matplotlib.transforms</span>
<span class="n">r</span><span class="o">=</span><span class="n">matplotlib</span><span class="o">.</span><span class="n">patches</span><span class="o">.</span><span class="n">RegularPolygon</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span><span class="n">pi</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
<span class="n">fig</span><span class="o">=</span><span class="n">figure</span><span class="p">()</span>
<span class="n">ax</span><span class="o">=</span><span class="n">fig</span><span class="o">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,(</span><span class="mi">1</span><span class="o">-</span><span class="n">x1</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span><span class="s">'rs'</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">'$x_1$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">'$x_2$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'equal'</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>(-1.0, 1.0, -0.60000000000000009, 1.2)
</code></pre>
</div>
<p><img src="/images/compressive_sampling_11_1.png" alt="png" /></p>
<p>To explore this a bit, let’s consider using the <code class="highlighter-rouge">cvxopt</code> package (Python ver 2.6
used here). This can be cast as a linear programming problem as follows:</p>
<table>
<tbody>
<tr>
<td>min $</td>
<td> </td>
<td>\mathbf{t}</td>
<td> </td>
<td>_1 =</td>
<td>t_1</td>
<td>+</td>
<td>t_2</td>
<td>$</td>
</tr>
</tbody>
</table>
<p>subject to:</p>
<p>$-t_1 < x_1 < t_1$</p>
<p>$-t_2 < x_2 < t_2$</p>
<p>$x_1 + 2 x_2 = 1$</p>
<p>$t_1 > 0$</p>
<p>$t_2 > 0$</p>
<p>where the last two constraints are already implied by the first two and are
written out just for clarity. This can be implemented and solved in <code class="highlighter-rouge">cvxopt</code> as
the following:</p>
<p><strong>In [3]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">cvxopt</span> <span class="kn">import</span> <span class="n">matrix</span> <span class="k">as</span> <span class="n">matrx</span> <span class="c"># don't overrite numpy matrix class</span>
<span class="kn">from</span> <span class="nn">cvxopt</span> <span class="kn">import</span> <span class="n">solvers</span>
<span class="c">#t1,x1,t2,x2</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],(</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="n">G</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="c">#column-0</span>
<span class="p">[</span> <span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="c">#column-1</span>
<span class="p">[</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="c">#column-2</span>
<span class="p">[</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="c">#column-3</span>
<span class="p">],(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],(</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span> <span class="c"># (4,1) is 4-rows,1-column, 'd' is float type spec</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">],(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mi">1</span><span class="p">],(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">solvers</span><span class="o">.</span><span class="n">lp</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span><span class="n">A</span><span class="p">,</span><span class="n">b</span><span class="p">)</span>
<span class="n">x1</span><span class="o">=</span><span class="n">sol</span><span class="p">[</span><span class="s">'x'</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
<span class="n">x2</span><span class="o">=</span><span class="n">sol</span><span class="p">[</span><span class="s">'x'</span><span class="p">][</span><span class="mi">3</span><span class="p">]</span>
<span class="k">print</span> <span class="s">'x=</span><span class="si">%3.2</span><span class="s">f'</span><span class="o">%</span> <span class="n">x1</span>
<span class="k">print</span> <span class="s">'y=</span><span class="si">%3.2</span><span class="s">f'</span><span class="o">%</span> <span class="n">x2</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code> pcost dcost gap pres dres k/t
0: 0.0000e+00 -0.0000e+00 3e+00 3e+00 1e-16 1e+00
1: 2.3609e-01 2.3386e-01 5e-01 5e-01 1e-16 2e-01
2: 4.9833e-01 4.9734e-01 5e-02 4e-02 5e-15 1e-02
3: 4.9998e-01 4.9997e-01 5e-04 5e-04 2e-15 2e-04
4: 5.0000e-01 5.0000e-01 5e-06 5e-06 6e-16 2e-06
5: 5.0000e-01 5.0000e-01 5e-08 5e-08 9e-16 2e-08
Optimal solution found.
x=0.00
y=0.50
</code></pre>
</div>
<h2 id="example-gaussian-random-matrices">Example Gaussian Random matrices</h2>
<p>Let’s try out our earlier result about random Gaussian matrices and see if we
can reconstruct an unknown $\mathbf{s}$ vector using $L_1$ minimization.</p>
<p><strong>In [56]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">scipy.linalg</span>
<span class="k">def</span> <span class="nf">rearrange_G</span><span class="p">(</span> <span class="n">x</span> <span class="p">):</span>
<span class="s">'setup to put inequalities matrix with last 1/2 of elements as main variables'</span>
<span class="n">n</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="n">hstack</span><span class="p">([</span><span class="n">x</span><span class="p">[:,</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">n</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[:,</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">n</span><span class="p">,</span><span class="mi">2</span><span class="p">)]])</span>
<span class="n">K</span><span class="o">=</span><span class="mi">2</span> <span class="c"># components</span>
<span class="n">Nf</span><span class="o">=</span><span class="mi">128</span> <span class="c"># number of samples</span>
<span class="n">M</span> <span class="o">=</span> <span class="mi">12</span> <span class="c"># > K log2(Nf/K); num of measurements</span>
<span class="n">s</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="c"># sparse vector we want to find</span>
<span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span> <span class="c"># set the K nonzero entries</span>
<span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mf">0.5</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">5489</span><span class="p">)</span> <span class="c"># set random seed for reproducibility</span>
<span class="n">Phi</span> <span class="o">=</span> <span class="n">matrix</span><span class="p">(</span><span class="n">randn</span><span class="p">(</span><span class="n">M</span><span class="p">,</span><span class="n">Nf</span><span class="p">)</span><span class="o">*</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="n">Nf</span><span class="p">))</span> <span class="c"># random Gaussian matrix</span>
<span class="n">y</span><span class="o">=</span><span class="n">Phi</span><span class="o">*</span><span class="n">s</span> <span class="c"># measurements</span>
<span class="c">#-- setup L1 minimization problem --</span>
<span class="c"># inequalities matrix with</span>
<span class="n">G</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">rearrange_G</span><span class="p">(</span><span class="n">scipy</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">block_diag</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">matrix</span><span class="p">([[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mf">1.0</span><span class="p">]]),]</span><span class="o">*</span><span class="n">Nf</span><span class="p">)</span> <span class="p">))</span>
<span class="c"># objective function row-matrix</span>
<span class="n">c</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">ones</span><span class="p">(</span><span class="n">Nf</span><span class="p">),</span><span class="n">zeros</span><span class="p">(</span><span class="n">Nf</span><span class="p">)]))</span>
<span class="c"># RHS for inequalities</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,]</span><span class="o">*</span><span class="p">(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">),(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="c"># equality constraint matrix</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">Phi</span><span class="o">*</span><span class="mi">0</span><span class="p">,</span><span class="n">Phi</span><span class="p">]))</span>
<span class="c"># RHS for equality constraints</span>
<span class="n">b</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">solvers</span><span class="o">.</span><span class="n">lp</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span><span class="n">A</span><span class="p">,</span><span class="n">b</span><span class="p">)</span>
<span class="c">#nonzero entries</span>
<span class="n">nze</span><span class="o">=</span> <span class="n">array</span><span class="p">(</span><span class="n">sol</span><span class="p">[</span><span class="s">'x'</span><span class="p">])</span><span class="o">.</span><span class="n">flatten</span><span class="p">()[:</span><span class="n">Nf</span><span class="p">]</span><span class="o">.</span><span class="nb">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">nonzero</span><span class="p">()</span>
<span class="k">print</span> <span class="n">array</span><span class="p">(</span><span class="n">sol</span><span class="p">[</span><span class="s">'x'</span><span class="p">])[</span><span class="n">nze</span><span class="p">]</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code> pcost dcost gap pres dres k/t
0: 0.0000e+00 -0.0000e+00 1e+02 2e+01 1e-16 1e+00
1: 1.6712e-01 1.6700e-01 1e+01 1e+00 2e-16 7e-02
2: 1.2947e+00 1.2929e+00 4e+00 5e-01 3e-16 3e-02
3: 1.3785e+00 1.3745e+00 2e+00 2e-01 6e-16 8e-03
4: 1.4705e+00 1.4690e+00 5e-01 7e-02 4e-16 2e-03
5: 1.4976e+00 1.4972e+00 2e-01 2e-02 6e-16 7e-04
6: 1.4979e+00 1.4978e+00 6e-02 7e-03 3e-14 2e-04
7: 1.4998e+00 1.4998e+00 6e-03 8e-04 2e-14 2e-05
8: 1.5000e+00 1.5000e+00 6e-05 8e-06 3e-14 3e-07
9: 1.5000e+00 1.5000e+00 6e-07 8e-08 2e-14 3e-09
Optimal solution found.
[[ 0.99999789]
[ 0.49999879]]
</code></pre>
</div>
<p>That worked out! However, if you play around with this example enough with
different random matrices (unset the <code class="highlighter-rouge">seed</code> statement above), you will find
that it does not <em>always</em> find the correct answer. This is because the
guarantees about reconstruction are all stated probabalistically (i.e. “high-
probability”). This is another major difference between this and Shannon
sampling.</p>
<p>Let’s encapulate the above $L_1$ minimization code so we can use it later.</p>
<p><strong>In [5]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">cStringIO</span> <span class="kn">import</span> <span class="n">StringIO</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="k">def</span> <span class="nf">L1_min</span><span class="p">(</span><span class="n">Phi</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">K</span><span class="p">):</span>
<span class="c"># inequalities matrix with</span>
<span class="n">M</span><span class="p">,</span><span class="n">Nf</span> <span class="o">=</span> <span class="n">Phi</span><span class="o">.</span><span class="n">shape</span>
<span class="n">G</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">rearrange_G</span><span class="p">(</span><span class="n">scipy</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">block_diag</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">matrix</span><span class="p">([[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mf">1.0</span><span class="p">]]),]</span><span class="o">*</span><span class="n">Nf</span><span class="p">)</span> <span class="p">))</span>
<span class="c"># objective function row-matrix</span>
<span class="n">c</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">ones</span><span class="p">(</span><span class="n">Nf</span><span class="p">),</span><span class="n">zeros</span><span class="p">(</span><span class="n">Nf</span><span class="p">)]))</span>
<span class="c"># RHS for inequalities</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,]</span><span class="o">*</span><span class="p">(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">),(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="c"># equality constraint matrix</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">Phi</span><span class="o">*</span><span class="mi">0</span><span class="p">,</span><span class="n">Phi</span><span class="p">]))</span>
<span class="c"># RHS for equality constraints</span>
<span class="n">b</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="c"># suppress standard output</span>
<span class="n">old_stdout</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">mystdout</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">solvers</span><span class="o">.</span><span class="n">lp</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span><span class="n">A</span><span class="p">,</span><span class="n">b</span><span class="p">)</span>
<span class="c"># restore standard output</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">old_stdout</span>
<span class="n">sln</span> <span class="o">=</span> <span class="n">array</span><span class="p">(</span><span class="n">sol</span><span class="p">[</span><span class="s">'x'</span><span class="p">])</span><span class="o">.</span><span class="n">flatten</span><span class="p">()[:</span><span class="n">Nf</span><span class="p">]</span><span class="o">.</span><span class="nb">round</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="k">return</span> <span class="n">sln</span></code></pre></figure>
<h2 id="example-sparse-fourier-transform">Example: Sparse Fourier Transform</h2>
<p>As an additional example, let us consider the Fourier transform and see if we
can recover the sparse Fourier transform from a small set of measurements. For
simplicity, we will assume that the time domain signal is real which
automatically means that the Fourier transform is symmetric.</p>
<p><strong>In [141]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">dftmatrix</span><span class="p">(</span><span class="n">N</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
<span class="s">'compute inverse DFT matrices'</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">arange</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="n">U</span><span class="o">=</span><span class="n">matrix</span><span class="p">(</span> <span class="n">exp</span><span class="p">(</span><span class="mi">1</span><span class="n">j</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">pi</span><span class="o">/</span><span class="n">N</span><span class="o">*</span><span class="n">n</span><span class="o">*</span><span class="n">n</span><span class="p">[:,</span><span class="bp">None</span><span class="p">]</span> <span class="p">))</span><span class="o">/</span><span class="n">sqrt</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="k">return</span> <span class="n">matrix</span><span class="p">(</span><span class="n">U</span><span class="p">)</span>
<span class="n">Nf</span><span class="o">=</span><span class="mi">128</span>
<span class="n">K</span><span class="o">=</span><span class="mi">3</span> <span class="c"># components</span>
<span class="n">M</span> <span class="o">=</span> <span class="mi">8</span> <span class="c"># > K log2(Nf/K); num of measurements</span>
<span class="n">s</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="c"># sparse vector we want to find</span>
<span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span> <span class="c"># set the K nonzero entries</span>
<span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mf">0.5</span>
<span class="n">s</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="c"># symmetric to keep inverse Fourier transform real</span>
<span class="n">Phi</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)[:</span><span class="n">M</span><span class="p">,:]</span> <span class="c"># take M-rows</span>
<span class="n">y</span><span class="o">=</span><span class="n">Phi</span><span class="o">*</span><span class="n">s</span> <span class="c"># measurements</span>
<span class="c"># have to assert the type here on my hardware</span>
<span class="n">sol</span><span class="o">=</span><span class="n">L1_min</span><span class="p">(</span><span class="n">Phi</span><span class="o">.</span><span class="n">real</span><span class="p">,</span><span class="n">y</span><span class="o">.</span><span class="n">real</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">),</span><span class="n">K</span><span class="p">)</span>
<span class="k">print</span> <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">flatten</span><span class="p">(),</span><span class="n">sol</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>True
</code></pre>
</div>
<p><strong>In [140]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">plot</span><span class="p">(</span><span class="n">sol</span><span class="p">)</span>
<span class="n">plot</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">real</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>[<matplotlib.lines.Line2D at 0x7884910>]
</code></pre>
</div>
<p><img src="/images/compressive_sampling_22_1.png" alt="png" /></p>
<h2 id="uniform-uncertainty-principle">Uniform Uncertainty Principle</h2>
<p>$\Phi$ obeys a UUP for sets of size $K$ if</p>
<center>
$$ 0.8 \frac{M}{N} ||f||_2^2 \leq || \Phi f||_2^2 \leq 1.2 \frac{M}{N}
||f||_2^2 $$
</center>
<p>Measurements that satisfy this are defined as <em>incoherent</em>. Given that $f$ is
$K$-sparse and we measure
$y=\Phi f$, then we search for the sparsest vector that explains the $y$
measurements and thus find $f$ as follows:</p>
<center>
$min_f \\#\lbrace t: f(t) \ne 0 \rbrace $ where $\Phi f = y$
</center>
<div class="highlighter-rouge"><pre class="highlight"><code>Note that the hash mark is the size (i.e. cardinality) of the set. This means that we are looking for the fewest individual points for $f$ that satisfy the constraints. Unfortunately, this is not practically possible, so we must use the $\mathbb{L}_1$ norm as a proxy for sparsity.
</code></pre>
</div>
<p>Suppose $f$ is $K$-sparse and that $\Phi$ obeys UUP for sets of size $4K$. Then
we measure $y=\Phi f$ and then solve</p>
<center>
$min_f ||f||_1 $ where $\Phi f = y$
</center>
<p>to recover $f$ exactly and we can use $M > K \log N$ measurements, where the
number of measurements is approximately equal to the number of active
components. Let’s consider a concrete example of how this works.</p>
<h3 id="example-sampling-sinusoids">Example: Sampling Sinusoids</h3>
<p>Here, we sample in the time-domain, given that we know the signal is sparse in
the frequency domain.</p>
<center>
$$ \hat{f}(\omega) = \sum_{i=1}^K \alpha_i \delta(\omega_i-\omega) $$
</center>
<p>which means that it consists of $K$-sparse nonzero elements. Therefore, the time
domain signal is</p>
<center>
$$ f(t) = \sum_{i=1}^K \alpha_i e^{i \omega_i t} $$
</center>
<p>where the $\alpha_i$ and $\omega_i$ are unknown. We want solve for these
unknowns by taking $M \gt K \log N$ samples of $f$.</p>
<p>The problem we want to solve is</p>
<table>
<tbody>
<tr>
<td>$ min_g</td>
<td> </td>
<td>\hat{g}</td>
<td> </td>
<td>_{L_1}$</td>
</tr>
</tbody>
</table>
<p>subject to</p>
<p>$ g(t_m)=f(t_m) $</p>
<p>The trick here is that are minimizing in the frequency-domain while the
constraints are in the time-domain. To make things easier, we will restrict our
attention to real time-domain signals $f$ and we will only reconstruct the even-
indexed time-samples from the signal. This means we need a way of expressing
the inverse Fourier Transform as a matrix of equality constraints. The
assumption of real-valued time-domain signals implies the following symmetry in
the frequency-domain:</p>
<p>$ F(k) = F(N-k)^* $</p>
<p>where $F$ is the Fourier transform of $f$ and the asterisk denotes complex
conjugation and $k\in \lbrace 0,1,..N-1\rbrace$ and $N$ is the Fourier Transform
length. To make things even more tractable we will assume the time-domain signal
is even, which means real-valued Fourier transform values.</p>
<p>Suppose that $\mathbf{U}_N$ is the $N$-point DFT-matrix. Note that we always
assume $N$ is even. Since we are dealing with only real-valued signals, the
transform is symmetric, so we only need half of the spectrum computed. It turns
out that the even-indexed time-domain samples can be constructed as follows:</p>
<p>$ \mathbf{f_{even}} = \mathbf{U}_{N/2} \begin{bmatrix}\<br />
F(0)+F(N/2)^* \<br />
F(1)+F(N/2-1)^* \<br />
F(2)+F(N/2-2)^* \<br />
\dots \<br />
F(N/2-1)+F(1)^*
\end{bmatrix}$</p>
<p>We can further simplify this by breaking this into real (superscript $R$) and
imaginary (superscript $I$) parts and keeping only the real part</p>
<script type="math/tex; mode=display">\mathbf{f_{even}} = \mathbf{U}_{N/2}^R
\begin{bmatrix}\\\\
F(0)^R+F(N/2)^R \\\\
F(1)^R+F(N/2-1)^R \\\\
F(2)^R+F(N/2-2)^R \\\\
\dots \\\\
F(N/2-1)^R+F(1)^R
\end{bmatrix}
+
\mathbf{U}^I_N
\begin{bmatrix} \\\\
-F(0)^I+F(N/2)^I \\\\
-F(1)^I+F(N/2-1)^I \\\\
-F(2)^I+F(N/2-2)^I \\\\
\dots \\\\
-F(N/2-1)^I+F(1)^I
\end{bmatrix}</script>
<p>But we are going to force all the $F(k)^I$ to be zero in our example. Note that
the second term should have a $\mathbf{U}_{N/2}$ in it instead $\mathbf{U}_N$
but there is something wrong with the javascript parser for that bit of TeX.</p>
<p>Now, let’s see if we can walk through to step-by-step to make sure our
optimization can actually work. Note that we don’t need the second term on the
right with the $F^I$ terms because by our construction, $F$ is real.</p>
<p><strong>In [358]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">dftmatrix</span><span class="p">(</span><span class="n">N</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
<span class="s">'compute inverse DFT matrices'</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">arange</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="n">U</span><span class="o">=</span><span class="n">matrix</span><span class="p">(</span> <span class="n">exp</span><span class="p">(</span><span class="mi">1</span><span class="n">j</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">pi</span><span class="o">/</span><span class="n">N</span><span class="o">*</span><span class="n">n</span><span class="o">*</span><span class="n">n</span><span class="p">[:,</span><span class="bp">None</span><span class="p">]</span> <span class="p">))</span><span class="o">/</span><span class="n">sqrt</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="k">return</span> <span class="n">matrix</span><span class="p">(</span><span class="n">U</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">Q_rmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
<span class="s">'implements the reordering, adding, and stacking of the matrices above'</span>
<span class="n">Q_r</span><span class="o">=</span><span class="n">matrix</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">eye</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">),</span><span class="n">eye</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span><span class="o">*</span><span class="mi">0</span><span class="p">])</span>
<span class="o">+</span><span class="n">hstack</span><span class="p">([</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">)),</span><span class="n">fliplr</span><span class="p">(</span><span class="n">eye</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)),</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span><span class="p">))]))</span>
<span class="k">return</span> <span class="n">Q_r</span>
<span class="n">Nf</span><span class="o">=</span><span class="mi">8</span>
<span class="n">F</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="c"># 8-point DFT</span>
<span class="n">F</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span> <span class="mi">1</span> <span class="c"># DC-term, constant signal</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">arange</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="n">ft</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span><span class="o">.</span><span class="n">H</span><span class="o">*</span><span class="n">F</span> <span class="c"># this gives the constant signal</span>
<span class="n">Q_r</span><span class="o">=</span><span class="n">Q_rmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span>
<span class="n">U</span><span class="o">=</span><span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="c">#half inverse DFT matrix</span>
<span class="n">feven</span><span class="o">=</span> <span class="n">U</span><span class="o">.</span><span class="n">real</span><span class="o">*</span><span class="n">Q_r</span><span class="o">*</span><span class="n">F</span> <span class="c"># half the size</span>
<span class="k">print</span> <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">feven</span><span class="p">,</span><span class="n">ft</span><span class="p">[::</span><span class="mi">2</span><span class="p">])</span> <span class="c"># retrieved even-numbered samples</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>False
</code></pre>
</div>
<p><strong>In [359]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># let's try this with another sparse frequency-domain signal</span>
<span class="n">F</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">F</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span> <span class="c"># symmetric part</span>
<span class="n">ft</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span><span class="o">.</span><span class="n">H</span><span class="o">*</span><span class="n">F</span> <span class="c"># this gives the constant signal</span>
<span class="n">feven</span><span class="o">=</span> <span class="n">U</span><span class="o">.</span><span class="n">real</span><span class="o">*</span><span class="n">Q_r</span><span class="o">*</span><span class="n">F</span> <span class="c"># half the size</span>
<span class="k">print</span> <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">feven</span><span class="p">,</span><span class="n">ft</span><span class="p">[::</span><span class="mi">2</span><span class="p">])</span> <span class="c"># retrieved even-numbered samples</span>
<span class="n">plot</span><span class="p">(</span><span class="n">arange</span><span class="p">(</span><span class="n">Nf</span><span class="p">),</span><span class="n">ft</span><span class="o">.</span><span class="n">real</span><span class="p">,</span><span class="n">arange</span><span class="p">(</span><span class="n">Nf</span><span class="p">)[::</span><span class="mi">2</span><span class="p">],</span><span class="n">feven</span><span class="p">,</span><span class="s">'o'</span><span class="p">)</span>
<span class="n">xlabel</span><span class="p">(</span><span class="s">'$t$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">22</span><span class="p">)</span>
<span class="n">ylabel</span><span class="p">(</span><span class="s">'$f(t)$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">22</span><span class="p">)</span>
<span class="n">title</span><span class="p">(</span><span class="s">'even-numbered samples'</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>False
<matplotlib.text.Text at 0x7205970>
</code></pre>
</div>
<p><img src="/images/compressive_sampling_29_2.png" alt="png" /></p>
<p>We can use the above cell to create more complicated real signals. You can
experiment with the cell below. Just remember to impose the symmetry condition!</p>
<p><strong>In [360]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">Nf</span><span class="o">=</span><span class="mi">32</span> <span class="c"># must be even</span>
<span class="n">F</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="c"># set values and corresponding symmetry conditions</span>
<span class="n">F</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="n">F</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="o">=</span><span class="mf">0.5</span>
<span class="n">F</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span><span class="o">=-</span><span class="mf">0.25</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">9</span><span class="p">]</span><span class="o">=-</span><span class="mf">0.25</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">7</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span> <span class="c"># symmetric part</span>
<span class="n">Q_r</span><span class="o">=</span><span class="n">Q_rmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span>
<span class="n">U</span><span class="o">=</span><span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="c">#half inverse DFT matrix</span>
<span class="n">ft</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span><span class="o">.</span><span class="n">H</span><span class="o">*</span><span class="n">F</span> <span class="c"># this gives the constant signal</span>
<span class="n">feven</span><span class="o">=</span> <span class="n">U</span><span class="o">.</span><span class="n">real</span><span class="o">*</span><span class="n">Q_r</span><span class="o">*</span><span class="n">F</span> <span class="c"># half the size</span>
<span class="k">print</span> <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">feven</span><span class="p">,</span><span class="n">ft</span><span class="p">[::</span><span class="mi">2</span><span class="p">])</span> <span class="c"># retrieved even-numbered samples</span>
<span class="n">plot</span><span class="p">(</span><span class="n">arange</span><span class="p">(</span><span class="n">Nf</span><span class="p">),</span><span class="n">ft</span><span class="o">.</span><span class="n">real</span><span class="p">,</span><span class="n">arange</span><span class="p">(</span><span class="n">Nf</span><span class="p">)[::</span><span class="mi">2</span><span class="p">],</span><span class="n">feven</span><span class="p">,</span><span class="s">'o'</span><span class="p">)</span>
<span class="n">xlabel</span><span class="p">(</span><span class="s">'$t$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">22</span><span class="p">)</span>
<span class="n">ylabel</span><span class="p">(</span><span class="s">'$f(t)$'</span><span class="p">,</span><span class="n">fontsize</span><span class="o">=</span><span class="mi">22</span><span class="p">)</span>
<span class="n">title</span><span class="p">(</span><span class="s">'even-numbered samples'</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code>False
<matplotlib.text.Text at 0x73d8f10>
</code></pre>
</div>
<p><img src="/images/compressive_sampling_31_2.png" alt="png" /></p>
<p>Now that we have gone through all that trouble to create the even-samples
matrix, we can finally put it into the framework of the $L_1$ minimization
problem:</p>
<table>
<tbody>
<tr>
<td>$ min_F</td>
<td> </td>
<td>\mathbf{F}</td>
<td> </td>
<td>_{L_1}$</td>
</tr>
</tbody>
</table>
<p>subject to</p>
<p>$ \mathbf{U}_{N/2}^R \mathbf{Q}_r \mathbf{F}= \mathbf{f} $</p>
<p><strong>In [361]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">rearrange_G</span><span class="p">(</span> <span class="n">x</span> <span class="p">):</span>
<span class="s">'setup to put inequalities matrix with first 1/2 of elements as main variables'</span>
<span class="n">n</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="n">hstack</span><span class="p">([</span><span class="n">x</span><span class="p">[:,</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">n</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[:,</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">n</span><span class="p">,</span><span class="mi">2</span><span class="p">)]])</span>
<span class="n">K</span><span class="o">=</span><span class="mi">2</span> <span class="c"># components</span>
<span class="n">Nf</span><span class="o">=</span><span class="mi">128</span> <span class="c"># number of samples</span>
<span class="n">M</span> <span class="o">=</span> <span class="mi">18</span> <span class="c"># > K log(N); num of measurements</span>
<span class="c"># setup signal DFT as F</span>
<span class="n">F</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="n">Nf</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">F</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="n">F</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">=</span><span class="mf">0.5</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span> <span class="c"># symmetric parts</span>
<span class="n">F</span><span class="p">[</span><span class="n">Nf</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="o">=</span><span class="mf">0.5</span>
<span class="n">ftime</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span><span class="o">.</span><span class="n">H</span><span class="o">*</span><span class="n">F</span> <span class="c"># this gives the time-domain signal</span>
<span class="n">ftime</span> <span class="o">=</span> <span class="n">ftime</span><span class="o">.</span><span class="n">real</span> <span class="c"># it's real anyway</span>
<span class="n">time_samples</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">18</span><span class="p">,</span> <span class="mi">24</span><span class="p">,</span> <span class="mi">34</span><span class="p">,</span> <span class="mi">36</span><span class="p">,</span> <span class="mi">38</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">44</span><span class="p">,</span> <span class="mi">46</span><span class="p">,</span> <span class="mi">52</span><span class="p">,</span> <span class="mi">56</span><span class="p">,</span> <span class="mi">54</span><span class="p">,</span><span class="mi">62</span><span class="p">]</span>
<span class="n">half_indexed_time_samples</span> <span class="o">=</span> <span class="p">(</span><span class="n">array</span><span class="p">(</span><span class="n">time_samples</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="n">Phi</span> <span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">real</span><span class="o">*</span><span class="n">Q_rmatrix</span><span class="p">(</span><span class="n">Nf</span><span class="p">)</span>
<span class="n">Phi_i</span> <span class="o">=</span> <span class="n">Phi</span><span class="p">[</span><span class="n">half_indexed_time_samples</span><span class="p">,:]</span>
<span class="c"># inequalities matrix with</span>
<span class="n">G</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">rearrange_G</span><span class="p">(</span><span class="n">scipy</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">block_diag</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">matrix</span><span class="p">([[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">],[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mf">1.0</span><span class="p">]]),]</span><span class="o">*</span><span class="n">Nf</span><span class="p">)</span> <span class="p">))</span>
<span class="c"># objective function row-matrix</span>
<span class="n">c</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">zeros</span><span class="p">(</span><span class="n">Nf</span><span class="p">),</span><span class="n">ones</span><span class="p">(</span><span class="n">Nf</span><span class="p">)]))</span>
<span class="c"># RHS for inequalities</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,]</span><span class="o">*</span><span class="p">(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">),(</span><span class="n">Nf</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="s">'d'</span><span class="p">)</span>
<span class="c"># equality constraint matrix</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">matrx</span><span class="p">(</span><span class="n">hstack</span><span class="p">([</span><span class="n">Phi_i</span><span class="p">,</span><span class="n">Phi_i</span><span class="o">*</span><span class="mi">0</span><span class="p">]))</span>
<span class="c"># RHS for equality constraints</span>
<span class="n">b</span><span class="o">=</span><span class="n">matrx</span><span class="p">(</span><span class="n">ftime</span><span class="p">[</span><span class="n">time_samples</span><span class="p">])</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">solvers</span><span class="o">.</span><span class="n">lp</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span><span class="n">A</span><span class="p">,</span><span class="n">b</span><span class="p">)</span></code></pre></figure>
<div class="highlighter-rouge"><pre class="highlight"><code> pcost dcost gap pres dres k/t
0: 0.0000e+00 -0.0000e+00 4e+02 2e+01 3e+00 1e+00
1: -1.5648e+01 -1.2218e+01 2e+03 2e+01 3e+00 4e+00
2: -2.3184e+03 -1.7022e+03 1e+06 8e+01 1e+01 6e+02
3: -2.2814e+05 -1.6566e+05 1e+08 8e+01 1e+01 6e+04
4: -2.2818e+07 -1.6568e+07 1e+10 8e+01 1e+01 6e+06
5: -2.2818e+09 -1.6568e+09 1e+12 8e+01 1e+01 6e+08
Certificate of dual infeasibility found.
</code></pre>
</div>
<p><strong>In [12]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">itertools</span> <span class="kn">as</span> <span class="nn">it</span>
<span class="k">def</span> <span class="nf">dftmatrix</span><span class="p">(</span><span class="n">N</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
<span class="s">'compute inverse DFT matrices'</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">arange</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="n">U</span><span class="o">=</span><span class="n">matrix</span><span class="p">(</span> <span class="n">exp</span><span class="p">(</span><span class="mi">1</span><span class="n">j</span><span class="o">*</span><span class="mi">2</span><span class="o">*</span><span class="n">pi</span><span class="o">/</span><span class="n">N</span><span class="o">*</span><span class="n">n</span><span class="o">*</span><span class="n">n</span><span class="p">[:,</span><span class="bp">None</span><span class="p">]</span> <span class="p">))</span><span class="o">/</span><span class="n">sqrt</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="k">return</span> <span class="n">matrix</span><span class="p">(</span><span class="n">U</span><span class="p">)</span>
<span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">5489</span><span class="p">)</span> <span class="c"># set random seed for reproducibility</span>
<span class="n">Psi</span><span class="o">=</span> <span class="n">dftmatrix</span><span class="p">(</span><span class="mi">128</span><span class="p">)</span>
<span class="n">Phi</span><span class="o">=</span> <span class="n">randn</span><span class="p">(</span><span class="n">M</span><span class="p">,</span><span class="mi">128</span><span class="p">)</span>
<span class="n">s</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="n">s</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span><span class="o">=</span><span class="mi">1</span>
<span class="n">Theta</span> <span class="o">=</span> <span class="n">Phi</span><span class="o">*</span><span class="n">Psi</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">Theta</span><span class="o">*</span><span class="n">s</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">it</span><span class="o">.</span><span class="n">combinations</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">128</span><span class="p">),</span><span class="mi">2</span><span class="p">):</span>
<span class="n">sstar</span><span class="o">=</span><span class="n">zeros</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">sstar</span><span class="p">[</span><span class="n">array</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span><span class="o">=</span><span class="mi">1</span>
<span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">Theta</span><span class="o">*</span><span class="n">sstar</span><span class="p">,</span><span class="n">y</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">print</span> <span class="s">'no solution'</span></code></pre></figure>
<p><strong>In [9]:</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">%</span><span class="n">qtconsole</span></code></pre></figure>
<p><strong>In [None]:</strong></p>D. Zack Garzadzackgarza@gmail.comCompressive sampling OverviewA Brief Introduction to Category Theory2017-01-08T00:00:00-08:002017-01-08T00:00:00-08:00/brief-intro-to-category-theory-1<h1 id="disclaimer">Disclaimer</h1>
<p>This is meant to be a relatively short and <strong>non-rigorous</strong> introduction to Category Theory. Although I will be defining and using a lot of the technical terminology that is commonly used, this talk is primarily aimed at introducing these concepts, why they exist, and where they’re useful and commonly used.</p>
<p>In fact, most of the results used here will be stated with very minimal proof – this is partly due to time constraints, and diving into them here would obfuscate the more high-level points I’d like to make. However, if you are interested in seeing and working through some of these types of proofs yourself, I’ve included some references that I’d recommend near the end.</p>
<h1 id="introduction">Introduction</h1>
<p>Of course, if we’re going to talk about Category Theory, I should probably start by telling you whatu it is! However, instead of diving into the definitions immediately, I think it helps to have some motivation for <em>why</em> such a thing should even exist in the first place.</p>
<p>Category Theory was conceived (or invented, or discovered; whichever you prefer) in 1945 by Samuel Eilenberg and Saunders Mac Lane while working on something called the Cech cohomology, which is central in the field of algebraic topology.</p>
<p>One of Eilenberg and Mac Lane’s motivations was that it was (and still is!) common among mathematicians to refer to certain constructions as “natural” and “canonical” - broadly speaking, these terms are used to denote constructions that were somehow “choice-free”. For example, one might want to study vector spaces without explicitly choosing a basis vectors. In this way, one can discover properties that don’t actually <em>depend</em> on a particular frame of reference, and in some sense are more “universal” and intrinsic to the object being studied.</p>
<p>In particular, Eilinberg and Mac Lane wanted to formalize the notion of a <strong>natural transformation</strong> and things that were “naturally isomorphic.”</p>
<h3 id="interlude---what-does-natural-mean">Interlude - What does “natural” mean?</h3>
<hr />
<p>A canonical example from mathematics is that, given a finite-dimensional vector space $V$ over a field $k$ (you can just take $k=\mathbb{R} $ here if you’d like), one can look at its <em>dual space</em>, denoted $V^{\ast}$, which is the space of all functions $f : V \rightarrow k$ that take vectors in $V$ as input and output scalars in the base field $k$. It turns out that $V^{*}$ is also a vector space, with the same dimension as $V$, and one result you might remember from linear algebra is that $\text{dim} V = n \Rightarrow V \cong R^n$ - that is, all vector spaces of finite dimension $n$ are indistinguishable (as vector spaces) from $\mathbb{R}^n$.</p>
<p>In particular, we have $\text{dim} V^{\ast} = n$, so $V^{\ast} \cong R^n \cong V$. So $V$ is isomorphic to its dual.</p>
<p>But $V^{\ast}$ is a vector space in its own right, so we can look at <em>it’s</em> dual too! This is denoted $V^{\ast\ast}$, and sometimes referred to as the “double dual” of $V$. In exactly the same way, we find that $\text{dim}V^{\ast\ast} =n$ as well, and so $V^{\ast\ast} \cong V^{\ast}$, and so we can conclude that $V \cong V^{\ast\ast}$ - that is, $V$ is isomorphic to its double dual.</p>
<p>So $V$ is isomorphic to $V^{\ast}$, and it is also isomorphic to $V^{\ast\ast}$. However, when one goes through the process of actually finding and constructing these bijections, one finds that the map from $V \rightarrow V^{\ast}$ truly depends on choosing a basis for $V$; on the other hand, the map from $V$ to $V^{\ast\ast}$ requires <em>no such choice</em>. In this way, we say that $V$ is isomorphic to its dual, but $V$ is <em>naturally</em> isomorphic to it’s double dual.</p>
<hr />
<p>This idea of ‘naturality” is part of what category theory sets out to make precise.</p>
<p>A secondary motivation was to to abstract away properties that are really only a result of some particular structure or construction, and don’t actually have much to do with the specific kind of object you’re working with. (If you’ve programmed much, the analog here would be “refactoring” commonly used pieces of code into a more general interface.)</p>
<p>A few such constructions would be things like products or quotients of objects, which are ubiquitous in mathematics. With products, for example, it is possible to construct a product of sets (which have very little structure), but we can also construct a product of vector spaces (which have a very rich structure). It’s then natural to ask, what commonalities do these constructions share? Which properties of a product of vector spaces are due to them being vector spaces, and which are just a result of its construction as a product? This is another area where category theory shines; notions such as products and quotients can be described in terms of <em>universal properties</em>, which pay no heed to what the underlying objects really are at all.</p>
<p>As a result, Category Theory provides a way of describing things in ways that are general enough be applied very broadly. It is useful as both an organizational tool, and also as a general language and logical framework which has found use not only in various branches mathematics, but also in logic, computer science, physics, philosophy, linguistics, and a host of other fields.</p>
<p>On one hand, it serves as “simplification through abstraction” – we move from studying individual trees to studying the forest as a whole. On the other hand, it also allows us to reason about entire collections of forests, and how to transport our findings from one forest to another.</p>
<p>In a nutshell, categories were invented as a framework to support <strong>functors</strong>, which were in turn invented to describe <strong>natural transformations</strong> between objects, which are in turn used to define <strong>adjoints</strong>. Of course, many other useful categorical tools have been developed, adjunction is really one of<strong>the</strong> key notions that category theory is meant to describe support.</p>
<h3 id="interlude---what-is-an-adjoint">Interlude - What is an adjoint?</h3>
<hr />
<p>Adjunction is a slightly complicated concept, but informally speaking, <strong>functors</strong> map categories into other categories, and adjoints allow you to “approximate” one category by another. And in some cases, there is also an “inverse” to this approximation which takes you back to the original category.</p>
<p>For example, consider groups and sets; there are categories $\mathbf{Grp}$ and $\mathbf{Set}$ in which these objects live. A group is really just a set that is decorated with some additional structure - in this case, a binary operation that essentially behaves like modular addition. Usually groups are given to you with an <em>a priori</em> notion of what this operation is, but what if this weren’t the case? If you were just given a set, is there any way to “upgrade” it to a group?</p>
<p>The answer is yes; if $X$ is any set, there is a construction called the <em>free group on</em> $X$, denoted $F(X)$, which goes something like this: given a set like $A = {a, b}$, one thinks of $A$ as a formal alphabet of symbols, and makes another set of “formal inverses” of $A$, say $B = {a^{-1}, b^{-1}}$. Then, take the set $G = A \coprod B = {a,b,a^{-1},b^{-1}}$ , add an element $\varepsilon$ to denote an empty symbol, and define a group operation $\bigstar$ that is simply the concatenation of symbols together (subject to no rules or relations other than $x \bigstar \varepsilon = x$). We then stipulate that whenever something like $aa^{-1}$ occurs in a string (again, strictly as formal symbols over the alphabet $G$), there is a reduction operation that replaces this with $\varepsilon$. After quotienting out by an equivalence under these reductions, we produce something that is a well-defined group, and is somehow the minimal group that could have been made from the original set and no other information.</p>
<p>Then, there is something called a “forgetful functor” $\mathcal{F}$ from $\mathbf{Grp}$ into $\mathbf{Set}$ that takes a group and gives you only the underlying set, “forgetting” everything about its structure as a group. For example, if one took that group $(\mathbb{Z}_2 = {0,1}$ with the group operation $0+1 =1+0 = 1, 0+0=1+1=0$ (i.e., the $XOR$ operation), then applying $\mathcal{F}$ to $(\mathbb{Z}_2, XOR)$ just gives you a two element set ${a_0. a_1}$.</p>
<p>Then $\mathcal{F}$ has an adjoint $\mathcal{G}$, which creates the free group on that set, $F({a_0, a_1})$. So if you apply $\mathcal{G} \circ \mathcal{F}$ to $\mathbb{Z_2}$, you end up back in $\mathbf{Grp}$, but you don’t get back the same group you started with - indeed, the free group consists of infinitely many strings over the alphabet $a_o, a_1, a_0^{-1}, a_1^{-1}$, while $\mathbb{Z}_2$ had only two elements. So this adjunction, the free group, provided a way to reconstruct a minimal group out of the information we lost by applying $\mathcal{F}$. For this reason, you’ll often hear of adjunction as the “the most efficient” solution to a given problem, or as a form of “optimization”.</p>
<p>(In this case, however, there was only one group with an underlying set of two elements, so if we knew the adjunction was applied, we could deduce what the original group was!)</p>
<hr />
<h1 id="definition-of-a-category">Definition of a Category</h1>
<ul>
<li>Informally, a category is a collection of <strong>objects</strong> and <strong>arrows</strong> between them.
<ul>
<li>Each arrow has a unique source and a target, both of which are objects, and arrows can be <strong>composed</strong> (definition later)</li>
<li>For example, there is a category <strong>Set</strong> where the objects are just normal sets, and the arrows are functions between sets. Composition of arrows is just the usual composition of functions.</li>
</ul>
</li>
<li>Simply put, categories are <strong>Directed Graphs</strong> (diagrams) with certain constraints on the edges
<ul>
<li>Nodes are <strong>objects</strong> in the category, edges are <strong>morphisms</strong> between objects, subject to:
<ul>
<li>Every node has an edge to itself</li>
<li>For any sequence of paths between two nodes, there is a direct path between them.</li>
</ul>
</li>
<li>Show free category construction</li>
</ul>
</li>
<li>The objects are black boxes - not allowed to “look into” them!</li>
<li>Formally, a category $C$ is two pieces of data:
<ul>
<li>$Ob(C)$, the class of <em>objects</em> of $C$,</li>
<li>$Hom(C)$, the <strong>set</strong> of <em>morphisms</em> between objects in $Ob(C)$
<ul>
<li>Members of $Hom(C)$ are denoted $Hom_C(X,Y)$, where $X,Y \in Ob(C)$.</li>
</ul>
</li>
<li>Along with a binary operation $\circ$ which composes morphisms:
<ul>
<li>$\forall X,Y,Z \in Ob(C)$ where $f: X \rightarrow Y$ and $g: Y \rightarrow Z$, there exists the <strong>composition</strong> $h$ of $f$ and $g$, denoted $h = g \circ f$, where $h: X \rightarrow Z$.</li>
<li>Using types: <script type="math/tex">\circ: Hom_C(X,Y) \times Hom_C(Y,Z) \rightarrow Hom_C(X,Z)</script> <script type="math/tex">(f \times g) \mapsto g \circ f</script></li>
</ul>
</li>
<li>And two rules:
<ul>
<li>Associativity of $\circ$, given by $f\circ(g\circ h) = (f\circ g)\circ h$</li>
<li>Existence of unique two-sided identities: $\forall X \in Ob(C), \exists id_X \in Hom(C)$ where $id_x: X \rightarrow X$. These satisfy
<ul>
<li>$\forall f: A\rightarrow X \in Hom(A,X), f\circ id_X = f$ and</li>
<li>$\forall g: X \rightarrow B \in Hom(X, B), id_X \circ g = g$.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="foundational-issues">Foundational Issues</h2>
<ul>
<li>Certain collections of objects are “too big” to be sets
<ul>
<li>Example: $(\exists R = { x : x \not \in x }) \Rightarrow (R\in R \iff R\not\in R)$ (Contradiction)</li>
<li>So one can’t have a “set of all sets”, but we’d like to study such things
<ul>
<li>e.g. we’d like a category <strong>Set</strong> that contains all sets</li>
</ul>
</li>
<li>So we use <strong>classes</strong> (sets with restricted operations)
<ul>
<li>Classes that are sets: <strong>small classes</strong></li>
<li>Classes that are <em>not</em> sets: <strong>proper classes</strong></li>
<li>The collection of <strong>objects</strong> in a category form a proper class.</li>
<li>The morphisms (sometimes called <em>homsets</em>) are usually small classes, can be either.</li>
</ul>
</li>
</ul>
</li>
<li>In a sense, Category theory <em>subsumes</em> and generalizes set theory
<ul>
<li>There are mathematical camps that see it as a possible alternative to set theory for the foundations of math (see <strong>homotopy type theory</strong>)</li>
<li>In certain types of categories, <em>all</em> of mathematics can be formulated (see <strong>topos</strong>)</li>
</ul>
</li>
</ul>
<h1 id="examples">Examples</h1>
<p>Since a category can be quite abstract objects in and of itself, it’s useful to have a few concrete categories in mind to check new definitions and theorems against. Here are a number of toy examples you can use.</p>
<hr />
<p>Here, I’ll just cover what I think are the three most important parts of recognizing that some structure you’ve used is a category - the objects, the morphisms, and what kind of morphisms are called isomorphisms in that category. Checking the categorical axioms is pretty routine and perhaps not as enlightening, so we’ll skip that for now.</p>
<p>That being said, here’s how the examples are formatted:</p>
<ul>
<li><script type="math/tex">\mathbf{Name}</script>: A somewhat informal name I’ve given to the category as a whole. Some names are more “official”, but these vary a lot across the literature. Some categories aren’t named at all, so I’ve supplied arbitrary names in some cases. Note that some categories are named after their object classes (<script type="math/tex">\mathbf{Set}</script>), while others are actually named after their morphism classes (<script type="math/tex">\mathbf{Mat}</script>). Category names are usually typeset in <em>mathbf</em>.
<ul>
<li><em>Objects</em>: Describes the entire class <script type="math/tex">Ob(C)</script>, and gives an example of what the full data of what two distinct members <script type="math/tex">X, Y</script> in <script type="math/tex">Ob(C)</script> might look like. I’ve tried to match the notation to the domain-specific notation one might use when working in each individual category.</li>
<li><em>Morphisms</em>: Denotes what the entire class <script type="math/tex">Hom(C)</script> looks like, as well as what a morphism <script type="math/tex">f: X \rightarrow Y \in Hom_C(X,Y) \in Hom(C)</script> looks like.</li>
<li><em>Isomorphisms</em>: Denotes what conditions one puts on a morphism <script type="math/tex">f: X\rightarrow Y</script> , and perhaps a corresponding morphism <script type="math/tex">g : Y \rightarrow X</script>, in order to recognize <script type="math/tex">X, Y</script> as isomorphic objects in this category. (Often denoted <script type="math/tex">X \cong Y</script>)</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: Some of these categories are constructed, and easier to demonstrate their construction blackboard. I’ve included notes to explain how this is done for a few examples.</p>
<h3 id="constructions">Constructions</h3>
<p>Here, I’ll explicitly describe the full set of objects, and the full set of morphisms.</p>
<ul>
<li><script type="math/tex">\mathbf{2}</script> (The minimal category on two objects)
<ul>
<li>Objects: <script type="math/tex">\{a,b\}</script> (A category made out of two arbitrary objects)</li>
<li>Morphisms: <script type="math/tex">\{Id_a: a \mapsto a, Id_b: b \mapsto b\}</script></li>
<li>Isomorphisms: None (There is no morphism from <script type="math/tex">b</script> to <script type="math/tex">a</script>.)</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{2'}</script>
<ul>
<li>Objects: <script type="math/tex">\{a,b\}</script></li>
<li>Morphisms: <script type="math/tex">\{a \mapsto a, Id_b: b \mapsto b\} \cup \{ \bigstar: a \mapsto b, Id_a: \}</script></li>
<li>Isomorphisms: None (There is no morphism from <script type="math/tex">b</script> to <script type="math/tex">a</script>.)</li>
</ul>
</li>
</ul>
<p><em>Notes:</em> Here I just took <script type="math/tex">\mathbf{2}</script> and added in a single extra morphism. The star symbol is used here just to denote the fact that this mapping is completely made up, and that arrows in categories don’t have to be “functions” in the traditional sense at all. Each arrow is just <em>some</em> way to associate a source object with a target object.</p>
<ul>
<li><script type="math/tex">\mathbf{n}</script> (The minimal category on <script type="math/tex">n</script> objects)
<ul>
<li>Objects: <script type="math/tex">\{a_1, a_2, \cdots, a_n\}</script> (A category made out of <script type="math/tex">n</script> arbitrary objects)</li>
<li>Morphisms: <script type="math/tex">\{Id_{a_1}: a_1 \mapsto a_2, Id_{a_2}: a_2 \mapsto a_2, \cdots , Id_{a_n} a_n \mapsto a_n\}</script></li>
<li>Isomorphisms: None (There are no morphism from <script type="math/tex">a_i</script> to <script type="math/tex">a_j</script> for any <script type="math/tex">i,j \leq n</script>)</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: This just shows that you can make a category out of any set of objects by only supplying identity morphisms - such a category is called <em>discrete</em>. Also note that since every object <em>must</em> have an identity morphism anyway, the objects themselves don’t really matter at all. If we wanted, we could just identify every object with its identity morphism and define categories entirely in terms of morphisms. Practically speaking, though, keeping the notion of objects around makes categories a little easier to work with.</p>
<p>Also, note that it didn’t matter that <script type="math/tex">n</script> was finite here - this construction works for any set <script type="math/tex">X</script>, yielding <script type="math/tex">\mathbf{Dis(X)}</script> (the discrete category on <script type="math/tex">X</script>)</p>
<ul>
<li>
<script type="math/tex; mode=display">\mathbf{3'}</script>
<ul>
<li>Objects: <script type="math/tex">\{a, b\} \cup \{c\}</script> (A “minimally interesting” extension of <script type="math/tex">\mathbf{2}</script>)</li>
<li>Morphisms: <script type="math/tex">\{\bigstar: a \mapsto b, Id_a: a \mapsto a, Id_b: b \mapsto b\}</script>
<script type="math/tex">\cup~\{ Id_c: c\mapsto c\}</script>
<script type="math/tex">\cup~\{\clubsuit: b \mapsto c\}</script>
<script type="math/tex">\cup~\{ \sharp: a\mapsto c \text{ where } \sharp(a) = (\clubsuit \circ \bigstar)(a) \}</script></li>
<li>Isomorphisms: None (There is no map from <script type="math/tex">b</script> to <script type="math/tex">a</script>, <script type="math/tex">c</script> to <script type="math/tex">a</script>, or <script type="math/tex">b</script> to <script type="math/tex">c</script>.)</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: The wacky symbols are again used to denote that these mappings are absolutely arbitrary.</p>
<p>A quick explanation of what I mean by “minimally interesting”, though: Given <script type="math/tex">\mathbb{2}</script>, note that there are really only a few things we can do with it at this point. We could add another morphism, <script type="math/tex">b \mapsto a</script>, and we would get a category where <script type="math/tex">b\cong a</script>.</p>
<p>The other thing we can do is add in a single object <script type="math/tex">c</script>. We are forced to add an identity morphism for this to be a category, which is what the first union in the morphisms section supplies.</p>
<p>At this point, we just have <script type="math/tex">\mathbf{3}</script>, so we look to modify the morphisms a bit to get something slightly different. There are a few choices here, but we’ll go with one of the more interesting ones: a morphism <script type="math/tex">\clubsuit</script> from an existing object <script type="math/tex">b</script> to the new object <script type="math/tex">c</script>.</p>
<p>However, this won’t be a category unless it satisfies the axiom of composition, so we’re forced to add in a morphism that looks like <script type="math/tex">\sharp</script>.</p>
<p>Denote
<script type="math/tex">\bigstar</script> by <script type="math/tex">f</script>
<script type="math/tex">\clubsuit</script> by <script type="math/tex">g</script>
<script type="math/tex">\sharp</script> by <script type="math/tex">g \circ f</script>,</p>
<p>and you get something that perhaps looks a little more familiar:</p>
<p><img src="https://i.imgur.com/016ixGX.png" alt="The Category 3" /></p>
<p>If you haven’t seen this before, don’t worry - you will! This particular kind of diagram shows up in many algebraic constructions (quotients and products, to name a few), and understanding it is the first step in getting a handle on things like universal properties.</p>
<h3 id="more-standard-examples">More Standard Examples</h3>
<p>Here are some common examples of categories that arise in various contexts, roughly in increasing order of complexity.</p>
<ul>
<li>
<script type="math/tex; mode=display">\mathbf{Set}</script>
<ul>
<li>Objects: Sets <script type="math/tex">A, B</script></li>
<li>Morphisms: Set functions <script type="math/tex"></script>f: A \rightarrow B<script type="math/tex"></script></li>
<li>Isomorphisms: Bijective set functions <script type="math/tex">f: A\rightarrow B</script>
<ul>
<li><script type="math/tex">f</script> is bijective iff <script type="math/tex">f</script> is both
<ul>
<li>injective: <script type="math/tex">\forall a_1, a_2 \in A, f(a_1) = f(a_2) \Rightarrow a_1 = a_2</script>
<ul>
<li>This lets you construct a <em>left inverse</em></li>
</ul>
</li>
<li>surjective: <script type="math/tex">\forall b \in B, \exists a\in A : b = f(a)</script>
<ul>
<li>This lets you construct a <em>right inverse</em></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: These conditions allow you to construct a <script type="math/tex"></script>g: B\rightarrow A<script type="math/tex"></script> such that</p>
<ul>
<li><script type="math/tex"></script>\forall b\in B, (f\circ g)(b) = b<script type="math/tex"></script>
<ul>
<li>(i.e.<script type="math/tex"></script>f\circ g = id_B<script type="math/tex"></script> as a function)</li>
</ul>
</li>
<li><script type="math/tex"></script>\forall a \in A, g\circ f(a) = a <script type="math/tex"></script>
<ul>
<li>(i.e. <script type="math/tex"></script>g\circ f = id_A<script type="math/tex"></script> as a function)</li>
</ul>
</li>
</ul>
<p>So we refer to <script type="math/tex">g</script> as <strong>the</strong> two-sided inverse and call it<script type="math/tex">f^{-1}</script>, which is unique when it exists. In many common cases, the objects in a category are “built” out of sets. These categories are called concrete, and the isomorphisms in these categories end up just being isomorphisms of the underlying sets, along with some other structure-preserving conditions. Thus understanding how morphisms and isomorphisms in <script type="math/tex">\mathbf{Set}</script> are constructed is a key first step.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Poset}</script>
<ul>
<li>Objects: Partially-ordered sets <script type="math/tex">(P \leq)</script>, <script type="math/tex">(Q, \prec)</script>
<ul>
<li>Recall that partial orders are reflexive, transitive, antisymmetric binary operations.</li>
</ul>
</li>
<li>Morphisms: Set functions <script type="math/tex">f: P \rightarrow Q</script></li>
<li>Isomorphisms: Bijective functions <script type="math/tex">f: P \rightarrow Q</script> such that
if <script type="math/tex">x,y\in P</script> and <script type="math/tex">x\leq y</script>, then <script type="math/tex">f(x) \prec f(y)</script></li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Rel}</script>
<ul>
<li>Objects: Binary Relations <script type="math/tex">(X, \sim), (Y, \propto)</script>
<ul>
<li>Here <script type="math/tex">X</script> is just a set and <script type="math/tex">\sim \subseteq X\times X</script> is a binary relation.</li>
</ul>
</li>
<li>Morphisms: Relation-preserving set functions <script type="math/tex">f: X \rightarrow Y</script> such that <script type="math/tex">\forall a,b\in X, a\sim b \Rightarrow f(a)\propto f(b)</script></li>
<li>Isomorphims: Bijective set functions (as in <script type="math/tex">\mathbf{Set}</script>) with an inverse <script type="math/tex">g: Y \rightarrow X</script> such that <script type="math/tex">\forall c,d \in Y, c \propto d \Rightarrow g(c) \sim g(d)</script>.</li>
</ul>
<p><em>Notes</em>: This works for any binary relation - for example, take <script type="math/tex">\mathbb{Z}</script> with <script type="math/tex">a \sim b</script> iff <script type="math/tex">a</script> divides <script type="math/tex">b</script>.</p>
<p>Also notice that to get an isomorphism, all we really did was take an isomorphism on the underlying set, and required that the inverse also satisfied the conditions of the morphisms in this category. So really, it required <script type="math/tex">f</script> to be bijective in <script type="math/tex">\mathbf{Set}</script>, then <script type="math/tex">g=f^{-1}</script> just needed to <em>also be morphism in</em> <script type="math/tex">\mathbf{Rel}</script>. We’ll see this pattern in almost every concrete category!</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Grp}</script>
<ul>
<li>Objects: Groups <script type="math/tex">(G, \star), (H, \diamond)</script></li>
<li>Morphisms: Group homomorphisms <script type="math/tex">\varphi: (G, \star) \rightarrow (H, \diamond)</script> where <script type="math/tex"></script>\forall x,y \in G<script type="math/tex"></script>, <script type="math/tex"></script>\phi(x\star y) = \phi(x) \diamond \phi(y)<script type="math/tex"></script></li>
<li>Isomorphisms: Bijective group homomorphisms.
<ul>
<li>These are found by finding a <script type="math/tex">\varphi</script> that is bijective as a set function (as described in <script type="math/tex">\mathbf{Set}</script>) that is almost a homomorphism (as described above).</li>
<li>Then <script type="math/tex">\phi^{-1}</script> can be constructed as a set function, and a result from group theory shows that <script type="math/tex">\phi^{-1}</script> is also a homomorphism.</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: This is the first case in a very common pattern - the isomorphisms in this category were just set bijections, <em>but they preserve the structure of the objects</em>. In this case, homomorphisms end up being the kind of morphisms you need to preserve the fundamental pieces of a group’s structure. They preserve the binary operation (by definition) and associativity (from function composition), but they also end up preserving inverses, identities, and information about the elements themselves like order.</p>
<p>This can be summed up with a wave of the hand by saying that the isomorphisms in a category are just <em>invertible structure-preserving morphisms</em>.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Ring}</script>
<ul>
<li>Objects: Rings <script type="math/tex">(R, +, \times)</script></li>
<li>Morphisms: Ring homomorphisms <script type="math/tex">\varphi: (R, +, \times) \rightarrow (S, \star, \diamond)</script> where <script type="math/tex"></script>\varphi(a\times(b+c)) = \varphi(a)\star(\varphi(b) \diamond \varphi(c))<script type="math/tex"></script></li>
<li>Isomorphisms: Bijective ring homomorphisms</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Ab} = \mathbf{Mod_\mathbb{Z}}</script>
<ul>
<li>Objects: Abelian groups (Left <script type="math/tex">R</script>-modules of <script type="math/tex">\mathbb{Z}</script>)</li>
<li>Homomorphisms of abelian groups <script type="math/tex">\varphi: G \rightarrow H</script></li>
<li>Isomorphisms: Bijective group homomorphisms</li>
</ul>
</li>
</ul>
<p><em>Notes</em></p>
<ul>
<li>
<script type="math/tex; mode=display">\mathbf{Vect_k}</script>
<ul>
<li>Objects: Vector spaces over a field <script type="math/tex">k</script>, say <script type="math/tex">V, W</script></li>
<li>Morphisms: <script type="math/tex">k</script>-linear maps <script type="math/tex">T: V \rightarrow W</script>
<ul>
<li>These are maps <script type="math/tex">T</script> such that <script type="math/tex">\forall v_1, v_2 \in V, \forall k\in K</script>, we have <script type="math/tex">T(v_1 + kv_2) = T(v_1) + kT(v_2)</script>.</li>
</ul>
</li>
<li>Isomorphisms: Invertible linear maps
<ul>
<li>These are maps between sets of <em>vectors</em> of <script type="math/tex">V</script> and <script type="math/tex">W</script> which are bijective functions on these sets (again, just as in <script type="math/tex">\mathbf{Set}</script>) with the restriction that they obey the linearity condition from above.</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: The prescence of <script type="math/tex">k</script> is just a generalization - if you haven’t seen a lot of algebra, you can just take <script type="math/tex">k=\mathbb{R}</script> and think of the category of all vector spaces over <script type="math/tex">\mathbb{R}</script>. Then an object in this category is just <script type="math/tex">\mathbb{R^n}</script> for some <script type="math/tex">n</script>, and the maps are just the usual linear maps you’d see in an undergraduate course on linear algebra.</p>
<p>Notice how the pattern seen in <script type="math/tex">\mathbf{Grp}</script> continues here - to get an isomorphism, you just look at all of the functions between the underlying sets (this is a large set!), take only the bijections, then filter it even further by taking the bijections which preserve the structure you care about.</p>
<p>Here, the structure-preserving maps in vector spaces end up being <em>linear maps</em>. You might notice that condition of linearity looks very similar to the condition for homomorphisms - only now, the operations in both the source and target are the same.</p>
<p>Informally, this is because you essentially get vector spaces by taking a group, tacking on a field <script type="math/tex">k</script>, then adding a few more axioms - so the linearity condition is really just a souped-up homomorphism on the underlying group (here, vectors under addition) that takes into account the remaining axioms (namely, scalar multiplication).</p>
<p>The point of this example is to show that (generally speaking) as more structure is put on the objects, more restrictions will need to be put on the morphisms to retain that structure.</p>
</li>
<li><script type="math/tex">\mathbf{Logic_0}</script> (Propositional / “0-order” Logic)
<ul>
<li>Objects: Propositions <script type="math/tex">P, Q</script></li>
<li>Morphisms: Deductions defined by <script type="math/tex">P \Rightarrow Q</script> or “P implies Q”
<ul>
<li>Also known as deductions</li>
</ul>
</li>
<li>Isomorphisms: Tautologies – equivalent propositions <script type="math/tex">P, Q</script> such that <script type="math/tex">P \iff Q</script></li>
</ul>
<p><em>Notes</em>: This can be thought of as “the category of proofs”, and such a category can be derived from any deductive system. The isomorphisms here are “if and only if” statements, and they are often exploited in Mathematics to create <em>definitions</em>.
(In other words, every Mathematical definition is an iff statement, and any proposition isomorphic to a definition in this category can be taken as an equivalent definition.)</p>
</li>
<li>
<p><script type="math/tex">\mathbf{Aut}</script> (Finite state automata)</p>
<ul>
<li>
<p>Objects: Finite state automata <script type="math/tex">(Q, \Sigma, \delta, q_0, F), (Q', \Sigma, \delta', q'_0, F')</script></p>
<ul>
<li><script type="math/tex">Q</script> is the set of states</li>
<li><script type="math/tex">\Sigma</script> is the input alphabet</li>
<li><script type="math/tex">\delta : \Sigma \times Q \rightarrow Q</script> is a transition map</li>
<li><script type="math/tex">q_0\in Q</script> is the initial state</li>
<li><script type="math/tex">F \subseteq Q</script> is the set of final/accepting states</li>
</ul>
</li>
<li>
<p>Morphisms: Simulations <script type="math/tex">f: Q \rightarrow Q'</script> such that</p>
<ul>
<li>
<script type="math/tex; mode=display">\forall \sigma\in\Sigma, \forall q\in Q, ~ f(\delta(\sigma, q)) = \delta'(\sigma, f(q))</script>
<ul>
<li>(Transitions are preserved)</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">f(q_0) = {q'}_0</script>
<ul>
<li>(Initial states are mapped to each other)</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">f(F) \subseteq F'</script>
<ul>
<li>(Accepting states are preserved)</li>
</ul>
</li>
</ul>
</li>
<li>
<p>Isomorphisms: Bijective simulations that are also bijective on the underlying sets. Note that this forces <script type="math/tex">g=f^{-1}: Q' \rightarrow Q</script> to exist, and</p>
<ul>
<li><script type="math/tex">\forall \sigma\in\Sigma,~\forall q\in Q, (g\circ f)(\delta(\sigma, q)) = \delta(\sigma, q)</script>, so <script type="math/tex">g\circ f = id_Q</script>. Similarly, <script type="math/tex">f\circ g = Id_{Q'}</script></li>
<li>
<script type="math/tex; mode=display">q_0 = q'_0</script>
</li>
<li>
<script type="math/tex; mode=display">F = F'</script>
</li>
</ul>
</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Graph}</script>
<ul>
<li>Objects: Graphs <script type="math/tex">G = (V_1, E_2), H =(V_2, E_2)</script> where <script type="math/tex">E_i \subseteq V_i\times V_i</script></li>
<li>Morphisms: maps <script type="math/tex"></script>f: V_1 \rightarrow V_2<script type="math/tex"></script> where <script type="math/tex"></script>(v,w) \in E_1 \Rightarrow (f(v), f(w)) \in E_2<script type="math/tex"></script>
<ul>
<li>i.e. maps between vertex sets that preserve incidence relations.</li>
</ul>
</li>
<li>Isomorphisms: Bijective graph morphisms</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Mat(\mathbb{F})}</script>
<ul>
<li>Objects: Natural numbers <script type="math/tex">m, n</script></li>
<li>Morphisms: <script type="math/tex">A:m \rightarrow n</script> is <script type="math/tex">m\times n</script> matrix with entries in the underlying field <script type="math/tex">\mathbb{F}</script></li>
<li>Isomorphisms: Natural numbers <script type="math/tex">m, n</script> for which there exists a <script type="math/tex">B: n\rightarrow m</script> , i.e an <script type="math/tex">n\times m</script> matrix, such that <script type="math/tex">AB = BA =I</script>
<ul>
<li>Note that this can only possibly happen when <script type="math/tex">n=m</script>, so <script type="math/tex">A,B</script> are square. But then we can <em>always</em> just take the identity matrix <script type="math/tex">I_n = I_n^{-1}</script> So isomorphisms are just equalities of natural numbers.</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: This category is a little different - the objects don’t matter much, since they’re really just keeping tracks of matrix dimensions. Instead, the morphisms themselves are the data this category encodes.</p>
<p>While this seems like an odd category to consider, the kicker is it’s possible to prove that there is a “full, faithful, surjective functor from <script type="math/tex">\mathbf{Mat}(\mathbb{F})</script> to <script type="math/tex">\mathbf{Vec}(\mathbb{F})</script>” - in other words, one can move between these categories without losing any vital information. In this case, this tells us that when working with (finite dimensional) vector spaces, it doesn’t matter whether you study abstract linear maps or the matrices that represent them!</p>
</li>
<li><script type="math/tex">\mathbf{Hask}</script> (pseudo-category)
<ul>
<li>Objects: Haskell types <script type="math/tex">A, B</script></li>
<li>Morphisms: Functions <script type="math/tex">f: A \rightarrow B</script></li>
<li>Isomorphisms: Type <script type="math/tex">A,B</script> for which there exist functions <script type="math/tex">f: A\rightarrow B, g: B \rightarrow A</script> such that <script type="math/tex">f.g b = id ~b</script> and <script type="math/tex">g.f a = id ~a</script>
<ul>
<li>Note: From the compiler’s point of view, <em>function</em> equivalence is perhaps the more interesting/important thing to look at!</li>
</ul>
</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{\lambda-Calc}</script>
<ul>
<li>Objects: Typed lambda calculi</li>
<li>Morphisms: Translations that map types to types, terms to terms, and preserve equations (<script type="math/tex">\alpha</script> conversions, <script type="math/tex">\beta</script> reductions, etc)</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Diff}</script>
<ul>
<li>Objects: Smooth manifolds <script type="math/tex">(\mathcal{M}, \mathcal{A})</script> ,where <script type="math/tex">\mathcal{M}</script> is a topological manifold (locally homeomorphic to <script type="math/tex">\mathbb{R}^n</script>), and <script type="math/tex">\mathcal{A}</script> is a maximal smooth atlas on <script type="math/tex">\mathcal{M}</script>.</li>
<li>Morphisms: Smooth maps <script type="math/tex">F: (\mathcal{M_1}, \mathcal{A_1}) \rightarrow (\mathcal{M_2}, \mathcal{A_2})</script> (where <script type="math/tex">F = (f_1, f_2, \cdots)</script>) such that <script type="math/tex">\frac{\partial f_i}{\partial x_j}</script> is continuous for all <script type="math/tex">i,j</script>, and if <script type="math/tex">\phi \in \mathcal{A_1}</script> is a chart on <script type="math/tex">\mathcal{M_1}</script>, then <script type="math/tex">F(\phi)\in\mathcal{A_2}</script> and is a chart on <script type="math/tex">\mathcal{M_2}</script></li>
<li>Isomorphisms: Diffeomorphisms - morphisms <script type="math/tex">F</script> with a smooth inverse <script type="math/tex">G</script>.</li>
</ul>
<p><em>Notes</em>: This is where differential geometry and a fair amount of topology takes places, as well as certain branches of analysis, partial differential equations, and physics.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Meas}</script>
<ul>
<li>Objects: Measurable spaces <script type="math/tex">(X, \mathcal{\Sigma}_X), (Y, \mathcal{\Sigma}_Y)</script>
<ul>
<li>(Where the <script type="math/tex">\Sigma \subset 2^X</script> are <script type="math/tex">\sigma</script>-algebras over their respective sets, and the members of <script type="math/tex">\Sigma</script> are denoted the measurable sets)</li>
<li>Note that these are measur<strong>able</strong> spaces, not measure spaces - this is a space for which a measure <script type="math/tex">\mu</script> can be assigned. The triple <script type="math/tex"></script>(X, \Sigma, \mu_X)<script type="math/tex"></script> would be a <strong>measure</strong> space.</li>
</ul>
</li>
<li>
<p>Morphisms: Measurable functions <script type="math/tex">f: (X, \Sigma_X) \rightarrow (Y, \Sigma_Y)</script> such that
<script type="math/tex">E \in \Sigma_Y \Rightarrow f^{-1}(E) \in \Sigma_X</script></p>
<p>(Where <script type="math/tex">f^{-1}</script>denotes the preimage or pullback of <script type="math/tex">f</script>)</p>
</li>
<li>Isomorphisms: Measurable functions <script type="math/tex"></script>f<script type="math/tex"></script> with measurable inverses <script type="math/tex">g: (Y, \Sigma_Y) \rightarrow (X, \Sigma_X)</script> where <script type="math/tex">F \in \Sigma_X \Rightarrow g^{-1}(F) \in \Sigma_Y</script></li>
</ul>
<p><em>Notes</em>: This is where probability theory happens.</p>
<p>Also, it turns out to actually be very tricky to formulate measure theory in a categorical way! If we try to look at the category of <strong>measure</strong> spaces, it turns out that adding the actual measure <script type="math/tex">\mu</script> to a measurable space is in some sense “too strong” of a condition, and the resulting category lacks many useful properties.</p>
<p>(In particular, it occludes the possibility of having a structure that is denoted the “categorical product”. Attempts formalize measure/probability in categorical terms is a topic of relatively current research.)</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Top}</script>
<ul>
<li>Objects: Topological Spaces <script type="math/tex">(X, \mathcal{T}_X)</script></li>
<li>Morphisms: Continuous functions <script type="math/tex"></script>f: (X, \mathcal{T}_X) \rightarrow (Y, \mathcal{T}_Y)<script type="math/tex"></script> such that if <script type="math/tex">U</script> is open in <script type="math/tex">Y</script>, then <script type="math/tex">f^{-1}(U)</script> is open in <script type="math/tex">X</script>.
<ul>
<li>Note that this is equivalent to <script type="math/tex">U \in \mathcal{T}_Y \Rightarrow f^{-1}(U) \in \mathcal{T}_X</script></li>
</ul>
</li>
<li>Isomorphisms: Homeomorphisms where <script type="math/tex">f</script> has an inverse <script type="math/tex">g</script> (as in <script type="math/tex">\mathbf{Set}</script>) where <script type="math/tex">g</script> is also a continuous function.</li>
</ul>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Unif}</script>
<ul>
<li>Objects: Uniform Spaces <script type="math/tex">(X, \varepsilon)</script></li>
<li>Morphisms: Uniformly continuous maps</li>
<li>Isomorphisms: Uniformic maps, i.e. uniformly continuous maps admitting a uniformly continuous inverse.
<ul>
<li>These can be thought of as homemorphisms, along with an added condition of uniformity on the maps and their inverses.</li>
</ul>
</li>
</ul>
<p><em>Notes</em>: A uniform space is a topological space, equipped with some notion of “<script type="math/tex">\varepsilon</script>-closeness”. Things like metric spaces and topological groups fit this description, so most analysis technically happens in this category.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Met}</script>
<ul>
<li>Objects: Metric spaces <script type="math/tex">(M_1, d_1), (M_2, d_2)</script>
<ul>
<li><script type="math/tex">d_1 : M_1 \times M_1 \rightarrow \mathbb{R}</script> is denoted the <em>metric</em> on <script type="math/tex">M_1</script>.</li>
</ul>
</li>
<li>Morphisms: Contractions <script type="math/tex">f: (M_1, d_1) \rightarrow (M_2, d_2)</script> such that <script type="math/tex">\forall x,y \in M_1</script>, we have <script type="math/tex">d_2(f(x), f(y)) \leq d_1(x,y)</script>.</li>
<li>Isomorphisms: Isometries
<ul>
<li>These are just the bijective contractions <script type="math/tex">f</script>, so that <script type="math/tex">d_2(f(x), f(y)) = d_1(x,y)</script></li>
</ul>
</li>
</ul>
<p><em>Notes</em>: The distance function <script type="math/tex">d</script> has to satisfy a few more axioms than <script type="math/tex">\varepsilon</script> in <script type="math/tex">\mathbf{Unif}</script></p>
<p>Starting here, there are actually many choices we could make for the morphisms - for example, we could have chosen uniformly continuous functions, Lipschitz functions, or a few others. Here I’ve just chosen one of the weaker conditions - Lipschitz functions with constant 1.</p>
<p>Since every metric space is a topological space, the morphisms here need to extend the morphisms on <script type="math/tex">\mathbb{Top}</script>. This is in fact the case in <script type="math/tex">\mathbf{Met}</script>, since contractions on metric spaces end up being continuous.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Norm}</script>
<ul>
<li>Objects: Normed spaces</li>
<li>Morphisms: Continuous and linear maps
<ul>
<li>i.e., <script type="math/tex">Hom(\mathbf{Met}) \cap Hom(\mathbf{Vec})</script></li>
</ul>
</li>
<li>Isomorphisms: Continuous linear bijective maps with continuous linear inverses</li>
</ul>
</li>
<li><script type="math/tex">\mathbf{Ban}</script> (Complete normed spaces)
<ul>
<li>Objects: Banach spaces <script type="math/tex">B, C</script></li>
<li>
<table>
<tbody>
<tr>
<td>Morphisms: Bounded linear maps <script type="math/tex">f: B \rightarrow C</script> such that $$</td>
<td> </td>
<td>f</td>
<td> </td>
<td>_{\text{sup}}$$ is finite.</td>
</tr>
</tbody>
</table>
<ul>
<li>If <script type="math/tex">B=C</script>, these are usually referred to as <em>bounded linear operators</em></li>
</ul>
</li>
<li>
<p>Isomorphisms: Bounded linear bijective maps with bounded linear inverses.</p>
<p><em>Notes</em>: A Banach space is a vector space that is also a (complete) metric space, so the morphisms simply reflect that these two structures “play nicely” together. This is the case, since the following three are equivalent in Banach spaces:</p>
<ul>
<li>Bounded linear maps</li>
<li>Continuous linear maps</li>
<li>Uniformly continuous linear maps</li>
</ul>
<p>This is where much of functional analysis happens.</p>
</li>
</ul>
</li>
<li><script type="math/tex">\mathbf{Hilb}</script> (Complete inner product spaces)
<ul>
<li>Objects: Hilbert Spaces <script type="math/tex">\mathcal{H}, \mathcal{K}</script></li>
<li>
<table>
<tbody>
<tr>
<td>Morphisms: Bounded linear maps <script type="math/tex">T: \mathcal{H} \rightarrow \mathcal{K}</script> such that $$</td>
<td> </td>
<td>T</td>
<td> </td>
<td>_{sup}$$ is finite.</td>
</tr>
</tbody>
</table>
</li>
<li>Isomorphisms: Bounded linear maps with bounded linear inverses.</li>
</ul>
<p><em>Notes</em>: It might seem a bit simplistic at first to characterize something like a Hilbert space as essentially an “enriched vector space”, but this turns out to be reflected in its categorical structure - the forgetful functor from <script type="math/tex">\mathbf{Hilb}</script> to <script type="math/tex">\mathbf{Vec}</script> given by forgetting the inner product is <strong>faithful</strong> (the categorical analog of “surjective” for normal functions)</p>
<p>Similarly, one can think of Hilbert spaces like Banach spaces where the norm is induced by the inner product.</p>
</li>
<li>
<script type="math/tex; mode=display">\mathbf{Cat}</script>
<ul>
<li>Objects: Small categories <script type="math/tex">C = (Ob(C), Hom(C)), ~D = (Ob(D), Hom(D))</script></li>
<li>Morphisms: <strong>Functors</strong> <script type="math/tex">F: (Ob(C), Hom(C)) \rightarrow (Ob(D), Hom(D))</script>
<ul>
<li>Functors map:
<ul>
<li>objects <script type="math/tex">c, c' \in Ob(C)</script> to objects <script type="math/tex">F(c) = d, F(c') = d' \in Ob(D)</script>,</li>
<li>morphisms <script type="math/tex">f:c \rightarrow c' \in Hom_C(c, c')</script> to morphisms <script type="math/tex">F(f) : F(c) \rightarrow F(c') \in Hom_D(F(c), F(c'))</script>,
<ul>
<li>Or to clean up notation a bit, morphisms that look like <script type="math/tex">g: d \rightarrow d' \in Hom(d, d')</script>.</li>
</ul>
</li>
</ul>
</li>
<li>In words, this just sends the objects and arrows of one category to another, preserving the way arrows connect objects</li>
</ul>
</li>
<li>Isomorphisms: <strong>Natural isomorphisms</strong>, i.e. functors <script type="math/tex">F: C \rightarrow D</script> with a dual functor <script type="math/tex">G: D \rightarrow C</script> such that <script type="math/tex"></script>F \circ G \cong Id_D \text{ and } G \circ F \cong Id_C<script type="math/tex"></script></li>
</ul>
</li>
</ul>D. Zack Garzadzackgarza@gmail.comA relaitvely short introduction to Category Theory, with concrete examples of categories.Setting Up A Haskell Dev Environment2015-05-30T00:00:00-07:002015-05-30T00:00:00-07:00/tutorials/setting-up-a-haskell-dev-environment<p>Since the second week of this year’s GSOC is drawing to a close, I figured I’d
take a minute to write a bit about my experience diving into Haskell
development.</p>
<p>As someone relatively new to the Haskell world, I’ve had quite a bit to
learn - fortunately, there’s pleny of documentation out there, and the IRC
community is incredibly helpful as well (particularly #haskell, #hackage,
and #haskell-beginners on Freenode).</p>
<p>However, the packaging and build tools themselves are in constant development,
and working with cabal can be a bit tricky at first. In particular, there’s a
lot of noise in forums and wikis concerning the best way to source your
haskell packages - at this point, there are at several separate, viable
ways to manage them:</p>
<ul>
<li>
<p>Use the Haskell packages provided by your distro’s package manager,</p>
</li>
<li>
<p>Install cabal from your distro’s package manager, use it to bootstrap
cabal-install, then pull and build packages from hackage using
<code class="highlighter-rouge">cabal install <package_name></code>.</p>
</li>
<li>
<p>Clone repos from git or darcs directly, and create binaries using
<code class="highlighter-rouge">cabal install</code> in the project’s root directory (preferably using sandboxes), or</p>
</li>
<li>
<p>Use one another tool to help streamline the process, such as the
Nix package manager or Halcyon.</p>
</li>
</ul>
<p>Personally, I’ve run into issues with some of these methods. I work across
three different operating systems (Debian, Arch, and Windows), and coordinating
equivalent packages can be tedious at best.
Bootstrapping cabal-install can very quickly lead to the infamous “cabal hell”,
in which it becomes difficult to keep track of exactly which packages are
sourced as dependencies when you build something new. Cloning repos manually
works fairly well, but in this case it’s difficult to install system-wide tools
that rely on GHC such as HLint, ghc-mod, hdevtools, or really any other package
that’s compiled against the GHC API.</p>
<p>For these reasons, I decided to combine approaches 3 and 4 to set up a
reliable and easily reproducible dev environment – I rely on the package
manager to provide an up-to-date version of GHC for building packages from
source, and have a global config using halcyon that I can switch into with a
few commands for doing dev work.</p>
<p>There are some pretty nice benefits to doing things this way - for example, I
use the haskell-based xmonad for my window manager. With this setup, I can
compile my configuration against the newest version of GHC, and not have to
worry about one of its dependencies clobbering another project’s dependencies
or having to roll back my global version of GHC to install older packages.</p>
<p>Overall, it’s proved so far to be a great way to keep dependencies cleanly
separated, while still allowing multiple versions of ghc and cabal to be
installed alongside each other. Setting things up is pretty straightforward, so
here’s a quick rundown of what you can do to quickly get a dev environment
rolling.</p>
<h1 id="install-halycon">Install Halycon</h1>
<p>In my case, it was easiest to start with a clean OS installation. Assuming
you’re using a *nix variant, the steps should be roughly similar.</p>
<p>Start by installing halcyon – in my case, I did so as root to simplify things.
You can find a tutorial over at <a href="https://halcyon.sh/tutorial/">https://halcyon.sh/tutorial/</a>, but the key bit
is to run:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">eval</span> <span class="s2">"</span><span class="k">$(</span> curl -sL https://github.com/mietek/halcyon/raw/master/setup.sh <span class="k">)</span><span class="s2">"</span></code></pre></figure>
<p>and check that <code class="highlighter-rouge">which ghc</code> and <code class="highlighter-rouge">which cabal</code> both return paths in the /app/
directory.</p>
<p>From this point on, any user with access to the /app/ directory can call
<code class="highlighter-rouge">eval "$( /app/halcyon/halcyon paths</code> )”` to jump into this environment - this
command takes care of putting the correct versions of ghc and cabal at the
front of your path, regardless of which versions you may have otherwise
installed elsewhere.</p>
<p>You can then use <code class="highlighter-rouge">halcyon install</code> in place of <code class="highlighter-rouge">cabal install</code>
to ensure that your packages are built against the particular versions of ghc
and cabal you just installed.</p>
<h1 id="grab-some-dev-tools">Grab Some Dev Tools</h1>
<p>In particular, if you’re doing development with these versions, you’ll want to
build any tools that require executables using halcyon – here are a few
examples to get you started:</p>
<h3 id="ghc-mod-httpshackagehaskellorgpackageghc-mod"><strong>ghc-mod</strong> (<a href="https://hackage.haskell.org/package/ghc-mod">https://hackage.haskell.org/package/ghc-mod</a>)</h3>
<p>Provides vim/emacs integration for type checking, linting, and showing compiler
errors. Pairs really well with <strong>ghcmod-vim</strong>
(<a href="https://github.com/eagletmt/ghcmod-vim">https://github.com/eagletmt/ghcmod-vim</a>) and <strong>syntastic</strong>
(<a href="https://github.com/scrooloose/syntastic">https://github.com/scrooloose/syntastic</a>) for vim users.</p>
<h3 id="hasktags-httpshackagehaskellorgpackagehasktags"><strong>hasktags</strong> (<a href="https://hackage.haskell.org/package/hasktags">https://hackage.haskell.org/package/hasktags</a>)</h3>
<p>A ctags alternative for Haskell projects. Use this to generate a .tags file for
your project, and you can easily jump to function/type definitions using
Ctrl-].</p>
<h3 id="codex-httpshackagehaskellorgpackagecodex"><strong>codex</strong> (<a href="https://hackage.haskell.org/package/codex">https://hackage.haskell.org/package/codex</a>)</h3>
<p>Uses hasktags to build your entire tags database with a single command</p>
<ul>
<li><code class="highlighter-rouge">codex update</code>. What’s more, it also includes the tags of all of your
project’s dependencies – so if, for example, you run <code class="highlighter-rouge">cabal install</code>
inside of a sandbox, you can easily jump into the source code of
other libraries and see function definitions, types, etc.</li>
</ul>
<h3 id="hscope-httpshackagehaskellorgpackagehscope"><strong>hscope</strong> (<a href="https://hackage.haskell.org/package/hscope">https://hackage.haskell.org/package/hscope</a>)</h3>
<p>A cscope alternative for Haskell, which is kind of a reverse-direction ctags.
After generating a database, you can press Ctrl-\ on a function definition to
instantly find all of the places in your project that call that function.
Paired with hasktags, jumping through a new codebase is a breeze.</p>
<h3 id="hlint-httpshackagehaskellorgpackagehlint"><strong>hlint</strong> (<a href="https://hackage.haskell.org/package/hlint">https://hackage.haskell.org/package/hlint</a>)</h3>
<p>Quickly parses a file and provides suggestions for style improvement. Can also
be called using <strong>syntastic</strong> in vim (using :SyntasticCheck hlint).</p>
<h3 id="hoogle-httpshackagehaskellorgpackagehoogle"><strong>hoogle</strong> (<a href="https://hackage.haskell.org/package/hoogle">https://hackage.haskell.org/package/hoogle</a>)</h3>
<p>A Haskell-specific search engine, lets you quickly look up function type
signatures and definitions. Also has a neat feature that lets you search for
functions by type signature – for example, searching for
<code class="highlighter-rouge">(a->b)->[a]->[b]</code> brings up the map function and a few examples of how to use
it. Super handy!</p>
<p>If you install any or all of these using <code class="highlighter-rouge">halcyon install</code>, their binaries will
be placed in /app/bin/, and will be on your path any time you’re using
halcyon’s environment.</p>
<h1 id="get-text-editor-integration">Get Text Editor Integration</h1>
<p>After trying several IDEs and plugins, I found that using vim with a few choice
plugins netted me all of the features I really needed, and required the least
troubleshooting.</p>
<h2 id="the-necessities">The Necessities</h2>
<p>If you’re just looking to get started as soon as possible, look no further
than…</p>
<h3 id="haskell-vim-now-httpsgithubcombegriffshaskell-vim-now"><strong>haskell-vim-now</strong> (<a href="https://github.com/begriffs/haskell-vim-now">https://github.com/begriffs/haskell-vim-now</a>)</h3>
<p>It has an automated setup, and provides a ton of great Haskell-specific tools
right out of the box. (See the readme for the default key bindings.)</p>
<p>However, if you’re like me and already have a ridiculously long vimrc built up,
take a look following repos for some useful plugins:</p>
<h3 id="syntastic-httpsgithubcomscrooloosesyntastic"><strong>syntastic</strong> (<a href="https://github.com/scrooloose/syntastic">https://github.com/scrooloose/syntastic</a>)</h3>
<p>Easily the plugin I use the most, and derive the most value from. If you have
the ghc-mod binary on your path (which would be the case if you ran the eval
command from earlier), then you can easily check your current file. With a few
mappings like</p>
<figure class="highlight"><pre><code class="language-vim" data-lang="vim">map <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>leader<span class="p">></span>hc <span class="p">:</span>SyntasticCheck<span class="p"><</span>CR<span class="p">></span>
map <span class="p"><</span><span class="k">silent</span><span class="p">></span> <span class="p"><</span>leader<span class="p">></span>hl <span class="p">:</span>SyntasticCheck hlint<span class="p"><</span>CR<span class="p">></span></code></pre></figure>
<p>, you can quickly get line markers and status messages for any errors. This
also works within projects quite well, and will only recompile files that
have changed since the last build (note: this can still be a bit slow for large
projects).</p>
<h3 id="ghcmod-vim-httpsgithubcomeagletmtghcmod-vim"><strong>ghcmod-vim</strong> (<a href="https://github.com/eagletmt/ghcmod-vim">https://github.com/eagletmt/ghcmod-vim</a>)</h3>
<p>Can also perform type checking and linting, but one of its most useful features
is the ability to display the type of an expression under the cursor by calling
<code class="highlighter-rouge">:GhcModType</code>, or to pull up info about its definition with <code class="highlighter-rouge">:GhcModInfo</code>.</p>
<h3 id="haskellmode-vim-httpsgithubcomlukerandallhaskellmode-vim"><strong>haskellmode-vim</strong> (<a href="https://github.com/lukerandall/haskellmode-vim">https://github.com/lukerandall/haskellmode-vim</a>)</h3>
<p>Mainly fixes up some extra syntax highlighting and indentation. Also offers a
few neat features like looking up the word under the cursor on hoogle, and
automatically adding qualified imports.</p>
<h3 id="vim-haskellconcealplus"><strong>vim-haskellConcealPlus</strong></h3>
<p>(<a href="https://github.com/enomsg/vim-haskellConcealPlus">https://github.com/enomsg/vim-haskellConcealPlus</a>)
This one’s purely aesthetic, and a great way to test exactly how well your
terminal emulator support UTF-8! Uses vim conceals to show some Haskell
operators and keywords as symbols (that is, it displays them as special symbols
unless they are on your current line). This is particularly useful when you’re
browing large swaths of code, and want to make things a bit more readable.
Note: this does need a small hack not to be a complete eyesore. By default, vim
adds a background highlight to every concealed character. You can clear this by
throwing</p>
<figure class="highlight"><pre><code class="language-vim" data-lang="vim"><span class="k">au</span> <span class="nb">FileType</span> haskell autocmd <span class="nb">VimEnter</span> * <span class="k">hi</span> clear Conceal</code></pre></figure>
<p>into your vimrc.</p>
<h2 id="general-vim-goodness">General Vim Goodness</h2>
<p>Here are some other useful plugins (not Haskell-specific) for jumping around large projects include:</p>
<ul>
<li>
<p><strong>CtrlP</strong> (<a href="https://github.com/kien/ctrlp.vim">https://github.com/kien/ctrlp.vim</a>)
Open files with a fuzzy-finder.</p>
</li>
<li>
<p><strong>Ag</strong> (<a href="https://github.com/ervandew/ag">https://github.com/ervandew/ag</a>)
Fast replacement for vimgrep, quickly find specific words from files in your
current working tree.</p>
</li>
<li>
<p><strong>Supertab</strong> (<a href="https://github.com/ervandew/supertab">https://github.com/ervandew/supertab</a>)
Provides tab-completion - suggestion list can be populated from local buffer or
other plugins like neco-ghc.</p>
</li>
<li>
<p><strong>Fugitive</strong> (<a href="https://github.com/tpope/vim-fugitive">https://github.com/tpope/vim-fugitive</a>)
Provides most common git commands within vim.</p>
</li>
</ul>
<h1 id="workflow">Workflow</h1>
<p>Finally, the good part! Everything’s easy from here on out. With everything
installed, you can create a user and chown the /app/ directory. Then, you can
go about your day-to-day business with your package manager’s version of ghc
and cabal (just remember to run <code class="highlighter-rouge">cabal sandbox init</code> before building/installing
to keep things as clean as possible).</p>
<p>Then, whenever you’re working on a dev project, just call the eval statement
mentioned up in the halcyon section, and all of the binaries you installed with
halcyon will be on your path, all linked properly, and you’ll be using the same
versions of ghc and cabal each and every time.</p>
<p>Generally, before I start working on a project, I run a function that looks
something like this:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">git clone http://url/to/your/project ~/project_directory
<span class="nb">cd</span> ~/project_directory
<span class="nb">eval</span> <span class="s2">"</span><span class="k">$(</span> /app/halcyon/halcyon paths <span class="k">)</span><span class="s2">"</span>
cabal sandbox init
cabal install
cabal build</code></pre></figure>
<p>And there you have it! Hopefully this serves as a bit of help to anyone else
getting started with Haskell. Please feel free to leave any questions or
comments below!</p>D. Zack Garzadzackgarza@gmail.comSince the second week of this year’s GSOC is drawing to a close, I figured I’d take a minute to write a bit about my experience diving into Haskell development.Introductory Statistical Analysis in R2014-07-18T00:00:00-07:002014-07-18T00:00:00-07:00/introductory-statistical-analysis-in-r<p>This Summer, while I’ve been interning at Shutterfly, I’ve also been taking a course in introductory statistics. Of course, since I’m majoring in Computer Science, I’m always looking for new ways to tie computer science into whatever I happen to be learning. Fortunately, there is quite a bit of overlap between these fields, and there are many computational tools out there that make it easy to analyze data.</p>
<p>Since my calculator (a TI-92, for those of you that are into calculators!) doesn’t have very many statistical capabilities or build in programs, my tools of choice for this semester have been a combination of the statistics package <script type="math/tex">R</script>, as well as several programs I’ve written for my calculator in BASIC. <script type="math/tex">R</script> is free and open source, and can very likely be found in your package manager of choice. In this article, I hope to cover a few ways to use it to perform some basic analysis on data sets.</p>
<p>Once you have <script type="math/tex">R</script> installed, grab a data set and dive in!</p>
<h1 id="1-dimensional-data-sets">1-Dimensional Data Sets</h1>
<p>Let’s say you simply have a list of values, and you want to know some of its statistical properties - the mean, median, mode, and maybe even the different quartiles the values fall in to. The first order of business is to get your data into a file.</p>
<p>Doing everything from the command line makes things fairly straightforward. If you can copy your data onto the clipboard, you can easily pipe it into a file with xclip. If you don’t already have xclip installed, and you’re on a Debian variant, you can install it using</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sudo apt-get install xclip</code></pre></figure>
<p>Since xclip’s command-line options aren’t especially intuitive, it helps to make quick aliases for the copy and paste commands. You can use whatever you’d like - I went with ‘cpaste’ and ‘ccopy’ - and simply echo them into your .bashrc, .zshrc, or external alias file. Make sure to refresh your .rc file after adding them, though!</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">echo</span> <span class="s2">"alias cpaste='xclip -selection clipboard -o'"</span> >> ~/.bashrc
<span class="nb">echo</span> <span class="s2">"alias ccopy='xclip -selection c'"</span> >> ~/.bashrc
<span class="nb">source</span> ~/.bashrc</code></pre></figure>
<p>Once you have this, you can then pipe your clipboard contents into a file.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">cpaste > ~/data.csv</code></pre></figure>
<p>Great! So now we have a quick way of getting some data into a file. Next, we need to load it as a data frame in <script type="math/tex">R</script>. Lucky for us, there are already plenty of functions in place to accomplish this, so I’ll just demonstrate one. You can launch <script type="math/tex">R</script> by simply typing “R”, as long as it’s on your path.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="o">//</span><span class="w"> </span><span class="n">Read</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">data</span><span class="w">
</span><span class="n">t</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="s2">"~/data.csv"</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">t</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">contents</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">namely</span><span class="p">,</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="n">we</span><span class="w"> </span><span class="n">just</span><span class="w"> </span><span class="n">read</span><span class="w"> </span><span class="n">in.</span><span class="w">
</span><span class="n">tail</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">last</span><span class="w"> </span><span class="n">few</span><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">t</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">how</span><span class="w"> </span><span class="n">many</span><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="n">contains</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Get</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">quick</span><span class="w"> </span><span class="n">overview</span></code></pre></figure>
<p>~~~~ Let’s skip to some more advanced stuff! ~~~</p>
<h1 id="scatterplots-and-linear-regression">Scatterplots and Linear Regression</h1>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="o">$</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="o">$</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Produce</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">scatterplot</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="n">vs.</span><span class="w"> </span><span class="n">y</span><span class="w">
</span><span class="n">cor</span><span class="p">(</span><span class="n">t</span><span class="o">$</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="o">$</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">R</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">linear</span><span class="w"> </span><span class="n">correlations</span><span class="w">
</span><span class="n">fit</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">formula</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t</span><span class="o">$</span><span class="n">x</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">t</span><span class="o">$</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Perform</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">linear</span><span class="w"> </span><span class="n">regression</span><span class="p">,</span><span class="w"> </span><span class="n">and</span><span class="w"> </span><span class="n">store</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">result</span><span class="w">
</span><span class="n">fit</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">slope</span><span class="w"> </span><span class="n">and</span><span class="w"> </span><span class="n">y</span><span class="o">-</span><span class="n">intercept</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">linear</span><span class="w"> </span><span class="n">regression</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">fit</span><span class="p">)</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Display</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">more</span><span class="w"> </span><span class="n">detailed</span><span class="w"> </span><span class="n">analysis</span></code></pre></figure>
<h1 id="hypothesis-testing-with-multiple-populations">Hypothesis Testing with Multiple Populations</h1>
<p>To be continued!</p>D. Zack Garzadzackgarza@gmail.comThis Summer, while I’ve been interning at Shutterfly, I’ve also been taking a course in introductory statistics. Of course, since I’m majoring in Computer Science, I’m always looking for new ways to tie computer science into whatever I happen to be learning. Fortunately, there is quite a bit of overlap between these fields, and there are many computational tools out there that make it easy to analyze data.Big O Notation2014-05-31T00:00:00-07:002014-05-31T00:00:00-07:00/big-o-notation<p>Some quick notes on Big O notation.<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
<h1 id="how-to-determine-running-time-complexity">How to Determine Running-Time Complexity</h1>
<h2 id="nested-loops">Nested Loops</h2>
<p>Consider a general sequence of nested loops in which the outer loop is iterated <script type="math/tex">N</script> times and the inner loop is iterated <script type="math/tex">M</script> times. Generally, these will look something like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">M</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">//statements
</span> <span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Every time the outer loop executes, the inner loop runs <script type="math/tex">M</script> times. Thus, we expect this function to run in <script type="math/tex">O(N \cdot M)</script>, or <script type="math/tex">O(N^2)</script> in the case that <script type="math/tex">N=M</script>.</p>
<p>A similar case arises when the inner loop depends on the outer loop. Consider the function</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">//statements
</span> <span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Since the outer loop ranges over <script type="math/tex">% <![CDATA[
0<i<N-1 %]]></script>, while the inner loop ranges from <script type="math/tex">N>j>1</script>, the total number of operations is given by</p>
<p>\[ \sum_{i=1}^{N} i = \frac{N(N+1)}{2}\]</p>
<p>which is the sum of the first <script type="math/tex">N</script> natural numbers, and thus this function is also in <script type="math/tex">O(n^2)</script>.</p>
<h2 id="function-composition">Function Composition</h2>
<p>Consider a function <script type="math/tex">g(n) \in O(n)</script> for which the complexity depends on the number of inputs <script type="math/tex">n</script>.
If this function is embedded within another function <script type="math/tex">f(m) \in O(m)</script> that also depends on another variable <script type="math/tex">m</script>, then the composition <script type="math/tex">(f \circ g) \in O(n \cdot m)</script>.</p>
<p>Again, in the special case that <script type="math/tex">n=m</script>, then <script type="math/tex">(f \circ g) \in O(n^2)</script>.</p>
<p>For example, take <script type="math/tex">f</script> to be a loop, and <script type="math/tex">g</script> to be a function called within the loop. Then you wind up with something like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"> <span class="kt">void</span> <span class="n">f</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="o">:</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">g</span><span class="p">(</span><span class="n">n</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></figure>
<p>The loop iterates <script type="math/tex">n</script> times, and <script type="math/tex">g</script> is called each time. So we have an overall complexity of <script type="math/tex">O(n \cdot n) \in O(n^2)</script>.</p>
<p>Note: This method may not always produce the tightest upper bound possible. For more specific cases, tighter bounds on running time can be found by considering the total time spent on the inner loop and outer loop separately, and adding them.</p>
<hr />
<p>That’s all for today’s post, although I may end up adding more information later! If you have any questions or found this helpful, feel free to comment below.</p>
<hr />
<h1 id="links-and-references">Links and References</h1>
<p><a href="http://ozark.hendrix.edu/~burch/cs/280/work/r1/print.html">Some great practice problems and explanations: Carl Burch at Hendrix</a></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="http://pages.cs.wisc.edu/~vernon/cs367/notes/3.COMPLEXITY.html#application">Notes on complexity: Mary K. Vernon at U. Wisconsin</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>D. Zack Garzadzackgarza@gmail.comDetermining the running-time complexity of some common types of functions.