# Compressive_Sampling

## Compressive sampling Overview

In our previous discussion, we saw that imposing bandlimited-ness on our class of signals permits point-wise sampling of our signal and then later perfect reconstruction. It turns out that by imposing sparsity we can also obtain perfect reconstruction irrespective of whether or not we have satsified the sampling rate limits imposed by Shannon’s sampling theorem. This has extremely important in practice because many signals are naturally sparse so that collecting samples at high rates only to dump most of them as the signal is compressed is expensive and wasteful.

## What Are Sparse Signals?

Let’s carefully discuss what we mean by sparse in this context. A signal $f$ is sparse if it can be expressed in very few nonzero components ($\mathbf{s}$) with respect to a given basis ($\mathbf{\Psi}$ ). In other words, in matrix- vector language:

$\mathbf{f} = \mathbf{\Psi} \mathbf{s}$

where $|| \mathbf{s} ||_0 \leq N$ where $N$ is the length of the vector and $|| \cdot||_0$ counts the number of nonzero elements in $\mathbf{s}$. Furthermore, we don’t actually collect $N$ samples point-wise as we did in the Shannon sampling case. Rather, we measure $\mathbf{f}$ indirectly as $\mathbf{y}$ with another matrix as in:

$\mathbf{y} = \mathbf{\Phi f} = \mathbf{\Phi} \mathbf{\Psi} \mathbf{s} = \mathbf{\Theta s}$

where $\mathbf{\Theta}$ is an $M \times N$ matrix and $M < N$ is the number of measurements. This setup means we have two problems to solve. First, how to design a stable measurement matrix $\mathbf{\Phi}$ and then, second, how to reconstruct $\mathbf{f}$ from $\mathbf{y}$.

This may look like a standard linear algebra problem but since $\mathbf{\Theta}$ has fewer rows than columns, the solution is necessarily ill-posed. This is where we inject the sparsity concept! Suppose that $f$ is $K$-sparse ( $||f||_0=K$ ), then if we somehow knew which $K$ columns of $\mathbf{\Theta}$ matched the $K$ non-zero entries in $\mathbf{s}$, then $\mathbf{\Theta}$ would be $M \times K$ where we could make $M > K$ and then have a stable inverse.

This bit of reasoning is encapsulated in the following statement for any vector $\mathbf{v}$ sharing the same $K$ non-zero entries as $\mathbf{s}$, we have

which is another way of saying that $\mathbf{\Theta}$ preserves the lengths of $K$-sparse vectors. Of course we don’t know ahead of time which $K$ components to use, but it turns out that this condition is sufficient for a stable inverse of $\mathbf{\Theta}$ if it holds for any $3K$-sparse vector $\mathbf{v}$. This is the Restricted Isometry Property (RIP). Unfortunately, in order to use this sufficient condition, we would have to propose a $\mathbf{\Theta}$ and then check all possible combinations of nonzero entries in the $N$-length vector $\mathbf{v}$. As you may guess, this is prohibitive.

Alternatively, we can approach stability by defining incoherence between the measurement matrix $\mathbf{\Phi}$ and the sparse basis $\mathbf{\Psi}$ as when any of the columns of one cannot be expressed as a small subset of the columns of the other. For example, if we have delta-spikes for $\mathbf{\Phi}$ as the row-truncated identity matrix

and the discrete Fourier transform matrix for $\mathbf{\Psi}$ as

$\mathbf{\Psi} = \begin{bmatrix}\ e^{-j 2\pi k n/N}\ \end{bmatrix}_{N \times N}$

Then we could not write any of the columns of $\mathbf{\Phi}$ using just a few of the columns of $\mathbf{\Psi}$.

It turns out that picking the measuring $M \times N$ matrix randomly according to a Gaussian zero-mean, $1/N$ variance distribution and using the identity matrix as $\mathbf{\Phi}$, that the resulting $\mathbf{\Theta}$ matrix can be shown to satisfy RIP with a high probability. This means that we can recover $N$-length $K$-sparse signals with a high probability from just $M \ge c K \log (N/K)$ samples where $c$ is a small constant. Furthermore, it also turns out that we can use any orthonormal basis for $\mathbf{\Phi}$, not just the identity matrix, and these relations will all still hold.

## Reconstructing Sparse Signals

Now that we have a way, by using random matrices, to satisfy the RIP, we are ready to consider the reconstruction problem. The first impulse is to compute the least-squares solution to this problem as

But a moment’s thought may convince you that since $\mathbf{\Theta}$ is a random matrix, most likely with lots of non-zero entries, it is highly unlikely that $\mathbf{s}^*$ will turn out to be sparse. There is actually a deeper geometric intuition as to why this happens, but let’s first consider another way of solving this so that the $\mathbf{s}^*$ is $K$-sparse. Suppose instead we shuffle through combinations of $K$ nonzero entries in $\mathbf{s}$ until we satisfy the measurements $\mathbf{y}$. Stated mathematically, this means

where

It can be shown that with $M=K+1$ iid Gaussian measurements, this optimization will recover a $K$-sparse signal exactly with high probability. Unfortunately, this is numerically unstable in addition to being an NP-complete problem.

Thus, we need another tractable way to approach this problem. It turns out that when a signal is sparse, it usually means that the nonzero terms are highly asymmetric meaning that if there are $K$ terms, then most likely there is one term that is dominant (i.e. of much larger magnitude) and that dwarfs the other nonzero terms. Geometrically, this means that in $N$-dimensional space, the sparse signal is very close to one (or, maybe just a few) of the axes.

It turns out that one can bypass this combinatorial problem using $L_1$ minimization. To examine this, let’s digress and look at the main difference between $L_2$ and $L_1$ minimization problems.

reference: http://users.ece.gatech.edu/justin/ssp2007

## $L_2$ vs. $L_1$ Optimization

The classic constrained least squares problem is the following:

 min $\mathbf{x} _2^2$

where $x_1 + 2 x_2 = 1$

with corresponding solution illustrated below.

In [1]:

Note that the line is the constraint so that any solution to this problem must be on this line (i.e. satisfy the constraint). The $L_2$ solution is the one that just touches the perimeter of the circle. This is because, in $L_2$, the unit-ball has the shape of a circle and represents all solutions of a fixed $L_2$ length. Thus, the one of smallest length that intersects the line is the one that satisfies the stated minimization problem. Intuitively, this means that we inflate a ball at the origin and stop when it touches the contraint. The point of contact is our $L_2$ minimization solution.

Now, let’s do same problem in $L_1$ norm

 min $\mathbf{x} _1= x_1 + x_2$

where $x_1 + 2 x_2 = 1$

In this case the constant-norm unit-ball contour in the $L_1$ norm is a diamond- shape instead of a circle. Comparing the graph below to the last shows that the solutions found are different. Geometrically, this is because the line tilts over in such a way that the inflating circular $L_2$ ball hits a point of tangency that is different from the $L_1$ ball because the $L_1$ ball creeps out mainly along the principal axes and is less influenced by the tilt of the line. This effect is much more pronounced in higher $N$-dimensional spaces where $L_1$-balls get more spikey.

The fact that the $L_1$ problem is less sensitive to the tilt of the line is crucial since that tilt (i.e. orientation) is random due the choice of random measurement matrices. So, for this problem to be well-posed, we need to not be influenced by the orientation of any particular choice of random matrix and this is what casting this as a $L_1$ minimization provides.

In [2]:

(-1.0, 1.0, -0.60000000000000009, 1.2)


To explore this a bit, let’s consider using the cvxopt package (Python ver 2.6 used here). This can be cast as a linear programming problem as follows:

 min $\mathbf{t} _1 = t_1 + t_2$

subject to:

$-t_1 < x_1 < t_1$

$-t_2 < x_2 < t_2$

$x_1 + 2 x_2 = 1$

$t_1 > 0$

$t_2 > 0$

where the last two constraints are already implied by the first two and are written out just for clarity. This can be implemented and solved in cvxopt as the following:

In [3]:

     pcost       dcost       gap    pres   dres   k/t
0:  0.0000e+00 -0.0000e+00  3e+00  3e+00  1e-16  1e+00
1:  2.3609e-01  2.3386e-01  5e-01  5e-01  1e-16  2e-01
2:  4.9833e-01  4.9734e-01  5e-02  4e-02  5e-15  1e-02
3:  4.9998e-01  4.9997e-01  5e-04  5e-04  2e-15  2e-04
4:  5.0000e-01  5.0000e-01  5e-06  5e-06  6e-16  2e-06
5:  5.0000e-01  5.0000e-01  5e-08  5e-08  9e-16  2e-08
Optimal solution found.
x=0.00
y=0.50


## Example Gaussian Random matrices

Let’s try out our earlier result about random Gaussian matrices and see if we can reconstruct an unknown $\mathbf{s}$ vector using $L_1$ minimization.

In [56]:

     pcost       dcost       gap    pres   dres   k/t
0:  0.0000e+00 -0.0000e+00  1e+02  2e+01  1e-16  1e+00
1:  1.6712e-01  1.6700e-01  1e+01  1e+00  2e-16  7e-02
2:  1.2947e+00  1.2929e+00  4e+00  5e-01  3e-16  3e-02
3:  1.3785e+00  1.3745e+00  2e+00  2e-01  6e-16  8e-03
4:  1.4705e+00  1.4690e+00  5e-01  7e-02  4e-16  2e-03
5:  1.4976e+00  1.4972e+00  2e-01  2e-02  6e-16  7e-04
6:  1.4979e+00  1.4978e+00  6e-02  7e-03  3e-14  2e-04
7:  1.4998e+00  1.4998e+00  6e-03  8e-04  2e-14  2e-05
8:  1.5000e+00  1.5000e+00  6e-05  8e-06  3e-14  3e-07
9:  1.5000e+00  1.5000e+00  6e-07  8e-08  2e-14  3e-09
Optimal solution found.
[[ 0.99999789]
[ 0.49999879]]


That worked out! However, if you play around with this example enough with different random matrices (unset the seed statement above), you will find that it does not always find the correct answer. This is because the guarantees about reconstruction are all stated probabalistically (i.e. “high- probability”). This is another major difference between this and Shannon sampling.

Let’s encapulate the above $L_1$ minimization code so we can use it later.

In [5]:

## Example: Sparse Fourier Transform

As an additional example, let us consider the Fourier transform and see if we can recover the sparse Fourier transform from a small set of measurements. For simplicity, we will assume that the time domain signal is real which automatically means that the Fourier transform is symmetric.

In [141]:

True


In [140]:

[<matplotlib.lines.Line2D at 0x7884910>]


## Uniform Uncertainty Principle

$\Phi$ obeys a UUP for sets of size $K$ if

$$0.8 \frac{M}{N} ||f||_2^2 \leq || \Phi f||_2^2 \leq 1.2 \frac{M}{N} ||f||_2^2$$

Measurements that satisfy this are defined as incoherent. Given that $f$ is $K$-sparse and we measure $y=\Phi f$, then we search for the sparsest vector that explains the $y$ measurements and thus find $f$ as follows:

$min_f \\#\lbrace t: f(t) \ne 0 \rbrace$ where $\Phi f = y$
Note that the hash mark is the size (i.e. cardinality) of the set. This means that we are looking for the fewest individual points for $f$ that satisfy the constraints.    Unfortunately, this is not practically possible, so we must use the $\mathbb{L}_1$ norm as a proxy for sparsity.


Suppose $f$ is $K$-sparse and that $\Phi$ obeys UUP for sets of size $4K$. Then we measure $y=\Phi f$ and then solve

$min_f ||f||_1$ where $\Phi f = y$

to recover $f$ exactly and we can use $M > K \log N$ measurements, where the number of measurements is approximately equal to the number of active components. Let’s consider a concrete example of how this works.

### Example: Sampling Sinusoids

Here, we sample in the time-domain, given that we know the signal is sparse in the frequency domain.

$$\hat{f}(\omega) = \sum_{i=1}^K \alpha_i \delta(\omega_i-\omega)$$

which means that it consists of $K$-sparse nonzero elements. Therefore, the time domain signal is

$$f(t) = \sum_{i=1}^K \alpha_i e^{i \omega_i t}$$

where the $\alpha_i$ and $\omega_i$ are unknown. We want solve for these unknowns by taking $M \gt K \log N$ samples of $f$.

The problem we want to solve is

 $min_g \hat{g} _{L_1}$

subject to

$g(t_m)=f(t_m)$

The trick here is that are minimizing in the frequency-domain while the constraints are in the time-domain. To make things easier, we will restrict our attention to real time-domain signals $f$ and we will only reconstruct the even- indexed time-samples from the signal. This means we need a way of expressing the inverse Fourier Transform as a matrix of equality constraints. The assumption of real-valued time-domain signals implies the following symmetry in the frequency-domain:

$F(k) = F(N-k)^*$

where $F$ is the Fourier transform of $f$ and the asterisk denotes complex conjugation and $k\in \lbrace 0,1,..N-1\rbrace$ and $N$ is the Fourier Transform length. To make things even more tractable we will assume the time-domain signal is even, which means real-valued Fourier transform values.

Suppose that $\mathbf{U}_N$ is the $N$-point DFT-matrix. Note that we always assume $N$ is even. Since we are dealing with only real-valued signals, the transform is symmetric, so we only need half of the spectrum computed. It turns out that the even-indexed time-domain samples can be constructed as follows:

$\mathbf{f_{even}} = \mathbf{U}_{N/2} \begin{bmatrix}\ F(0)+F(N/2)^* \ F(1)+F(N/2-1)^* \ F(2)+F(N/2-2)^* \ \dots \ F(N/2-1)+F(1)^* \end{bmatrix}$

We can further simplify this by breaking this into real (superscript $R$) and imaginary (superscript $I$) parts and keeping only the real part

But we are going to force all the $F(k)^I$ to be zero in our example. Note that the second term should have a $\mathbf{U}_{N/2}$ in it instead $\mathbf{U}_N$ but there is something wrong with the javascript parser for that bit of TeX.

Now, let’s see if we can walk through to step-by-step to make sure our optimization can actually work. Note that we don’t need the second term on the right with the $F^I$ terms because by our construction, $F$ is real.

In [358]:

False


In [359]:

False

<matplotlib.text.Text at 0x7205970>


We can use the above cell to create more complicated real signals. You can experiment with the cell below. Just remember to impose the symmetry condition!

In [360]:

False

<matplotlib.text.Text at 0x73d8f10>


Now that we have gone through all that trouble to create the even-samples matrix, we can finally put it into the framework of the $L_1$ minimization problem:

 $min_F \mathbf{F} _{L_1}$

subject to

$\mathbf{U}_{N/2}^R \mathbf{Q}_r \mathbf{F}= \mathbf{f}$

In [361]:

     pcost       dcost       gap    pres   dres   k/t
0:  0.0000e+00 -0.0000e+00  4e+02  2e+01  3e+00  1e+00
1: -1.5648e+01 -1.2218e+01  2e+03  2e+01  3e+00  4e+00
2: -2.3184e+03 -1.7022e+03  1e+06  8e+01  1e+01  6e+02
3: -2.2814e+05 -1.6566e+05  1e+08  8e+01  1e+01  6e+04
4: -2.2818e+07 -1.6568e+07  1e+10  8e+01  1e+01  6e+06
5: -2.2818e+09 -1.6568e+09  1e+12  8e+01  1e+01  6e+08
Certificate of dual infeasibility found.


In [12]:

In [9]:

In [None]:

Tags:

Updated: