# Convex Optimization

# Definitions

# Def. Convex Set

A set S\sub\R^n is convex if for all \bold{x,y}\in S,\lambda\in[0,1], the line connecting \bold{x,y} is also in S.

\lambda \bold x + (1-\lambda)\bold y \in S

# Def. Convex Function

A function f:\R^n\to \R is convex iff the epigraph of f (region above f) is convex. Equivalently, for all \bold{x,y}\in\text{Domain}(f), \lambda\in[0,1],

f(\lambda\bold x + (1-\lambda)\bold y)\leqslant \lambda f(\bold{x}) + (1-\lambda)f(\bold{y})

If -f is convex, then f is concave.

# Remark.

A nice connection shows up of we consider the more general definition of a linear function. f:\R \to\R is linear if it satisfies:

f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)

for constants \alpha, \beta and all x,y in f's domain.

Another definition of a convex function is:

f(\alpha x + \beta y) \leqslant \alpha f(x) + \beta f(y)

So we can see that linear optimization is just a special case of convex optimization.

# Example. Convex Functions

Let the domain be (-\infty, \infty),

f(x) = e^x, f(x) = x^2 are convex
f(x) = \sqrt{x}, f(x) = x^3 are not convex

# Prop. Local minima of convex f is the global minima.

No other minima exist

# Convex Optimization

A convex optimization program looks like the following:

\begin{aligned}
    \min\quad& f_0(\bold x)\\
    \text{subject to }\quad & \left\{\begin{aligned}&
         \left.\begin{aligned}
        f_1(\bold x)&\leqslant 0\\
        f_2(\bold x)&\leqslant 0\\
        \qquad \vdots\\
        f_m(\bold x)&\leqslant 0
    \end{aligned}\right\}\text{Inequality Constraints}\\\\
    &\left.\begin{aligned}
        h_1(\bold x) &= 0\\
        \qquad \vdots\\
        h_p(\bold x) &= 0
    \end{aligned}\right\}\text{Equality Constraints}
    \end{aligned}
    \right.
\end{aligned}

where f_0\dots f_m: \R^m\to \R are convex, h_1\dots h_p: \R^m\to \R are linear.

Let X be the set of solutions / feasible region:

X = \{\bold{x}\in\R^m\mid \bold{x} \text{ satisfies all the constraints} \}

# Prop. Feasible region is convex

# Remark.

Linear functions of the form \bold{a^\top x +b} are both convex and concave. LP is a special case of convex optimization

# Example. Convex Region

In \R^2, consider the constraints f_1, f_2 \leqslant 0 where:

\begin{aligned}
    f_1(\bold x) &= x_1^2 + x_2^2 - 1\\
    f_2(\bold x) &= e^{x_1} - x_2 - 1
\end{aligned}

We can see that the overlapped region is convex.

# Prop. Set of convex functions form a vector space

Let f be convex.

Constant multiple: af is convex
Addition: f + g is convex if g is also convex

# Optimality Status

Infeasible: X=\varnothing, optimal value = \infty
Unbounded: |X| = \infty, optimal value = -\infty
Optimal: We either have a finite solution or the solution is infinite. For example \min e^{-x}, the optimal value is 0, but we need x\to \infty.

# Second Order Cone Programming (SOCP)

A general cone program has the form:

\begin{aligned}
    \min\quad &\bold c^\top \bold x\\
    \text{subject to}\quad & \|A_i\bold x + \bold b_i\|_2\leqslant \bold c_i^\top \bold x +  d_i\qquad 1\leqslant i\leqslant m
\end{aligned}

where \|\cdot\|_2 is the l_2 norm.

A second order cone has the following form:

\{(\bold x,t)\in\R^2\times \R: \|\bold x\|_2 \leqslant t\}

Linear programming is a special case of SOCP.

\begin{matrix}
    \begin{aligned}
        \min \quad & \bold c^\top  \bold x\\
        \text{subject to}\quad &\begin{aligned}
            A\bold x &= \bold b\\
            \bold x &\geqslant 0
        \end{aligned}
    \end{aligned}

    &\iff
    &
    \begin{aligned}
        \min \quad & \bold c^\top  \bold x\\
        \text{subject to}\quad &\begin{aligned}
            \|A\bold x - \bold b\|_2 &\leqslant \bold 0^\top \bold x + 0 = 0\\
            \|0\bold x + \bold 0\|_2 &\leqslant x_i
        \end{aligned}
    \end{aligned}
\end{matrix}

# Semi-Definite Programming

# Def. Semi-Definite Matrix

Matrix A is semi-definite if

a_{ij}\geqslant 0
A^\top = A
\bold x^\top A\bold x\geqslant 0 for all \bold x with non-negative entries

This is equivalent to saying all eigenvalues of A is positive

# Prop. The set of semi-definite matrices is convex

A semi-definite program has the following form:

\begin{aligned}
    \min \quad&\bold c^\top \bold x\\
    \text{subject to}\quad & \sum^n_{i=1}x_iA_i\geqslant B

\end{aligned}

where A_i is a matrix, \bold x is a vector, x_i is the i-th entry of \bold x

# Def. Cone

A set S\sub\R^n is a cone if \forall\bold x\in S, \lambda \geqslant 0\implies\lambda\bold x\in S.

# Lagrangian Dual Method

Now we wish to know the dual form of convex programs. For the general primal problem (\text{P}):

(\text P)\qquad \begin{aligned}
    \min\quad & f_0(\bold x)\\
    \text{subject to}\quad & \left\{\begin{aligned}
        f_i(\bold x)&\leqslant 0 &&\quad i = 1\dots m\\
        h_j(\bold x) &= 0  &&\quad j = 1\dots p
    \end{aligned}\right.
\end{aligned}

We need a variable for each f_i and each h_j.

# Def. Lagrangian

{\cal L}(\bold x, \boldsymbol\lambda, \boldsymbol\mu) = f_0(\bold x) + \sum^m_{i=1}\lambda_if_i(\bold x) + \sum^p_{j=1}\mu_j h_j(\bold x)

To satisfy Thm. Weak Duality , we need {\cal L}(\bold x, \lambda, \mu)\leqslant f_0(\bold x).

If \bold x is a solution to the program, then we know from the original constraints:

\sum^p_{j=0}\mu_jh_j(\bold x) = 0\qquad \sum^m_{i=1}f_i(\bold x)\leqslant 0

So there's no constraint on \mu_j, and we need \lambda_i\geqslant 0 for \sum^m_{i=1}\lambda_if_i(\bold x)\leqslant 0 to hold.

The Lagrangian is also bounded by the optimal value. For all feasible \bold x, {\cal L}(\bold{x},\lambda, \mu)\leqslant f_0(\bold{x}^*)

# Def. Lagrangian Dual

Let g(\boldsymbol\lambda, \boldsymbol\mu) = \displaystyle\inf_{\bold x\in\R^n}{\cal L}(\bold x, \boldsymbol\lambda, \boldsymbol\mu) for all \boldsymbol\lambda\geqslant 0,\boldsymbol \mu unrestricted. The Lagrangian dual is:

\begin{aligned}
    \max && g(\boldsymbol\lambda, \boldsymbol\mu)\\
    \text{subject to}&&\boldsymbol\lambda \geqslant 0
\end{aligned}

Weak duality hold here: g(\boldsymbol\lambda ^*, \boldsymbol\mu^*)\leqslant f_0(\bold x^*)

# Remark.

Lagrangian dual also applies to non-convex problems
The dual problem is always convex

# Strong Duality

Strong duality doesn't hold for general convex problems

# Prop. Constraints Qualification (Slater's Condition)

There exists a feasible solution \bold x_0 such that all non-linear constraints are strictly satisfied.

f_i(x_0)<0 \qquad \text{for all non-linear } f_i

Under this condition we get strong duality.

# Thm. Complementary Slackness

Suppose \bold x^*, \boldsymbol\lambda^*, \boldsymbol\mu^* are optimal solutions. By weak duality:

g(\boldsymbol\lambda^*, \boldsymbol\mu^*)\leqslant \cal L (\bold x^*, \boldsymbol\lambda^*, \boldsymbol\mu^*)\leqslant f_0(\bold x^*)

The Lagrangian \cal L is:

\cal L (\bold x^*, \boldsymbol\lambda^*, \boldsymbol\mu^*) = f_0(\bold x^*) + \sum^m_{i=1}\lambda_i^*f_i(\bold x^*) + \sum^p_{j=1}\mu_j^*h_j(\bold x^*)

If we have Slater's Condition which gives us strong duality, then:

\sum^m_{i=1}\lambda_i^*f_i(\bold x^*) + \sum^p_{j=1}\mu_j^*h_j(\bold x^*) = 0

which implies \sum^m_{i=1}\lambda_i^*f_i(\bold x^*) = 0 since the linear part is automatically 0 when the solution is feasible.

Since all \lambda_i^*\geqslant 0, we must have \lambda_i^*f_i(\bold x^*) = 0 for all i = 1\dots m.

Therefore we have complementary slackness.

# KKT Condition

Assume we have strong duality for a convex problem, and f_0, f_i, h_j are all differentiable.

Then (\bold x^*, \boldsymbol\lambda ^*, \boldsymbol\mu^*) is optimal to both primal and dual \iff we have all of the following conditions:

Primal Feasibility

f_i(\bold x^*) \leqslant 0\quad h_j(\bold x^*) = 0

Dual Feasibility
```
\boldsymbol{\lambda}^*\geqslant 0
```

Complementary slackness:

\lambda_if_i(\bold x^*) = 0\qquad \forall i

The stationary condition for the lagrangian:

\nabla f_0(\bold x^*) + \sum^m_{i=1}\lambda_i^*\nabla f_i(\bold x^*) + \sum^p_{j=1} \nabla h_j(\bold x^*) = 0

# Summary

# SDP

The cone is K = \bold S^n_+ = \{X\in \R^{n\times n}: X\succeq 0\} the set of positive semi definite matrices.

The program is:

\begin{aligned}
    \min &&& \bold c^\top \bold x\\
    \text{subject to} &&& \sum^n_{i=1}A_ix_i - B\succeq 0
\end{aligned}

# SOCP

The cone is K = \{(x,t)\in\R^m\times \R:\|x\|_2 \leqslant t\}

The program is:

\begin{aligned}
    \min &&& \bold c^\top \bold x\\
    \text{subject to} &&& \|A_i\bold x + \bold b_i\|_2\leqslant \bold c_i^\top \bold{x} +\bold{d}_i
\end{aligned}

# Types of Conic Programs

# Generalized Conic Constraints

For a proper cone K (i.e. closed, pointed, nonempty interior), we denote a constraint f_i in K as:

f_i(\bold x)\preceq_{K} \bold 0\iff -f_i(\bold x) \in K

Each kind of optimization problems have the following cones:

LP: K = \R_+^m
SOCP: K = \{(x,t)\in\R^m\times \R: \|\bold x\|_2\leqslant t\}
SDP: K = \{X\in\R^{m\times m}: X \text{ is positive semi-definite}\}

# Def. Dual Cone

Let K be a cone. The dual cone K^* is:

K^* = \{\bold y: \bold x^\top \bold y\geqslant 0,\forall \bold x\in K\}

The dual cone for LP, SOCP, and SDP is itself.

# Applications of SOCP & SDP - Robust Linear Programming

Consider the linear program

\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&& \bold a_i^\top \bold x\leqslant \bold b_i
\end{aligned}

where we have a uncertainty set {\cal E}_i for each \bold a_i meaning that \bold a_i could take on any value in {\cal E}_i. We split this problem into deterministic and stochastic case.

Deterministic

Stochastic

\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&& \bold a_i^\top \bold x\leqslant \bold b_i
\end{aligned}

for all possible \bold a_i\in {\cal E}_i

\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&& \Bbb P( \bold a_i^\top \bold x\leqslant \bold b_i) \geqslant \eta
\end{aligned}

where \eta is a probability value and each \bold a_i is a random variable.

# Deterministic

For simplicity suppose each {\cal E}_i is an ellipsoid in \R^n.

{\cal E}_i = \Big\{\overline{\bold a_i} + P_i\bold u : \|\bold u\|_2\leqslant 1, P_i\small\text{ is a linear transformation}\Big\}

where \overline{\bold a_i} is the "center" of {\cal E}_i.

Then the worst case is \sup_{\bold a_i\in{\cal E}_i}\bold a_i^\top \bold x. Therefore the constraint that covers all \bold a_i is:

\sup_{\bold a_i\in{\cal E}_i}\bold a_i^\top \bold x \leqslant \bold b_i

Since \overline{\bold a_i} is fixed,

\begin{aligned}
    \sup_{\bold a_i\in {\cal E}_i} \bold a_i\bold x &= \sup_{\|\bold u\|_2\leqslant 1} (P_i\bold u)^\top  \bold x + \overline{\bold a_i}^\top \bold x\\
    &= \overline{\bold a_i}^\top \bold x + \sup_{\|\bold u\|_2\leqslant 1} (P_i\bold x)^\top  \bold u
\end{aligned}

By Cauchy-Schwartz inequality we have \bold x^\top \bold y\leqslant \|\bold x\|_2\cdot \|\bold y\|_2 for all \bold x, \bold y\in \R^n, therefore:

\sup_{\bold a_i\in {\cal E}_i} \bold a_i\bold x   = \overline{\bold a_i}^\top \bold x + (P_i\bold x)^\top

The final program is:

\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&&  \overline{\bold a_i}^\top \bold x + (P_i\bold x)^\top \leqslant \bold b_i
\end{aligned}

which is an SOCP problem.

# Stochastic

For simplicity assume each \bold a_i is a normal random variable distributed as \cal N(\overline{\bold a}_i, \Sigma_i). Note that \overline{\bold a}_i is a vector and \Sigma_i is the covariance matrix.

Since \bold{a}_i^\top \bold x is a linear combination, the combination is still a normal r.v.

\bold a_i^\top \bold x\sim {\cal N}(\mu=\overline{\bold a_i}^\top \bold x, \sigma^2=\bold x^\top \Sigma_i\bold x)

Then we can convert it to the standard normal distribution (1 dimensional):

\frac{\bold a_i^\top \bold x - \overline{\bold a}_i^\top \bold x}{\sqrt{\bold x^\top \Sigma_i\bold x}}\sim{\cal N}(\mu=0,\sigma^2=1)\\
\Bbb P(\bold a_i^\top \bold x\leqslant \bold b_i) = \Phi\left(\frac{\bold b_i - \overline{\bold a}_i^\top \bold x}{\sqrt{\bold x^\top \Sigma_i\bold x}}\right)

where \Phi is the c.d.f of standard normal distribution.

Plugging in the probability expression gives us:

\begin{aligned}
\iff\qquad&\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&& \frac{\bold b_i - \overline{\bold a}_i^\top \bold x}{\sqrt{\bold x^\top \Sigma_i\bold x}}\geqslant \Phi^{-1}(\eta)
\end{aligned}\\\\

\iff\qquad&\begin{aligned}
    \min &&& \bold c^\top  \bold x\\
    \text{subject to}&&& \bold b_i - \overline{\bold a}_i^\top \bold x\geqslant\Phi^{-1}(\eta)\left\|\Sigma^{\frac12}\right\|_2
\end{aligned}
\end{aligned}

which is also an SOCP problem.

# Combinatorial Optimization (Max Cut)

Given an undirected graph G = (V,E) and weights w_{ij} = w_{ji} \geqslant 0. We want to find a subset S\sube V that maximizes the total weight of the edges crossing the cut.

An edge ij\in E crosses the cut when one of the vertices of ij is in S and the other is in V\backslash S

For each node i, define

x_i = \begin{cases}
    1 & i\in S\\
    -1 & i\notin S
\end{cases}

The objective and constraints are:

\begin{aligned}
    \max &&&\frac 12\sum_i\sum_j w_{ij}\cdot \underbrace{(1-x_ix_j)}_{\substack{\text{either 0 or 2}\\\text{hence mult by 1/2}}}\\
    \text{subject to} &&& x_j\in\{-1,1\}\\
\end{aligned}

This program is an NP-hard problem, so we can use SDP relaxation to approximate a solution.

Define the matrix Y = \bold x\bold x^\top , Y_{ij} = x_ix_j. The program becomes:

\begin{aligned}
    \max &&&\frac 12\sum_i\sum_j w_{ij} - \frac 12 \text{Trace}(W, Y)\\
    \text{subject to} &&& x_j\in\{-1,1\}
\end{aligned}