EncycloReader

In optimization, a self-concordant function is a function [math]\displaystyle{ f:\mathbb{R} \rightarrow \mathbb{R} }[/math] for which

[math]\displaystyle{ |f'''(x)| \leq 2 f''(x)^{3/2} }[/math]

or, equivalently, a function [math]\displaystyle{ f:\mathbb{R} \rightarrow \mathbb{R} }[/math] that, wherever [math]\displaystyle{ f''(x) \gt 0 }[/math], satisfies

[math]\displaystyle{ \left| \frac{d}{dx} \frac{1}{\sqrt{f''(x)}} \right| \leq 1 }[/math]

and which satisfies [math]\displaystyle{ f'''(x) = 0 }[/math] elsewhere.

More generally, a multivariate function [math]\displaystyle{ f(x) : \mathbb{R}^n \rightarrow \mathbb{R} }[/math] is self-concordant if

[math]\displaystyle{ \left. \frac{d}{d\alpha} \nabla^2 f(x + \alpha y) \right|_{\alpha = 0} \preceq 2 \sqrt{y^T \nabla^2 f(x)\,y} \, \nabla^2 f(x) }[/math]

or, equivalently, if its restriction to any arbitrary line is self-concordant.^[1]

History

As mentioned in the "Bibliography Comments"^[2] of their 1994 book,^[3] self-concordant functions were introduced in 1988 by Yurii Nesterov^[4]^[5] and further developed with Arkadi Nemirovski.^[6] As explained in^[7] their basic observation was that the Newton method is affine invariant, in the sense that if for a function [math]\displaystyle{ f(x) }[/math] we have Newton steps [math]\displaystyle{ x_{k+1} = x_k - [f''(x_k)]^{-1}f'(x_k) }[/math] then for a function [math]\displaystyle{ \phi(y) = f(Ay) }[/math] where [math]\displaystyle{ A }[/math] is a non-degenerate linear transformation, starting from [math]\displaystyle{ y_0 = A^{-1} x_0 }[/math] we have the Newton steps [math]\displaystyle{ y_k = A^{-1} x_k }[/math] which can be shown recursively

[math]\displaystyle{ y_{k+1} = y_k - [\phi''(y_k)]^{-1} \phi'(y_k) = y_k - [A^T f''(A y_k) A]^{-1} A^T f'(A y_k) = A^{-1} x_k - A^{-1}[f''(x_k)]^{-1} f'(x_k) = A^{-1} x_{k+1} }[/math].

However the standard analysis of the Newton method supposes that the Hessian of [math]\displaystyle{ f }[/math] is Lipschitz continuous, that is [math]\displaystyle{ \|f''(x) - f''(y)\| \leq M\| x-y \| }[/math] for some constant [math]\displaystyle{ M }[/math]. If we suppose that [math]\displaystyle{ f }[/math] is 3 times continuously differentiable, then this is equivalent to

[math]\displaystyle{ | \langle f'''(x)[u]v, v \rangle | \leq M \|u\| \|v\|^2 }[/math]for all [math]\displaystyle{ u,v \in \mathbb{R}^n }[/math]

where [math]\displaystyle{ f'''(x)[u] = \lim_{\alpha \to 0} \alpha^{-1} [f''(x + \alpha u) - f''(x)] }[/math] . Then the left hand side of the above inequality is invariant under the affine transformation [math]\displaystyle{ f(x) \to \phi(y) = f(A y), u \to A^{-1} u, v \to A^{-1} v }[/math], however the right hand side is not.

The authors note that the right hand side can be made also invariant if we replace the Euclidean metric by the scalar product defined by the Hessian of [math]\displaystyle{ f }[/math] defined as [math]\displaystyle{ \| w \|_{f''(x)} = \langle f''(x)w, w \rangle^{1/2} }[/math] for [math]\displaystyle{ w \in \mathbb R^n }[/math]. They then arrive at the definition of a self concordant function as

[math]\displaystyle{ | \langle f'''(x)[u]u, u \rangle | \leq M \langle f''(x) u, u \rangle^{3/2} }[/math].

Properties

Linear combination

If [math]\displaystyle{ f_1 }[/math] and [math]\displaystyle{ f_2 }[/math] are self-concordant with constants [math]\displaystyle{ M_1 }[/math] and [math]\displaystyle{ M_2 }[/math] and [math]\displaystyle{ \alpha,\beta\gt 0 }[/math], then [math]\displaystyle{ \alpha f_1 + \beta f_2 }[/math] is self-concordant with constant [math]\displaystyle{ \max(\alpha^{-1/2} M_1, \beta^{-1/2} M_2) }[/math].

Affine transformation

If [math]\displaystyle{ f }[/math] is self-concordant with constant [math]\displaystyle{ M }[/math] and [math]\displaystyle{ Ax + b }[/math] is an affine transformation of [math]\displaystyle{ \mathbb R^n }[/math], then [math]\displaystyle{ \phi(x) = f(Ax+b) }[/math] is also self-concordant with parameter [math]\displaystyle{ M }[/math].

Convex conjugate

If [math]\displaystyle{ f }[/math] is self-concordant, then its convex conjugate [math]\displaystyle{ f^* }[/math] is also self-concordant.^[8]^[9]

Non-singular Hessian

If [math]\displaystyle{ f }[/math] is self-concordant and the domain of [math]\displaystyle{ f }[/math] contains no straight line (infinite in both directions), then [math]\displaystyle{ f'' }[/math] is non-singular.

Conversely, if for some [math]\displaystyle{ x }[/math] in the domain of [math]\displaystyle{ f }[/math] and [math]\displaystyle{ u \in \mathbb R^n, u \neq 0 }[/math] we have [math]\displaystyle{ \langle f''(x) u, u \rangle = 0 }[/math], then [math]\displaystyle{ \langle f''(x + \alpha u) u, u \rangle = 0 }[/math] for all [math]\displaystyle{ \alpha }[/math] for which [math]\displaystyle{ x + \alpha u }[/math] is in the domain of [math]\displaystyle{ f }[/math] and then [math]\displaystyle{ f(x + \alpha u) }[/math] is linear and cannot have a maximum so all of [math]\displaystyle{ x + \alpha u, \alpha \in \mathbb R }[/math] is in the domain of [math]\displaystyle{ f }[/math]. We note also that [math]\displaystyle{ f }[/math] cannot have a minimum inside its domain.

Applications

Among other things, self-concordant functions are useful in the analysis of Newton's method. Self-concordant barrier functions are used to develop the barrier functions used in interior point methods for convex and nonlinear optimization. The usual analysis of the Newton method would not work for barrier functions as their second derivative cannot be Lipschitz continuous, otherwise they would be bounded on any compact subset of [math]\displaystyle{ \mathbb R^n }[/math].

Self-concordant barrier functions

are a class of functions that can be used as barriers in constrained optimization methods
can be minimized using the Newton algorithm with provable convergence properties analogous to the usual case (but these results are somewhat more difficult to derive)
to have both of the above, the usual constant bound on the third derivative of the function (required to get the usual convergence results for the Newton method) is replaced by a bound relative to the Hessian

Minimizing a self-concordant function

A self-concordant function may be minimized with a modified Newton method where we have a bound on the number of steps required for convergence. We suppose here that [math]\displaystyle{ f }[/math] is a standard self-concordant function, that is it is self-concordant with parameter [math]\displaystyle{ M = 2 }[/math].

We define the Newton decrement [math]\displaystyle{ \lambda_f(x) }[/math] of [math]\displaystyle{ f }[/math] at [math]\displaystyle{ x }[/math] as the size of the Newton step [math]\displaystyle{ [f''(x)]^{-1} f'(x) }[/math] in the local norm defined by the Hessian of [math]\displaystyle{ f }[/math] at [math]\displaystyle{ x }[/math]

[math]\displaystyle{ \lambda_f(x) = \langle f''(x) [f''(x)]^{-1} f'(x), [f''(x)]^{-1} f'(x) \rangle^{1/2} = \langle [f''(x)]^{-1} f'(x), f'(x) \rangle^{1/2} }[/math]

Then for [math]\displaystyle{ x }[/math] in the domain of [math]\displaystyle{ f }[/math], if [math]\displaystyle{ \lambda_f(x) \lt 1 }[/math] then it is possible to prove that the Newton iterate

[math]\displaystyle{ x_+ = x - [f''(x)]^{-1}f'(x) }[/math]

will be also in the domain of [math]\displaystyle{ f }[/math]. This is because, based on the self-concordance of [math]\displaystyle{ f }[/math], it is possible to give some finite bounds on the value of [math]\displaystyle{ f(x_+) }[/math]. We further have

[math]\displaystyle{ \lambda_f(x_+) \leq \Bigg( \frac{\lambda_f(x)}{1-\lambda_f(x)} \Bigg)^2 }[/math]

Then if we have

[math]\displaystyle{ \lambda_f(x) \lt \bar\lambda = \frac{3-\sqrt 5}{2} }[/math]

then it is also guaranteed that [math]\displaystyle{ \lambda_f(x_+) \lt \lambda_f(x) }[/math], so that we can continue to use the Newton method until convergence. Note that for [math]\displaystyle{ \lambda_f(x_+) \lt \beta }[/math] for some [math]\displaystyle{ \beta \in (0, \bar\lambda) }[/math] we have quadratic convergence of [math]\displaystyle{ \lambda_f }[/math] to 0 as [math]\displaystyle{ \lambda_f(x_+) \leq (1-\beta)^{-2} \lambda_f(x)^2 }[/math]. This then gives quadratic convergence of [math]\displaystyle{ f(x_k) }[/math] to [math]\displaystyle{ f(x^*) }[/math] and of [math]\displaystyle{ x }[/math] to [math]\displaystyle{ x^* }[/math], where [math]\displaystyle{ x^* = \arg\min f(x) }[/math], by the following theorem. If [math]\displaystyle{ \lambda_f(x) \lt 1 }[/math] then

[math]\displaystyle{ \omega(\lambda_f(x)) \leq f(x)-f(x^*) \leq \omega_*(\lambda_f(x)) }[/math]

[math]\displaystyle{ \omega'(\lambda_f(x)) \leq \| x-x^* \|_x \leq \omega_*'(\lambda_f(x)) }[/math]

with the following definitions

[math]\displaystyle{ \omega(t) = t - \log(1+t) }[/math]

[math]\displaystyle{ \omega_*(t) = -t-\log(1-t) }[/math]

[math]\displaystyle{ \| u \|_x = \langle f''(x) u, u \rangle^{1/2} }[/math]

If we start the Newton method from some [math]\displaystyle{ x_0 }[/math] with [math]\displaystyle{ \lambda_f(x_0) \geq \bar\lambda }[/math] then we have to start by using a damped Newton method defined by

[math]\displaystyle{ x_{k+1} = x_k - \frac{1}{1+\lambda_f(x_k)}[f''(x_k)]^{-1}f'(x_k) }[/math]

For this it can be shown that [math]\displaystyle{ f(x_{k+1}) \leq f(x_k) - \omega(\lambda_f(x_k)) }[/math] with [math]\displaystyle{ \omega }[/math] as defined previously. Note that [math]\displaystyle{ \omega(t) }[/math] is an increasing function for [math]\displaystyle{ t \gt 0 }[/math] so that [math]\displaystyle{ \omega(t) \geq \omega(\bar\lambda) }[/math] for any [math]\displaystyle{ t \geq \bar\lambda }[/math], so the value of [math]\displaystyle{ f }[/math] is guaranteed to decrease by a certain amount in each iteration, which also proves that [math]\displaystyle{ x_{k+1} }[/math] is in the domain of [math]\displaystyle{ f }[/math].

Examples

Self-concordant functions

Linear and convex quadratic functions are self-concordant with [math]\displaystyle{ M = 0 }[/math] since their third derivative is zero.
Any function [math]\displaystyle{ f(x) = -\log(-g(x))-\log x }[/math] where [math]\displaystyle{ g(x) }[/math] is defined and convex for all [math]\displaystyle{ x \gt 0 }[/math] and verifies [math]\displaystyle{ | g'''(x) | \leq 3g''(x)/x }[/math], is self concordant on its domain which is [math]\displaystyle{ \{ x \mid x \gt 0, g(x) \lt 0 \} }[/math]. Some examples are
- [math]\displaystyle{ g(x) = -x^p }[/math] for [math]\displaystyle{ 0 \lt p \leq 1 }[/math]
- [math]\displaystyle{ g(x) = -\log x }[/math]
- [math]\displaystyle{ g(x) = x^p }[/math] for [math]\displaystyle{ -1 \leq p \leq 0 }[/math]
- [math]\displaystyle{ g(x) = (ax+b)^2 / x }[/math]
- for any function [math]\displaystyle{ g }[/math] satisfying the conditions, the function [math]\displaystyle{ g(x) + a x^2 + bx + c }[/math] with [math]\displaystyle{ a \geq 0 }[/math] also satisfies the conditions

Functions that are not self-concordant

[math]\displaystyle{ f(x) = e^x }[/math]
[math]\displaystyle{ f(x) = \frac{1}{x^p}, x \gt 0, p \gt 0 }[/math]
[math]\displaystyle{ f(x) = |x^p|, p \gt 2 }[/math]

Self-concordant barriers

positive half-line [math]\displaystyle{ \mathbb R_+ }[/math]: [math]\displaystyle{ f(x) = -\log x }[/math] with domain [math]\displaystyle{ x \gt 0 }[/math] is a self-concordant barrier with [math]\displaystyle{ M = 1 }[/math].
positive orthant [math]\displaystyle{ \mathbb R_+^n }[/math]: [math]\displaystyle{ f(x) = -\sum_{i=1}^n \log x_i }[/math] with [math]\displaystyle{ M = n }[/math]
the logarithmic barrier [math]\displaystyle{ f(x) = -\log \phi(x) }[/math] for the quadratic region defined by [math]\displaystyle{ \phi(x) \gt 0 }[/math] where [math]\displaystyle{ \phi(x) = \alpha +\langle a, x \rangle - \frac{1}{2} \langle Ax, x \rangle }[/math] where [math]\displaystyle{ A = A^T \geq 0 }[/math] is a positive semi-definite symmetric matrix self-concordant for [math]\displaystyle{ M = 2 }[/math]
second order cone [math]\displaystyle{ \{ (x,y) \in \mathbb R^{n-1} \times \mathbb R \mid \| x \| \leq y \} }[/math]: [math]\displaystyle{ f(x,y) = -\log(y^2 - x^T x) }[/math]
semi-definite cone [math]\displaystyle{ A = A^T \geq 0 }[/math]: [math]\displaystyle{ f(A) = - \log \det A }[/math]
exponential cone [math]\displaystyle{ \{ (x,y,z) \in \mathbb R^3 \mid ye^{x/y} \leq z, y \gt 0 \} }[/math]: [math]\displaystyle{ f(x,y,z) = -\log (y \log(z/y) - x) - \log z - \log y }[/math]
power cone [math]\displaystyle{ \{ (x_1,x_2,y) \in \mathbb R_+^2 \times \mathbb R \mid |y| \leq x_1^{\alpha} x_2^{1-\alpha} \} }[/math]: [math]\displaystyle{ f(x_1,x_2,y) = -\log(x_1^{2\alpha} x_2^{2(1-\alpha)} - y^2) - \log x_1 - \log x_2 }[/math]

References

↑ Boyd, Stephen P.; Vandenberghe, Lieven (2004) (pdf). Convex Optimization. Cambridge University Press. ISBN 978-0-521-83378-3. https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. Retrieved October 15, 2011.
↑ Nesterov, Yurii; Nemirovskii, Arkadii (January 1994) (in en). Interior-Point Polynomial Algorithms in Convex Programming (Bibliography Comments). Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611970791.bm. ISBN 978-0-89871-319-0. https://epubs.siam.org/doi/pdf/10.1137/1.9781611970791.bm.
↑ Nesterov, I︠U︡. E. (1994). Interior-point polynomial algorithms in convex programming. Nemirovskiĭ, Arkadiĭ Semenovich.. Philadelphia: Society for Industrial and Applied Mathematics. ISBN 0-89871-319-6. OCLC 29310677. https://www.worldcat.org/oclc/29310677.
↑ Yu. E. NESTEROV, Polynomial time methods in linear and quadratic programming, Izvestija AN SSSR, Tekhnitcheskaya kibernetika,No. 3, 1988, pp. 324-326. (In Russian.)
↑ Yu. E. NESTEROV, Polynomial time iterative methods in linear and quadratic programming, Voprosy kibernetiki, Moscow,1988, pp. 102-125. (In Russian.)
↑ Y.E. Nesterov and A.S. Nemirovski, Self–concordant functions and polynomial–time methods in convex programming, Technical report, Central Economic and Mathematical Institute, USSR Academy of Science, Moscow, USSR, 1989.
↑ Nesterov, I︠U︡. E.. Introductory lectures on convex optimization : a basic course. Boston. ISBN 978-1-4419-8853-9. OCLC 883391994. https://www.worldcat.org/oclc/883391994.
↑ Nesterov, Yurii; Nemirovskii, Arkadii (1994). "nterior-Point Polynomial Algorithms in Convex Programming". Studies in Applied and Numerical Mathematics. doi:10.1137/1.9781611970791. https://doi.org/10.1137/1.9781611970791.
↑ Sun, Tianxiao; Tran-Dinh, Quoc (2018). "Generalized Self-Concordant Functions: A Recipe for Newton-Type Methods". Mathematical Programming: Proposition 6. https://arxiv.org/abs/1703.04599.

[1] Boyd, Stephen P.; Vandenberghe, Lieven (2004) (pdf). Convex Optimization. Cambridge University Press. ISBN 978-0-521-83378-3. https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. Retrieved October 15, 2011.

[2] Nesterov, Yurii; Nemirovskii, Arkadii (January 1994) (in en). Interior-Point Polynomial Algorithms in Convex Programming (Bibliography Comments). Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611970791.bm. ISBN 978-0-89871-319-0. https://epubs.siam.org/doi/pdf/10.1137/1.9781611970791.bm.

[3] Nesterov, I︠U︡. E. (1994). Interior-point polynomial algorithms in convex programming. Nemirovskiĭ, Arkadiĭ Semenovich.. Philadelphia: Society for Industrial and Applied Mathematics. ISBN 0-89871-319-6. OCLC 29310677. https://www.worldcat.org/oclc/29310677.

[4] Yu. E. NESTEROV, Polynomial time methods in linear and quadratic programming, Izvestija AN SSSR, Tekhnitcheskaya kibernetika,No. 3, 1988, pp. 324-326. (In Russian.)

[5] Yu. E. NESTEROV, Polynomial time iterative methods in linear and quadratic programming, Voprosy kibernetiki, Moscow,1988, pp. 102-125. (In Russian.)

[6] Y.E. Nesterov and A.S. Nemirovski, Self–concordant functions and polynomial–time methods in convex programming, Technical report, Central Economic and Mathematical Institute, USSR Academy of Science, Moscow, USSR, 1989.

[7] Nesterov, I︠U︡. E.. Introductory lectures on convex optimization : a basic course. Boston. ISBN 978-1-4419-8853-9. OCLC 883391994. https://www.worldcat.org/oclc/883391994.

[8] Nesterov, Yurii; Nemirovskii, Arkadii (1994). "nterior-Point Polynomial Algorithms in Convex Programming". Studies in Applied and Numerical Mathematics. doi:10.1137/1.9781611970791. https://doi.org/10.1137/1.9781611970791.

[9] Sun, Tianxiao; Tran-Dinh, Quoc (2018). "Generalized Self-Concordant Functions: A Recipe for Newton-Type Methods". Mathematical Programming: Proposition 6. https://arxiv.org/abs/1703.04599.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]