Nothing Special   »   [go: up one dir, main page]

Chapter 2, Lecture 3: Building Convex Functions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Math 484: Nonlinear Programming1 Mikhail Lavrov

Chapter 2, Lecture 3: Building convex functions


February 8, 2019 University of Illinois at Urbana-Champaign

1 The plan
Today’s goal is to figure out how we can tell if a function is convex.
So far, we know three ways, each with their own drawbacks:
1. The definition: a function f : C → R is convex if and only if, for all x, y ∈ C and 0 ≤ t ≤ 1,

f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y).

This is a useful property, and often useful in proofs, but for many actual functions, it’s not
clear how to prove this.
2. The tangent line test: for differential functions, f is convex if and only if, for all x, y ∈ C,

f (y) ≥ f (x) + ∇f (x) · (y − x).

This is more useful as a consequence of convexity, rather than as a way to prove that a
function is convex.
3. The second-derivative test: assuming that Hf (x) exists for all x, f is convex if, for all x ∈ C,
Hf (x)  0.
This is often the way to go, but if f is a complicated function, lots of computation is involved.
Today, we take a different approach. The definition and the second-derivative test are easy to use
for functions with simple definitions. To deal with more complicated functions, we look at how
they are built out of simpler “building blocks”. We will prove several results about how we can
manipulate convex functions to get more complicated convex functions.

1.1 Strictly convex functions


But first, an aside for another definition.
Given a set C ⊆ Rn (convex, as always), a function f : C → R is called strictly convex when, for
all x, y ∈ C with x 6= y and 0 < t < 1,

f (tx + (1 − t)y) < tf (x) + (1 − t)f (y).

These have a slightly sharper version of most properties of convex functions. For example, if a
function is strictly convex, then any local minimizer (and any critical point) is not just a global
minimizer, but a strict global minimizer. So having strict convexity is often nice.
1
This document comes from the Math 484 course webpage: https://faculty.math.illinois.edu/~mlavrov/
courses/484-spring-2019.html

1
We’ll need to watch out for it today so that we know what operations preserve strict convexity, not
just convexity.
Warning: the second-derivative tests can show that a function is strictly convex. If Hf (x)  0
for all x ∈ C, then f is strictly convex. But the implication doesn’t go both ways. For example,
f (x) = x4 has f 00 (0) = 0, but is still strictly convex.

2 Building convex functions


2.1 The less-scary ways to combine functions
The simplest and most important operations that preserve convexity are addition and multiplication
by a positive scalar.
Theorem 2.1 (Theorem 2.3.10 in the textbook). Suppose that f1 , f2 , . . . , fk are convex functions
C → R and α1 , α2 , . . . , αk are positive2 scalars. Then
k
X
f (x) = αi fi (x) = α1 f1 (x) + α2 f2 (x) + · · · + αk fk (x)
i=1

is convex. Moreover, if at least one fi is strictly convex, then f is strictly convex.

Proof. It’s enough to prove two simple cases of this theorem rather than deal with the arbitrary
sum.
First, if f is (strictly) convex and α > 0, then αf is (strictly) convex. This holds because we can
just multiply both sides of the definition by α:

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y) ⇐⇒ αf (tx + (1 − t)y) ≥ tαf (x) + (1 − t)αf (y).

Second, if f and g are convex, then their sum h defined by h(x) = f (x) + g(x) is convex. This is
also just a matter of adding together the two inequalities:

h(tx + (1 − t)y) = f (tx + (1 − t)y) + g(tx + (1 − t)y) (definition of h)


≤ tf (x) + (1 − t)f (y) + g(tx + (1 − t)y) (f is convex)
≤ tf (x) + (1 − t)f (y) + tg(x) + (1 − t)g(y) (g is convex)
= th(x) + (1 − t)h(y). (definition of h)

There are two inequalities here. If either f or g is strictly convex, then one inequality is strict; so
the whole inequality becomes strict, and the sum h = f + g is strictly convex.
To get the full theorem, we just build up the combination of k functions by induction.

Here is another relatively simple result. It rarely comes up, but when it does, it’s often the only
tool we have.
2
The textbook says “nonnegative”, but if αi = 0 it’s as though we didn’t include fi at all.

2
Theorem 2.2. Suppose that f1 , f2 , . . . , fk are (strictly) convex functions C → R. Then

f (x) = max{f1 (x), f2 (x), . . . , fk (x)}

is (strictly) convex.

Proof. As usual, take x, y ∈ C and t ∈ [0, 1]. Then f (tx+(1−t)y) must be equal to fi (tx+(1−t)y)
for some i, and we have

f (tx + (1 − t)y) = fi (tx + (1 − t)y) (we are at a point where f = fi )


≤ tfi (x) + (1 − t)fi (y) (fi is convex)
≤ tf (x) + (1 − t)f (y)

where the last inequality holds because for any i = 1, 2, . . . , k and any point x ∈ C, f (x) ≥ fi (x)
because f (x) is a maximum of several values including fi (x).
If all the fi are strictly convex and 0 < t < 1, we get to use a strict inequality in this proof and so
f is strictly convex.

It’s also important to mention that multiplying two convex functions does not guarantee convexity:
for example, f (x) = x2 − 1 is convex, but f (x) · f (x) = (x2 − 1)2 is not. Also, the minimum of two
convex functions isn’t convex, even though min looks a lot like max.

2.2 Compositions of functions


The final way of combining functions we’ll cover is composition. We ask: when is it true (it’s
certainly not always true) that the composition g(f (x)) of two convex functions f and g is con-
vex?
Theorem 2.3 (Also Theorem 2.3.10 in the textbook). Suppose f : C → R is convex and g : R → R
is not only convex but increasing: when x1 ≤ x2 , g(x1 ) ≤ g(x2 ). Then h(x) = g(f (x)) is convex.
(If f is strictly convex and g is strictly increasing—when x1 < x2 , g(x1 ) < g(x2 )—then h is strictly
convex as well.)

Proof. The proof is short; the hard part is watching out for this rule in examples. We have (in the
usual setup for a convexity proof):

f (tx) + (1 − t)y) ≤ tf (x) + (1 − t)f (y), (f is convex)


g(f (tx) + (1 − t)y)) ≤ g(tf (x) + (1 − t)f (y)) (g is increasing)
≤ tg(f (x)) + (1 − t)g(f (y)). (g is convex)

If f is strictly convex, then the first inequality is strict (it’s <). If g is strictly increasing, then that
strict inequality is preserved, so h is strictly convex as well.

Another useful case, which is in the textbook as a post-chapter exercise:


Theorem 2.4. If f : Rm → Rn has the form f (x) = Ax + b for a matrix A and a vector b, and
g : C → R is convex, so is h(x) = g(f (x)) as a function f −1 (C) → R.

3
Proof. Such a function f has the useful property that it’s convex, and the definition of convex is
always an equality: for all x, y ∈ Rm and t ∈ [0, 1] (actually, any t), we have

f (tx + (1 − t)y) = A(tx + (1 − t)y) + b


= t(Ax + b) + (1 − t)(Ay + b)
= tf (x) + (1 − t)f (y).

So we have

h(tx + (1 − t)y) = g(f (tx + (1 − t)y)) (definition of h)


= g(tf (x) + (1 − t)f (y)) (what we proved above)
≤ tg(f (x)) + (1 − t)g(f (y)) (g is convex)
≤ th(x) + (1 − t)h(y). (definition of h)

Noteworthy special case: if g : R → R is convex, so is h(x) = g(ax + b). Also, by choosing the
matrix A appropriately, we know that h(x) = g(xi ) is convex as a function Rn → R.

3 Examples
3.1 Example 2.3.11.c in the textbook
To check that f (x1 , x2 ) = x21 − 4x1 x2 + 5x22 − ln x1 x2 is convex for x1 , x2 > 0, write f as a sum
  
  1 −2 x1
f (x1 , x2 ) = x1 x2 + (− ln x1 ) + (− ln x2 ).
−2 5 x2
 
1 −2
The first term is convex because  0.
−2 5
The second term and theird term are convex because g(x) = − ln x is convex on (0, ∞) (and by our
last result, plugging in just x1 or just x2 doesn’t change this).
Finally, the sum of three convex functions is convex.

3.2 A function that will be useful for us soon


x x y y z z
 
The function f (x, y, z) = 2 3 4 is convex on {(x, y, z) : x, y, z > 0}.
x y z
To show this, write it as ex ln 2 +y ln 3 +zln 4 . If ewe can show that the inside function is convex, we
are done, because et is convex and increasing.
For any constant C, g(x) = x ln Cx has first derivative ln Cx + x · 1
x/C = ln Cx + C, and second
derivative Cx . When x > 0, this is guaranteed to be positive, so g is convex. Applying this to the
three parts of the exponent with C = 2, C = 3, and C = 4 concludes the example.

You might also like