Directional Derivatives and the Gradient

Section 11.7 Directional Derivatives and the Gradient

Motivating Questions

The partial derivatives of a function \(f\) tell us the rate of change of \(f\) in the direction of the coordinate axes. How can we measure the rate of change of \(f\) in other directions?
🔗

🔗
What is the gradient of a multivariable function and what does it tell us about the rate of change in different directions?
🔗

🔗

Subsection 11.7.1 Introduction

The partial derivatives of a multivariable function tell us the instantaneous rate at which the function’s output changes as we hold all but one independent variable constant (allowing the remaining independent variable to change). It is natural to wonder how we can measure the rate at which a function changes in a direction other than parallel to a coordinate axes. In this section, we investigate this question and will connect the rates of change in other directions to the rates of change given by the partial derivatives. In our preview activity, we will investigate these concepts in terms of a contour map representing elevation in terms of location.

🔗

The preview activity uses cardinal directions to specify the direction of displacement vectors. These directions can be described by a compass rose. The compass rose given in Figure 11.7.1 is an example of a sixteen point rose. Note that directions like ESE are read as “east-southeast” half way between east and southeast.

🔗

described in detail following the image — Figure 11.7.1. An example of a sixteen point compass rose (from By Brosen~commonswiki - Own work, CC BY 2.5)
🔗

Preview Activity 11.7.1.

Figure 11.7.2 is a contour plot showing the elevations for a region of a nearby park. We will be referring to \(h(x,y)\) as the function of two variables that gives the elevation as a function of x-coordinates (location in East-West, horizontal direction) and y-coordinates (location North-South, vertical direction).

🔗

Figure 11.7.2. A contour plot representing the elevation of locations in a nearby park with three locations labeled
🔗

(a)

Using Figure 11.7.2 and treating the elevation as a multivariable function \(h(x,y)\text{,}\) state whether each of the following is positive, negative, or zero. Write a sentence to justify your reasoning.

\(\displaystyle h_x(A)\)

🔗
\(\displaystyle h_y(A)\)

🔗
\(\displaystyle h_x(B)\)

🔗
\(\displaystyle h_y(B)\)

🔗
\(\displaystyle h_x(C)\)

🔗
\(\displaystyle h_y(C)\)

🔗

🔗

(b)

Suppose we are at point B. Would you expect the elevation to be increasing, decreasing, or constant if you took a step in the Northeast direction?

🔗

(c)

Suppose we are at point B. Would you expect the elevation to be increasing, decreasing, or constant if you took a step in the Southeast direction?

🔗

(d)

Suppose we are at point A. Would you expect the elevation to be increasing, decreasing, or constant if you took a step in the Southwest direction?

🔗

(e)

Suppose we are at point C. In what direction would you take a step to move in the steepest downhill direction?

🔗

(f)

Suppose we are at point A. In what direction would you take a step to move in the steepest uphill direction?

🔗

(g)

Suppose we are at point B. Rank the following directions in order of steepness (from most steep and uphil to level to most steep downhill): Northeast, North, East, West, South

🔗

As you saw in the preview activity, visual information like a contour plot or a surface plot will be helpful in determining information about how quickly the output of a function is changing as your inputs change in different directions. In the next subsections, we will focus on how to calculate this kind of directional derivative using either algebraic or numerical representations of our function, then we will relate this calculation to other ways we have described change for a function.

🔗

Subsection 11.7.2 Directional Derivatives

The first goal of this section is to measure the rate of change for a function of two variables in a direction that is not parallel to one of the coordinate directions. We will use the classic calculus approach to set up this measurement: approximate our measurement, quantify how the approximation changes on a smaller scale, then use a limit to find the actual value. In order to measure the rate of change in the output of our function, we will use a typical difference quotient

\begin{equation*} \text{rate of change} = \frac{\text{change in output}}{\text{change in input}} \end{equation*}

🔗

Let \(f\) be a function of two variables, \(x\) and \(y\text{,}\) where we wish to measure the rate of change in the \(\langle a,b\rangle\) direction for \(f\) at the base point \((x_0,y_0)\text{.}\) So we can set up the difference quotient using the input points \((x_0+a,y_0+b)\) and \((x_0,y_0\)).

\begin{equation*} \frac{f(x_0+a,y_0+b)-f(x_0,y_0)}{\sqrt{a^2+b^2}} \end{equation*}

Note here that we use the distance formula to measure the change in the input as distance on the \(xy\)-plane. Figure 11.7.3 shows the plot of a surface given by \(z=f(x,y)\) with the points used in the difference quotient labeled.

🔗

Figure 11.7.3. A plot of \(f(x,y)\) with the measurements of change in input and output labeled for a change of inputs given by \(\langle a,b\rangle\)

🔗

The difference quotient given above is a good start to approximating the rate of change for the output of \(f\) in the direction \(\langle a,b\rangle\) (the first step in our classic calculus approach), but it is not clear how this approximation will change when we look at smaller scales. Specifically, we will need to look at what happens to this approximation as we shrink the step size, \(\sqrt{a^2+b^2}\to 0\) while maintaining the same direction of change. In order to maintain our direction and look at smaller step sizes, we will need to separate the length of the vector \(\langle a,b \rangle \) from its direction using the unit vector in the direction of \(\langle a,b \rangle\text{.}\)

🔗

Let \(\vu=\langle u_1,u_2\rangle \) be the unit vector in the same direction as \(\langle a,b \rangle\text{.}\) We can express the vectors in the same direction as \(\langle a,b \rangle\) using the form \(t \langle u_1,u_2\rangle\text{,}\) where \(t\) is the step size and \(\vu=\langle u_1,u_2\rangle \) is our unit vector. Taking a step of length \(t\) in the direction \(\langle u_1,u_2 \rangle\text{,}\) our difference quotient becomes

\begin{equation*} \frac{f(x_0+t (u_1),y_0+t(u_2))-f(x_0,y_0)}{t} \end{equation*}

We will need to look how this difference quotient changes at smaller scales (as our step size \(t\) gets smaller). This accomplishes the second step of our classic calculus approach in quantifying how our approximation will work on smaller scales. We can now take the limit as \(t \to 0\) to get our definition for the directional derivative.

🔗

Definition 11.7.4.

Let \(f(x,y)\) be a function of two variables. The derivative of \(f\) at the point \((x,y)\) in the direction of the unit vector \(\vu = \langle u_1, u_2 \rangle\) is denoted \(D_{\vu}f(x,y)\) and is given by

\begin{equation} D_{\vu}f(x,y) = \lim_{t \to 0} \frac{f(x+u_1 t, y+u_2 t) - f(x,y)}{t}\tag{11.7.1} \end{equation}

for those values of \(x\) and \(y\) for which the limit exists.

🔗

The notation \(D_{\vu}f(x,y)\) can also be read as “the directional derivative of \(f\) in the direction \(\vu\) at the point \((x,y)\text{.}\)” While this may seem like a handful, each piece of this statement is necessary for a directional derivative; We need to specify what function we are looking at, the location at which we are looking for the rate of change, AND the direction in which we will change the inputs. When we evaluate the directional derivative \(D_{\vu} f(x,y)\) at a point \((x_0, y_0)\text{,}\) the result \(D_{\vu} f(x_0,y_0)\) tells us the instantaneous rate at which \(f\) changes at \((x_0, y_0)\) per unit increase in the direction of the vector \(\vu\text{.}\)

🔗

We will take a moment to connect Equation (11.7.1) to our work on traces earlier in this chapter. When initially exploring surface plots, we simplified our approach by looking at slices that were parallel to one of the coordinate axes (by holding the other input variable constant). This allowed us to use our single variable calculus knowledge to understand a bit more about the surface as well as motivating the definitions for partial derivatives. Similarly, if we cut our surface in the \(\langle u_1,u_2\rangle \) direction through our base point, then we can think about the directional derivative as a single variable calculus problem.

🔗

Figure 11.7.5. A plot of \(f(x,y)\) with a trace shown from the point \((x_0,y_0)\) in the direction \(\langle u_1,u_2 \rangle\)

🔗

In particular, we can parameterize the trace along the surface by noticing our inputs correspond to the line in the xy-plane given by \(\langle x_0+ u_1 t, y_0 + u_2 t\rangle\) (shown in red on Figure 11.7.5). Note that when \(t=0\) the location will be \((x_0,y_0)\text{.}\) Our trace along the surface will be given by \(\vr(t)=\langle x_0+ u_1 t, y_0 + u_2 t , f(x_0+ u_1 t, y_0 + u_2 t) \rangle\) (shown in blue on Figure 11.7.5). We can plot the trace of our surface in the direction of \(\langle u_1,u_2 \rangle\) (as a function of \(t\)) as a 2D plot like Figure 11.7.6. We can visualize the directional derivative as the slope of a tangent line at our base point as drawn in black on Figure 11.7.6. The scalar \(D_{\vu} f(x_0,y_0)\) tells us the slope of the line tangent to the surface in the direction of \(\vu\) at the point \((x_0,y_0,f(x_0,y_0))\text{.}\)

🔗

Figure 11.7.6. A plot of the trace of \(f\) along the direction \(\langle u_1,u_2\rangle\) with tangent line drawn at point corresponding to \((x_0,y_0)\)
🔗

🔗

Subsection 11.7.3 Efficiently Computing the Directional Derivative

In a similar way to how we developed algebraic rules for calculating standard derivatives in single variable calculus and for partial derivatives in multivariable calculus, we want to find a way to evaluate directional derivatives without resorting to evaluating the limit definition (Equation (11.7.1)). We will look at this algebraic shortcuts from two perspectives; First, we will look at Equation (11.7.1) as a composition of functions and use the chain rule from Section 11.6 to define a simple algebraic rule which will allow us to efficiently calculate directional derivatives. Second, we will look at how directional derivatives can be computed using the linearization (when we have a locally linear/differentiable function).

🔗

We are interested in the instantaneous rate of change of \(f\) at a point \((x_0,y_0)\) in the direction \(\vu = \langle u_1, u_2 \rangle\text{,}\) where \(\vu\) is a unit vector. In particular, the input variables \(x\) and \(y\) are therefore changing according to

\begin{equation*} x = x_0 + u_1t \quad \text{ and } \quad y = y_0 + u_2t \end{equation*}

Therefore, our trace along the surface in the direction \(\langle u_1,u_2\rangle\) will be given by \(\vr(t)=\langle x_0+ u_1 t, y_0 + u_2 t , f(x_0+ u_1 t, y_0 + u_2 t) \rangle\text{.}\)

🔗

Observe that \(\frac{dx}{dt} = u_1\) and \(\frac{dy}{dt} = u_2\) for all values of \(t\text{.}\) Since \(\vu\) is a unit vector in the xy-plane, a single unit change in the parameter \(t\) corresponds to moving a single unit of distance in the \(\vu\) direction. This observation allows us to use the multivariable chain rule to calculate the directional derivative as a measure of the instantaneous rate of change of \(f\) with respect to change in the direction \(\vu\text{.}\)

🔗

We can express the output of our function along the given direction as a composition of functions, \(f(t)=f(\vr(t))=f(x(t), y(t))\text{,}\) which means we can apply the chain rule to get

\begin{align*} D_{\vu}f(x_0,y_0) \amp = \frac{d}{dt}\left[ f(\vr(t)) \right] \\ \amp = f_x(x_0,y_0)\frac{dx}{dt} + f_y(x_0,y_0)\frac{dy}{dt} \\ \amp = f_x(x_0,y_0) u_1 + f_y(x_0,y_0) u_2 \end{align*}

🔗

This allows us to compute the directional derivative at an arbitrary point according to the following formula.

🔗

Theorem 11.7.7. Calculating Directional Derivatives.

Given a differentiable function \(f = f(x,y)\) and a unit vector \(\vu = \langle u_1, u_2 \rangle\text{,}\) \(D_{\vu}f(x,y)\) can be computed by

\begin{equation} D_{\vu}f(x,y) = f_x(x,y) u_1 + f_y(x,y) u_2\tag{11.7.2} \end{equation}

🔗

To use Equation (11.7.2), we must have a unit vector \(\vu = \langle u_1, u_2 \rangle\) in the direction of motion. In the event that we have a direction prescribed by a non-unit vector, we must first scale the vector to have length 1.

🔗

Example 11.7.8.

We will look at an example that is algebraically and geometrically simple but allows us to use Theorem 11.7.7 and the concepts stated in this section up to this point. We will look at the rate of change for the function \(f(x,y)=2.5-\frac{(x-1)^2}{2}-\frac{(y+1)^2}{9}\) in several directions around the input point \((2,2)\text{.}\) Note that \(f(2,2)=2.5-\frac{(2-1)^2}{2}-\frac{(2-2)^2}{9}=2.5-0.5-1=1\) so our point will lie on the level curve with output 1 as shown in Figure 11.7.9.

🔗

Figure 11.7.9. A contour plot of \(f(x,y)=2.5-\frac{(x-1)^2}{2}-\frac{(y+1)^2}{9}\) with the point \((2,2)\) highlighted
🔗

We would like to find the directional derivative of \(f\) at the point \((2,2)\) in the direction of \(\langle-2,-1\rangle\text{.}\) Since \(\langle-2,1\rangle\) is not a unit vector, we will use \(\vu=\frac{1}{\sqrt{5}}\langle -2,1\rangle\) as the unit vector in the direction of \(\langle-2,1\rangle\text{.}\) We would like to use (11.7.2), so we first calculate the partial derivatives of \(f\text{.}\)

\begin{equation*} f_x= -\frac{2(x-1)}{2} \quad \text{ and } \quad f_y=-\frac{2(y+1)}{9} \end{equation*}

At our base point, we have \(f_x(2,2)=-1\) and \(f_y(2,2)=-\frac{2}{3}\text{.}\) By Equation (11.7.2), our directional derivative of \(f\) at \((2,2)\) in the direction \(\vu\) will be

\begin{equation*} Df_\vu (2,2)=f_x(2,2) u_1 + f_y(2,2) u_2 = (-1)(-3)+\left(-\frac{2}{3}\right)(1)=\frac{7}{3} \end{equation*}

So for a small step in the direction of \(\vu\) at the input \((2,2)\) we would expect the output of \(f\) to increase by \(\frac{7}{3}\) times the step size.

🔗

This may be a bit difficult to see based on Figure 11.7.9, but if we look at surface plot and parameterize the path of inputs along the direction \(\vu\) we can give a better visualization. In Figure 11.7.10, we see a surface plot of \(z=f(x,y)\) with the trace in the \(\vu\)-direction shown in blue and the direction vector \(\vu\) shown in red (on the \(xy\)-plane). The black line segment shows how \(Df_\vu(2,2)\) is positive because the \(z\)-coordinate of this tangent line will increase (as we step away from \((2,2)\) in the \(\vu\) direction).

🔗

Figure 11.7.10. A plot of \(z=f(x,y)\) with trace in the direction \(\vu\) through (2,2,1) shown including the tangent line

🔗

Remember that the directional derivative is a local measurement and only applies in a small neighborhood of our point and only in the direction \(\vu\text{.}\) In other words, \(Df_\vu(2,2)\) does not describe the the rate of change at other points along the blue curve, only the rate of change along the blue curve for a small step away from our base point in the red direction. With plots like Figure 11.7.9, you will be tempted to use information far away from the point of interest to try to figure out the rate of change, but like most calculus measurements, we will need to only take from plots information on what is happening right around our point of interest.

🔗

If we look at \(Df_\vj(2,2)\text{,}\) the rate of change for the output of \(f\) in the direction \(\vj=\langle 0,1\rangle\) at the input \((2,2)\text{,}\) (11.7.2) gives the following result

\begin{equation*} Df_\vj (2,2)=f_x(2,2) u_1+f_y(2,2) u_2=(-1)(0)+(-\frac{2}{3})(1)=-\frac{2}{3} \end{equation*}

which corresponds to the partial derivative of \(f\) with respect to \(y\text{.}\) The default direction displayed in Figure 11.7.11 shows the direction vector \(\vj\text{,}\) the cooresponding trace, and tangent line to the surface in the \(\vj\) direction.

🔗

Figure 11.7.11. An intereactive plot of \(z=f(x,y)\) with trace in the direction \(\vu\) through (2,2,1) shown including the tangent line

🔗

You can use the slider at the top of Figure 11.7.11 to change the direction in which you want to evaluate the directional derivative. You should look to see how the steepness of the black segment will change in different directions.

🔗

In the following activity, we will explore how to calculate and make sense of directional derivatives for a new function without using surface graphs.

🔗

Activity 11.7.2.

In this activity, we will use Equation (11.7.2) to calculate some directional derivatives and make sense of these results for a few cases. For all parts of this activity, let \(f(x,y) = 3xy-x^2y^3\text{.}\)

🔗

(a)

Calculate \(f_x(x,y)\) and \(f_y(x,y)\text{.}\)

🔗

(b)

Use Equation (11.7.2) to determine \(D_{\vi} f(x,y)\) and \(D_{\vj} f(x,y)\text{.}\) Write a couple of sentences to describe what familiar functions \(D_{\vi} f\) and \(D_{\vj} f\) are. Remember that \(\vi\) is the unit vector in the positive \(x\)-direction and \(\vj\) is the unit vector in the positive \(y\)-direction.

🔗

(c)

Use Equation (11.7.2) to find the derivative of \(f\) in the direction of the vector \(\vv = \langle 2, 3 \rangle\) at the point \((1,-1)\text{.}\) Remember that a unit direction vector is needed.

🔗

(d)

Use Equation (11.7.2) to find the derivative of \(f\) in the direction of the vector \(\vv = \langle 4, 6 \rangle\) at the point \((1,-1)\text{.}\)

🔗

(e)

Use Equation (11.7.2) to find the derivative of \(f\) in the direction of the vector \(\vv = \langle -2, -3 \rangle\) at the point \((1,-1)\text{.}\) Write a couple of sentences to explain why this result is different than your answer to the previous two tasks, even though the direction vectors are parallel.

🔗

We found Equation(11.7.2) by considering our definition for the directional derivative 11.7.4 as a composition of functions and utilizing our chain rule. A very useful idea when we have multivariable functions that are locally linear is that evaluating a limit for how the function is changing near a point will give the same result as asking what the corresponding change would be on the linearization or tangent plane. Remember that on very small scales there is virtually no difference between the (locally linear) function and the linearization as demonstrated by Figure 11.5.2.

🔗

The equation of the plane tangent to the graph of \(f\) at the point \((x_0,y_0,f(x_0,y_0))\) is

\begin{equation} z = f(x_0,y_0) + f_x(x_0,y_0)(x-x_0) + f_y(x_0,y_0)(y-y_0)\tag{11.7.3} \end{equation}

If \(f\) is a locally linear function, then the directional derivative \(D_{\vu}f(x_0,y_0)\) will be the same as the rate of change along the tangent plane in the direction of \(\vu\text{.}\) Remember that the rate of change will be the same regardless of step size on the tangent plane (this is NOT true on the surface \(z=f(x,y)\)). So we will look at the quotient

\begin{gather*} \frac{z(x_0+u_1,y_0+u_2)-z(x_0,y_0)}{\sqrt{u_1^2+u_2^2}}\\ \quad \quad =\frac{\left(f(x_0,y_0) + f_x(x_0,y_0)(x_0+u_1-x_0) + f_y(x_0,y_0)(y_0+u_2-y_0)\right)-f(x_0,y_0)}{1}\\ \quad \quad = f_x(x_0,y_0) (u_1) + f_y(x_0,y_0)(u_2) \end{gather*}

which is exactly the result (11.7.2).

🔗

Subsection 11.7.4 The Gradient

Via the chain rule and linearization, we have seen that for a given function, \(f = f(x,y)\text{,}\) its instantaneous rate of change in the direction of a unit vector \(\vu = \langle u_1, u_2 \rangle\) is given by

\begin{equation} D_{\vu}f(x_0,y_0) = f_x(x_0,y_0)u_1 + f_y(x_0,y_0)u_2\tag{11.7.4} \end{equation}

In particular, we saw how the rate of change along the linearization in a given direction is given by a linear combination of the rates of change in the coordinate directions (partial derivatives) where the weight of each partial derivative comes from the components of our unit vector. You may recognize the form of (11.7.4) as being similar to a dot product.

🔗

Recall the dot product of two vectors \(\vv = \langle v_1, v_2 \rangle\) and \(\vu = \langle u_1, u_2 \rangle\) is computed by

\begin{equation*} \vv \cdot \vu = v_1 u_1 + v_2 u_2 \end{equation*}

Thus we can think about Equation (11.7.4) in a way that will have geometric meaning related to the dot product. In particular, we see that \(D_{\vu}f(x_0,y_0)\) is the dot product of the vector \(\left\langle f_x(x_0,y_0), f_y(x_0,y_0) \right\rangle\) and the vector \(\vu=\langle u_1,u_2\rangle\text{.}\) This is particularly useful because the vector \(\left\langle f_x(x_0,y_0), f_y(x_0,y_0) \right\rangle\) comes from simple calculations that tell us how the function \(f\) is changing near the input \((x_0,y_0)\text{.}\) We give this vector a special name and will spend most of the rest of this section talking about the importance of this vector.

🔗

Definition 11.7.12.

The vector formed by partial derivatives of \(f\) is called the gradient of \(f\) and is denoted

\begin{equation*} \nabla f(x_0,y_0) = \left\langle f_x(x_0,y_0), f_y(x_0,y_0) \right\rangle \end{equation*}

🔗

We read \(\nabla f\) as “the gradient of \(f\text{,}\)” “grad \(f\)” or “del \(f\)”.

The symbol \(\nabla\) is called nabla, which comes from a Greek word for a certain type of harp that has a similar shape.

Notice that \(\nabla f\) varies from point to point, and also provides an alternate formulation of the directional derivative.

🔗

Key Idea 11.7.13. The Directional Derivative and the Gradient.

Given a differentiable function \(f = f(x,y)\) and a unit vector \(\vu = \langle u_1, u_2 \rangle\text{,}\) we may compute \(D_{\vu}f(x,y)\) by

\begin{equation} D_{\vu}f(x,y) = \nabla f(x,y) \cdot \vu.\tag{11.7.5} \end{equation}

🔗

In the following activity, we investigate some of what the gradient tells us about the behavior of a function \(f\text{.}\)

🔗

Activity 11.7.3.

Let’s consider the function \(f\) defined by \(f(x,y) = \frac{1}{3}(x^2-y^2)\text{.}\) Some contours for this function are shown in Figure 11.7.14.

🔗

Figure 11.7.14. A contour plot of \(f(x,y)=\frac{1}{3}(x^2-y^2)\)
🔗

(a)

Find the gradient \(\nabla f (x,y)\text{.}\)

🔗

(b)

For each of the following points \((x_0,y_0)\text{,}\) evaluate the gradient \(\nabla f(x_0,y_0)\) and sketch the gradient vector with its tail at \((x_0,y_0)\text{.}\) Some of the vectors are too long to fit onto the plot, but we’d like to draw them to scale; to do so, scale each vector by a factor of 1/2.

\(\displaystyle (x_0,y_0) = (2,0)\)

🔗
\(\displaystyle (x_0,y_0) = (0,2)\)

🔗
\(\displaystyle (x_0,y_0) = (2,2)\)

🔗
\(\displaystyle (x_0,y_0) = (2,1)\)

🔗
\(\displaystyle (x_0,y_0) = (-3,2)\)

🔗
\(\displaystyle (x_0,y_0) = (-2,-4)\)

🔗
\(\displaystyle (x_0,y_0) = (0,0)\)

🔗

🔗

(c)

Write a few sentences about how the direction of the gradient at each of these points is related to the the contour passing through that point.

🔗

(d)

Does the output of \(f\) increase or decrease in the direction of \(\nabla f(x_0,y_0)\text{?}\) Write a few sentences and use examples from the points above to justify your answer.

🔗

As a vector, \(\nabla f(x_0,y_0)\) defines a direction and a length. As we will see in the next subsection, both of these convey important information about the behavior of \(f\) near \((x_0,y_0)\text{.}\)

🔗

Subsection 11.7.5 Vector Properties of the Gradient

Key Idea 11.7.13 shows how we can separate the directional derivative of a function of two variables into two separate parts: 1) the gradient vector evaluated at the point of interest and 2) the unit vector in the direction we want to change the inputs of the function. This separation will be vital to explaining the the vector properties of the gradient (direction and magnitude).

🔗

Recall from Equation (9.3.1) that the dot product of two vectors depends on the lengths of the vectors and the angle between the vectors. So if \(\theta\) is the angle between \(\nabla f(x_0,y_0)\) and \(\vu\) (where \(\vu\) is a unit vector), then by (11.7.5) and (9.3.1), respectively,

\begin{align*} D_{\vu}f(x_0,y_0) \amp= \nabla f(x_0,y_0)\cdot\vu \\ \amp= \vecmag{\nabla f(x_0,y_0)} \vecmag{\vu} \cos(\theta) \end{align*}

Remember that our vector \(\vu\) is a unit vector (\(\vecmag{\vu}=1\)), so the value of the directional derivative will be the length of the gradient vector times the cosine of the angle between \(\nabla f(x_0,y_0)\) and \(\vu\text{.}\)

\begin{equation} D_{\vu}f(x_0,y_0) =\vecmag{\nabla f(x_0,y_0)} \cos(\theta)\tag{11.7.6} \end{equation}

Equation (11.7.6) will be extremely useful in understanding all of the elements that gradient describes.

🔗

Figure 11.7.15. The sign of \(D_{\vu} f(x_0,y_0)\) is determined by \(\theta\)
🔗

Because the magnitude of a vector is always non-negative, whether a directional derivative will be positive/negative/zero will depend on \(\cos(\theta)\text{.}\) Figure 11.7.15 graphically shows examples of the following statements (from left to right):

If the angle between the gradient and our direction vector is a right angle, then the directional derivative will be zero.

🔗
If the angle between the gradient and our direction vector is acute, then the directional derivative will be positive.

🔗
If the angle between the gradient and our direction vector is obtuse, then the directional derivative will be neagtive.

🔗

In particular, the second statement shows why the gradient will be perpendicular to the level curve through the point of interest. The level curve is the set of points that have a particular value for the output of our function, so the directional derivative in a direction tangent to the level curve must be zero. Remember that the level curve corresponds to input points with the same output value, therefore the output will not change along the level curve and the directional derivative along the level curve must be zero.

🔗

We can expand our explanation to the other statements as well. The output of \(f\) will be increasing in any direction that makes an acute angle with the gradient vector and the output of \(f\) will be decreasing in any direction that makes an obtuse angle with the gradient.

🔗

So by looking at different directions, the directional derivative at our input point \((x_0,y_0)\) will vary with positive, negative, and zero values. It is natural to ask the following questions about how the directional derivative changes at a point:

What direction corresponds to the largest value for the directional derivative?

🔗
What is the largest value possible for the directional derivative at a particular input \((x_0,y_0)\text{?}\)

🔗
What direction corresponds to the smallest (or most negative) value for the directional derivative?

🔗
What is the smallest value possible for the directional derivative at a particular input \((x_0,y_0)\text{?}\)

🔗

All of these questions can be answered with Equation (11.7.6). Because the length of the gradient vector does not change when we look in different directions, the \(\cos(\theta)\) term will tell us when we have the maximum and minimum values. We only need to consider values of \(\theta\) between 0 and \(\pi\) inclusive because \(\theta\) is the (smallest) angle between two vectors. Therefore, we will have the largest value of the directional derivative when \(\cos(\theta)\) has its maximum output of 1 at \(\theta=0\text{.}\) So when the direction vector \(\vu\) is in the exact same direction as the gradient, we have the maximum value of the directional derivative AND the largest value of the directional derivative is the length of the gradient; If \(\vu\) is in the same direction as \(\nabla f(x_0,y_0)\text{,}\) then \(D_{\vu}f(x_0,y_0) = \vecmag{\nabla f(x_0,y_0)} \cos(0) =\vecmag{\nabla f(x_0,y_0)}\text{.}\)

🔗

By a parallel argument, the minimum (most negative) value the directional derivative can take is when \(\theta=\pi\text{,}\) the direction vector is in the opposite direction of the gradient vector. Thus if \(\theta=\pi\text{,}\) then \(D_{\vu}f(x_0,y_0) = \vecmag{\nabla f(x_0,y_0)} \cos(\pi) = -\vecmag{\nabla f(x_0,y_0)}\) because \(\cos(\pi)=-1\text{.}\)

🔗

We summarize our most recent work by stating important facts about the gradient.

🔗

The Meaning of the Gradient as a Vector.

Let \(f\) be a differentiable function and \((x_0,y_0)\) a point for which \(\nabla f(x_0,y_0) \ne \vzero\text{.}\)

🔗

The gradient points in a direction perpendicular to the level curve \(z=k\) where \(k=f(x_0,y_0)\text{.}\)

🔗
The gradient \(\nabla f(x_0,y_0)\) points in the direction of greatest rate of increase for \(f\) at \((x_0,y_0)\text{,}\) and the instantaneous rate of change of \(f\) in that direction is the length of the gradient vector. That is, if \(\vu = \frac{1}{\vecmag{\nabla f(x_0,y_0)}} \nabla f(x_0,y_0)\text{,}\) then \(\vu\) is a unit vector in the direction of greatest increase of \(f\) at \((x_0,y_0)\text{,}\) and \(D_{\vu} f(x_0,y_0) = \vecmag{\nabla f(x_0,y_0)}\text{.}\)

🔗
The gradient \(\nabla f(x_0,y_0)\) points in the opposite direction of greatest rate of decrease for \(f\) at \((x_0,y_0)\text{,}\) and the instantaneous rate of change of \(f\) in that direction is negative the length of the gradient vector. That is, if \(\vu = -\frac{1}{\vecmag{\nabla f(x_0,y_0)}} \nabla f(x_0,y_0)\text{,}\) then \(\vu\) is a unit vector in the direction of greatest decrease of \(f\) at \((x_0,y_0)\text{,}\) and \(D_{\vu} f(x_0,y_0) = -\vecmag{\nabla f(x_0,y_0)}\text{.}\)

🔗

🔗

In Activity 11.7.4, we will look at how the directional derivative will change as a function of the direction used or of the point where the directional derivative is being evaluated. The goals of these tasks are to understand the directional derivative as a geometric measurement using visually-based sense making prompts in terms of the location and directions being used. These activities are adapted from materials developed by Rafael Martinez-Planell.

🔗

Activity 11.7.4.

(a)

Figure 11.7.16 shows a graph of \(z=f(x,y)\) that you should use for the following:

On the \(xy\)-plane of the figure below, draw the direction vector \(\langle 1,1\rangle \) starting at the point \((2,-1)\text{.}\)

🔗
Draw a piece of the tangent line to the graph of \(f\) at the point \((2,-1,f(2,-1))\) which is in the direction \(\langle 1,1\rangle \text{.}\) The slope of the tangent line in the \(\langle 1,1\rangle \) direction is called the directional derivative of \(f\) at \((2,-1)\) in the \(\langle 1,1\rangle \) direction and is denoted \(D_{\langle 1,1\rangle} f (2,-1)\text{.}\)

🔗
Is \(D_{\langle 1,1\rangle} f (2,-1)\) positive, negative, or zero? Justify your answer.

🔗
On the \(xy\)-plane of (a new plot of) the figure below, draw the direction vector \(\langle -\frac{1}{2},3\rangle \) starting at the point \((2,-1)\text{.}\)

🔗
Draw a piece of the tangent line to the graph of \(f\) at the point \((2,-1,f(2,-1))\) which is in the direction \(\langle -\frac{1}{2},3 \rangle \text{.}\) The slope of the tangent line in the \(\langle -\frac{1}{2},3 \rangle \) direction is called the directional derivative of \(f\) at \((2,-1)\) in the \(\langle -\frac{1}{2},3 \rangle \) direction and is denoted \(D_{\langle -\frac{1}{2},3 \rangle} f (2,-1)\text{.}\)

🔗
Is \(D_{\langle -\frac{1}{2},3 \rangle} f (2,-1)\) positive, negative, or zero? Justify your answer.

🔗

🔗

Figure 11.7.16. A plot of \(f(x,y)\) with the measurements of change in input and output labeled for a change of inputs given by \(\langle a,b\rangle\)

🔗

(b)

Let \(f\) be the function whose graph appears in Figure 11.7.16. For each value of \(t\) the vector \(\langle \cos(t),\sin(t)\rangle \) is a direction vector. The value of \(D(t)=D_{\langle \cos(t),\sin(t)\rangle} f (1,-1)\) a scalar.

🔗

What direction is the vector \(\langle \cos(t),\sin(t)\rangle \) in for
🔗
1. \(\displaystyle t=0\)
  
  🔗
2. \(\displaystyle t=\frac{\pi}{2}\)
  
  🔗
3. \(\displaystyle t=\frac{3\pi}{2}\)
  
  🔗
4. \(\displaystyle t=\pi\)
  
  🔗
🔗
Is the value of \(D(t)=D_{\langle \cos(t),\sin(t)\rangle} f (1,-1)\) positive, negative, or zero for:
1. \(\displaystyle t=0\)
  
  🔗
2. \(\displaystyle t=\frac{\pi}{2}\)
  
  🔗
3. \(\displaystyle t=\frac{3\pi}{2}\)
  
  🔗
4. \(\displaystyle t=\pi\)
  
  🔗
🔗
🔗
Draw the graph of \(y=D(t)\) for values of \(t\) in the interval \([0,2\pi]\text{.}\) You should think about how the value of \(D(t)\) changes between the values you considered above to help you draw the graph of \(D(t)\text{.}\) Your graph doesn’t have to have exact values but should correctly identify where \(D(t)\) is positive, negative, and zero.
🔗

🔗
Draw the graph of \(y=G(t)=D_{\langle 1,1 \rangle} f (t,-t)\) for values of \(t\) in the interval \((0,2]\text{.}\) You should probably go through the same process you did above of looking at what \(G(t)\) will be for a few values of \(t\) between 0 and 2, then think about how \(G(t)\) will vary for values in between. The function \(G(t)\) is different than \(D(t)\text{.}\)
🔗

🔗

🔗

(c)

The graph of \(z=f(x,y)\) is as given below. In this problem, use geometric arguments to justify your answers.

Draw the line segment that goes from \((1,-1,f(1,-1))\) to \((2,0,f(2,0))\text{.}\) How does the slope of this line segment in that direction compare (smaller, equal, larger) with the value of \(D_{\langle 1,1 \rangle} f (1,-1)\text{?}\)

🔗
How does \(D_{\langle 1,1 \rangle} f (1,-1)\) compare with \(D_{\langle 0.01,0.01 \rangle} f (1,-1)\text{?}\) Justify your answer.

🔗
Which is closer to \(D_{\langle 1,1 \rangle} f (1,-1)\text{,}\) the slope of the line segment that goes from \((1,-1,f(1,-1))\) to \((2,0,f(0,2))\) or the slope of the line segment that goes from \((1,-1,f(1,-1))\) to \((1.01,-0.99,f(1.01,-0.99))\text{?}\) Explain your answer.

🔗

🔗

Subsection 11.7.6 Applications

The gradient has many natural applications. For example, situations often arise — for instance, constructing a road through the mountains or planning the flow of water across a landscape — where we are interested in knowing the direction in which a function is increasing or decreasing most rapidly. In the next activity, we will look at how the gradient can help you navigate to the top of a mountain in foggy conditions.

🔗

Activity 11.7.5.

In this activity, we will make sense of the directional derivatives and gradients in terms of a function that measures the elevation. We are hiking in a foggy park and cannot see anything more than a few feet in front of us. There is nothing blocking us from walking in any particular direction but because of the fog we cannot see where the highest point on the mountain is. We want to try to find the top of the mountain, but we don’t have a map or trail or any line of sight to other landmarks. Our compass still works in the fog, so we can tell what direction North/South/East/West are.

🔗

In order to use some of our calculations tools from multivariable calculus, we will think of the elevation at different locations in the park given by a function \(h(x,y)\) where \(x\) is your location in the East (positive \(x\))/West (negative \(x\)) direction and \(y\) is your location in the North (positive \(y\))/South (negative \(y\)) direction.

🔗

(a)

Let \(P_1\) be your current location in the foggy park. You use your compass to find the East and North directions. At \(P_1\text{,}\) you find that the ground rises 1 meter per 50 meters traveled in the East direction and the ground rises 2.5 meters per 50 meters traveled in the North direction.

🔗

Use this information to give \(\nabla h (P_1)\text{.}\)

🔗

(b)

Use your answer to the previous task to say what direction is “uphill” at \(P_1\) and state how steep the mountain is in this direction.

🔗

(c)

You decide to walk uphill from your location \(P_1\) in order to try to find the top of the mountain. After walking in the same direction for a while, you notice that you are no longer walking in the steepest direction. So at your new location, which we will call \(P_2\text{,}\) you find the East and North directions and measure the steepness of the mountain in these directions. You find that the ground rises 1.5 meters per 75 meters traveled in the East direction and the ground goes down 0.5 meters per 100 meters traveled in the North direction.

🔗

Use this new information to calculate \(\nabla h (P_2)\text{,}\) find the uphill direction, and give state how steep the mountain is in the uphill direction at \(P_2\text{.}\)

🔗

(d)

If we use this method of walking in the uphill direction for a ways and then finding the new uphill direction, do you think we will have to find the top of the mountain? You should write a few sentences to justify your ideas and be sure to state how you will know you are at the top of the mountain. Remember that you can’t see very far in front of you.

🔗

The technique described in the previous activity has many applications related to maximizing or minimizing functions. For example, consider a two-dimensional version of how a heat-seeking missile might work.(This application is borrowed from United States Air Force Academy Department of Mathematical Sciences.) Suppose that the temperature surrounding a fighter jet can be modeled by the function \(T\) defined by

\begin{equation*} T(x,y) = \frac{100}{1+(x-5)^2 + 4(y-2.5)^2}, \end{equation*}

where \((x,y)\) is a point in the plane of the fighter jet and \(T(x,y)\) is measured in degrees Celsius. Some contours and gradients \(\nabla T\) are shown on the left in Figure 11.7.17.

🔗

Figure 11.7.17. Contours and gradient for \(T(x,y)\) and the missile’s path.
🔗

A heat-seeking missile will always travel in the direction in which the temperature increases most rapidly; that is, it will always travel in the direction of the gradient \(\nabla T\text{.}\) If a missile is fired from the point \((2,4)\text{,}\) then its path will be that shown on the right in Figure 11.7.17.

🔗

This type strategy is sometimes called gradient ascent and has uses in economics and machine learning.

🔗

Subsection 11.7.7 Summary

The directional derivative of \(f\) at the point \((x,y)\) in the direction of the unit vector \(\vu = \langle u_1, u_2 \rangle\) is

\begin{equation*} D_{\vu}f(x,y) = \lim_{h \to 0} \frac{f(x+u_1h, y+u_2h) - f(x,y)}{h} \end{equation*}

for those values of \(x\) and \(y\) for which the limit exists. In addition, \(D_{\vu}f(x,y)\) measures the slope of the graph of \(f\) when we move in the direction \(\vu\text{.}\) Alternatively, \(D_{\vu} f(x_0,y_0)\) measures the instantaneous rate of change of \(f\) in the direction \(\vu\) at \((x_0,y_0)\text{.}\)

🔗

🔗
The gradient of a function \(f=f(x,y)\) at a point \((x_0,y_0)\) is the vector

\begin{equation*} \nabla f(x_0,y_0) = \left\langle f_x(x_0,y_0), f_y(x_0,y_0)\right\rangle \end{equation*}

🔗

🔗
The directional derivative in the direction \(\vu\) may be computed by

\begin{equation*} D_{\vu}f(x_0,y_0) = \nabla f(x_0,y_0)\cdot \vu \end{equation*}

🔗

🔗
At any point where the gradient is nonzero, the gradient is orthogonal to the contour through that point and points in the direction in which \(f\) increases most rapidly; moreover, the slope of \(f\) in this direction equals the length of the gradient \(|\nabla f(x_0,y_0)|\text{.}\) Similarly, the direction opposite of the gradient is the direction of greatest decrease, and that rate of decrease is the negative length of the gradient.
🔗

🔗

🔗

Exercises 11.7.8 Exercises

1.

Consider the function \(f(x,y,z) = xy + yz^2 + xz^3\text{.}\)

🔗

Find the gradient of \(f\text{:}\)

🔗

\(\langle\), , \(\rangle\)

🔗

Find the gradient of \(f\) at the point (-2, 4, 4).

🔗

\(\langle\), , \(\rangle\)

🔗

Find the rate of change of the function \(f\) at the point (-2, 4,4) in the direction \(\mathbf u = \langle 5/\sqrt{54}, 5/\sqrt{54}, 2/\sqrt{54} \rangle\text{.}\)

🔗

2.

If \(f \left( x, y \right) = 1 x^{2} + 4 y^{2}\text{,}\) find the value of the directional derivative at the point \(\left( -2, 1 \right)\) in the direction given by the angle \(\theta = \frac{2 \pi}{6}\text{.}\)

🔗

3.

Find the directional derivative of \(\displaystyle f(x,y,z) = 4xy+z^{2}\) at the point \((1,2,-2)\) in the direction of the maximum rate of change of \(f\text{.}\)

🔗

\(f_{\lt B>\lt I>u\lt /B>\lt /I>} \, (1,2,-2) = D_{\lt B>\lt I>u\lt /B>\lt /I>} \, f(1,2,-2) =\)

🔗

4.

The temperature at any point in the plane is given by \(\displaystyle T(x,y) = \frac{190}{x^{2}+y^{2}+2}\text{.}\)

🔗

(a) What shape are the level curves of \(T\text{?}\)

circles
🔗

🔗
ellipses
🔗

🔗
lines
🔗

🔗
parabolas
🔗

🔗
hyperbolas
🔗

🔗
none of the above
🔗

🔗

🔗

(b) At what point on the plane is it hottest?

🔗

What is the maximum temperature?

🔗

What is the value of this maximum rate of change, that is, the maximum value of the directional derivative at \((-3,3)\text{?}\)

🔗

(d) Find the direction of the greatest decrease in temperature at the point \((-3,3)\text{.}\)

🔗

What is the value of this most negative rate of change, that is, the minimum value of the directional derivative at \((-3,3)\text{?}\)

🔗

5.

The temperature at a point (x,y,z) is given by \(\displaystyle T(x,y,z) = 200e^{-x^2 -y^2/4 - z^2/9}\text{,}\) where \(T\) is measured in degrees Celsius and x,y, and z in meters. There are lots of places to make silly errors in this problem; just try to keep track of what needs to be a unit vector.

🔗

Find the rate of change of the temperature at the point (1, 1, 1) in the direction toward the point (4, 5, 5).

🔗

In which direction (unit vector) does the temperature increase the fastest at (1, 1, 1)?

🔗

\(\langle\), ,\(\rangle\)

🔗

What is the maximum rate of increase of \(T\) at (1, 1, 1)?

🔗

6.

If \(\displaystyle f(x,y,z) = 2zy^{2}\text{,}\) then the gradient at the point \((4,6,4)\) is

🔗

\(\nabla f (4,6,4) =\)

🔗

7.

The concentration of salt in a fluid at \((x,y,z)\) is given by \(F(x,y,z) = x^{2}+y^{4}+2x^{2}z^{2}\) mg/cm\({}^3\text{.}\) You are at the point \((-1,1,1)\text{.}\)

🔗

(a) In which direction should you move if you want the concentration to increase the fastest?

🔗

direction:

🔗

(Give your answer as a vector.)

🔗

(b) You start to move in the direction you found in part (a) at a speed of \(3\) cm/sec. How fast is the concentration changing?

🔗

rate of change =

🔗

8.

At a certain point on a heated metal plate, the greatest rate of temperature increase, 3 degrees Celsius per meter, is toward the northeast. If an object at this point moves directly north, at what rate is the temperature increasing?

🔗

degrees Celsius per meter

🔗

9.

Suppose that you are climbing a hill whose shape is given by \(z = 720 - 0.04 x^2 -0.06 y^2\text{,}\) and that you are at the point (30, 80, 300).

🔗

In which direction (unit vector) should you proceed initially in order to reach the top of the hill fastest?

🔗

\(\langle\),\(\rangle\)

🔗

If you climb in that direction, at what angle above the horizontal will you be climbing initially (radian measure)?

🔗

10.

Are the following statements true or false?

🔗

If \(\vec{u}\) is a unit vector, then \(f_{\vec{u}} (a,b)\) is a vector.
🔗

🔗
If \(f(x,y)\) has \(f_x(a,b) = 0\) and \(f_y(a,b) = 0\) at the point \((a,b)\text{,}\) then \(f\) is constant everywhere.
🔗

🔗
The gradient vector \(\nabla f(a,b)\) is tangent to the contour of \(f\) at \((a,b)\text{.}\)
🔗

🔗
Suppose \(f_x(a,b)\) and \(f_y(a,b)\) both exist. Then there is always a direction in which the rate of change of \(f\) at \((a,b)\) is zero.
🔗

🔗
\(\nabla f(a,b)\) is a vector in 3-dimensional space.
🔗

🔗
If \(\vec{u}\) is perpendicular to \(\nabla f(a,b)\text{,}\) then \(f_{\vec{u}} \, (a,b) = \langle 0, 0 \rangle\text{.}\)
🔗

🔗
\(f_{\vec{u}} \, (a,b) = || \nabla f(a,b) ||\text{.}\)
🔗

🔗
\(f_{\vec{u}} \, (a,b)\) is parallel to \(\vec{u}\text{.}\)
🔗

🔗

🔗

11.

Let \(E(x,y) = \frac{100}{1+(x-5)^2 + 4(y-2.5)^2}\) represent the elevation on a land mass at location \((x,y)\text{.}\) Suppose that \(E\text{,}\) \(x\text{,}\) and \(y\) are all measured in meters.

Find \(E_x(x,y)\) and \(E_y(x,y)\text{.}\)
🔗

🔗
Let \(\vu\) be a unit vector in the direction of \(\langle -4,3 \rangle\text{.}\) Determine \(D_{\vu} E(3,4)\text{.}\) What is the practical meaning of \(D_{\vu} E(3,4)\) and what are its units?
🔗

🔗
Find the direction of greatest increase in \(E\) at the point \((3,4)\text{.}\)
🔗

🔗
Find the instantaneous rate of change of \(E\) in the direction of greatest decrease at the point \((3,4)\text{.}\) Include units on your answer.
🔗

🔗
At the point \((3,4)\text{,}\) find a direction \(\vw\) in which the instantaneous rate of change of \(E\) is 0.
🔗

🔗

🔗

12.

Find all directions in which the directional derivative of \(f(x,y) = ye^{-xy}\) is 1 at the point \((0,2)\text{.}\)

🔗

13.

Find, if possible, a function \(f\) such that

\begin{equation*} \nabla f = \left\langle \sin(yz), xz\cos(yz)+2y, xy\cos(yz)+\frac{5}{z} \right\rangle\text{.} \end{equation*}

If not possible, explain why.

🔗

14.

Let \(f(x,y) = x^2+3y^2\text{.}\)

Find \(\nabla f(x,y)\) and \(\nabla f(1,2)\text{.}\)
🔗

🔗
Find the direction of greatest increase in \(f\) at the point \((1,2)\text{.}\) Explain. A graph of the surface defined by \(f\) is shown at left in Figure 11.7.18. Illustrate this direction on the surface.
🔗

🔗
A contour diagram of \(f\) is shown at right in Figure 11.7.18. Illustrate your calculation from (b) on this contour diagram.

Figure 11.7.18. Left: Graph of \(f(x,y) = x^2+3y^2\text{.}\) Right: Contours.
🔗
🔗

🔗
Find a direction \(\vw\) for which the derivative of \(f\) in the direction of \(\vw\) is zero.
🔗

🔗

🔗

15.

The properties of the gradient that we have observed for functions of two variables also hold for functions of more variables. In this problem, we consider a situation where there are three independent variables. Suppose that the temperature in a region of space is described by

\begin{equation*} T(x,y,z) = 100e^{-x^2-y^2-z^2} \end{equation*}

and that you are standing at the point \((1,2,-1)\text{.}\)

Find the instantaneous rate of change of the temperature in the direction of \(\vv=\langle 0, 1, 2\rangle\) at the point \((1,2,-1)\text{.}\) Remember that you should first find a unit vector in the direction of \(\vv\text{.}\)
🔗

🔗
In what direction from the point \((1,2,-1)\) would you move to cause the temperature to decrease as quickly as possible?
🔗

🔗
How fast does the temperature decrease in this direction?
🔗

🔗
Find a direction in which the temperature does not change at \((1,2,-1)\text{.}\)
🔗

🔗

🔗

16.

Figure 11.7.19 shows a plot of the gradient \(\nabla f\) at several points for some function \(f=f(x,y)\text{.}\)

🔗

Figure 11.7.19. The gradient \(\nabla f\text{.}\)
🔗

Consider each of the three indicated points, and draw, as best as you can, the contour through that point.
🔗

🔗
Beginning at each point, draw a curve on which \(f\) is continually decreasing.
🔗

🔗

🔗

Prev Top Next