The Chain Rule.
Let \(z = f(x,y)\text{,}\) where \(f\) is a differentiable function of the independent variables \(x\) and \(y\text{,}\) and let \(x\) and \(y\) each be differentiable functions of an independent variable \(t\text{.}\) Then
\begin{equation}
\frac{dz}{dt} = \frac{\partial z}{\partial x} \frac{dx}{dt} + \frac{\partial z}{\partial y} \frac{dy}{dt}.\tag{11.6.2}
\end{equation}
It is important to note the differences among the derivatives in
(11.6.2). Since
\(z\) is a function of the two variables
\(x\) and
\(y\text{,}\) the derivatives in the Chain Rule for
\(z\) with respect to
\(x\) and
\(y\) are partial derivatives. However, since
\(x = x(t)\) and
\(y = y(t)\) are functions of the single variable
\(t\text{,}\) their derivatives are the standard derivatives of functions of one variable. When we compose
\(z\) with
\(x(t)\) and
\(y(t)\text{,}\) we then have
\(z\) as a function of the single variable
\(t\text{,}\) making the derivative of
\(z\) with respect to
\(t\) a standard derivative from single variable calculus as well.
To understand why this Chain Rule works in general, suppose that some quantity \(z\) depends on \(x\) and \(y\) so that we can express the change in \(z\) in terms of the differential as
\begin{equation}
dz = \frac{\partial z}{\partial x} dx + \frac{\partial z}{\partial
y} dy.\tag{11.6.3}
\end{equation}
Next, suppose that \(x\) and \(y\) each depend on another quantity \(t\text{,}\) so that
\begin{equation}
dx = \frac{dx}{dt}~dt
\
\mbox{and}
\
dy = \frac{dy}{dt}~dt.\tag{11.6.4}
\end{equation}
\begin{equation*}
dz = \frac{\partial z}{\partial x}\frac{dx}{dt}~dt
+ \frac{\partial z}{\partial y}\frac{dy}{dt}~dt = \frac{dz}{dt}~dt,
\end{equation*}
which is the Chain Rule in this particular context, as expressed in
Equation (11.6.2).
This approach to understanding the change in z using the differential or linearization also has the added benefit that we have separated how much of the change in
\(x\) is coming through each of the intermediate variables
\(x\) and
\(y\text{.}\) Let’s go back to the context of our
Preview Activity 11.6.1 where
\(P\) is the amount of the large particulate matter in the air as function of location
\((x,y)\) and our location is changing according to time along the path
\(\vr(t)=\langle x(t),y(t)\rangle \text{.}\) The chain rule will express the rate of change in
\(P\) as a function of time as
\begin{equation*}
\frac{dP}{dt} = \frac{\partial P}{\partial x} \frac{dx}{dt}+ \frac{\partial P}{\partial y} \frac{dy}{dt}
\end{equation*}
Because the amount of large particulate matter in terms of the \(x\) coordinate is governed by tree pollen, the term \(\frac{\partial P}{\partial x} \frac{dx}{dt}\) describes the rate of change in the large particulate matter per unit time along our drive due to tree pollen. Similarly, the amount of large particulate matter in terms of the \(y\) coordinate is governed by industrial pollution, so the second term, \(\frac{\partial P}{\partial y} \frac{dy}{dt}\text{,}\) describes the rate of change in the large particulate matter per unit time along our drive due to industrial pollution. Because our function is locally linear, the total rate of change in large particulate matter is the sum of these individual rates of change.
The previous activity and discussion have all dealt with a multivariable function where each of the inputs of the function depend on a separate variable,
\(t\text{.}\) The next example generalizes the concept of the chain rule for multivariable functions with more than two inputs as well as the situation that the input variables depend on more than one separate variable.
Example 11.6.3.
Let
\(C\) be the cost to manufacture flibertygibbits. The cost to produce flibertygibbits depends on the costs of two parts, which we will call
\(p\) and
\(q\text{,}\) and the cost of labor to construct the flibertygibbits, which we will call
\(L\text{.}\) So
\(C\) is a multivariable function of
\(p\text{,}\) \(q\text{,}\) and
\(L\text{,}\) which we will denote
\(C(p,q,L)\text{.}\) The part costs (
\(p\) and
\(q\)) and the labor costs each depend on
\(s\text{,}\) the distance to nearby suppliers and cities, and
\(t\text{,}\) time. This means that
\(p\text{,}\) \(q\text{,}\) and
\(L\) are themselves multivariable functions of
\(s\) and
\(t\text{;}\) in other words we need to consider
\(p(s,t)\text{,}\) \(q(s,t)\text{,}\) and
\(L(s,t)\text{.}\)
All of these dependencies mean that \(C\) is dependent on \(s\) and \(t\) through composition with \(p(s,t)\text{,}\) \(q(s,t)\text{,}\) and \(L(s,t)\text{.}\) So we can ask how the cost to produce flibertygibbits depends on changes in \(s\) or \(t\text{,}\) which we can use the Chain Rule to describe. We can look at the differential of \(C\) in particular to see how these dependencies will affect each other. The differential
\begin{equation*}
dC = \frac{\partial C}{\partial p} dp + \frac{\partial C}{\partial q} dq + \frac{\partial C}{\partial L} dL
\end{equation*}
describes how a change in \(C\) will depend on changes in each variable \(p\text{,}\) \(q\text{,}\) and \(L\text{.}\)
We can look at how this change in \(C\) will depend on either \(s\) or \(t\) by taking the appropriate partial derivatives with respect to \(s\) or \(t\text{.}\) In particular, we get
\begin{equation*}
\frac{\partial C}{\partial s} = \frac{\partial C}{\partial p} \frac{\partial p}{\partial s} + \frac{\partial C}{\partial q} \frac{\partial q}{\partial s} + \frac{\partial C}{\partial L} \frac{\partial L}{\partial s}
\end{equation*}
and
\begin{equation*}
\frac{\partial C}{\partial t} = \frac{\partial C}{\partial p} \frac{\partial p}{\partial t} + \frac{\partial C}{\partial q} \frac{\partial q}{\partial t} + \frac{\partial C}{\partial L} \frac{\partial L}{\partial t}
\end{equation*}
These chain rule expressions have the same separation of intermediate variables as in our earlier discussion. For instance, the term
\(\frac{\partial C}{\partial q} \frac{\partial q}{\partial s} \) describes the rate of change in the cost of making flibertygibbits per unit of distance from other cities that depends on the cost of part
\(q\text{.}\) The term
\(\frac{\partial C}{\partial L} \frac{\partial L}{\partial t} \) describes the rate of change in the cost of making flibertygibbits per unit time that depends on the cost of labor
\(L\text{.}\) In both cases, the partial derivatives of
\(C\) with respect to
\(s\) and
\(t\) are a linear combination of the changes through each of the intermediate variables
\(p\text{,}\) \(q\text{,}\) and
\(L\text{.}\)
We close this section with another statement on what linearity means in a bigger conceptual way. Every version of the chain rule we have looked at (including the chain rule from single variable calculus) states that the change in our output variable is a linear combination of the instantaneous rate of change in terms of each variable and the step size in that variable. The chain rule is simply an algebraic statement of this linear combination. As we said at the end of the previous section, if we are using a locally linear function then the instantaneous rate of change of the function will be the same as the corresponding measurement of change along the linearization/tangent plane.