Welcome to a short introduction to solving ordinary differential equations, or ODEs, numerically!
Here, we will learn how ODEs can help us to predict the future and what an ODE actually is. Then we will look into solving ODEs numerically and derive a simple method to do so. We discuss the properties of this method, and point out a way to come up with improved methods.
Although we focus mostly on intuition in this introduction, a basic understanding of derivatives and integrals can be helpful to grasp all details.
The quote "The only constant in life is change" tells us that everything changes with time. If we could get an insight into how something changes over time, then we are able to predict the future.
Here are some common things we would maybe want to predict:
From these examples, we see that the desire to predict the future spans many different disciplines, from biology and medicine to engineering, climate science, and finance.
One idea to predict the future is to determine a model of how a quantity changes over time and use that model to predict future values of the quantity. ODEs are one way of modeling the change of a quantity over time and, therefore, ODEs are a crucial tool to help us predict the future. In addtion to that, ODEs can also help us in designing new engineering systems because we can use them to predict, for example, how a mechanical system would move, and help us to design the parameters for the system to have a desired behavior.
So, ODEs can help us predict how something behaves over time.
Now that we know that ODEs can help us with predicting the future, let us define what an ODE is.
Remember that if a function $x(t)$ depends on a parameter, say $t$, then we can take the derivative of $x(t)$ with respect to $t$, which is written as $\frac{dx}{dt}$. The parameter $t$ could be any parameter of the function, but here we will assume that $t$ represents time. This means that $x(t)$ describes a quantity that changes over time, for example, the spread of a disease. To simplify the notation, we will use $\dot{x}(t)$ to denote the derivative of $x(t)$ with respect to $t$.
With this notation, an ODE is simply an equation that expresses the derivative $\dot{x}(t)$ as a function of $x(t)$ and the time $t$: $$ \dot{x}(t)=f(t,x) $$
So, an ODE is an equation that tells us the value of the derivative of $x(t)$ with respect to time at time $t$ and for a certain value of $x$. This in turn gives us an insights into how $x(t)$ will change over time. The size of $f(t,x)$ tells us how much $x(t)$ will change, that is, its rate of change. For example, a large value for $f(t,x)$ means that $x(t)$ will change a lot at time $t$ if it has the value $x$. Furthermore, we will also get a direction of change. For example, if $f(t,x)$ is positive, the value of $x(t)$ will increase, and if it is negative it will decrease.
Now let us look at a simple example, where $x(t)=\sin(t)$. The derivative is given by $\dot{x}(t)=\cos(t)$, which defines our ODE.
As mentioned above, the derivative $f(t,x)$ provides us with the rate of change of $x(t)$ as well as its direction of change. The larger the value of the ODE the more $x(t)$ will change at that point in time. To illustrate that we plot the function $x(t)$ below. Then at a certain point $(t,x)$ (marked as a dot), we plot the derivative as an arrow. The length of the arrow represents the rate of change and the direction the arrow is pointing in represents in which direction the change will happen.
You can use the slider to move the point $(t,x)$ around to see how the rate and direction of change varies over time.
Visually, the steeper $x(t)$ is the larger the rate of change. That is why at $t=3.20$ the line is very long and at $t=1.60$ the line is very short. For values of $t$ between $1.8$ and $4.6$, the arrow points down, because the rate of change is negative and the values of $x(t)$ decrease with time. However, for $t$ between $4.8$ and $7.8$ the arrow is pointing up, showing that $x(t)$ is increasing.
So, an ODE describes the rate and the direction of the change of a function $x(t)$
One might think having the ODE itself is enough to predict the behavior of $x(t)$, but that is not true!
To show that let's take one step back and return to our initial example. Remember that $$ \dot{x}(t)=\cos(t) $$ is the ODE that described the sine function $x(t)=\sin(t)$.
What about the function $x(t)=\sin(t)+x_0$, where $x_0$ is a constant? Which ODE describes the rate of change of it?
Taking the derivative with respect to time leads to $$ \dot{x}(t)=\frac{d}{dt}x(t)=\frac{d}{dt}(\sin(t)+x_0)=\cos(t) $$ Since $x_0$ is a constant, its derivative with respect to time is zero, because it does not change with time.
Hence, all functions $x(t)=\sin(t)+x_0$, where $x_0$ is a constant value, have the same ODE!
Therefore, to be able to obtain $x(t)$ we do not only need an ODE that describes its change over time, but also an initial condition $x(0)=x_0$ to make the solution to the ODE unique.
Intuitively, this makes sense. Assume you want to predict how a rain cloud is moving when the wind is blowing from the west. Where the cloud will end up depends on where it is initially. For example, if it starts west from you then it will at some point pass over you, while if it starts north of you it will not pass over you.
To recap, the function $x(t)$ can be defined by its ODE $\dot{x}=f(t,x)$ and its initial value $x(0)=x_0$.
Now that we have our ODE and its an initial condition, we can find a solution to the ODE by integrating both sides of it from $0$ to $t_1$, which gives us the value $x(t_1)$.
$ \begin{align} \int_{t=0}^{t=t_1}\dot{x}(t)dt&=\int_{t=0}^{t=t_1}f(t,x)dt\\ \Leftrightarrow x(t_1)-x(0)&=\int_{t=0}^{t=t_1}f(t,x)dt\\ \Leftrightarrow x(t_1)&=x(0)+\int_{t=0}^{t=t_1}f(t,x)dt \end{align} $
For our example ODE $\dot{x}(t)=\cos(t)$, we can solve solve this integral by hand
$ \begin{align} x(t_1)&=x(0)+\int_{t=0}^{t=t_1}\cos(t)dt\\ &=x(0)+\sin(t_1)-\sin(0)\\ &=x(0)+\sin(t_1) \end{align} $
and there are several other ODEs for which we are able to integrate $f(t,x)$. But often it is difficult to integrate the ODE. If we want to get an idea how $x(t)$ should look like without solving the ODE analytically we can use a, so called, slope field to understand how $x(t)$ would behave over time.
The idea behind a slope field is the following. Similar to the interactive figure above, where we plotted the value of $f(t,x)$ at one specific point as an arrow, in a slope field we look at many different points in the $t$-$x$ plane. For each point $(t,x)$ we draw an arrow at this point, where the length of the arrow and its direction, given by $f(t,x)$, represent the rate and the direction of the change of $x(t)$ at this point, respectively. The name slope field comes from the fact that the arrow at point $(t,x)$ follows a line with a slope of $f(t,x)$ going through this point.
Below you can see an illustration of a slope field for our initial example $f(t,x)=\cos(t)$, where we choose the points $(t,x)$ with an equally spaced grid of $t$ and $x$ values, where $M$ determines how many grid points we have in $t$ and $x$.
Play around with the slider to change the size, $M$, of the grid that is used to generate the slope field and see how it changes the slope field.
As you might have observed, the more values we use in the grid the better we can imagine how the function would behave over time from any point in space at any time. Note how the slope field does not depend on the initial condition! Hence, even without an initial condition we can see how the function behaves approximately.
Below we plot three different solutions to $\dot{x}(t)=\cos(t)$, where $x(0)$ is 2, 4, and 6, respectively.
We see that the three solutions follow the slope field and we could have already guessed from the slope field that $x(t)$ has an oscillatory behavior.
So, a slope field can be used to get an idea how the solution of an ODE behaves qualitatively.
Some ODEs can be solved analytically but if the function $f(t,x)$ is complicated we might not be able to solve the ODE analytically, even if we have used a slope field to know how it qualitatively behaves. For example, if $f(t,x)$ describes how the weather changes it will be a very complicated function, which we mostly likely cannot integrate by hand.
Therefore, we want to derive one of the simplest methods to solve an ODE numerically, namely the Euler method. We will derive the method in three different ways, which will gradually increase our understanding of the method and give us an idea on how to derive more sophisticated methods.
When solving the ODE numerically, we will only obtain an approximation of the function $x(t)$ and we call this approximation $\hat{x}(t)$.
To get an intuition for the method, imagine we have arrived at the train station in a new city and want to walk to our hostel/hotel from the train station. Our initial position is the train station and by looking at a map we know in which direction we should walk to reach our destination. Typically we will have to look several times at the map to make sure we are not going in the wrong direction. Let's say that we are looking every five minutes at the map and then correct the direction in which we are walking. This should hopefully lead us to our hostel/hotel.
When solving an ODE numerically, we can do something very similar. Recall that the ODE tells us the rate of change and also the direction in which we should go. Therefore, we can treat the ODE as our map and we will check the ODE from time to time to update the direction we are going in.
Let's state this more mathematically. Assume that at $t=0$ we are at $x(0)=x_0$, which would be the train station in our map example. Looking at our ODE, we determine that we should head in the direction $f(0,x_0)$. So we head in that direction for an amount of time we will call $h$. Then we end up at $$ \hat{x}(h)=x_0+h\cdot f(0,x_0) $$ At $t=h$, we ended up at $\hat{x}(h)$ and we check again where we should go from there. This time we should go in the direction $f(h,\hat{x}(h))$. So we do that again for an amount of time $h$ and end up at
$ \begin{align} \hat{x}(2h)&=\hat{x}(h)+h\cdot f(h,\hat{x}(h))\\ &=x_0+h\cdot f(0,x_0)+h\cdot f\big(h,x_0+h\cdot f(0,x_0)\big). \end{align} $
By repeatedly checking where we ended up after the time $h$, checking which direction we should go next and then following that direction for the time $h$, we can approximate the ODE.
So, to solve ODEs numerically with the Euler method we initialize $\hat{x}(0)=x_0$ and $t=0$. Then we calculate ${\hat{x}(t+h)=\hat{x}(t)+h\cdot f(t,\hat{x}(t))}$, update $t$ to $t+h$, and repeat it.
Equipped with the Euler method, let us now numerically solve our example ODE $\dot{x}(t)=\cos(t)$, where we start at $x(0)=2$ and choose $h=1\,\mathrm{s}$. This means for $1\,\mathrm{s}$ we are going in the same direction before we check the direction again.
This does not look too good... What happened?
Going back to our navigation example, choosing $h$ too large is the same as looking at the map once and then following the first direction for half an hour although we should have taken a turn after five minutes of walking. In the figure above, the step size $h$ was too large for $\hat{x}(t)$ to approximate $x(t)$ well.
It is often common that we want to approximate the function $x(t)$ over a specific time interval from ${t=0}$ to $t=T$. Instead of choosing a step size $h$ directly, we choose how many steps $N$ we want to take in this interval, which leads to ${h=\frac{T}{N}}$.
Below we can observe the influence of the number of steps $N$ on the approximation of $x(t)$ over an interval from $t=0$ to ${T=10}$ by changing the value of $N$.
We can see that the smaller the step size is the better $\hat{x}(t)$ approximates $x(t)$ and with our map analogy this means we check the map more often to find the right direction.
Let us now try to find a more mathematical explanation for why the Euler method stops working if $h$ is chosen too large. Recall that the derivative is defined as follows $$ \dot{x}(t)=\lim_{h\rightarrow 0}\frac{x(t+h)-x(t)}{h}, $$ where $\lim_{h\rightarrow 0}$ means that $h$ approaches zero. Instead of letting $h$ go to zero, we could approximate the derivative with a constant $h$ $$ \dot{x}(t)\approx\frac{x(t+h)-x(t)}{h} $$ Replacing the derivative in the ODE with its approximation above leads to $$ \frac{\hat{x}(t+h)-\hat{x}(t)}{h}=f(t,\hat{x}), $$ which can be re-written as $$ \hat{x}(t+h)=\hat{x}(t)+h\cdot f(t,\hat{x}). $$ This shows us that the Euler method is equivalent to approximating the derivative with a constant $h$. In the definition of the derivative, $h$ goes to zero, which explains why the approximation becomes better if we choose a smaller $h$.
Based on our above analysis of Euler's method, we could simply choose the step size $h$ to be very small. This will definitely lead to a better fit between $\hat{x}(t)$ and $x(t)$. However, for a given prediction length $T$, we will need to do more steps of the Euler method, namely ${N=\frac{T}{h}}$ steps. This means the smaller $h$, the more often we have to evaluate $f(t,x)$, which is not a problem for our example ODE. However, if $f(t,x)$ is a complex function these evaluations will take a lot of time and computational power. Especially when we want to evaluate how a system evolves within the next second we would not like to have to simulate it for ten seconds. In our map example, a complex function $f(t,x)$ can be seen as a large map of a city with many twisting and turning streets such that it will take some time to find the next direction to follow.
Let us now look at the error between $x(t)$ and our approximation $\hat{x}(t)$ for our simple ODE with $x_0=2$. We look at two types of errors here. The first one is the absolute cumulative error $$ e_c(N) = \sum_{k=1}^N |x(t+k\cdot h)-\hat{x}(t+k\cdot h)|. $$ The cumulative error basically adds up the absolute value of the difference of the true value of $x(t)$ and the predicted value $\hat{x}(t)$ at the times where we have calculated $\hat{x}(t)$. The second one is the average error $$ e_a(N) = \frac{\sum_{k=1}^N |x(t+k\cdot h)-\hat{x}(t+k\cdot h)|}{N}=\frac{e_c(N)}{N}, $$ which is the absolute cumulative error from above divided by the number, $N$, of steps taken in the Euler method.
Next we plot how these errors change if we vary $N$.
The plot shows us that the both errors decrease as we increase $N$. Since a larger $N$ means that $h$ is smaller, this is an expected result.
When observing the average error, we see that we need around $N=20$ steps in the Euler forward method to bring the average error closer to zero. But if we think about the animation above, where we can choose the step size, not even for $N=50$ the approximation $\hat{x}(t)$ fits $x(t)$ closely. The above plot shows us that there is still a large cumulative error, such that the difference between $x(t)$ and $\hat{x}(t)$ at certain values for $t$ will not go to zero even if we increase $N$.
Since the average error does not decay very quickly for the number of function evaluations and the cumulative error approaches a constant values, we might wonder if there are any better alternatives than solving the ODE with the Euler method.
And yes, there are better methods!
To show a way for deriving a better method we will look now at a third derivation of the Euler method. Recall that to solve an ODE $\dot{x}(t)=f(t,x)$ with $x(0)=x_0$ we can integrate both sides of the ODE to obtain $$ x(h)=x_0+\int_0^h f(t,x)dt. $$ To be able to integrate $f(t,x)$ we need to know the value of $x(t)$ for all $t$ between $0$ and $h$, which is something we do not know, because we do not know $x(t)$.
However, we know that $x(0)=x_0$ and we can evaluate $f(0,x_0)$. A simple idea would be then to assume that $f(t,x)$ is constant for values of $t$ between $0$ and $h$. This assumption leads to the approximation $$ \hat{x}(h)=x_0+hf(0,x_0), $$ which is again the Euler method.
So, the third interpretation of the Euler method is that we keep the value of the function under the integral constant while integrating.
How can we get a better approximation of $x(t)$ then?
Instead of assuming that the function $f(t,x)$ is constant in the interval $[0,h]$, we could assume that it takes more than one value. For example, we could assume that the function is constant in the interval $[0,\alpha\cdot h)$ and then changes value in the interval $[\alpha\cdot h, h]$, where $\alpha$ is a number between $0$ and $1$. This would lead to
$ \begin{align} \hat{x}(h)&=x_0+\int_{\alpha\cdot h}^h f(\alpha\cdot h,x_1)dt+\int_0^{\alpha\cdot h} f(0,x_0)dt\\ &= x_0+(1-\alpha)\cdot h\cdot f(\alpha\cdot h,x_1)+\alpha\cdot h\cdot f(0,x_0). \end{align} $
But here the problem is again that we do not know the value of $x_1$...
One way to approximate $x_1$ is to just use the Euler method with a step size of $\alpha h$. This leads to
$ \begin{align} x_1 &= x_0 + \alpha\cdot h\cdot f(0,x_0)\\ \hat{x}(h) &= x_0+(1-\alpha)\cdot h\cdot f(\alpha\cdot h,x_1)+\alpha\cdot h\cdot f(0,x_0) \end{align} $
The intuition behind this new method is that we check where we should go at time $t=0$, that is $f(0,x_0)$ and then we check where we should go at time $t_1=\alpha\cdot h$, that is $f(t_1,x_1)$. Based on these two values we determine where we go by taking their weighted average.
Going back to our map example, imagine that we have a scout with us. We tell the scout to follow the current direction for a time of $\alpha\cdot h$ and the scout will end up at $x_1$. There, the scout checks the map to determine the new direction, which is $f(\alpha\cdot h, x_1)$, comes back to us, and tells us about the new direction. We will then follow neither of the current and the new direction but take an average of both directions, which we follow for a time $h$.
By taking an average of two directions, we can approximate the true direction we should follow better than by just looking at the direction in the beginning.
But wait a minute...
Now to go one step of $h$ ahead, we need to evaluate $f(t,x)$ twice instead of once! This doubles the number of function evaluations compared to the Euler method!
That is true, the price for more accuracy is that we need to make more function evaluations. Performing more function evaluations can take more time and, thus, slow down our prediction of the future. In our map example, more function evaluations means that we need to send the scout out more often and wait for their return to tell us the potential directions in the future. Ideally, this drawback of needing more function evaluations in one time step $h$ can be compensated by being able to take larger steps and, thus, require less function evaluations in total.
Below we want to look again at the error of the Euler method as above, but now we want to compare it to the proposed method, where we set $\alpha=0.5$.
This method is also known as Heun's method and is described by
$ \begin{align} x_1 &= \hat{x}(t) + \frac{1}{2}\cdot h\cdot f(t,\hat{x}(t))\\ \hat{x}(t+h) &= \hat{x}(t)+\frac{1}{2} \cdot h\left(f\left(t+\frac{1}{2}\cdot h,x_1\right)+ f(t,\hat{x}(t))\right) \end{align} $
The first thing we notice is that with Heun's method, the cumulative error also approaches zero. Furthermore, the average error converges much faster to zero with Heun's method than with the Euler method.
So, for smaller $N$ we get a much more accurate solution. But what about the function evaluations?
For that let us look specific values for $N$ and compare the average error for the Euler method and Heun's method. The results are shown in the table below.
$N$ | 5 | 10 | 20 | 30 | 40 |
---|---|---|---|---|---|
Average Error (Euler) | 1.20 | 0.56 | 0.27 | 0.18 | 0.13 |
Average Error (Heun) | 0.25 | 0.06 | 0.01 | 0.006 | 0.003 |
As in the previous plot above, the average error of Heun's method decreases much faster with $N$. For $N=10$, the average error of Heun's method is smaller than the average error for the Euler method with $N=40$. For $N=20$, Heun's method needs the same amount of function evaluations as the Euler method for $N=40$, but the average error is about ten times smaller in our example.
If we plot the approximate solutions for both the Euler and Heun's method, we see that for $N=20$ Heun's method fits the sine graph quite well, while Euler's method has still an offset for $N=40$.
In with this improved method, we only evaluated the function under the integral on two different positions between $0$ and $h$. In general, we could evaluate the integral at even more positions. This would increase the accuracy of our approximation $\hat{x}(t)$ but also increase the number of function evaluations. A very popular method for solving ODEs numerically is the RK4 method which approximate the integral with four function evaluations.
So, to derive more accuracte methods we can evaluate the function $f(t,x)$ more often in an interval of length $h$. This comes at the cost of more function evaluations though. Hence, there is a trade-off between accuracy and the speed of finding an approximation.
We have introduced ODEs and discussed how one can predict the future by solving these ODEs. Since solving ODEs can be very difficult, we introduced the simplest method, the Euler method, to solve ODEs numerically. We discussed three different ways to derive the Euler method. The first way gave us an intuitive understanding how the Euler method works. The second way made a connection between derivatives and the Euler method and also gave a mathematical understanding why the step size needs to be chosen small for obtaining a good approximation with the Euler method. The third way used integrals and showed us a technique to derive more sophisticated methods than the Euler method by finding better approximations of the integral.
If this has sparked your interest in learning more and diving deeper into the approximation of ODEs, then you are in luck. There is are many more topics to explore when it comes to solving ODEs. For example, the Euler method we have introduced here is also called the Forward Euler method. As you can imagine, if there is a Forward Euler method there is also Backward Euler method. The latter one is an implicit ODE solver, which opens up another world of methods to solve ODEs numerically. Another interesting aspect is how quickly the error of between the true function $x(t)$ and the approximation $\hat{x}(t)$ decreases with the number of steps $N$ for different ODE solvers. Furthermore, for simplicity, we only looked at one-dimensional examples of ODEs, but ODEs can also be used for higher dimensional values. For example, going back to Newton's law $x(t)$ could not only present the velocity of an object but also the position of the object.
So there are many exciting topics in the field of solving ODEs numerically. Here are some recommendations for you to dive deeper into the topic:
If you want increase your understanding of differential equations in general, 3Blue1Brown has a short series on differential equations, starting with the video Differential equations, a tourist's guide | DE1
Steven Brunton has a series on Differential Equations and Dynamical Systems on YouTube, where he also discusses numerical methods to solve ODEs. For example, the video Numerical Simulation of Ordinary Differential Equations: Integrating ODEs is a good starting point.
If you are more interested in getting a hands-on experience, the freely available book Solving Ordinary Differential Equations in Python teaches you how to program ODE solvers, such the Euler method and Heun's method, in Python.
I would like to thank everyone who has read through this tutorial and provided me valuable feedback. Furthermore, I would like to thank the team behind Voila Dashboards for developing a method to turn Jupyter notebooks into web applications.