top of page
sde1.png

​

Determining Differential Equation (DE) and Stochastic DE from Data: Part I.

​

The purpose:

​

The stochastic SDE we want to model will have the form:

​

​

​

​

​

We want to determine mu and sigma from measurements of X(t). Because X(t) is random, we will need to collect X(t) for a number of simulations. 

​

​

Now how does one determine mu and sigma? The most straight forward strategy is to define a set of candidate functions, and set both mu and sigma equal a linear combination of these candidate functions:

​

​

​

​

​

​

The correct mu and sigma will hopefully be among the candidate functions. The correct coefficients c's and d's will be the ones which accurately model the time evolution:

​

                                                          X(1) -> X(2) -> X(3) -> . . . . . -> X(n)

​

However, X(t) is stochastic! So at first, it appears it is not possible to determine the correct ansatz. Thus, the best we can do is deduce mu and sigma through some "observables".

​

In general these observables can be moments of X(t). However, for this application, I will use the mean and variance of the recorded data to determine the correct stochastic model i.e. the correct model will be the one whose mean and variance  equals the observed mean and variance.

​

​

​

sde2.png

​

Generating an Ansatz for mu and sigma:

​

Strategy 1:

Obviously I chose polynomials in the above equation as my candidate functions. You may choose any other form that pleases you. However, for efficiency and capacity, the candidate functions that I will choose are of the form of a multivariate polynomial (as above) but generated recursively:

sde3.png

​

The above Ansatz involves recursive polynomial multiplcations, thus computing the coefficients of the involved terms involves Cauchy products. The implementation of this algorithm is thus a little more difficult to carry out. There is another way to generate terms recursively which I call Strategy 2:

​

Strategy 2:

​

One can define a linear relationship among the candidate functions. This could be a very expensive way of doing calculations since the memory for storing each feature grows exponentially.

​

​

​

​

​

Determining the Mu term: For the time being lets ignore the stochastic term Sigma. In order to determine the mu term, we attempt to reproduce the data:

​

ode1.png

​

​

To do this, recall from ODE theory that the solution (With no stochastic term) is given by:

​

ode2.png

​

​

Where we used the Forward Euler approximation in the last line. Higher order Runge-Kutta approximations are of course possible. This motivates us to form the loss function:

​

ode3.png

​

​

The term X(t_{i+1}) is the exact solution at time t_{i+1} and the term in the parenthesis is the Forward-Euler approximation to the exact solution at time t_{i+1}. Notice that if we can make this loss function zero, then F_N will be approximately (to order Delta t) equal to Mu.

ode4.png

​

​

We use gradient descent to minimize the Loss function, specifically we use the Adam optimizer. In Tensorflow this is done with the commands:

        self.optimizer_Adam = tf.train.AdamOptimizer()
        self.train_op_Adam = self.optimizer_Adam.minimize(self.loss)

This is a great method to use when there are a relatively large number of weights and biases.

​

To prevent overfitting (and because it usually also makes physical sense) we will add L1 regularization to this loss function so that most of the weights in our Symbolic Neural network come out to zero (in the numerical round off sense). Thus we use:

ode5.png

​

The hyper parameter gamma is typically chosen to be a small number less than 1 but greater than zero. I find that anything around [0.000001, 0.001] works (at least for my applications).

​

Remark. The choice of norm is usually L1, L2, or Huber norm. Of course when doing regression with a Gaussian prior, maximizing the log likelihood is equivalent to minimizing the error with respect to the L2 norm.

​

That does it for the theory on determining non-stochastic ODE's from data. The research for stochastic DEs is much richer and I might hold off the Stochastic theory until later, when I talk about PDEs and the Forward/Backward equations.  Lets look at some examples:

​

Example 1: The ansatz for Strategy 1 using 1-layer network can be computed by hand and is given by:

ode6.png

​

Remark(Initializing): One can certainly use random weight initialization, however, I recommend also decreasing the absolute value of the weights as the number of layers increases. The physical meaning behind this initialization is that we believe the more complicated nonlinear terms in the higher layers are less likely to be part of the dynamics of the ODE or PDE data.

​

In many physics applications we already know some of the simple operations already involved such as advection and diffusion. These simple operators appear in the beginning layers. More complicated operators show up in the later layers and thus we expect them to have less weight on the dynamics of the data. Now lets look at some numerical examples.

​

​

Numerical Examples:

ode8.png
ode_figure.png
bottom of page