Inverse problem in Geophysics

Magnetic field data captured by SWARM satellite. Downward continuation Robust estimation, iterated IRLS Overdetermined system. Conditioning numbers. The data consists of around 7500 observations, made in 2014, by a high-altitude satellite. The field data will be used to fit a model for the core surface boundary. This procedure includes downward continuation, which introduces instability and amplifies short wavelengths and observation noise. Thus regularization is required. We can start off with a simple inversion method, assuming gaussian noise and thus using L2-norm on the misfit, with L2-norm on the model. This is the classical Tikhonov (Ridge regression special case) method. Then we can try combinations of L1 norm and L2 norms on model and misfit. Around 97% of the magnetic field measured on Earth originates from internal sources. The core is too hot to sustain a permanent magnet, and instead much more difficult dynamics are at play, involving fluid dynamics along with the rotation of Earth. Note that this is a model for the radial field alone, where the magnetic field consists of also two angular directions. The radial magnetic field data can be seen below

Inv

As can be seen from the data, putting aside errors, the radial magnetic field of Earth is largely a dipole, as from a north-south bar magnet. The north pole is the magnetic south pole, and vice verca. That situation changes with time however, as the field reverses on large timescales. The left figure is a solution with almost no regularization, and the right figure is a solution with too much. The predicted field is squished towards zero, leaving the amplitudes smaller, and also create a strong bias beyond any physically reasonable solution.

From Maxwell's equations, the curl of the magnetic field is zero in a region free from current sources, or constant electric field, The model can be derived from some simple physical considerations and assumptions.

Inv

Magnetic measurements on the satellite are done with shielding from electronic, to shield the measurement device as much as possible from the electromagnetic noise from the satellite itself. For zero curl, the field can be written as gradient of a potential field,

inv

The magnetic field is divergence free,

BVE

And so we can arrive at Laplace's equations

BVE

The solution on the sphere with internal sources is given by a linear combination of spherical harmonics. Specifically, the radial field can be expressed as

BVE

BVE

The parameters of the model are the g's and h's. For a SH model of degree n = 20, without intercept, there will be 440 model parameters. About 20 degree SH model is reasonable for a model at the surface of Earth. Only internal sources are modelled, i.e magnetic disturbances from the ionosphere, the influence of the sun etc. is not modelled, but will be present as variance in the data. With up to 7561 data points we thus have an overdetermined system. For computational efficiency, only around 750 points will be used, at the cost of more uncertainty in the results. The sum of spherical harmonics, with radius coefficients, will make up the design matrix G, that is used in the inverse steps to find the model m, which consits of the g's and h's. The radius factor is seen with reference from the surface of earth, with a = 6371km, and while the data is collected at around 6800km by the sattelite, and also consideration of the field at the core-mantle boundary will be given, which is at a radius of 3480km. Thus the inverse problem uses both downward continuation and upward continuation. Downward continuation to the CMB is an unstable process, which gives weight to higher frequencies, as will be seen in some of the methods.

Best solution - IRLS L1L1 norm with L matrix

The best solution comes from using the L1 norm on misfit and L1 norm on the total magnetic flux at the core-mantle boundary The solution will be given by the IRLS algerithm, updating the W and Wm matrix at each step, using e.g Huber weights. The W matrix is a data weight matrix, which makes the solution robust to outliers.

BVE

The L-matrix is given by the discretisation of the integrated unsigned magnetic flux. The radial field at the CMB is actually just the predicted values Bcmb = Lm, so the L matrix is similar to the design matrix G, but at a different radius, longitude and latitude.

BVE

It's also possible to do regularisation on the integrated squared radial field. The solution for reasonable regularisation parameter is seen below

Snow
Forest

We can see that it captures the Earth surface field as we know it pretty well. The radial field is also largely dipolar at the CMB, with some large persistent reverse magnetic flux areas on the southern hemisphere, which is consistent with theory and the field at the surface. The geomagnetic power spectrum at the core-boundary surface for the chosen solution is

Snow
Forest

It's best to downward continue the Earth Surface field at to the core-mantle boundary, if there is no field generated in the mantle and the curst. (Lowes,1974) I.e no sources of Br in between. There are small but significant sources to the surface field, from magnetized crustal rocks. We can expect the crustal field to exist mostly of higher harmonics, due to its shallow origin. Early estimates showed the crustal field to mostly show itself from n = 25 and upwards, to e.g n = 500. We mostly expect the core field to be flat or slowly decreasing up to n = 10, at the core field, corresponding to nearly a 'white' spectrum from the sources at radius 0.47a, with maybe a slight increase for n = 2, due to a deeper dipole source at 0.35a The earth surface spectrum decreases up to between n = 14 to 20, with main contributions from the core field. The ES spectrum then stays nearly flat from n = 20. And the core field is greater than the crustal field to at least n = 11, while the crustal field likely has very low contribution for small n. According to (Langel & Estes 1982) we see to distinct slopes at Earth Surface magnetic field spectrum, with a distinct change at n = 14. The spectrum before that comes mostly from core field sources, and the spectrum after n = 15 from crustal sources. It is difficult to observe the core field after n = 14, because the crustal field obscures it. External sources include ionosphere and magnetosphere. The Earth Surface spectrum, the dipole term stands way higher than the rest, and a break near n = 14. The higher degrees are not due to noise in data, but carry actual meaningful physical information. The crustal field thus puts a limitation on our ability to estimate the core field, mostly for higher frequencies. The core spectrum should decrease slightly up to n = 14, and after that, increase fast. For n = 13 and 14, the crustal field still contributes a lot to the core spectrum.

The IRLS algorithm updates two weight matrices, W for misfit and Wm for generalized model norm. The convergence criteria is a relative change in model norms, within E-7. After 200 steps the algorithm seems to have converged.

To great relief the Lcurve and GCV actually give pretty similar results, they both suggest a parameter close to E-4.

Snow
Forest

Regularization is basically a tradeoff between variance and bias. A least squares solution is often an overfit to the data. The obtained model has high variance and thus is very sensitive to small changes in data. Some authors plot log scales, some use no log scales, and some authors pick the value visually. The L-curve basically measures the misfit against the modelnorm. It's one way of choosing which regularization parameter to choose. The optimal regularization parameter, in this context, would be the one that balances the two norms, and would found on the corner of the curve. Getting a good sharp curve can be difficult, so some authors they pick a regularization parameter visually that most seems to represent the point where a balanced tradeoff happens. Another way of choosing parameter is the GCV, which is a generalization and more efficient way of doing Leave-One-Out cross validation. Basically measuring how well the model can predict a data point, if you had deliberately taken it out of the training set. The optimal parameter minimizes the GCV, which means on average, with splitting up the training set, you would have the lower residual sum of squares for that particular regularisation parameter. The GCV uses a denominator which essentially counts the effective degree of freedom, which the regularisation parameter also affects.

L2L2 norm Ridge Regression

One of the earliest methods to in regularized regression, in reducing variance, is the Ridge regression. To reduce variance of predictions and model parameters, we can put a penalty on the sum of squared parameters, next to the penantly on misfit. It can be expressed in the lagrangian form as

BVE

The advantage to the Ridge regression is the simple analytical solution, which can be cleanly expressed, without having to use iterative methods

BVE

The predicted space data and CMB data for a particular regularisation parameter is chosen below.

Snow
Forest

The solution at satellite height matches the data well visually, at least. At CMB level, the solution makes no sense, with higher frequencies dominating. It looks physically unreasonable, knowing what the surface and satellite height field looks like. The CMB field should still largely be dominated by a dipolar mode.

BVE

The power spectrum for the first 10 degrees is not too bad, despite the very unphysical fit at CMB level. The rise for larger spherical harmonics is due to crustal field sources.

The Ridge regression is fast to compute, since we do not need to iterate and converge. We can try for many parameters, and over a very long range. For a large range, spanning 10^-5 to 10^5, a corner is seen somewhere around 10^3.

Snow
Forest

The L-curve is one method for parameter selection in regularized optimization problems. In its simplest form it shows the tradeoff between misfit and model norm, as the regularization parameter changes. As the regularization parameter increases, the model parameters decrease in size, and eventually m->0, and then Gm = Y -> 0, the predicted values will also approach zero. The misfit norm will approach ||d||_2, as the predicted values will be approach 0. Note that the different regularization methods use different terms on the axis, with appropiate weight matrices W,Wm included in the norms.

L2L1 norm Lasso regression

Putting L1 norm on model parameters gives the Lasso method. The Lasso method will do shrinkage as the Ridge regression, but also more directly feature selection, as it puts parameters to zero. The Lagrangian form can be expressed as

BVE

The solution for some parameter that is somewhat close to the optimal one given by Lcurve and GCV is seen below.

Snow
Forest

For high enough regularization parameter, the Lasso method does manage to find the final spherical harmonic mode as the dipole, but the field has also been diminished too much. A reversed magnetic flux patch does also get captured though on the southern hemisphere, which is very good. The same has been found by many researched models using a variety of techniques, so it's a good sign that there are certain features in the field that are reversed currently inside earth, especially in the southern hemisphere. It's also interesting that different regularisation norms captures those features.

BVE

The power spectrum gives too low values for higher frequencies, and too sharp a decline i.e almost discontinuity. This suggests that the regularization parameter is a bit too high, although the result is visually pleasing in terms of what is physically expected.

Snow
Forest

The L-curve shows parameters close to E4, while the GCV shows parameter close to E3. So pretty much we might look for solutions around that range. The L-curve has highest curvature for the first points, but this is rather unreliable, especially since the method uses iteration and the convergence criteria is picked arbitrarily small. In general it seems the optimal regularization parameter is greater for L1 compared to L2 norms. This makes sense, since L1 norm will be smaller in general, so a higher cost coefficient has to be used. In other words, the Wm matrix will make the sum of squared residuals smaller than the unweighted sum.

Home