New Computer Fund

Monday, October 22, 2012

Weird Autocorrelation?

Update:  Okay, found out that there was a copy paste error in the spread sheet.  Still an interesting situation, see below:

Since I had the mess up I may as well take this chance to explain what I am trying to do better.  The impact of solar variation is estimated at ~0.2C degrees depending on the source.  That impact is complicated since solar is absorbed at different depths, impacted by cloud cover, impacted by heat capacity variation due to internal variability, it is just basically a bitch to accurately isolate all the impacts of solar.  The screw up in the original post below tends to indicate that some more information on solar longer term impacts may be in the data.  The formula I am trying to find just uses the quadratic equation to determine distance traveled from point A to point B per data point and compares that distance with the distant traveled for point A to point C, D or whatever.  Since solar has impacts with differing time delays, by using point A to B for 6 months, point A to C for 11 years I should be able to isolate the solar signal a little better.  I know that FFT would do the same job or I could download R and used canned routines, but there should be a simple approach that is pretty effective with just the basic spread sheet program.  

Autocorrelation is tool for finding a repeating pattern in a time series or signal.  The Durban Watson Statistic, is a test for auto correlation that is used in most statistical analysis.  The DW compares a value at time t with a value at time t-something, typically 1, and divides [e(t)-e(t-1)]^2 by e(t)^2.  The e(t) is normally a residual, of the variance from the mean of the regression or signal that the statistic is being preformed on.  If the "noise" or residual is truly random, the sum of all the e(t)-e(t-1) would be zero or darn close.

Looking at the new HADCRUT4 and the Svalgaard solar data, I got a wild hair and decided to do a simple "cheat" autocorrelation anaysis.  The "cheat" is:


Just like the DW test, if x(t) and x(t-l) are truly random there will be no trend.  If the values of x(t) are greater than 1, the magnitude of the test value can fluctuate a good deal, but it is just a "cheat" test I am playing with.

 This is a test with just random numbers and no trend if the random number generator is truly random.

This is a test with a large trend added to the same random series.  This "cheat" test is not all that sensitive, but    it does the job and is quick in Openoffice.

Update:  In the chart above [x(t)*x(t-4)] not [x(t)-x(t-4)] was used, my bad.  So when I use the "cheat" with lag 3 on the HADCRUT4 data set and compare to the Svalgaard TSI, I get an interesting chart.  Notice that the 1910 to 1940 temperature rise is not evident in the chart and the cheat statistic appears to be inversely correlated to the longer term solar time series.  

I have no idea if the cheat is exceptional in any way or totally buggy, but that is pretty interesting.

I will have to brush up on my stats (arrgh!), but there is probably a more standard autocorrelation method that can fine tune that relationship.


This is the cheat with HADSST2 showing a pattern more like I was expecting with the lags and the ugly pre-1900 sh data.  

This is the [x(t)-x(t-4)] version of the CRUT4 data.  More on this in a moment.

This does a reasonable job, but is not much better than just plotting variance and I already know there is a reduction in variance, what I don't know is if that is an artifact of the crappy data or something that is related enough to solar to estimate impact.  

So to cut down on the noise what I done here is scaled the data before the operation.  I just found the minimum value, in the NH data series of course, and added that minimum to both the NH and SH data sets.  Solar appears to have a pretty good correlation, but there is not much more information than just eyeballing the curves.

So I am still stuck with this estimate of lags for solar forcing. There is no indication of a true lag in the tropics because of ENSO noise, but a hint in other latitudes.  Both highest latitudes are also so noisy that they are useless.

No comments:

Post a Comment