Predictors

Predictors are something that can be created separately from this package if desired, however, for convenience we provide a sample set of predictors as well as some convenience methods for creating new ones.

The sample set of predictors can be loaded in with

[2]:
import matplotlib.pyplot as plt
import LOTUS_regression.predictors as predictors

pred = predictors.load_data('pred_baseline_pwlt.csv')

print(pred[:5])
                enso     solar      qboA      qboB       aod  linear_pre  \
time
1979-01-01  0.545353  1.581821 -1.071557 -0.948523 -0.410788   -1.800000
1979-02-01  0.342144  1.636272 -0.835098 -0.887104 -0.411355   -1.791667
1979-03-01  0.027170  1.321658 -1.171239 -0.621723 -0.412482   -1.783333
1979-04-01  0.331984  1.133807 -1.232609 -0.554725 -0.415366   -1.775000
1979-05-01  0.423428  1.006939 -1.229580 -0.043544 -0.419356   -1.766667

            linear_post  constant
time
1979-01-01          0.0       1.0
1979-02-01          0.0       1.0
1979-03-01          0.0       1.0
1979-04-01          0.0       1.0
1979-05-01          0.0       1.0

The sample set contains

* ENSO from http://www.esrl.noaa.gov/psd/enso/mei/table.html
* Three orthogonal components of the QBO from https://acd-ext.gsfc.nasa.gov/Data_services/met/qbo/QBO_Singapore_Uvals_GSFC.txt calculated with PCA
* Solar f10.7 from https://spdf.gsfc.nasa.gov/pub/data/omni/low_res_omni/omni2_all_years.dat
* Piecewise linear terms with an inflection at 1997 that are expressed in per decade
* GloSSAC AOD from https://asdc.larc.nasa.gov/project/GloSSAC/GloSSAC_2.21

These predictors (with the exception of the linear terms) have been scaled to have mean 0 with standard deviation of 1. Each predictor begins in 1978 and ends at 2023-09. A constant is also included by default.

[3]:
pred.plot(figsize=(16,8))
plt.show()
../_images/examples_Predictors_4_0.png

Three other baseline predictor files are included, pred_baseline_eesc.csv, pred_baseline_ilt.csv, pred_baseline_ilt_continuous.csv. Predictors are similar between the four files with the exception of the linear terms.

pred_baseline_eesc.csv contains two orthogonal components of the EESC calculated using principal component analysis for different mean ages of air instead of the piecewise linear term.

pred_baseline_ilt.csv also contains two linear terms, the first is the “pre” term ending at 1997-01-01, and the second is the “post” term beginning at 2000-01-01. The difference is that multiple constant terms are included which allow the two time periods to be essentially independent of each other. This is in contrast with the PWLT predictors which force a common value at 1997

pred_baseline_ilt_continuous.csv is a modification of the previous ILT terms where continuity is enforced at the start of the pre period (1997-01-01) and the start of the post period (2000-01-01). This is done by modifying the pre and post constants to include a linear variation within the gap period.

Functions are also provided to download each individual predictor from its respective source

[4]:
import LOTUS_regression.predictors.download as download

enso = download.load_enso()

enso.plot(figsize=(18, 5))
plt.show()
../_images/examples_Predictors_7_0.png

One final tool included is the ability to create predictors modulated by seasonal components. This is useful if you are regressing to data that has not been deseasonalized.

[5]:
from LOTUS_regression.predictors.seasonal import add_seasonal_components

const_with_seasons = add_seasonal_components(pred, {'constant': 4})

const_with_seasons.plot(figsize=(18, 5))
plt.show()
../_images/examples_Predictors_9_0.png

Extra Predictors

Some additional predictors are included for testing purposes, these currently are tropopause pressure and the GISS AOD. The additional predictors can be loaded in with

[6]:
extra_predictors = predictors.load_data('pred_extra.csv')

and used to modify the standard predictors. E.g., we can change the GloSSAC AOD to the GISS AOD with

[7]:
pred['aod'] = extra_predictors['giss_aod']