Regression

LOTUS_regression.regression.mzm_regression(X, Y, sigma=None, tolerance=0.01, max_iter=50, do_autocorrelation=True, do_heteroscedasticity=False, extra_heteroscedasticity=None, heteroscedasticity_merged_flag=None, seasonal_harmonics=(3, 4, 6, 12), constrain_ilt_gap=False, ilt_predictor_index_dict=None)[source]

Performs the regression for a single bin.

Parameters:
  • X (np.ndarray) – (nsamples, npredictors) Array of predictor basis functions

  • Y (np.array) – (nsamples) Observations

  • sigma (np.array) – (nsamples) Square root of the diagonal elements of the covariance matrix for Y

  • tolerance (float, optional) – Iterations stop when the relative difference in the AR1 coefficient is less than this threshold. Default 1e-2

  • max_iter (int, optional) – Maximum number of iterations to perform. Default 50

  • do_autocorrelation (bool, optional) – If true, do the AR1 autocorrelation correction on the covariance matrix. Default True.

  • do_heteroscedasticity (bool, optional) – If True, do the heteroscedasticity correction on the covariance matrix. Default False.

  • extra_heteroscedasticity (np.ndarray, optional) – (nsamples, nextrapredictors) Extra predictor functions to use in the heteroscedasticity correction.

  • heteroscedasticity_merged_flag (np.ndarray, optional) – (nsamples) A flag indicating time periods that should be treated independently in the heteroscedasticity correction. E.g. this could be something like [0, 0, 0, 0, 2, 2, 2, 1, 1, 1] which would create 3 independant time periods where the heteroscedasticity correction is applied.

  • seasonal_harmonics (Iterable Float, optional. Default (3, 4, 6, 12)) – The monthly harmonics to use in the heteroscedasticity correction.

  • constrain_ilt_gap (bool, optional. Default False.) – If true then a constraint is added so that the ILT terms in the gap period enforce continuity. This must be set in conjunction with ilt_predictor_index_dict

  • ilt_predictor_index_dict (dict, optional. Default None.) – If using constrain_ilt_gap, this must be a dictionary {predictor_name: index_in X} which contains the indicies of the predictors ‘gap_cons’, ‘pre_const’, ‘post_const’, ‘gap_linear’, ‘linear_pre’, ‘linear_post’

Returns:

results – a dictionary of outputs with keys: gls_results

The raw regression output. This is an instance of RegressionResults which is documented at http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.html

residual

Residuals of the fit in the original coordinate system.

transformed_residuals

Residuals of the fit in the GLS transformed coordinates.

autocorrelation

The AR1 correlation constant.

numiter

Number of iterative steps performed.

covariance

Calculated covariance of Y that is input to the GLS model.

Return type:

dict

LOTUS_regression.regression.regress_all_bins(predictors, mzm_data, time_field='time', debug=False, sigma=None, post_fit_trend_start=None, include_monthly_fits=False, return_raw_results=False, constrain_ilt_gap=False, **kwargs)[source]

Performs the regression for a dataset in all bins.

Parameters:
  • predictors (pd.Dataframe) – Dataframe of predictors to use in the regression. Index should be a time field.

  • mzm_data (xr.DataArray) – DataArray containing the monthly zonal mean data in a variety of bins. The data should be three dimensional with one dimension representing time. The other two dimensions are typically latitude and a vertical coordinate.

  • time_field (string) – Name of the time field in the mzm_data structure

  • sigma (xr.DataArray, optional. Default None) – If not None then the regression is weighted as if sigma is the standard deviation of mzm_data. Should be in the same format as mzm_data.

  • post_fit_trend_start (datetimelike, optional. Default None) – If set to a datetime like object (example: ‘2000-01-01’) then a linear trend is post fit to the residuals with the specified start date. If this is set you should not include a linear term in the predictors or the results will not be valid

  • constrain_ilt_gap (bool, optional. Default False) – If True then a constraint is added to the regression so that the fit terms in the gap period maintain continunity when doing ILT trends. The predictors must have keys ‘gap_cons’, ‘post_const’, ‘pre_const’, ‘linear_pre’, ‘linear_post’, and ‘gap_linear’ for this option to work. This is the standard case when using the ‘predictors_baseline_ilt_linear_gap’ file.

  • kwargs – Other arguments passed to mzm_regression

Return type:

xr.Dataset