Overview

This report examines the UKgas dataset, which contains quarterly gas consumption in the United Kingdom. The goal is to understand the main time series patterns in the data, build a reasonable seasonal ARIMA model, compare a few plausible alternatives, and generate short-term forecasts.

The analysis follows a standard Box–Jenkins workflow: exploratory plotting, variance stabilization, differencing to achieve stationarity, ACF/PACF-based model identification, candidate model comparison, residual diagnostics, and forecasting.

Dataset

UKgas is included in base R’s datasets package. It is a quarterly series, so any seasonal model should account for a repeating pattern every 4 periods.

Exploratory Analysis

Initial Time Series Plot

plot(UKgas,
     main = "UK Gas Consumption Over Time",
     ylab = "Gas Consumption",
     xlab = "Year")

The raw series shows three features immediately. First, there is a strong long-run upward trend. Second, the within-year pattern is pronounced, which points to strong quarterly seasonality. Third, the seasonal swings become larger over time, suggesting that the variance increases with the level of the series.

Together, these patterns indicate that the original series is not stationary and that a transformation is likely needed before modeling.

Log Transformation

log_gas <- log(UKgas)
plot(log_gas,
     main = "Log-Transformed UK Gas Consumption",
     ylab = "log(UKgas)",
     xlab = "Year")

A logarithmic transformation is a natural next step because it stabilizes the changing amplitude in the seasonal pattern. After transformation, the seasonal variation becomes more even across time, which makes the series more suitable for ARIMA-type modeling. The trend and seasonality are still present, so differencing is still required.

Transformation and Stationarity

Regular and Seasonal Differencing

diff_log_gas <- diff(log_gas)
plot(diff_log_gas,
     main = "First Difference of log(UKgas)",
     ylab = "Differenced log(UKgas)",
     xlab = "Year")

diff_seasonal_log <- diff(log_gas, lag = 4)
plot(diff_seasonal_log,
     main = "Seasonal Difference of log(UKgas) at Lag 4",
     ylab = "Seasonally Differenced log(UKgas)",
     xlab = "Year")

diff_both <- diff(diff(log_gas, lag = 4))
plot(diff_both,
     main = "Combined Regular and Seasonal Differencing",
     ylab = "Differenced log(UKgas)",
     xlab = "Year")

The first difference reduces the long-run trend, but visible seasonality remains. Seasonal differencing at lag 4 removes most of the quarterly structure. Applying both transformations produces a series that fluctuates around a roughly constant mean with much more stable variability.

The transformation used for modeling is:

\[ (1-B)(1-B^4)\log(UKgas) \]

where \(B\) is the backshift operator. This transformed series appears plausibly stationary and is the series used for model identification.

ACF and PACF

par(mfrow = c(1, 2))
acf(diff_both, main = "ACF of Differenced log(UKgas)")
pacf(diff_both, main = "PACF of Differenced log(UKgas)")

par(mfrow = c(1, 1))

The ACF shows a strong negative spike at lag 1 and a noticeable seasonal feature at lag 4. The PACF does not show a clean cutoff but instead tails off across the early lags. That pattern is more consistent with moving-average behavior than with a strong autoregressive cutoff.

Based on these plots, it is reasonable to start with models that include a non-seasonal MA(1) term and either a seasonal MA(1) or seasonal AR(1) term.

Model Identification

Automated Reference Model

auto_model <- auto.arima(
  log_gas,
  seasonal = TRUE,
  stepwise = FALSE,
  approximation = FALSE
)
auto_model
## Series: log_gas 
## ARIMA(3,0,1)(0,1,0)[4] with drift 
## 
## Coefficients:
##           ar1      ar2     ar3     ma1   drift
##       -0.9753  -0.1429  0.1421  0.8809  0.0164
## s.e.   0.1260   0.1380  0.1056  0.0898  0.0023
## 
## sigma^2 = 0.01029:  log likelihood = 92.66
## AIC=-173.32   AICc=-172.45   BIC=-157.45

auto.arima() serves as a useful benchmark, but it is not the final word. It chooses a model by optimizing an information criterion over many possibilities, while the manual candidate models below are guided by the differenced series, the ACF/PACF structure, and model interpretability.

Candidate Model 1: SARIMA(0,1,1)(0,1,1)[4]

model1 <- capture.output(
  fit1 <- sarima(log_gas, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 4)
)

print(model1[grep("Coefficients", model1):length(model1)])
## [1] "Coefficients: "                                            
## [2] "     Estimate     SE  t.value p.value"                     
## [3] "ma1   -0.9192 0.0455 -20.1994  0.0000"                     
## [4] "sma1  -0.2353 0.1028  -2.2891  0.0242"                     
## [5] ""                                                          
## [6] "sigma^2 estimated as 0.01097285 on 101 degrees of freedom "
## [7] " "                                                         
## [8] "AIC = -1.592327  AICc = -1.591161  BIC = -1.515587 "       
## [9] " "

The fitted model is:

\[ (1-B)(1-B^4)\log(UKgas_t) = (1 - 0.9192B)(1 - 0.2353B^4)\varepsilon_t \]

This is the most classical specification for this series. It aligns closely with the ACF/PACF evidence and keeps the model fairly compact. Both fitted parameters are statistically significant, and the diagnostic plots suggest that most of the serial dependence has been removed.

Candidate Model 2: SARIMA(1,1,1)(0,1,1)[4]

model2 <- capture.output(
  fit2 <- sarima(log_gas, p = 1, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 4)
)

print(model2[grep("Coefficients", model2):length(model2)])
##  [1] "Coefficients: "                                            
##  [2] "     Estimate     SE  t.value p.value"                     
##  [3] "ar1   -0.2032 0.1052  -1.9314  0.0563"                     
##  [4] "ma1   -0.8868 0.0524 -16.9267  0.0000"                     
##  [5] "sma1  -0.2027 0.1042  -1.9443  0.0547"                     
##  [6] ""                                                          
##  [7] "sigma^2 estimated as 0.01061266 on 100 degrees of freedom "
##  [8] " "                                                         
##  [9] "AIC = -1.607326  AICc = -1.604972  BIC = -1.505006 "       
## [10] " "

The fitted model is:

\[ (1 + 0.2032B)(1-B)(1-B^4)\log(UKgas_t) = (1 - 0.8868B)(1 - 0.2027B^4)\varepsilon_t \]

This model adds a non-seasonal AR(1) term. It achieves the lowest AIC among the three candidates, so it deserves serious consideration. At the same time, the AR(1) and seasonal MA(1) terms are only marginally significant, which weakens the case for the extra complexity.

Candidate Model 3: SARIMA(0,1,1)(1,1,0)[4]

model3 <- capture.output(
  fit3 <- sarima(log_gas, p = 0, d = 1, q = 1, P = 1, D = 1, Q = 0, S = 4)
)

print(model3[grep("Coefficients", model3):length(model3)])
## [1] "Coefficients: "                                            
## [2] "     Estimate     SE  t.value p.value"                     
## [3] "ma1   -0.9260 0.0446 -20.7497  0.0000"                     
## [4] "sar1  -0.2278 0.0991  -2.2980  0.0236"                     
## [5] ""                                                          
## [6] "sigma^2 estimated as 0.01097152 on 101 degrees of freedom "
## [7] " "                                                         
## [8] "AIC = -1.592337  AICc = -1.591172  BIC = -1.515597 "       
## [9] " "

The fitted model is:

\[ (1-B)(1-B^4)\log(UKgas_t) = (1 - 0.9260B)(1 - 0.2278B^4)\varepsilon_t \]

This model replaces the seasonal MA term with a seasonal AR term. Its overall fit is very close to Model 1, and both parameters are significant. It is a reasonable alternative, though it does not clearly outperform the simpler seasonal MA specification.

Model Diagnostics and Comparison

Residual Diagnostics

Across the candidate models, the standardized residuals fluctuate around zero without an obvious trend or structure. The residual ACF plots show that most autocorrelations fall within the confidence bounds, and the Ljung–Box p-values are generally acceptable at moderate lags. The Q–Q plots also suggest approximate normality, with only minor deviations in the tails.

That combination of results indicates that all three models are broadly workable, but the choice should still depend on fit, parsimony, and coefficient stability.

Model Comparison

Model 2 has the lowest AIC, which means it provides the best fit by that metric alone. However, the gain over Model 1 is modest, and two of its parameters are only marginally significant. Model 3 performs almost identically to Model 1 in terms of AIC but is not clearly simpler or more interpretable.

For a portfolio report, the main point is not to claim a single universally correct model, but to show a defensible selection process. Here, the simplest strong choice is Model 1.

Final Selected Model

The final model selected for forecasting is:

\[ \text{SARIMA}(0,1,1)(0,1,1)_{4} \]

This model is preferred because it is parsimonious, statistically well-supported, and diagnostically adequate. It captures both short-term dependence and quarterly seasonal structure without adding borderline terms.

Forecasting

Forecast Generation

forecast_result <- sarima.for(
  log_gas,
  n.ahead = 12,
  p = 0, d = 1, q = 1,
  P = 0, D = 1, Q = 1,
  S = 4,
  ylab = "Log UK Gas Consumption",
  main = "12-Quarter Forecast from SARIMA(0,1,1)(0,1,1)[4]"
)

The selected model is used to forecast the next 12 quarters, or roughly three years ahead. The forecast plot includes the point forecasts together with 95% prediction intervals.

Forecast Interpretation

The forecast preserves the key structure seen in the historical data: an overall upward level together with a repeating quarterly seasonal pattern. The prediction intervals widen as the forecast horizon increases, which is expected because uncertainty accumulates the further ahead the model projects.

Because the model was fitted on the log scale, the forecast should be interpreted as describing multiplicative growth and seasonality in the original series. For short-horizon forecasting, the model appears to provide a reasonable and interpretable summary of the series dynamics.

Key Takeaways

  • UK gas consumption shows strong long-run growth and strong quarterly seasonality.
  • A log transformation is helpful because it stabilizes the variance.
  • Both regular and seasonal differencing are needed to obtain a plausibly stationary working series.
  • A small set of SARIMA models captures the main dependence structure well.
  • The final selected model, SARIMA(0,1,1)(0,1,1)[4], balances fit, simplicity, and interpretability.

Conclusion

This analysis shows how a seasonal time series can be modeled using a practical Box–Jenkins workflow. Starting from the raw UK gas consumption data, the report moved through transformation, differencing, model identification, diagnostic checking, and forecasting.

The final result is a compact SARIMA model that captures both the short-run and seasonal structure of the series and produces credible short-term forecasts. From a portfolio perspective, the project demonstrates a full time series workflow rather than just a final answer: diagnosing non-stationarity, comparing plausible alternatives, and selecting a model for defensible reasons.