 Original article
 Open access
 Published:
The Dmodel for GDP nowcasting
Swiss Journal of Economics and Statistics volume 159, Article number: 7 (2023)
Abstract
The paper provides a disaggregated mixedfrequency framework for the estimation of GDP. The GDP is disaggregated into components that can be forecasted based on information available at higher sampling frequency, i.e., monthly, weekly, or daily. The model framework is applied for Greek GDP nowcasting. The results provide evidence that the more accurate nowcasting estimations require (i) the disaggregation of GDP, (ii) the use of a multilayer mixedfrequency framework, and (iii) the inclusion of financial information on a daily frequency. The simulation study provides evidence in favor of the disaggregation into components despite the inclusion of multiple sources of forecast errors.
1 Introduction
We investigate whether the use of a disaggregated multilayer mixedfrequency framework improves the Greek GDP nowcasting performance. In the disaggregated approach, we nowcast each GDP component separately and aggregate them to obtain a GDP nowcast. This is the first paper to introduce the idea to combine a model for mixedfrequency data with a multilayer strategy.
In the first layer, we estimate a MIDAS regression for each GDP component, in which the dependent variable (e.g., growth in private consumption) is observed on a quarterly basis while the explanatory variables are observed on a monthly frequency. However, some of the explanatory monthly variables are published with a lag of several months, resulting in the unavailability of their most recent values. So, in the second layer of the model, we estimate the unavailable values of the monthly variables based on the information that is available at a higher frequency (i.e., daily frequency). For example, asset prices (e.g., stock prices) which are observed on a daily basis could provide information that is not incorporated in a monthly economic index (i.e., consumer confidence). Overall, we apply the proposed novel framework on a variety of economic and financial data (hard and soft data, mostly domestic) to nowcast Greek GDP and evaluate its nowcasting performance over the period 2005Q1–2020Q3.
Moreover, we provide empirical and simulated evidence that more accurate nowcasting estimations require the use of a disaggregated multilayer mixedfrequency framework. First, we show that the nowcasting ability of the AR(1) used as naive model is not better, if and only if a sophisticated model framework is defined. Second, the disaggregation into components reduces the nowcasting error despite the inclusion of multiple sources of nowcasting errors.
The proposed model framework, named Dmodel,^{Footnote 1} is very relevant for practitioners and policymakers who need to get informed accurately and in real time of the current state of the economy under investigation.
The rest of the paper is structured in a wise manner: Sect. 2 provides a literature review on GDP nowcasting, Sect. 3 presents the Dmodel’s construction, and Sect. 4 describes the dataset. Section 5 presents in detail the model specifications for the Greek GDP nowcasting. In Sect. 6, we proceed to a number of additional model extensions for robustness purposes, and in Sect. 7 we estimate nowcasts from naïve models in order to have a reference point in the evaluation of the nowcasting performance, which is illustrated in Sect. 8. Section 9 presents Monte Carlo simulations which provide evidence in favor of the disaggregation into components and, finally, Sect. 10 presents the conclusions.
2 Literature review
Dynamic factor models (DFMs) and bridge models (BMs) are the most pop2004ular tools in shortterm forecasting on real activity variables, such as the GDP growth. Bridge equations for forecasting GDP have been studied by Baffigi et al. (2004) and Diron (2008), among others. Barhoumi et al. (2008) study factor models for ten European countries and the euro area as a whole, concluding in their interior performance compared to averages of traditional bridge equations. Factor models for forecasting GDP, also have been applied by Marcellino et al. (2003) for euroarea data, Artis et al. (2005) for the UK, Den Reijer (2005) for the Netherlands, Duarte and Rua (2007) for Portugal, Schumacher (2007) for Germany, and Van Nieuwenhuyze (2005) for Belgium.
Both types of models come with their advantages and flaws. BMs are characterized by two empirical limitations. Firstly, the monthly series must be sufficiently long to guarantee the precision of the estimates. Secondly, it is not possible to include a large number of variables, because of the risk of multicollinearity and losses of degrees of freedom. On the other hand, DFMs are presented as a less restrictive alternative tool compared to BMs, especially for shortterm forecasting of GDP growth (see, inter alia, Angelini et al., 2008; Bańbura and Rünstler, 2011). A wider set of collinear monthly indicators is parsimoniously summarized with only a few common factors, making the projection possible and the number of parameters limited.
In Appendix A, we provide an overview of selected papers dealing with shortterm GDP forecasting techniques. A set of interesting conclusions can be derived:

i.
There is a controversy about which of the two competitive frameworks (DFMs vs. BMs) has the best forecasting accuracy. There is a forecasting debate between DFMs and BMs and the evaluation of several forecasting error measures does not provide a clear view in favor of DFMs.

ii.
According to empirical findings, the performance of the DFMs compared to a set of benchmarking models (random walk and autoregressive models) is clearly better, as their evaluation with several forecasting error measures, provides evidence in favor of their superiority. The same holds true for Stakénas (2012) who provides evidence of factor models’ performance superiority compared to naïve benchmark models.
Interesting conclusions driven from studies that worth mentioning and focus mainly on the euro area countries, in favor of DFM, can be summed up as follows: Angelini et al. (2008) who estimate a DFM for the euroarea economy and find that for GDP and a number of components, factor model forecasts beat the forecasts from alternative models such as quarterly models and bridge equations. Again, in a follow up paper Angelini et al. (2011), using euroarea data, provide evidence that factor model improves upon the pool of bridge equations. Also, Barhoumi et al. (2008) maintain that for the euroarea countries, factor models which exploit a large number of releases, do generally better than averages of bridge equations. Likewise, according to Bańbura and Rünstler (2011) once more for the euroarea economy, highlight the importance of survey data on both forecast weights and forecast precision measures, the moment that real activity data obtain rather low weights, apart perhaps from the backcasts. Financial data provide complementary information to both real activity and survey data for nowcasts and onequarterahead forecasts of GDP.
However, apart from those two popular tools, another framework has made a dynamic appearance, the mixed sampling frequency modeling framework that has recently been incorporated into the GDP nowcasting literature. Ghysels et al. (2006) and Andreou et al., (2010, 2013) propose Midas (mixeddata sampling) model when one desires to relate a dependent variable (i.e., the quarterly GDP) with explanatory variables sampled in higher sampling frequency (i.e., monthly or weekly data).
Chernis and Sekkel (2017) estimate DFM, BM as well as Midas models for nowcasting the Canadian gross domestic product. They compare the average of the Midas predictions against the forecast of the DFM (by underweighting the poor performing variables) and conclude that the DFM outperforms its competitors. Clements and Galvao (2009), on the other hand, forecast US growth with Midas models and provide important findings regarding the outperformance of Midas framework in exploiting information from the leading indicators. Marcellino and Schumacher (2010) introduce a FactorMidas approach to nowcast and forecast quarterly German GDP growth. They find that the most parsimonious Midas projection is the best performing overall. Kuzin et al. (2011) compare the forecasting ability of Midas and MFVAR in forecasting euroarea quarterly GDP and find that Midas tends to perform better for shorter horizons, and MFVAR for longer horizons. Jansen et al. (2016) evaluate the predictive ability of almost all the available statistical models (i.e., VAR, Bayesian VAR, mixedfrequency VAR (MFVAR), DFM, BM, Midas) in predicting GDP for the euro area, Germany, France, Italy, Spain, and the Netherlands. They conclude that the dynamic factor model is the best model overall due to its ability to incorporate more information.
Furthermore, Kim and Swanson (2018) apply FactorMidas models for nowcasting and forecasting the Korean GDP. In their forecasting exercise, models with one or two factors are the best for all forecasting horizons, whereas in backcasting and nowcasting horizons, models with more factors are preferred. They also notice that as forecast horizon gets shorter (i.e., move from forecast → nowcast → backcast), the AR and RW models perform better. Also, Andreou et al. (2013) use FactorMidas to examine the usefulness of daily financial data to forecast macroeconomic series. Foroni and Marcellino (2014) nowcast the quarterly growth rate of the euroarea GDP and conclude that the Midas model outperforms MFVAR at most forecasting horizons. Additionally, they investigate the potential usefulness of disaggregating the information contained in the components of GDP for nowcasting total GDP growth. In their concluding section they state “…findings for the aggregated nowcasts are promising, meaning that there is scope for forecasting the single components to shed light on the total GDP measure.”
Finally, a comparison of Midas and bridge equation models for the euroarea GDP growth is provided by Schumacher (2016). Schumacher estimates Midas models with different specifications for the lag polynomials: exponential Almon, multiplicative, unrestricted, etc. Results favor the most parsimonious specifications, with only a few AR and indicator lags. Midas tends to outperform bridge equations noticing, however, that results depend on the particular dataset and the sample chosen.
3 Model description
3.1 GDP disaggregation into components
The proposed model aims to estimate GDP in constant prices according to the fixedbased approach, defined as \(\left({Y}_{q}^{\left(0\right)}\right)\), by nowcasting the components of GDP from the expenditure side. Hence, we define one model for each one of the components: private consumption of goods and services, \(\left({Y}_{q}^{\left(1\right)}\right)\), government spending on public goods and services, \(\left({Y}_{q}^{\left(2\right)}\right)\), investment in business capital goods, \(\left({Y}_{q}^{\left(3\right)}\right)\), exports of goods \(\left({Y}_{q}^{\left(4\right)}\right)\), exports of services \(\left({Y}_{q}^{\left(5\right)}\right)\), imports of goods, \(\left({Y}_{q}^{\left(6\right)}\right)\), imports of services, \(\left({Y}_{q}^{\left(7\right)}\right)\) and changes in inventories, \(\left({Y}_{q}^{\left(8\right)}\right)\). Naturally:
3.2 Mixed sampling frequency framework
Let us denote as \({y}_{q}^{\left(k\right)}=log\left({Y}_{q}^{\left(k\right)}/{Y}_{q1}^{\left(k\right)}\right)\) the qoq growth rate. We construct the Midas regression in order to extract the information that is available at higher sampling frequencies. The dependent variable is observed on a quarterly basis, but explanatory variables are available at a higher frequency, i.e., on a monthly basis. However, there is information that is available at an even higher sampling frequency such as the asset prices from financial markets. So, what we suggest is the construction of a multilayer model framework, in which the different sampling frequencies are defined as layers. The rationality behind the multilayer model framework is to estimate the missing information at a lower frequency based on the information that is available at a higher frequency. For example, the price of the stock index which is observed on a daily basis could provide information that is not incorporated in the index of economic climate, which is observed monthly. In turn, the index of economic climate may provide information for the GDP component investment on business capital goods which is observed quarterly. Hence, the proposed model framework is estimated in the form:
where the error term is defined as \({\varepsilon }_{q}\sim N\left(0,{\sigma }_{{\varepsilon }_{q}}^{2}\right)\).^{Footnote 2} The \({{\varvec{X}}}_{\left(m\right)}\) denotes the vector of variables observed at a monthly frequency. The \({\beta }_{0}\) is a coefficient, \({{\varvec{\theta}}}_{j}\) is a vector of coefficients to be estimated, \(p\) is the Almon polynomial order, \(\kappa\) is the number of lagged months to employ, and \(s=3\) denotes the number of months of each quarter. The \(i\) term determines the time that the information set is available as well as the capacity of the model to estimate predictions without imposing a look ahead bias. For example, if we set \(i=0\), we are able to nowcast the GDP component, i.e., \({y}_{q\backslash q}^{\left(k\right)}\). If we set \(i\ge 1\) and \(is\ge 3\), then we are able to estimate onequarter ahead \({y}_{q+1\backslash q}^{\left(k\right)}\), etc.
3.3 Multilayer framework
Using the same rational, we create the next layer that represents the estimation of the nonavailable values of the variables at a monthly frequency. Let us assume that in the 1st layer, where the dependent variable is at a quarterly frequency, the explanatory monthly variable is an index of retail sales which is published with a lag of 2 months. The most recent value of the monthly variable is not available, so it must be estimated based on information that is available. Hence, for the values of the monthly variables that are not available, we have to define the 2nd layer of our model:
where \({\varepsilon }_{m}\sim N\left(0,{\sigma }_{{\varepsilon }_{m}}^{2}\right)\). The \({\gamma }_{0}\) is a coefficient, \({\varvec{\gamma}}\) and \({\boldsymbol{\varphi }}_{j}\) are vectors of coefficients to be estimated, \(q\) is the polynomial order, \(l\) denotes the number of lagged days and \(s=22\) denotes the number of trading days of each quarter. The \({\widetilde{{\varvec{X}}}}_{m}\) denotes the vector of variables observed at monthly frequencies and provide explanatory power for the \({x}_{m}\). The \({{\varvec{Z}}}_{\left(d\right)}\) denotes the vector of variables observed on a daily frequency. Regarding the \(i\) term, for \(i=0\), we estimate \({x}_{m\backslash m}\), while for \(i\ge 1\) and \(is\ge 22\), we estimate the onemonth ahead \({x}_{m+1\backslash m}\), and for \(i\ge 2\) and \(is\ge 44\), we estimate the two months ahead \({x}_{m+2\backslash m} ,\) and so on.
3.4 Nowcasting error correction
We assume that the GDP nowcasts contain a forecast error with an autocorrelated structure. Possible sources of the autocorrelated structure of the forecast error could be (i) the multiple revisions of the figures, (ii) the construction of GDP as a summation of its components which are also revised frequently, (iii) the inclusion of multiple sources of forecast errors; as the computation of the nowcast requires the summation of multiple nowcast values. Thus, we propose a shortterm forecast error structure, around the longterm structure, \({Y}_{q1}^{\left(0\right)}=\mu +{\beta }_{1}{Y}_{q1\backslash q}^{\left(0\right)}\):
for \({u}_{q}\sim N\left(0,{\sigma }_{u}^{2}\right)\), where \({Y}_{q}^{\left(0\right)}\) denotes the published GDP for quarter q and \({Y}_{qj\backslash q}^{\left(0\right)}\) is the nowcasted value of GDP of quarter qj based on the information that is available up to the most recent quarter, i.e., q.
4 Data description
The handling of exogenous variables in nowcasting models should be done very carefully. The usual practice of creating a sandbox, which encloses all the variables that we have managed to collect, is not the most appropriate. Schumacher (2010) and Boivin and Ng (2006) note that only a careful preselection of predictors helps in exploiting the additional information from large and heterogeneous data. Thus, more data are not always better for nowcasting or for forecasting. Moreover, Boivin and Ng (2006) show that the sample size of the dataset has only a minor effect on the estimation. In our case, the dataset has been constructed based on the economic intuition, the current state of the literature and the availability of data in a continuous format for the adequate time frame. As first noted by Stock and Watson (2002a), the appropriate transformations of the data must be applied, so natural logarithms were taken for the majority of the variables (except, i.e., for interest rates) and stationarity was obtained by appropriately differentiating time series. When there were evident any scale effects, the variables were standardized to have a zero mean and unit sample variance. The vast majority of our variables, that were available in levels, were standardized and deseasonalized, i.e., for \({x}_{i,t}\) denoting the \({i}^{th}\) variable for month \(t\); the deseasonalized and standardized variables are: \(\tilde{x}_{i,t} = \left( {x_{i,t}^{*}  \overline{x}_{i}^{*} } \right)/\sqrt {V\left( {x_{i}^{*} } \right)}\), where \(\overline{x}_{i}^{*}\) and \(V\left( {x_{i}^{*} } \right)\) are the mean and variance estimates of the deseasonalized ith variable, \({x}_{i,t}^{*}\), respectively.
The sample runs from January 2002 up to December 2020. Regarding the quarterly frequency, the data are available up to the 3rd quarter of 2020. The sample size is dictated by the availability of data. The outofsample evaluation period runs from 2005Q1 up to 2020Q3 and, despite its short length, includes both normal times and crisis period, i.e., Greek sovereign crisis. We use the recursive estimation scheme due to the small sample size, as the alternative approach of the rolling scheme, requires the use of a fixed window in order to reestimate the parameters.
Also, we highlight the raggededge data problem. Let us consider that the consumer price index of previous month is released early in the current month, whereas the producer price index is released in the middle of the month. In between these releases, new vintages of GDP may be released. This is called the raggededge data problem. Kim and Swanson (2018), among others, have suggested the vertical alignment and the autoregressive interpolation for the missing values. In our proposed model framework, any variable is considered as observed after being published. But in the case that a variable is not available at the time when we want to proceed to nowcasting, then the method of estimating any missing values is defined explicitly.^{Footnote 3}
Regarding the Greek economy, the quarterly datasets are the GDP and its components, private consumption on goods and services, government spending on public goods and services, investment on business capital goods, exports of goods, exports of services, imports of goods, imports of services. Table 1 presents the data that we have used as explanatory variables. In the Dmodel we did evaluate almost the entire available dataset, but the variables that were finally incorporated, at a monthly frequency, are, the HICP, loans to private sector, loans to firms, financial conditions Index, the economic sentiment indicator, the purchasing managers' index, the interest rate on new loans, capital goods other than transport equipment—CAPG1, capital goods parts and accessories—CAPG2, the retail trade volume index, the retail trade turnover index, services, the confidence indicator, the consumer confidence indicator, the retail confidence indicator, the employment expectation index, the total volume of retail sales, the volume of retail sales excluding fuel, new private passenger car registrations, price expectations over next 3 months, valueadded tax, deposits of households (in flows), and credit to households (in flows). Also, from the balance of payments, we collect the importation of goods, importation of fuels, importation of vessels, importation of other services, travel receipts, transportation receipts, and other receipts. On a daily frequency, the incorporated variables are: the Athens stock exchange main general index, and the 10year Greek government bond yield.
5 Model specifications for the Greek GDP
In Sect. 3, we described the proposed disaggregated mixedfrequency framework for the estimation of GDP. In this section, we will present in detail the model framework for the components of GDP.
5.1 Private consumption on goods and services
Private consumption of goods and services, \({Y}_{q}^{\left(1\right)}\), is highly related to the retail trade volume index, \({x}_{m}^{\left(1\right)}\), and the retail turnover volume index, \({x}_{m}^{\left(2\right)}\). However, these indices are published by the Hellenic statistical authority with a publication lag of three months. Specifically, the indices for any month m (i.e., June, 2020) are published the last day of month m + 2 (i.e., 31st of August, 2020); therefore, we are forced to consider a publication lag of three months. Additionally, a wide information set has been constructed that includes variables such as the services confidence indicator, \({ x}_{m}^{\left(3\right)}\), the consumer confidence indicator, \({ x}_{m}^{\left(4\right)}\), the retail confidence indicator, \({x}_{m}^{\left(5\right)}\), the economic sentiment indicator, \({x}_{m}^{\left(6\right)}\), the employment expectation index, \({ x}_{m}^{\left(7\right)}\), the total volume of retail sales, \({ x}_{m}^{\left(8\right)}\), the volume of retail sales excluding fuel, \({ x}_{m}^{\left(9\right)}\), new private passenger car registrations, \({ x}_{m}^{\left(10\right)}\), price expectations over the next 3 months, \({ x}_{m}^{\left(11\right)}\), HICP, \({x}_{m}^{\left(12\right)}\), valueadded taxation, \({ x}_{m}^{\left(13\right)}\), deposits of households (flows), \({x}_{m}^{\left(14\right)}\), credit to households (flows), \({ x}_{m}^{\left(15\right)}\), etc.^{Footnote 4} The publication of the information over time is visualized in Table 2.
But, a preliminary analysis, as presented in Fig. 1, which presents the scatterplot of the aforementioned variables, enhances our belief that \({x}_{m}^{\left(1\right)}\) and \({x}_{m}^{\left(2\right)}\) can provide accurate information for the nowcast values of \({Y}_{q}^{\left(1\right)}\). Finally, due to the publication lag of three months, we define a 2nd layer according to Eq. 3 where the nonpublished values of \({x}_{m}^{\left(1\right)}\) and \({x}_{m}^{\left(2\right)}\) variables are estimated based on the vector \(\tilde{\user2{X}}_{m}^{\prime } = \left[ {x_{m}^{\left( 3 \right)} \ldots x_{m}^{\left( 7 \right)} } \right]^{\prime }\), published with a lag of one month.
Hence, the nowcasting of private consumption on goods and services, \({y}_{q}^{\left(1\right)}\), is based on the explanatory power of the retail trade volume index and retail turnover volume index that are published monthly, i.e., \({\varvec{X}}_{\left( m \right)} = \left[ {x_{m}^{\left( 1 \right)} { }x_{m}^{\left( 2 \right)} } \right]^{\prime }\), and at the same time, the nowcasting of nonpublished values of \({\varvec{X}}_{\left( m \right)}\) is based on the information available from the confidence and sentiment indicators, i.e., \(\tilde{\user2{X}}_{m}^{^{\prime}} = \left[ {x_{m}^{\left( 3 \right)} { } \ldots x_{m}^{\left( 7 \right)} } \right]\).
Table 3 helps us visualize the complexity of nowcasting. Let us assume that we are interested in estimating private consumption for the current quarter. We also assume that the present time (the time when we proceed to the estimations) is the first month of the next quarter. The \({x}_{m}^{\left(1\right)}\) variable is published up to the 1st month of current quarter, whereas the \({x}_{m}^{\left(3\right)}\) variable is published up to 3rd month of current quarter. We need to relate private consumption with the \({x}_{m}^{\left(1\right)}\). The \({x}_{m}^{\left(1\right)}\) is observed for the 1st month of current quarter, so we do not need any estimation. Regarding the 2nd month, we can estimate the model \({x}_{m}^{\left(1\right)}=f\left({x}_{m}^{\left(3\right)}\right)\) based on the data up to the 1st month of current quarter, and then predict the \({x}_{m}^{\left(1\right)}\) for the 2nd quarter since we know the value of \({x}_{m}^{\left(3\right)}\) for the 2nd quarter. Regarding the 3rd month, we estimate the model \({x}_{m}^{\left(1\right)}=f\left({x}_{m}^{\left(3\right)}\right)\) based on the data up to the 1st month of the current quarter and then predict the \({x}_{m}^{\left(1\right)}\) for the 3rd quarter as a 2stepahead forecast (we know the values of \({x}_{m}^{\left(3\right)}\) for the 2nd and the 3rd quarters).
Let us now assume that the present time is the 2nd month of current quarter. The \({x}_{m}^{\left(1\right)}\) variable is published up to the 2nd month of previous quarter, whereas the \({x}_{m}^{\left(3\right)}\) variable is published up to the 1st month of current quarter. Regarding the 1st month, we have to estimate the model \({x}_{m}^{\left(1\right)}=f\left({x}_{m}^{\left(3\right)}\right)\) based on the data up to the 2nd month of the previous quarter and then predict the \({x}_{m}^{\left(1\right)}\) (as a 2stepahead forecast) for the 1st quarter, since we know the value of \({x}_{m}^{\left(3\right)}\) for the 1st quarter. Regarding the 2nd month, the estimation of a \({x}_{m}^{\left(1\right)}=f\left({x}_{m}^{\left(3\right)}\right)\) model is not helpful as the values of \({x}_{m}^{\left(3\right)}\) for the 2nd month are not published. Hence, we have to rely on another type of model such as a \({x}_{m}^{\left(1\right)}=f\left({x}_{m1}^{\left(3\right)}\right)\). And so on, for the 3rd month, a \({x}_{m}^{\left(1\right)}=f\left({x}_{m2}^{\left(3\right)}\right)\) model may be employed. As Schumacher and Breitungth (2008) accurately highlight in footnote 6 of page 392: “Note that due to the publication lags of GDP, however, the effective forecast horizon needed for computing the forecasts has to be longer. For example, the data of vintage October 2004 (2004M10) contains GDP data up to 2004Q2 and monthly information up to 2004M9. For a forecast of the value in 2005Q1, we effectively need a threequarterahead forecast from the end of the GDP sample.”
Summing up, the framework for the private consumption is:
where \({\beta }_{0}\) is a scalar, \({{\varvec{\theta}}}_{j}\) is a vector of coefficients, \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}=\left[{x}_{m}^{\left(1\right)}, {x}_{m}^{\left(2\right)}\right]\), \({\varepsilon }_{q}\sim N\left(0,{\sigma }_{{\varepsilon }_{q}}^{2}\right)\), \({{\varvec{\gamma}}}_{{\varvec{i}}}=\left[{\gamma }_{i,3} \dots {\gamma }_{i,7}\right]\), \({\widetilde{{\varvec{X}}}}_{m}^{^{\prime}}={\left[{x}_{m}^{\left(3\right)} \dots {x}_{m}^{\left(7\right)}\right]}^{^{\prime}}\), \({\varepsilon }_{i,m}\sim N\left(0,{\sigma }_{{\varepsilon }_{i,m}}^{2}\right)\), \(Cov\left({\varepsilon }_{i,m},{\varepsilon }_{{i}^{^{\prime}},m}\right)=0\). The evaluation of the nowcasting accuracy showed that only the nested model with the retail trade volume index provides a better performance. The simultaneous inclusion of \({x}_{m}^{\left(1\right)}\) and \({x}_{m}^{\left(2\right)}\) creates multicollinearity issues; therefore, the outcome has a worse forecasting performance.^{Footnote 5}
The variables are seasonally adjusted with the X12 method and any variable with nonpositive values is transformed as \({x}_{m}^{*}={x}_{m}min\left({x}_{m}\right)+1\). When we are in the 3rd month of the nowcasted quarter, the required values for the estimation of the proposed model framework are available at the time when we proceed with the estimations, i.e., do belong to the information set.
When we are in the 2nd month of the current quarter and we want to nowcast the \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for the 1st month of the quarter, the \({\widetilde{{\varvec{X}}}}_{m}^{^{\prime}}\) is available up to the 1st month of the current quarter. Hence, we estimate Eq. (5) with data up to the 2nd month of the previous quarter (because the values of \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) are available up to the 2nd month of previous quarter) and predict the values of \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for the 1st quarter of current month as a 2stepahead forecast, i.e.,
We are still in the 2nd month of the current quarter and we want to nowcast the \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for the 2nd month of the quarter. Keeping in mind the publication lags, Eq. (5) is not usable for nowcasting 2nd month’s values. So, we propose the estimation of a structure that provides nowcast values based on the available information set:
In this case, the nowcasting of \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) is a 3stepahead forecast. We estimate Eq. (8) with the data up to the 2nd month of previous quarter (i.e., the values of \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) are available up to the 2nd month of previous quarter) and then predict the values of \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for the 2nd quarter of current month as a 3stepahead forecast:
Finally, we are in the 2nd month of the current quarter and we want to nowcast the \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for the 3rd month of the quarter. Hence, the realtime nowcast for the third month of the quarter is computed as a 4stepahead forecast:
which is based on the model, \({\left[\begin{array}{cc}{x}_{m}^{\left(1\right)}& {x}_{m}^{\left(2\right)}\end{array}\right]}^{^{\prime}}=\left[\begin{array}{c}{\gamma }_{\mathrm{1,0}}\\ {\gamma }_{\mathrm{2,0}}\end{array}\right]+\left[\begin{array}{c}{{\varvec{\gamma}}}_{1}\\ {{\varvec{\gamma}}}_{2}\end{array}\right]{\widetilde{{\varvec{X}}}}_{m2}^{^{\prime}}+\left[\begin{array}{cc}{\varepsilon }_{1,m}& {\varepsilon }_{2,m}\end{array}\right]\).
Maintaining the same rationale, when we are in the 1st month of the current quarter, and we want to nowcast the \({{\varvec{X}}}_{\left(m\right)}^{^{\prime}}\) for:

a.
the 1st month of the quarter, based on the model \({\left[\begin{array}{cc}{x}_{m}^{\left(1\right)}& {x}_{m}^{\left(2\right)}\end{array}\right]}^{^{\prime}}=\left[\begin{array}{c}{\gamma }_{\mathrm{1,0}}\\ {\gamma }_{\mathrm{2,0}}\end{array}\right]+\left[\begin{array}{c}{{\varvec{\gamma}}}_{1}\\ {{\varvec{\gamma}}}_{2}\end{array}\right]{\widetilde{{\varvec{X}}}}_{m1}^{^{\prime}}+\left[\begin{array}{cc}{\varepsilon }_{1,m}& {\varepsilon }_{2,m}\end{array}\right]\), then realtime nowcast is computed as a 3stepahead forecast:
$$\left[ {\begin{array}{*{20}c} {x_{m\backslash m  3}^{\left( 1 \right)} } \\ {x_{m\backslash m  3}^{\left( 2 \right)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\gamma_{1,0}^{{\left( {m  3} \right)}} } \\ {\gamma_{2,0}^{{\left( {m  3} \right)}} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {{\varvec{\gamma}}_{1}^{{\left( {m  3} \right)}} } \\ {{\varvec{\gamma}}_{2}^{{\left( {m  3} \right)}} } \\ \end{array} } \right]\tilde{\user2{X}}_{m  1}^{^{\prime}} ,$$(12) 
b.
the 2nd month of the quarter, based on the model \({\left[\begin{array}{cc}{x}_{m}^{\left(1\right)}& {x}_{m}^{\left(2\right)}\end{array}\right]}^{^{\prime}}=\left[\begin{array}{c}{\gamma }_{\mathrm{1,0}}\\ {\gamma }_{\mathrm{2,0}}\end{array}\right]+\left[\begin{array}{c}{{\varvec{\gamma}}}_{1}\\ {{\varvec{\gamma}}}_{2}\end{array}\right]{\widetilde{{\varvec{X}}}}_{m2}^{^{\prime}}+\left[\begin{array}{cc}{\varepsilon }_{1,m}& {\varepsilon }_{2,m}\end{array}\right]\), then the realtime nowcast is computed as a 4stepahead forecast:
$$\left[ {\begin{array}{*{20}c} {x_{m\backslash m  4}^{\left( 1 \right)} } \\ {x_{m\backslash m  4}^{\left( 2 \right)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\gamma_{1,0}^{{\left( {m  4} \right)}} } \\ {\gamma_{2,0}^{{\left( {m  4} \right)}} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {{\varvec{\gamma}}_{1}^{{\left( {m  4} \right)}} } \\ {{\varvec{\gamma}}_{2}^{{\left( {m  4} \right)}} } \\ \end{array} } \right]\tilde{\user2{X}}_{m  2}^{^{\prime}} ,$$(13) 
c.
the 3rd month of the quarter, based on the model \({\left[\begin{array}{cc}{x}_{m}^{\left(1\right)}& {x}_{m}^{\left(2\right)}\end{array}\right]}^{^{\prime}}=\left[\begin{array}{c}{\gamma }_{\mathrm{1,0}}\\ {\gamma }_{\mathrm{2,0}}\end{array}\right]+\left[\begin{array}{c}{{\varvec{\gamma}}}_{1}\\ {{\varvec{\gamma}}}_{2}\end{array}\right]{\widetilde{{\varvec{X}}}}_{m3}^{^{\prime}}+\left[\begin{array}{cc}{\varepsilon }_{1,m}& {\varepsilon }_{2,m}\end{array}\right]\), then the realtime nowcast is computed as a 5stepahead forecast:
$$\left[ {\begin{array}{*{20}c} {x_{m\backslash m  5}^{\left( 1 \right)} } \\ {x_{m\backslash m  5}^{\left( 2 \right)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\gamma_{1,0}^{{\left( {m  5} \right)}} } \\ {\gamma_{2,0}^{{\left( {m  5} \right)}} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {{\varvec{\gamma}}_{1}^{{\left( {m  5} \right)}} } \\ {{\varvec{\gamma}}_{2}^{{\left( {m  5} \right)}} } \\ \end{array} } \right]\tilde{\user2{X}}_{m  3}^{^{\prime}} .$$(14)
5.2 Government spending on public goods and services
An annual estimate of government spending is published in the state budget. Usually, the state budget report is submitted between the last days of October and the first days of November. The figures are presented on an annual basis. Thus, we can infer the estimate of government spending for the last quarter of the current year. If \({\widehat{y}}_{a}^{\left(2\right)}\) is the estimated annual growth of government spending for year \(a\), we can nowcast the public consumption for the last quarter of the year as \({\widehat{Y}}_{q}^{\left(2\right)}=\left({Y}_{a1}^{\left(2\right)}\left(1+{\widehat{y}}_{a}^{\left(2\right)}\right)\right)\left({\sum }_{i=1}^{3}{Y}_{qi}^{\left(2\right)}\right)\).
The \({\widehat{y}}_{a}^{\left(2\right)}\) is the official nowcast that incorporates all the available information for government spending and is published between the 1st and 2nd month of last quarter. Hence, the \({\widehat{Y}}_{q}^{\left(2\right)}\) has those characteristics necessary to be considered a landmark estimator. But, if we measure the nowcasting accuracy of \({\widehat{y}}_{a}^{\left(2\right)}\) based on the mean absolute percentage error of the last quarter of each year, we reach a value of \({\left(Q/4\right)}^{1}{\sum }_{q=1\left(4\right)}^{Q}\left{\widehat{Y}}_{q}^{\left(2\right)}{Y}_{q}^{\left(2\right)}\right/{Y}_{q}^{\left(2\right)}=10.31\%\).
In order to proceed with an evaluation of the official nowcasting of government spending, we estimate the forecast values of \({Y}_{q}^{\left(2\right)}\) from the random walk model. The mean absolute percentage error for the whole period equals 10.76%, which is very close to that of official nowcast. On the other hand, a naïve firstorder autoregressive model of the form:
for \({\varepsilon }_{q}\sim N\left(0,{\sigma }_{\varepsilon }^{2}\right)\), leads to \({\mathrm{Q}}^{1}{\sum }_{\mathrm{q}=1}^{\mathrm{Q}}\left{\mathrm{Y}}_{\mathrm{q}+1\backslash q}^{\left(2\right)}{\mathrm{Y}}_{\mathrm{q}+1}^{\left(2\right)}\right/{\mathrm{Y}}_{\mathrm{q}+1}^{\left(2\right)}=2.33\mathrm{\%}\), where \({\mathrm{Y}}_{\mathrm{q}+1\backslash q}^{\left(2\right)}={\beta }_{0}^{\left(q\right)}\left(1{\beta }_{1}^{\left(q\right)}\right)\)+\({\beta }_{1}^{\left(q\right)}{\mathrm{Y}}_{\mathrm{q}}^{\left(2\right)}\) is the onequarter ahead forecast based on the information set of the previous quarter. Hence, we select the AR(1) model, as it leads to much lower forecast errors.^{Footnote 6}
5.3 Investment on business capital goods
For the nowcasting of investments, the preliminary analysis provides strong evidence for the usability of the daily financial data. More specifically, we observe that Athens stock exchange main general index, \({Z}_{d}^{\left(1\right)}\), and Greek 10year government bond yield, \({Z}_{d}^{\left(2\right)}\), are adequate explanatory variables for \(\left({Y}_{q}^{\left(3\right)}\right)\). Hence, the proposed model framework is estimated in the form:
where \({\beta }_{0}\) is a scalar coefficient, \({\boldsymbol{\varphi }}_{j}\) is a vector of coefficients, \({{\varvec{Z}}}_{\left(d\right)}^{^{\prime}}=\left[{z}_{d}^{\left(1\right)}, {z}_{d}^{\left(2\right)}\right]\) denotes the vector of variables observed at a daily frequency, i.e., \({z}_{d}^{\left(1\right)}=log\left({Z}_{d}^{\left(1\right)}/{Z}_{d1}^{\left(1\right)}\right)\) and \({z}_{d}^{\left(2\right)}=log\left({Z}_{d}^{\left(2\right)}/{Z}_{d1}^{\left(2\right)}\right)\), and \(s=66\) denotes the number of trading days of each quarter and \({\varepsilon }_{q}\sim N\left(0,{\sigma }_{{\varepsilon }_{q}}^{2}\right)\). Regarding the \(i\) term, for \(i=0\), we estimate \({y}_{q\backslash q}^{\left(3\right)}\), and for \(i\ge 1\) and \(is\ge 66\), we estimate the onemonth ahead \({y}_{q+1\backslash q}^{\left(3\right)}\), and when \(i\ge 2\) and \(is\ge 132\), we estimate the twomonth ahead \({y}_{q+2\backslash q}^{\left(3\right)},\) and so on.
5.4 Exports of goods
The quarterly export of goods and services is related to the export of fuels, vessels, other services, travel receipts, transportation receipts, and other receipts, which are available on a monthly basis from the balance of payments:
Thus, for \({y}_{q}^{\left(4\right)}\) the \({{\varvec{X}}}_{\left(m\right)}\) includes information available from the balance of payments on a monthly frequency. More specifically, we define \({{\varvec{X}}}_{\left(m\right)}\equiv {x}_{\left(m\right)}=\sum_{k=1}^{3}{x}_{m}^{\left(k\right)}+0.2{x}_{m}^{\left(4\right)}\), for export of fuels \({x}_{m}^{\left(1\right)}\), export of vessels \({x}_{m}^{\left(2\right)}\), other exports \({x}_{m}^{\left(3\right)}\) and travel receipts \({x}_{m}^{\left(4\right)}\). These variables are in nominal values and not seasonally adjusted; thus, the \({x}_{\left(m\right)}\) is seasonally adjusted with the X12 method.
With \({x}_{\left(m\right)}\) we have reached to a model that nowcasts the values that have not been published based on information available for the seasonally adjusted Purchasing Managers’ sub index New Export Orders:
The balance of payments is published with a lag of 2 months. Thus, we estimate the model based on the most recently available information set, \({{\varvec{I}}}_{m}=\left\{{x}_{m2},{pmi}_{m1}^{\left(exp\right)}\right\}\). When we are in the 3rd month of the quarter, we can estimate the coefficients of the model \({\gamma }_{0}^{\left(m2\right)}\), \({\gamma }_{1}^{\left(m2\right)}\). The realtime nowcast of \({x}_{m}\) for the 3rd month of the quarter is computed as:
Table 4 visualizes the publication of information across time. We are interested to nowcast for the current quarter, and at the present time we are in the 3rd month of the quarter, or \(\left(m\right)\). The PMI is published with a lag of 1 month, or \(\left(m1\right)\), whereas the balance of payments is published with a lag of 2 months, or \(\left(m2\right)\). So, the model can be estimated with the most recent information available at \(\left(m2\right)\). Afterwards, we are able to compute the 1stepahead forecast \({x}_{m1\backslash m2}\) as the \({pmi}_{m2}^{\left(exp\right)}\) is available. Additionally, we can compute the 2stepahead forecast \({x}_{m\backslash m2}\) as the \({pmi}_{m1}^{\left(exp\right)}\) is also available.
Also, the realtime nowcast of \({x}_{m}\) for the 2nd month of the quarter is computed as:
Finally, for the 1st month of the quarter, the \({x}_{m}\) has been published.
When we are in the 2nd month of the quarter, the realtime nowcast of \({x}_{m}\) for the 3rd month of the quarter cannot be nowcasted as \({pmi}^{\left(exp\right)}\) the 2nd month is not published. So, we estimate a time series model that captures the autoregressive pattern:
Keeping in mind the publication lag of 2 months, the adequate prediction scheme is \({x}_{m\backslash m3}={e}^{\left({1+\gamma }_{1}^{\left(m3\right)}\right)log\left({x}_{m1\backslash m3}\right){\gamma }_{1}^{\left(m3\right)}log\left({x}_{m2\backslash m3}\right)+{\gamma }_{0}^{\left(m3\right)}\left({1\gamma }_{1}^{\left(m3\right)}\right)}\), where \({x}_{m1\backslash m3}\) and \({x}_{m2\backslash m3}\) should also be computed iteratively. Also, the realtime nowcast of \({x}_{m}\) for the 2nd month of the quarter is computed as:
Finally, for the 1st month of the quarter, the \({x}_{m}\) is estimated as:
When we are in the 1st month of the quarter, the realtime nowcast of \({x}_{m}\) for the 3rd month of the quarter is estimated by the autoregressive pattern previously defined. Hence, the adequate prediction scheme is:
Similarly, for the 2nd month of the quarter:
Regarding the 1st month of the quarter, the available infmation set is \({{\varvec{I}}}_{m}=\left\{{x}_{m2},{pmi}_{m1}^{\left(exp\right)}\right\}\), so the realtime nowcast is computed as:
5.5 Exports of services
For the export of services, \({y}_{q}^{\left(5\right)}\), we define \({{\varvec{X}}}_{\left(m\right)}\equiv {x}_{\left(m\right)}=0.8{x}_{m}^{\left(4\right)}+{x}_{m}^{\left(5\right)}+{x}_{m}^{\left(6\right)}\), for travel receipts \({x}_{m}^{\left(4\right)}\), transportation receipts \({x}_{m}^{\left(5\right)}\) and other receipts \({x}_{m}^{\left(6\right)}\):
For the seasonally adjusted \({x}_{\left(m\right)}\) we have reached to a model that nowcasts the values that have not been published based on the information that is available for the seasonally adjusted Purchasing Managers’ sub index New Export Orders:
The rationale behind the computations is in line with the approach followed for the export of goods and is available in Appendix B.
5.6 Imports of goods
For the quarterly import of goods \({y}_{q}^{\left(6\right)}\) the \({{\varvec{X}}}_{\left(m\right)}\) is expressed with the information available from the balance of payments at a monthly frequency; \({{\varvec{X}}}_{\left(m\right)}\equiv {x}_{\left(m\right)}=\sum_{k=7}^{9}{x}_{m}^{\left(k\right)}\), where \(k=\mathrm{7,8},9\) denotes importation of fuels, importation of vessels and importation of other goods, respectively. These variables are in nominal values and not seasonally adjusted; thus, the \({x}_{\left(m\right)}\) is seasonally adjusted with the X12 method. For \({x}_{\left(m\right)}\) we have reached to a model that nowcasts the monthly nonpublished values based on the firstorder autoregressive pattern of qoq log returns:
The \({{\varvec{l}}{\varvec{X}}}_{\left(m\right)}\) denotes the vector of logtransformation of the variables \({x}_{\left(m\right)}\).
5.7 Imports of services
For the quarterly import of services, \({y}_{q}^{\left(7\right)}\), the balance of payments provides all the necessary information for the estimation of nonpublished values. We define \({{\varvec{X}}}_{\left(m\right)}\equiv {x}_{\left(m\right)}=\sum_{k=10}^{12}{x}_{m}^{\left(k\right)}\), where \(k=\mathrm{10,11,12}\) denotes travel receipts, transportation receipts and other receipts, respectively. As in the previous sections, the \({x}_{\left(m\right)}\) is seasonally adjusted with the X12 method. For \({x}_{\left(m\right)}\) we have reached to a model that nowcasts the monthly nonpublished values based on the secondorder autoregressive pattern of qoq log returns:
5.8 Changes in inventories
The changes in inventories are fully unpredictable, but with an autocorrelated structure across quarters. Hence, we assume, a priori, that for the future value of \({Y}_{q}^{\left(8\right)}\), the only available information is its firstorder autocorrelated structure:
for \({\varepsilon }_{q}\sim N\left(0,{\sigma }_{\varepsilon }^{2}\right)\). So, the nowcast value for the changes in inventories is the onequarter ahead forecast based on the information set of the previous quarter: \({\mathrm{Y}}_{\mathrm{q}+1\backslash q}^{\left(8\right)}={\beta }_{0}^{\left(q\right)}\left(1{\beta }_{1}^{\left(q\right)}\right)\)+\({\beta }_{1}^{\left(q\right)}{\mathrm{Y}}_{\mathrm{q}}^{\left(8\right)}\).^{Footnote 7}
6 Robustness tests
A number of additional model extensions for robustness purposes are discussed in the paragraphs which follow:

1.
The Midas models have been replaced by regression models aggregating the data from a higher sampling frequency to a quarterly frequency. According to the findings presented in Sect. 8, the use of mixeddata sampling frequency estimators is definitely necessary for returning accurate nowcasts.

2.
In Sect. 5.1, the variable selection is plausible, but how do we know that the omitted variables do not help? If that was the case then we may prefer a data driven way that would have examined all the available explanatory variables. Of course, the use of explanatory variables that are highly linearly related induces the problem of multicollinearity. A common strategy to reduce the risk of multicollinearity is the estimation of factors that express the majority of the variability of the original variables. Principal component analysis has been applied for the estimation of the factors. Illustratively, in the case of private consumption on goods and services, we present the replacement of monthly confidence and sentiment indicators with factors estimated from the PCA. Let us define as \({\widetilde{{\varvec{X}}}}_{m}\) the matrix with the \(M\) selected variables for \({m}^{th}\) month. The factors are estimated as:
$$\tilde{\user2{X}}_{m} = \user2{\Lambda X}_{m}^{\left( f \right)} + {\varvec{e}}_{m} ,$$(32)where \(\boldsymbol{\Lambda }\) is the matrix of factor loadings, \({{\varvec{X}}}_{m}^{\left(f\right)}={\left[{f}_{m}^{\left(1\right)},{\dots ,f}_{m}^{\left(M\right)}\right]}^{^{\prime}}\) is the vector with the common factors, and \({{\varvec{e}}}_{m}\) is the vector of the idiosyncratic component. Summing up, the private consumption model is estimated as:
$$y_{q}^{\left( 1 \right)} = \beta_{0} + \mathop \sum \limits_{\tau = 0}^{\kappa  1} {\varvec{X}}_{{\left( {m  \tau  is} \right)}}^{^{\prime}} \left( {\mathop \sum \limits_{j = 0}^{p} \tau^{j} {\varvec{\theta}}_{j} } \right) + \varepsilon_{q} ,$$(33)$$\left[ {\begin{array}{*{20}c} {x_{m}^{\left( 1 \right)} } \\ {x_{m}^{\left( 2 \right)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\gamma_{1,0} } \\ {\gamma_{2,0} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {{\varvec{\gamma}}_{1} } \\ {{\varvec{\gamma}}_{2} } \\ \end{array} } \right]{\varvec{X}}_{m}^{\left( f \right)} + \left[ {\begin{array}{*{20}c} {\varepsilon_{1,m} } \\ {\varepsilon_{2,m} } \\ \end{array} } \right],$$(34)where \({\beta }_{0}\) is a scalar coefficient, \({{\varvec{\theta}}}_{j}\) is a vector of coefficients\(, {{\varvec{X}}}_{\left(m\right)}^{^{\prime}}=\left[{x}_{m}^{\left(1\right)} {x}_{m}^{\left(2\right)}\right]\),\({\varepsilon }_{q}\sim N\left(0,{\sigma }_{{\varepsilon }_{q}}^{2}\right)\), \({{\varvec{\gamma}}}_{{\varvec{i}}}=\left[{\gamma }_{i,3} \dots {\gamma }_{i,7}\right]\), \({{\varvec{X}}}_{m}^{\left(f\right)}=\left[{f}_{m}^{\left(1\right)},{\dots ,f}_{m}^{\left(4\right)}\right]\), \(N\left(0,{\sigma }_{{\varepsilon }_{i,m}}^{2}\right)\), \(Cov\left({\varepsilon }_{i,m},{\varepsilon }_{{i}^{^{\prime}},m}\right)=0\). The model has been estimated with 4 as well as with 2 factors and the forecasting accuracy was statistically indistinguishable.^{Footnote 8}

3.
For the investment nowcasting, we have estimated models by adding explanatory variables available at a monthly frequency. The most informative model specifications are still those based on the Athens stock exchange main general index, \({Z}_{d}^{\left(1\right)}\), and the Greek 10year government bond yield, \({Z}_{d}^{\left(2\right)}\). The only monthly variable that has satisfactory nowcasting accuracy is the financial conditions index, \({x}_{m}^{\left(6\right)}\). However, none of the additional models are able to perform better when compared to those based on the daily dataset. In Sect. 8, the additional models are also presented.
7 Nowcasting with naive models
For benchmark purposes, a random walk (the projected growth rate is the most recently available plus the average loggrowth), a firstorder autoregressive model, and a regression model on a quarterly frequency using the same information as in the disaggregate framework are estimated for the quarterly data. The models are considered in the forms:
7.1 Random walk
where \(y_{q}^{\left( 0 \right)} = log\left( {Y_{q}^{\left( 0 \right)} /Y_{q  1}^{\left( 0 \right)} } \right)\). is the qoq GDP growth rate.
7.2 Firstorder autoregressive
7.3 Regression model
where \({\varvec{X}}_{q}^{\left( f \right)}\) is the vector with the common factors from the PCA dimension reduction method: \({\varvec{X}}_{q} = \user2{\Lambda X}_{q}^{\left( f \right)} + {\varvec{e}}_{q}\). The \({\varvec{X}}_{q}\) includes all the explanatory variables on a quarterly frequency.
8 Nowcasting evaluation
The nowcast evaluation focuses on answering the research question: If we proceed to a more complicated prediction task for GDP nowcasting (as the proposed framework), what is the forecast accuracy gain, compared to simpler nowcasting techniques? We answer this question comparing the forecasting accuracy of the disaggregated Midas model against simpler nowcasting approaches as (i) the disaggregated regression model (i.e., not mixedfrequency modeling based solely on quarterly data), (ii) the aggregated regression model (i.e., neither mixedfrequency modeling nor disaggregated dataset), and (iii) the naïve model techniques (no pain at all!).
As Barhoumi et al. (2008) noted, the nowcasting evaluation exercise must replicate the data availability situation that is faced in the realtime application of the models. As Diebold (2020) noted there are four approaches in forecasting evaluation. (1) Approach based on fullsample estimation and final revised data; (2) approach based on expanding sample estimation and final revised data; (3) approach based on expanding sample estimation and vintage data; and (4) approach based on expanding sample estimation and vintage information. Our nowcasting exercise is based on a sequence of pseudo outofsample nowcasts over the evaluation sample based on the final revised data, as vintage data are not available for the Greek economy. A realtime evaluation is truly credible if it is based on vintage information and it is obtained by using nowcasts produced and permanently recorded in real time. Unfortunately, we are not able to provide an evaluation based on vintage information. But, given the data availability, we have produced nowcasts based on final revised data that were available at the time the model was to be estimated. For example, let us assume that we estimate the model that produces the nowcast of private consumption,\({Y}_{q}^{\left(1\right)}\), and the explanatory variable is the retail trade volume index, \({x}_{m}^{\left(1\right)}\). The \({x}_{m}^{\left(1\right)}\) for June is published on 31st of August. If we nowcast \({Y}_{q}^{\left(1\right)}\) for the Q3 on June, then the \({x}_{m}^{\left(1\right)}\) of June will not be imported in the information set. But, if we nowcast \({Y}_{q}^{\left(1\right)}\) for the Q3 on September, then the \({x}_{m}^{\left(1\right)}\) of June will be imported in the information set.
The coefficients of the proposed model framework are estimated recursively each month. It is very difficult to present the coefficient estimates for all the components of GDP, all the layers and all the possible combinations of nowcastedmonth and publicationmonth. What is relevant for practitioners would be to know the estimated coefficients of the model using the latest data. Thus, in Table 5, we present the estimated coefficients with their p values based on the latest available information set. Moreover, we present in appendix C for the private consumption only, line plots of the estimated coefficients across time and the relative p values. On a quarterly frequency, the estimated parameters refer to the recursive estimation of private consumption for the current quarter, based on information available on the 3rd month of current quartet (see Eq. 5). On a monthly frequency, we present the estimated parameters for the 2nd month of current quarter, based on information available on the 3rd month of current quartet (see Eq. 6). We infer that the values of coefficients change over time, mainly gradually, reflecting the updates of the information set. The parameters are not statistically significant across the total period under evaluation, replicating the changes in the relationships among the various variables.
The \({Y}_{q}^{\left(0\right)}\) nowcasts are estimated for the quarters 2005Q1 to 2020Q3. For each quarter, we provide 5 different nowcasts of GDP, depending on the time we proceed to the estimation of the nowcasting. So, we estimate the GDP assuming that we are in the 1st month of the current quarter, the 2nd month of current quarter and so on, up to the 2nd month of the next quarter. The loss functions on which the forecasting evaluation is based on are i) the mean absolute percentage distance, MAPE, between actual and estimated GPD and ii) the root mean squared error, RMSE. So, we evaluate the nowcasting accuracy based on the GDP in billions, for quarter q:
and
where \({\widehat{Y}}_{q}^{\left(0\right)}\) is the GDP nowcast. The Hansen et al. (2011) Model Confidence Set is utilized in order to define the set of models that consists of the best nowcasting models, according to our predefined MAPE and RMSE loss functions. The null hypothesis \({H}_{0,M}: E\left({d}_{\left(j\right),\left({j}^{*}\right),q}\right)=0,\) for \(\forall\) \(j,{j}^{*}\in M\) \(M\subset {M}^{0}\), is tested against the alternative one \({H}_{1,M}: E\left({d}_{\left(j\right),\left({j}^{*}\right),q}\right)\ne 0.\) The test at each iteration, for \(\forall\) \(M \subset{M}^{0}\), identifies the model that should be rejected under the \({H}_{0,M}\). If \({\Psi }_{q,\left(j\right)}\) denotes the value of the predicted squared error of model \(j\) at quarter \(q\), or \({\Psi }_{q,\left(j\right)}={\left({Y}_{q}^{\left(0\right)}{\widehat{Y}}_{q,\left(j\right)}^{\left(0\right)}\right)}^{2}\), then \({d}_{\left(j\right),\left({j}^{*}\right),q}={\Psi }_{q,\left(j\right)}{\Psi }_{q,\left({j}^{*}\right)}\) is the evaluation differential for \(\forall\) \(j,{j}^{*}\in {M}^{0}\). A high p value provides evidence supporting the hypothesis that the model does belong to the model confidence set.
The most widely used tests for evaluating the statistical difference among competing forecasting models are the Diebold and Mariano (1995) test, the Equal Predictive Accuracy test of Clark and West (2007), the Reality Check for Data Snooping of White (2000), the Superior Predictive Ability of Hansen (2005) and the Model Confidence Set of Hansen et al. (2011). Each method has its pros and cons, and the Diebold and Mariano test is best suited for pairwise comparisons, while Model Confidence Set is more appropriate for simultaneously evaluating the forecasting performance of competing models, without predefining a benchmark model.
Tables 6, 7, 8, 9, 10, 11, 12, and 13 present the mean absolute percentage error and the root mean squared error for private consumption on goods and services (Table 6), government spending on public goods and services (Table 7), investment in business capital goods (Table 8), exports of goods (Table 9), exports of services (Table 10), imports of goods (Table 11), imports of services (Table 12), and changes in inventories (Table 13), respectively.
Indicatively, in Table 6, the MAPE loss function of nowcasting the consumption on goods and services when we have information published up to the 1st month of current quarter is 3.51% based on the Midas model and 9.70% based on the regression model. The regression model aggregates the data from the higher sampling frequency to the quarterly frequency as described in the robustness section. Overall, as we move from the 1st month of the current quarter to the 2nd month of next quarter, the error decreases for both model specifications (i.e., Midas and regression) and both loss functions (i.e., MAPE and RMSE). The AR(1) and RW are the firstorder autoregressive and the random walk models, respectively, which used as naïve benchmarks. The naïve models are estimated for the 3rd month of the current quarter because of the 3month publication lag of quarterly data. The naïve models have inferior performance in all the cases except in the case of the consumption on goods and services. Regarding consumption, the naïve models are beaten, in terms of nowcasting accuracy, by the Midas model only when the information for the 2nd month of next quarter is available.
The analysis in Tables 8, 9, 10, 11, and 12 reaches similar findings. In the vast majority of the cases, the Midas model outperforms the regression and the naïve models. Also, the Midas model has always better performance compared to the naïve models, even with the information available two months ago. Overall, the Midas models have better performance than the naïve models. The worst performance of the Midas model is in the case of consumption, where the information of the 2nd month of next quarter is required in order to beat the performance of the naïve models.
As the nowcasting of government spending (Table 7) and the changes in inventories (Table 13) do not use a mixedfrequency framework, the nowcasting is conducted once the quarterly data are published.
As discussed in the robustness section, we run a series of models in order to investigate the usability of data sampled at higher frequencies. Table 14 presents the MAPE and RMSE loss functions for the best performing Midas and regression models which include additional variables. Indeed, only one monthly variable has satisfactory nowcasting accuracy; the financial conditions index, \({x}_{m}^{\left(6\right)}\). None of the additional models is able to perform better compared to those based on the daily dataset.
Table 15 presents the nowcasting error when we estimate the GDP as a summation of the nowcasting of its components. For example, the MAPE loss function is 1.77% based on the Midas specifications when we take into consideration the data that are available up to the 2nd month of next quarter. When we use the regression model then the MAPE loss function becomes 2.14%. So, we reach at a very important finding. The Midas nowcasting based on the disaggregated data is by far better than the regression nowcasting. But a naïve model is able to provide a better nowcasting accuracy for the 3rd month of next quarter. The lower values of MAPE and RMSE for the naïve models compared to the Midas model are somehow in contradiction with the results presented for the nowcasting of GDP components. This is because, when we nowcast the GDP components separately, the Midas model has always a better nowcasting performance compared to the naïve models, except for the private consumption (where the AR(1) model performs slightly better). But if we aggregate the nowcasts of the components, then the GDP nowcasting has a higher MAPE compared to the MAPE of the naïve AR(1) model. The observed performance of the AR(1) model on the GDP nowcasting, leads us to model the forecast error in GDP nowcasting with an additional autocorrelated structure on the nowcasts of GDP components. The possible sources of the autocorrelated structure of the forecast error have been discussed in Sect. 3 (see the nowcasting error correction). Figure 2 plots the y–oy growth rate of GDP against the y–oy nowcasting error, which is defined as: \(\frac{\left({\widehat{Y}}_{q}^{\left(0\right)}{Y}_{q4}^{\left(0\right)}\right)}{{Y}_{q4}^{\left(0\right)}}\frac{\left({Y}_{q}^{\left(0\right)}{Y}_{q4}^{\left(0\right)}\right)}{{Y}_{q4}^{\left(0\right)}}\equiv \frac{\left({\widehat{Y}}_{q}^{\left(0\right)}{Y}_{q}^{\left(0\right)}\right)}{{Y}_{q4}^{\left(0\right)}}.\) Naturally, there is a positive relationship between the magnitude of the growth rate and the nowcasting error. Moreover, we observe that the majority of the nowcasting errors are positive (mainly in the estimation based on the data available in the 1st month of the current quarter). This positive bias of the nowcasting errors indicates an autocorrelated error structure, which justifies the use of the nowcasting error correction. The unconditional correlation between y–oy GDP growth and the y–oy nowcasting error ranges from 48% (for M1 of next quarter) up to 62% (for M2 of next quarter). Indicatively, Fig. 3 presents the scatterplots between y–oy GDP growth and the y–oy nowcasting errors, which confirms the autocorrelated error structure.
Table 16 presents the nowcasting error from the model that accounts for the error forecast correction. For example, the MAPE loss function is 1.75% based on the Midas specifications when we take into consideration the data that are available up to the 1st month of the current quarter. When we use the regression model, then the MAPE loss function is 1.84%, whereas the MAPE value of naïve AR(1) model is 2.44%. So, we conclude that the modeling of the nowcasting error structure, as proposed in Sect. 3, reduces the nowcasting error. Figure 4 plots the y–oy growth rate of GDP and the relative y–oy nowcasting errors of the disaggregated Midas model with the nowcasting error correction. We observe that the aforementioned positive bias of the nowcasting errors has decreased significantly, i.e., that is why the nowcasting accuracy has increased and is statistically significant.
This error reduction is statistically significant according to the p values of the MCS test, which are presented in Table 17. A high p value denotes that the model is included in the confidence set of the models with the lowest value in loss function, according to the MCS test. For example, if we define a 0.7 significance level, the disaggregated Midas with an error correction forecast becomes the only model to be included in the confidence set in all cases except for the nowcasting in the 2nd month of the next quarter. So, we conclude that i) this is a superior model for nowcasting the GDP at any nowcasting month and ii) only when the information of the current quarter is fully available does an AR(1) model provide equal nowcasting ability. Please keep in mind that we have presented the AR(1) model in the 2nd month of the next quarter but actually the AR(1) model can be estimated with a 3 month publication lag, in other words during the 3rd month of the next quarter.
9 Simulations
The disaggregation of GDP nowcasting has provided us with more accurate nowcasts of the components of GDP; in terms of MAPE and RMSE loss measures, but a naïve AR(1) model was able to provide nowcasts of equal forecasting accuracy for at least the 3rd month of next quarter.
In the paragraphs that follow we examine whether the inclusion of multiple sources of forecast errors is the key determinant of accuracy loss in the GDP nowcasting. As already mentioned, the computation of the GDP nowcasting requires the summation of multiple nowcast values. As GDP is the summation of its \(k\) components; \({Y}_{q}^{\left(0\right)}=\left({\sum }_{k=1}^{5}{Y}_{q}^{\left(k\right)}{\sum }_{k=6}^{7}{Y}_{q}^{\left(k\right)}+{Y}_{q}^{\left(8\right)}\right)\), naturally, the nowcasting is computed as; \({Y}_{q\backslash q}^{\left(0\right)}=\left({\sum }_{k=1}^{5}{Y}_{q\backslash q}^{\left(k\right)}{\sum }_{k=6}^{7}{Y}_{q\backslash q}^{\left(k\right)}+{Y}_{q\backslash q}^{\left(8\right)}\right)\). As \({Y}_{q}^{\left(k\right)}={Y}_{q\backslash q}^{\left(k\right)}+{\varepsilon }_{q\backslash q}^{\left(k\right)}\), the estimation of GDP nowcasting, \({Y}_{q\backslash q}^{\left(0\right)}\), hides diligently \(k\) nowcasting errors, \({\varepsilon }_{q\backslash q}^{\left(k\right)}.\) Thus, we run a series of simulations in order to unmask any possible impact of the multiple sources of forecasting errors.
9.1 Autoregressive framework
We assume an aggregated series \({Y}_{q}^{\left(0\right)}=\left({\sum }_{k=1}^{4}{Y}_{q}^{\left(k\right)}\right)\), where the qoq growth rate of each \({Y}_{q}^{\left(k\right)}\) follows an AR(1) process:
Then, we compute the onestepahead forecasts of \({Y}_{q}^{\left(k\right)}\) as \({Y}_{q+1\backslash q}^{\left(k\right)}\), for \(k=1,..,4\) as well as the onestepahead forecasts of \({Y}_{q}^{\left(0\right)}\) as \({Y}_{q+1\backslash q}^{\left(0\right)}={\sum }_{k=1}^{4}{Y}_{q+1\backslash q}^{\left(k\right)}\). Moreover, we assume for the simulated process \({Y}_{q}^{\left(0\right)}\) that it can be estimated as an AR(1) process, thus, we compute onestepahead forecasts of \({Y}_{q}^{\left(0\right)}\) from an estimated AR(1) model: \({Y}_{q+1\backslash q}^{*\left(0\right)}\).
By design, the true data generated process of \({Y}_{q}^{\left(0\right)}\) is the aggregation of the components whose qoq growth rate has a firstorder autoregressive structure. So, in terms of the statistical evaluation of forecasting accuracy, the \({Y}_{q+1\backslash q}^{\left(0\right)}\) forecasts must be more accurate compared to \({Y}_{q+1\backslash q}^{*\left(0\right)}\) forecasts according to the classical loss functions, despite the fact that we have imposed \(k\) forecasting errors, \({\varepsilon }_{q+1\backslash q}^{\left(k\right)}.\)
The simulations have been conducted for various values of parameters \({\beta }_{0}^{\left(k\right)}, {\beta }_{1}^{\left(k\right)}\) and the magnitude of the error term, \({\sigma }_{q}^{2\left(k\right)}\). Indicatively, for \({\beta }_{0}^{\left(k\right)}=0.1\) and \(0.8\le {\beta }_{1}^{\left(k\right)}\le 0.9\), various combinations of the four AR(1) models of Eq. (39) have been simulated. For illustration purposes, we have constructed a measure that represents the dispersion among the values of parameters.^{Footnote 9} The dispersion measure is computed as:
where \(\overline{{\beta }_{1}}={\sum }_{k=1}^{4}{\beta }_{1}^{\left(k\right)}/4\). Figure 5 presents the dispersion measure, \(DM,\) along with the RMSE loss function for \({Y}_{q+1\backslash q}^{\left(0\right)}\) and \({Y}_{q+1\backslash q}^{*\left(0\right)}\). The value of the RMSE loss function for the aggregated forecast \({\sum }_{k=1}^{4}{Y}_{q+1\backslash q}^{\left(k\right)}\) is stable across the various values of the dispersion measure. On the other hand, the values of \(RMSE=\sqrt{{10.000}^{1}\sum_{q=1}^{10.000}{\left({Y}_{q+1}^{\left(0\right)}{Y}_{q+1\backslash q}^{*\left(0\right)}\right)}^{2}}\) are highly related to the values of the dispersion measure. Therefore, we reach the finding that the aggregation of the predictions provides more accurate onestepahead predictions despite the inclusion of multiple sources of forecast errors. Moreover, when we ignore the disaggregation (and we compute the \({Y}_{q+1\backslash q}^{*\left(0\right)}\)), the loss of forecasting accuracy increases proportionally to the dispersion among the values of the parameters.
9.2 Regression framework
For robustness, we create another simulated framework assuming an aggregated series \({Y}_{q}^{\left(0\right)}=\left({\sum }_{k=1}^{4}{Y}_{q}^{\left(k\right)}\right)\), where the qoq growth rate of each \({Y}_{q}^{\left(k\right)}\) follows a regression model:
The initial values of the coefficients in the simulated regressions have been computed from similar regressions based on the actual dataset. Thus, we have assumed as \({y}_{q}^{\left(1\right)}\) the private consumption of goods and services, \({x}_{q}^{\left(1\right)}\) the retail trade volume index, \({y}_{q}^{\left(2\right)}\) the investment in business capital goods, \({x}_{q}^{\left(2\right)}\) the Athens stock exchange main general index, \({y}_{q}^{\left(3\right)}\) the exports of goods, \({x}_{q}^{(3)}=\sum_{k=1}^{3}{\widetilde{x}}_{q}^{\left(k\right)}+0.2{\widetilde{x}}_{q}^{\left(4\right)}\) (for export of fuels \({\widetilde{x}}_{q}^{\left(1\right)}\), export of vessels \({\widetilde{x}}_{q}^{\left(2\right)}\), other exports \({\widetilde{x}}_{q}^{\left(3\right)}\) and travel receipts \({\widetilde{x}}_{q}^{\left(4\right)}\)) and \({y}_{q}^{\left(4\right)}\) the imports of goods, \({x}_{q}^{(4)}=\sum_{k=5}^{7}{\widetilde{x}}_{q}^{\left(k\right)}\) (for \(k=\mathrm{5,6},7\) we denote the importation of fuels, importation of vessels and importation of other goods, respectively).
Then, we compute the onestepahead forecasts of \({Y}_{q}^{\left(k\right)}\) as \({Y}_{q+1\backslash q}^{\left(k\right)}\), for \(k=1,..,4\) as well as the onestepahead forecasts of \({Y}_{q}^{\left(0\right)}\) as \({Y}_{q+1\backslash q}^{\left(0\right)}={\sum }_{k=1}^{4}{Y}_{q+1\backslash q}^{\left(k\right)}\). Finally, we assume for the simulated process \({Y}_{q}^{\left(0\right)}\) that it can be estimated via a regression model that incorporates all the explanatory variables, i.e., \({y}_{q}^{\left(k\right)}={\beta }_{0}^{\left(k\right)}+{\sum }_{i=1}^{4}\left({\beta }_{i}^{\left(k\right)}\left(1L\right){x}_{i,q}^{\left(k\right)}\right)+{\varepsilon }_{q}^{\left(k\right)}\). So, we define the onestepahead forecasts of \({Y}_{q}^{\left(0\right)}\) from this regression as \({Y}_{q+1\backslash q}^{*\left(0\right)}\).
By design, the true data generating process of \({Y}_{q}^{\left(0\right)}\) is the aggregation of the components, or \({Y}_{q}^{\left(0\right)}=\left({\sum }_{k=1}^{4}{Y}_{q}^{\left(k\right)}\right)\). So, in terms of statistical evaluation of forecasting accuracy, the \({Y}_{q+1\backslash q}^{\left(0\right)}\) forecasts must be more accurate compared to \({Y}_{q+1\backslash q}^{*\left(0\right)}\) forecasts according to the classical loss functions, despite the fact that we have imposed \(k\) forecasting errors, \({\varepsilon }_{q+1\backslash q}^{\left(k\right)}.\)
The simulations have been conducted for various values of parameters \({\beta }_{0}^{\left(k\right)}, {\beta }_{1}^{\left(k\right)}\) and of the magnitude of the error term,\({\sigma }_{q}^{2\left(k\right)}\), around the initially estimated values;\({\beta }_{0}^{\left(1\right)}=0.001\),\({\beta }_{0}^{\left(2\right)}=0.01\),\({\beta }_{0}^{\left(3\right)}=0.002\),\({\beta }_{0}^{\left(4\right)}=0.0007\),\(0.1\le {\beta }_{1}^{\left(1\right)}\le 1.8\),\(0.095\le {\beta }_{1}^{\left(2\right)}\le 1.595\), \(0.02\le {\beta }_{1}^{\left(3\right)}\le 1.42\) and\(0.01\le {\beta }_{1}^{\left(4\right)}\le 1.51\). Figure 6 presents the RMSE loss function for \({Y}_{q+1\backslash q}^{\left(0\right)}\) and \({Y}_{q+1\backslash q}^{*\left(0\right)}\) and the dispersion measure as well. The values of the RMSE loss function for the aggregated forecast \({\sum }_{k=1}^{4}{Y}_{q+1\backslash q}^{\left(k\right)}\) are stable across the various combinations of the parameter’s values. On the other hand, the values of \(RMSE=\sqrt{{10.000}^{1}\sum_{q=1}^{10.000}{\left({Y}_{q+1}^{\left(0\right)}{Y}_{q+1\backslash q}^{*\left(0\right)}\right)}^{2}}\) are much higher (almost 6 times higher). Naturally, the dispersion measure is not related to the values of the RMSE loss function, because of the heterogeneity of the simulated framework in Eq. (41). However, as in the previous simulated framework, we reach a similar conclusion that the aggregation of the predictions provides more accurate onestepahead predictions, despite the inclusion of multiple sources of forecast errors.
10 Conclusions and further research
Literature has often highlighted that sophisticated models can rarely outperform the forecasting ability of a naive model; see D’Agostino et al. (2006) and Campbell (2007). Schumacher and Breitungth (2008) note that a sophisticated factor model is able to provide only moderate forecast performance in predicting German GDP, but as Schumacher (2010) notice, the preselection of international indicators may contain additional information in forecasting GDP. So, in contrast our paper contributes to the literature by providing both empirical and simulated evidence that more accurate nowcasting estimations of GDP require the use of a disaggregated multilayer mixedfrequency framework.
Indeed, the nowcasting ability of the AR(1) naive model is not better only if we define a sophisticated model framework. The proposed model framework requires the preselection of the explanatory variables. The explanatory variables must be related to the components of GDP based on a multilayer mixedfrequency framework, and we observe that even the daily available financial data are able to reduce the nowcasting error. So, we realize that the disaggregation into components reduces the forecasting error despite the inclusion of multiple sources of forecast errors.
Οf course, there is, still, much to be done that could possibly improve the nowcasting accuracy. The induction of a supervised algorithm, like the Lasso model selection process, can probably identify the explanatory variables that are strongly associated with the nowcasted variable. One further avenue that could improve the nowcasting accuracy is to find a way of exploiting the crosssectional information to get more accurate estimates or models.
Concluding, the estimation of the Dmodel^{Footnote 10} with the same structure among data and the same equations across layers but with data coming from another country will be like putting data into a black box. The construction of such framework requires the knowledge of data availability, their quality, and their interconnectedness. Thus, before the replication of the proposed model framework, the careful collection of the data and the construction of the appropriate connections among economic variables and across time is a necessity.
Availability of data and materials
The data are available upon request.
Notes
The acronym Dmodel stands for the Disaggregated model.
For the estimation of the models, we define a specific conditional mean, but we do not need to specify the distribution of the error term, as long as the independency of residuals over time holds. The assumption of normally distributed errors is required for the computation of the maximum likelihood.
We do not preselect a specific method, i.e., AR(1), for estimating the missing values. It is case wise.
The variables are seasonally adjusted with the X12 method. For variables with nonpositive values, the transformation \({x}_{t}^{*}={x}_{t}min\left({\left.{x}_{t}\right\}}_{t=0}^{T}\right)+1\) is used.
The existence of multicollinearity deteriorates the forecasting accuracy.
For comparability, the mean absolute percentage forecast error taking only the last quarter of each year is 2.28%.
The random walk, 1^{st} difference transformation, distributed lagged models have also been tested, but the AR(1) performs better.
The simulations and their outputs are available upon request.
This is the case for similar model framework having been proposed in the literature, as well.
References
Andreou, E., Ghysels, E., & Kourtellos, A. (2010). Regression models with mixed sampling frequencies. Journal of Econometrics, 158, 246–261.
Andreou, E., Ghysels, E., & Kourtellos, A. (2013). Should macroeconomic forecasters use daily financial data and how? Journal of Business and Economic Statistics, 31, 240–251.
Angelini, E., CambaMendez, G., Giannone, D., Reichlin, L., & Rünstler, G. (2011). Shortterm forecasts of Euro Area GDP growth. Econometrics Journal, 14(1), 25–44.
Angelini, E., Bańbura, M., Rünstler, G. (2008). Estimating and forecasting the Euro Area monthly national accounts from a dynamic factor model. In European Central Bank Working Paper, Series 953.
Antipa, P., Barhoumi, K., BrunhesLesage, V., & Darne, O. (2012). Nowcasting German GDP: A comparison of bridge and factor models. Journal of Policy Modeling, 34(6), 864–878.
Artis, M. J., Banerjee, A., & Marcellino, M. (2005). Factor forecasts for the UK. Journal of Forecasting, 24, 279–298.
Baffigi, A., Golinelli, R., & Parigi, G. (2004). Bridge models to forecast the Euro Area GDP. International Journal of Forecasting, 20(3), 447–460.
Bańbura, M., & Rünstler, G. (2011). A look into the factor model black box: Publication lags and the role of hard and soft data in forecasting GDP. International Journal of Forecasting, 27(2), 333–346.
Banerjee, A., & Marcellino, M. (2006). Are there any reliable leading indicators for us inflation and GDP growth? International Journal of Forecasting, 22(1), 137–151.
Barhoumi, K., Benk, S., Cristadoro, R., Reijer, A. D., Jakaitiene, A., Jelonek, P., Rua, A., Rünstler, G., Ruth, K., & Nieuwenhuyze, C. V. (2008). Shortterm forecasting of GDP using large monthly datasets: A pseudo realtime forecast evaluation exercise. European Central Bank, Occasional Paper Series, 84, 1–25.
Bessec, M. (2012). Shortterm forecasts of French GDP: A dynamic factor model with targeted predictors. In Banque de France, Working Paper 409.
Boivin, J., & Ng, S. (2006). Are more data always better for factor analysis? Journal of Econometrics, 132(1), 169–194.
Campbell, S. (2007). Macroeconomic volatility, predictability, and uncertainty in the great moderation: Evidence from the survey of professional forecasters. Journal of Business & Economic Statistics, 25, 191–200.
Chernis, T., & Sekkel, R. (2017). A dynamic factor model for nowcasting Canadian GDP growth. Empirical Economics, 53(1), 217–234.
Clark, T. E., & West, K. D. (2007). Approximately normal tests for equal predictive accuracy in nested models. Journal of Econometrics, 138, 291–311.
Clements, M. P., & Galvão, A. B. (2009). Forecasting US output growth using leading indicators: An appraisal using MIDAS models. Journal of Applied Econometrics, 24(7), 1187–1206.
D’Agostino, A., McQuinn, K., & O’Brien, D. (2012). Nowcasting Irish GDP. OECD Journal of Business Cycle Measurement and Analysis, 2, 1–11.
D'Agostino, A., Giannone, D., Surico, P. (2006). (Un)predictability and macroeconomic stability. In European Central Bank, Working Paper, 605.
Dahl, C. M., Hansen, H., & Smidt, J. (2009). The cyclical component factor model. International Journal of Forecasting, 25(1), 119–127.
De Mol, C., Giannone, D., & Reichlin, L. (2008). Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146(2), 318–328.
Den Reijer, A. H. J. (2005). Forecasting Dutch GDP using large scale factor models. In De Nederlandsche Bank, Working Paper, 28.
Diebold, F. X. (2020). Realtime real economic activity: Exiting the great recession and entering the pandemic recession. In Working Paper 27482, National Bureau of Economic Research, July 2020.
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 20(1), 134–144.
Diron, M. (2008). Shortterm forecasts of euro area real GDP growth: An assessment of realtime performance based on vintage data. Journal of Forecasting, 27(5), 371–390.
Doz, C., Giannone, D., & Reichlin, L. (2011). A twostep estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188–205.
Duarte, C., & Rua, A. (2007). Forecasting inflation through a bottomup approach: How bottom is bottom? Economic Modelling, 24, 941–953.
Foroni, C., & Marcellino, M. (2014). A comparison of mixed frequency approaches for nowcasting Euro area macroeconomic aggregates. International Journal of Forecasting, 30(3), 554–568.
Ghysels, E., SantaClara, P., & Valkanov, R. (2006). Predicting volatility: Getting the most out of return data sampled at different frequencies. Journal of Econometrics, 131, 59–95.
Giannone, D., Reichlin, L., & Small, D. (2008). Nowcasting: The realtime informational content of macroeconomic data. Journal of Monetary Economics, 55(4), 665–676.
Hansen, P. R. (2005). A test for superior predictive ability. Journal of Business and Economic Statistics, 23, 365–380.
Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453–497.
Heij, C., van Dijk, D., & Groenen, P. J. (2008). Macroeconomic forecasting with matched principal components. International Journal of Forecasting, 24(1), 87–100.
Heij, C., van Dijk, D., & Groenen, P. J. (2011). Realtime macroeconomic forecasting with leading indicators: An empirical comparison. International Journal of Forecasting, 27(2), 466–481.
Iacoviello, M. (2001). Shortterm forecasting: Projecting Italian GDP, one quarter to two years ahead. In International Monetary Fund, IMF Working Papers 01/109.
Jansen, W. J., Jin, X., & de Winter, J. M. (2016). Forecasting and nowcasting real GDP: Comparing statistical models and subjective forecasts. International Journal of Forecasting, 32(2), 411–436.
Kim, H. H., & Swanson, N. R. (2018). Methods for backcasting, nowcasting and forecasting using factorMIDAS: With an application to Korean GDP. Journal of Forecasting, 37(3), 281–302.
Kuzin, V., Marcellino, M., Schumacher, C. (2011). Midas vs. mixed frequency VAR: Nowcasting GDP in the Euro Area. International Journal of Forecasting, 27(2), 529–542.
Marcellino, M., & Schumacher, C. (2010). Factor MIDAS for nowcasting and forecasting with raggededge data: A model comparison for German GDP. Oxford Bulletin of Economics and Statistics, 72(4), 518–550.
Marcellino, M., Favero, C. A., & Neglia, F. (2005). Principal components at work: The empirical analysis of monetary policy with large data sets. Journal of Applied Econometrics, 20(5), 603–620.
Marcellino, M., Stock, J. H., & Watson, M. (2003). Macroeconomic forecasting in the euro area: Country specific versus euro wide information. European Economic Review, 47, 1–18.
Peña, D., & Poncela, P. (2004). Forecasting with nonstationary dynamic factor models. Journal of Econometrics, 119(2), 291–321.
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89, 1303–1313.
Schumacher, C. (2007). Forecasting German GDP using alternative factor models based on large data sets. Journal of Forecasting, 26(4), 271–302.
Schumacher, C. (2010). Factor forecasting using international targeted predictors: The case of German GDP. Economics Letters, 107, 95–98.
Schumacher, C. (2016). A comparison of MIDAS and bridge equations. International Journal of Forecasting, 32(2), 257–270.
Schumacher, C., & Breitungth, J. (2008). Realtime forecasting of German GDP based on a large factor model with monthly and quarterly data. International Journal of Forecasting, 24, 386–398.
Stakénas, J. (2012). Generating shortterm forecasts of the Lithuanian GDP using factor models. In Bank of Lithuania, Bank of Lithuania, Working Paper Series 13.
Stock, J. H., & Watson, M. W. (2002a). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics, 20, 147–162.
Stock, J. H., & Watson, M. W. (2002b). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460), 1167–1179.
Stock, J. H. and M. W. Watson (2005a). Implications of dynamic factor models for VAR analysis, National Bureau of Economic Research, Working Paper No. 11467.
Stock, J. H., & Watson, M. W. (2005b). An empirical comparison of methods for forecasting using many predictors. Princeton University, Working paper.
Van Nieuwenhuyze, C. (2005). A generalised dynamic factor model for the Belgian economy: Identification of the business cycle and GDP growth forecasts. Journal of Business Cycle Measurement and Analysis, 2(2), 213–247.
White, H. (2000). A reality check for data snooping. Econometrica, 68, 1097–1126.
Acknowledgements
I would like to thank the editor Professor Rafael Lalive and the two anonymous reviewers for their helpful comments and suggestions. Their insights have improved the quality of the paper substantially. I would also like to thank Eleftheria Kafousaki, Thanos Petralias, Stelios Panagiotou, Dimitris Malliaropoulos, and Zacharias Bragoudakis for their useful comments. The views expressed in this paper are those of the author and not necessarily those of either the Bank of Greece or the Eurosystem.
Funding
The research has not been funded.
Author information
Authors and Affiliations
Contributions
S.D. is a Full Professor of Statistics and Econometrics at the Department of Economics and Regional Development at the Panteion University of Social and Political Sciences. He holds a Ph.D. in Statistics from Athens University of Economics and Business. His research interests revolve mainly around the financial and economic forecasting and energy economics. His research has received multiple research funding, that is, from FP7, Horizon 2020, and European Commission. He has also served as a consultant for US Energy Information Administration, Economic Chamber of Greece, Bank of Greece, and so on. Author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Article  Country  Period  Technique  Explanatory variables  Results  

Angelini et al (2011), Econometrics Journal  Euro Area  1999Q1 2007Q2  Pools of bridge equations and the ‘bridging with factors’ approach proposed by Giannone et al. (2008) for the backcast, nowcast and shortterm forecast of euro area quarterly GDP growth  85 macroeconomic time series  The factor model improves upon the pool of bridge equations  
Angelini et al. (2008), ECB  Euro Area  1993Q1 2006Q2  A dynamic factor model based on Doz et al. (2011), which differs from other approaches (e.g., Stock & Watson, 2002a; Forni et al., 2000)  85 macroeconomic time series  For GDP and a number of components, the factor model forecasts beat the forecasts from alternative model such as quarterly models and bridge equations  
Antipa et al., (2012), Journal of Policy Modeling  Germany  1993Q1 2007Q4  Comparing the BMs and DFMs with a rolling forecast exercise in order to assess the forecasting performance  Forecast errors of the BMs are smaller than those of the DFMs  
Artis et al., (2005), Journal of Forecasting  UK  1970Q1 1998Q3  Dynamic factor model  81 macroeconomic time series  6 factors explain about 50% of the variability of 81 variables, the factors are related to groups of key variables, such as interest rates, price series, monetary aggregates, labor market variables and exchange rates  
Baffigi et al., (2004), International Journal of Forecasting  Euro Area, Germany, France, Italy  1980Q1 2002Q2  Bridge model against three types of benchmark models: univariate ARIMA, multivariate VAR and structural models  Macroeconomic indicators for each country  BM performance is always better than benchmark models, provided that at least some indicators are available over the forecasting horizon  
Bańbura and Rünstler (2011), International Journal of Forecasting  Euro Area  1993Q1 1996Q2  Dynamic factor model  32 real activity series, 22 survey series, 22 financial series  Both forecast weights and forecast precision measures attribute an important role to survey data, whereas real activity data obtain rather low weights, apart perhaps from the backcasts. Financial data provide complementary information to both real activity and survey data for nowcasts and onequarter ahead forecasts of GDP  
Banerjee and Marcellino (2006), International Journal of Forecasting  USA  1975Q1 2001Q4  Dynamic factor model  64 inflation indicators, 74 GDP growth indicators  All methods are systematically beaten by single indicator models both for inflation and GDP growth  
Barhoumi et al. (2008), ECB  Selected European countries and the Euro Area  1991m1 2006m6  Bridge model and dynamic factor model  More than one hundred series for each country  For the euroarea countries models that exploit timely monthly releases fare better than quarterly models. Factor models, which exploit a large number of releases, generally do better than averages of bridge equations  
Bessec (2012), Banque de France  France  1990Q1 2010Q4  Dynamic factor model  French GDP growth and 96 predictors. (surveys, indicators of real activity, monetary and financial variables)  Financial variables and survey variables are predominant at longer horizons, while the weight of real indicators increases at shorter ones. A pseudo realtime evaluation over the last decade shows again relative to factor models without preselection or with preselection made on the full dataset at least for large horizons  
Boivin and Ng (2006), Journal of Econometrics  USA  1971Q1 1997Q4  A factor model, which focuses on the finite sample properties of the PC estimator in the presence of crosssection correlation in the idiosyncratic errors, which is a pervasive feature of the data  147 series as in Stock and Watson (2002a)  In a realtime forecasting exercise, factors extracted from as few as 40 prescreened series often yield satisfactory or even better results than using all 147 series  
D'Agostino et al. (2006), ECB  USA  1959m1 2003m12  Random walk model, univariate forecasts, factor augmented forecast, in which the univariate models are augmented with common factors extracted from the whole panel of series. Pooling of bivariate forecasts: for each variable the forecast is defined as the average of 130 forecasts obtained by augmenting the model with each of the remaining 130 variables in the data set  131 monthly time series  The ability to predict several measures of inflation and real activity has declined remarkably, relative to naive forecasts, since the mid1980s. The informational advantage of the Fed and professional forecasters is limited to the 1970s and the beginning of the 1980s  
D'Agostino et al. (2012), OECD Journal of Business Cycle Measurement and Analysis  Irish  1980Q1 1996Q4  Dynamic factor model that produces nowcasts and backcasts of Irish quarterly GDP  Panel dataset of 35 indicators  The mean squared forecast errors for both the nowcasts and the backcasts based on DF model are considerably smaller than those of the benchmark model (average growth rate model)  
Dahl et al. (2009), International Journal of Forecasting  Denmark  Cyclical components factor model  172 monthly and 74 quarterly series  Cyclical components factor model improves the forecast accuracy substantially relative to the regular diffusion index model for four Danish macroeconomic variables  
Stock and Watson (2005a), NBER  USA  1959m1 2003m12  Static and dynamic factor models for VAR analysis  Monthly observations on 132 US macro time series  A large number of dynamic factors accounts for the movements in these data. Evidence against the VAR restrictions implied by the exact DFM. The data are well described by an approximate factor model but not an exact factor model. The structural FAVAR permits examination of overidentifying restrictions and diagnosis of modeling problems  
Stock and Watson (2005b), Working Paper  USA  1960m1 2003m12  This paper compares the empirical accuracy of forecast combination, model selection, dynamic factor model forecasts, Bayesian model averaging, empirical Bayes methods  131 monthly macro time series  The FAAR models and the principal component BMA models with small values of g put weight on a few of the principal components, resulting in more accurate forecasts  
Favero et al. (2008), Journal of Applied Econometrics  USA, Euro Area; DE/IT/FR/ES  1959m1 1998m12 (USA) 1982m1 1997m8 (Euro Area)  Static and dynamic factor models  146 (USA) and 105 (DE/IT/FR/ES) time series  Factor models produce useful instruments for the estimation of forwardlooking economic models. The DFM is more parsimonious than the static model, but the overall performance is similar  
De Mol et al. (2008), Journal of Econometrics  USA  1959m1 2003m12  Bayesian shrinkage as an alternative to PCA  131 macroeconomic variables (real and nominal variables, asset prices, surveys)  The forecasts provide a valid alternative to the PCA and are correlated with those obtained from the PCA. In addition, from an economic point of view, the results are not more interpretable than those of the PCA  
Doz et al. (2011), Journal of Econometrics  Euro Area  1993Q1 2006Q2  The parameters of a DFM are estimated using OLS on PC and given the estimates the factors are estimated using a Kalman smoother  Simulation study for the DGP and 85 macroeconomic time series  This approach improves the estimation of the factors for small values of n  
Giannone et al. (2008), Journal of Monetary Economics  USA  1982Q1 2005Q1  DFM using a twostep estimator for the factors: PCA followed by Kalman smoother  200 macroeconomic time series  Precision of the nowcast increases monotonically as new data become available  
Heij et al. (2008), International Journal of Forecasting  USA  1959m3 1998m12  Matched principal components regression (MPCR)  146 macroeconomic predictor variables, dataset of Stock and Watson (2002a)  A modified PCM is proposed in order to improve the forecasting ability compared to the PCR. The MPCR maximizes the variance of the predictors during the estimation interval  
Heij et al. (2011), International Journal of Forecasting  USA  1959m1 2009m5  An improved method for the construction of principal components in macroeconomic forecasting  10 leading indicators and 4 coincident indicators  The proposed modification leads, on average, to more accurate forecasts than previously used principal component regression methods  
Kuzin et al. (2011), International Journal of Forecasting  Euro Area  1992Q1 2008Q1, 1992m1 2008m6  Comparison between mixeddata sampling (Midas) and mixedfrequency VAR (MFVAR) approaches  20 monthly indicators from four main categories: industrial production, surveys, interest rates, exchange rates and money stocks, raw material prices and car registrations  Forecasting performance does not result in a clear winner. For shortterm horizons ARMidas performs better than Midas and MFVAR, whereas for longerterm horizons MFVAR outperforms the other two  
Schumacher (2010), Economic Letters  Germany  1980Q3 2004Q4  Large factor model—factors are estimated by PC—targeted predictors  531 variables: 123 quarterly indicators and data covering EA and G7 countries  International data improve forecasts only in the case that variables are preselected by LARSEN (leastangle regression with elastic net)  
Schumacher and Breitung (2008), International Journal of Forecasting  Germany  1991Q2 2005Q1, 1991m4 2004m12  Factors are estimated applying an EM logarithm combined with a PC estimator  52 time series: 39 monthly series and 13 quarterly series  The mixedfrequency factor model performs slightly better in comparison with the balanced data factor models. The difference is more pronounced once the realtime factor model is compared to simple benchmark models  
den Reijer (2005), De Nederlandsche Bank  The Netherlands  1980Q1 2002Q4, GDP growth forecasts up to 8 quarters ahead  Largescale factor model based on the static approach of Stock and Watson (2002a) and the dynamic approach of Forni et al. (2000)  270 series underlying the Central Bank's macroeconomic structural model supplemented with leading indicator variables. Subset of 170 series  Full data sample: the factor models do not outperform the AR benchmark model. Data subsample: The forecasting performance of the factor models improves. The dynamic factor model systematically outperforms the AR benchmark model  
Stakénas (2012), Lietuvos Bankas  Lithuania  1996Q1 2011Q3, 2000Q2 2011Q1 for forecast evaluation  Principal components, generalized principal components and the state space model  52 monthly indicators: survey, industry production, trade, price, financial variables, etc.  Factor models perform better than naïve benchmark models. The smallscale factor model (5 variables) outperforms the largescale model comprising the whole dataset  
Peña and Poncela (2004), Journal of Econometrics  European OECD countries; Belgium, France, Italy, the Netherlands, Spain  Annual real GNP 1949–1997. After 1981 forecasts were generated  A dynamic factor model with a common trend and a common AR(1) stationary factor  Τhe factor model provides substantial improvement in forecasts with respect to both univariate and shrinkage univariate forecasts  
Iacoviello (2001), IMF  Italy  1985Q22000Q2 forecasts from 1996Q2  Indicator approach: bridge model (shortterm forecasting) Econometric approach: Bayesian VAR (longerterm forecasting)  Bridge model: ind. Prod. index, coincident survey indicator, leading survey indicator BVAR model: real household cons., tbill rate, coincident survey ind., exchange rate, cpi, German gdp  Based on forecasting performance, both models are useful tools  
Stock and Watson (2002b), Journal of the American Statistical Association  USA  1959m1 1998m12, 12month ahead forecasts 1970m1 1997m12  Principal components, factor model, univariate AR, VAR, leading indicator model, ARaugmented PCM  149 monthly macroeconomic variables  The factor models offer substantial improvement stemming mainly from the first two or three factors. The leading indicator and the VAR models perform slightly better than the univariate AR 
Appendix B
We are going through the 3rd month of the current quarter. Keeping in mind that the balance of payments is published with a lag of 2 months, or, \({\varvec{I}}_{m} = \left\{ {x_{m  2} ,pmi_{m  1}^{{\left( {exp} \right)}} } \right\}\), the realtime nowcast of \(x_{m}\) for the 3rd month of the quarter equals:
The realtime nowcast of \({x}_{m}\) for the 2nd month of the current quarter is:
And for the 1st month of the quarter, the \({x}_{m}\) has been published already.
When we are in the 2nd month of the quarter, the realtime nowcast of \({x}_{m}\) for the 3rd month of the quarter equals to:
The realtime nowcast of \(x_{m}\). for the 2nd month of the quarter is computed as:
F the 1st month of the quarter, the \(x_{m}\) is estimated as:
When we are in the 1st month of the quarter, the realtime nowcast of \({x}_{m}\) for the 3rd month of the quarter is estimated by the firstorder autoregressive model for \(\left(1L\right)log\left({x}_{m}\right)\), as the \({pmi}^{\left(exp\right)}\) for the 3rd month has not been published. Thus:
For the 2nd month of the quarter, the \({x}_{m}\) is estimated, based on \({{\varvec{I}}}_{m}=\left\{{x}_{m2},{pmi}_{m2}^{\left(exp\right)}\right\}\), as:
For the 1st month of the quarter, the \({x}_{m}\) is estimated as:
Appendix C
The estimated coefficients (on the LHS) and their p values (on the RHS), for the private consumption based on Eqs. 5 and 6. On the quarterly (monthly) frequency, the estimated parameters refer to current quarter (2nd month of current quarter), based on information available on the 3rd month of current quartet.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Degiannakis, S. The Dmodel for GDP nowcasting. Swiss J Economics Statistics 159, 7 (2023). https://doi.org/10.1186/s41937023001098
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41937023001098