The iteration with the lowest root mean square error (RMSE) is chosen and denoted as H^sr∗. Typically,
r∗r∗ is around 4. Hs(t=0,m)=0Hs(t=0,m)=0 is assumed when applying Eq. (19) to simulate HsHs. One important assumption in regression analysis is that the residuals ( ε(t)=Hs(t)-H^s(t) in this case) are Gaussian distributed. This assumption is violated here, because in theory Hs(t)Hs(t) are non-negative data, which are obviously non-Gaussian. The consequences of such violation could tender the model performance, even resulting in nonsense values such as H^s<0. To evaluate the effects of violation of the Gaussian assumption on the model performance, and to improve the model performance, we explore two options for transforming the positive data (actually, both G and HsHs are all positive values):
(i) the log transformation (noted as trlntrln in Table 4), which has been used by others Wortmannin (e.g. Casas-Prat and Sierra, 2010 and Ortego et al., 2012); and (ii) the Box–Cox power transformation (noted as trbctrbc in Table 4 and Eq. (21)) ( Sakia, 1992), which also includes the log transformation as a special case (the case of λ=0λ=0) and has recently been applied by Wang et al. (2012): equation(21) trbc(X)=ln(X)ifλ=0,(Xλ-1)/λotherwise,where X denotes a variable of positive values. The parameter λλ is chosen so that the departure of X from a Gaussian distribution is minimized. As detailed in Table 4 (Settings 6–8), we apply these transformations to the Natural Product Library cost Oxymatrine predictand (HsHs) alone, and to both HsHs and the non-Gaussian predictor G (before calculating the anomalies and deriving the principal components, but after calculating the direction of the SLP gradient). The resulting model performance is compared later in Section 5. The statistical model is calibrated
and validated with HIPOCAS data (1958–2001) (see Section 3.1), which is split into two non-overlapping subsets: 1971–2000 for model calibration, and 1958–1970 for evaluation of model performance. We use the HIPOCAS data for the period 1971–2000 (calibration period) to calibrate the statistical model, namely, to estimate the unknown parameters in Eq. (2), including aˆ,aˆP,aˆG,aˆEOF+,i,aˆEOF-,i and αˆr∗ (see Eqs. (2), (15) and (19) and Fig. 5). This 30-year period is also chosen as the baseline period to derive the climate model simulated baseline climate for use to infer projected future changes in HsHs (see Section 3.2). Then, we use the HIPOCAS data for the period 1958–1970 (validation period) to evaluate the performance of the above calibrated statistical model. The validation considers the following three aspects: (i) overall model performance, (ii) model skill for a range of different quantiles of wave heights, and (iii) model errors in modeling waves along the Catalan coast. Note that all anomalies in this study are relative to the climatological mean field of the baseline period (1971–2000).