Close Home
iForest - Biogeosciences and Forestry
vol. 10, pp. 739-745
Copyright © 2017 by the Italian Society of Silviculture and Forest Ecology
doi: 10.3832/ifor2427-010

Research Articles

Sampling strategies for high quality time-series of climatic variables in forest resource assessment

Carlotta Ferrara (1), Maurizio Marchi (1)Corresponding author, Silvano Fares (1), Luca Salvati (2)


A periodic evaluation of meteorological parameters in forest sites can have a central role for the quantification of forest response to the climate condition and to balance future management guidelines and forestry ([35], [8]). In the recent decades the climatological aspect gained an increasing attention also in the forestry field due to direct effects on forest resources ([23], [31], [12], [22]), which in turn are fundamental to preserve ecosystem services biodiversity and multi-functionality by means of a sustainable forest management ([9], [18], [28]). Minimum requirements for ecological studies are generally long-term time series of meteorological variables collected in the environment under investigation or at least in forest sites as close as possible to the studied environment. For this purpose, many meteorological networks have been implemented in connection with research activities and are currently maintained by public programs. The institutional commitment implies that the higher the density of a meteorological network is, the more financial efforts in term of man and operational costs are required to ensure an adequate consistency of the data. ICP-Forests monitoring network was established in 1985 under the Convention on Long-range Trans-boundary Air Pollution (CLRTAP) of the United Nations Economic Commission for Europe (UNECE) with the intent of monitoring climate change effects and human-related pressure on forest ecosystems.

A robust quality check to ensure the absence of missing periods and storage errors is always mandatory and several statistical techniques have been adopted to fill gaps in climatic data series ([11], [14], [36]). Such methods depend more on the availability of data rather than complexity. In case of interpolation of climatic surfaces, a key role is played by the representativeness of the meteorological network often masked by a regular spatial coverage that does not take into account all the physiographic features of a given spatial extent, which play a major role in climate variability ([5], [20]).

Thanks to the diffusion of web knowledge and data storage, many climatic datasets are freely available, generally provided as global datasets with, in many cases, a spatial grid of approximately 1 km cell. Even assuming that statistical downscaling and interpolations are correct for a given site of interest, the representativeness of the obtained dataset requires careful evaluation. To achieve this goal, linear regression methods were adopted, comparing trends of interpolated data with the real data collected in situ ([2], [19]). Such simple methods consider the amount of explained variance and the p-value of the slope parameter as diagnostic indexes of representativeness.

In this paper, the meteorological time-series data relative to 13 monitoring sites in Italy has been analysed with the aim of defining the minimum number of valid data to be considered representative and adequate for further analysis. Three climatic parameters (air temperature, air humidity and precipitation) were extracted from the Italian ICP-Forests meteorological network collected between 1998 and 2013. Then, an increasing number of records were progressively removed using a bootstrap repetition, following evaluation of the representativeness of the remained records in terms of error of estimation and explained variance on the whole climatic period.

Material and methods 

The climatic data studied in this research were collected in a 16-year period (1998-2013) from the Italian monitoring network represented by 13 test sites around the whole country (Fig. 1). This network was designed in the framework of the ICP-Forests monitoring program, which probably represents one of the most important sources of information for forest researchers at European level ([1]). As many other European Countries, ICP-Forests monitoring network in Italy is structured in two levels of detail. The extensive network (LEVEL I) is much more represented and was established in 1985 with 243 plots units across the whole forested area of Italy. The intensive network (LEVEL II) is more recent, designed in 1995 under the “National Integrated Programme for Forest Ecosystems Monitoring” (CONECOFOR) with a non-probabilistic scheme and implemented between 1999 and 2003. LEVEL II sites are designed to collect intensive and long-term forestry-related characteristics, such as local meteorology, deposition, crown condition, foliar chemistry, wood increment, carbon storage. The most representative sites selected are Holm oak forests, flood-plain forests, Norway spruce stands, beech and European larch forests ([4], [21]). All the structural forest types (i.e., high forests, stored coppices, transitory crops) are well preserved mature stands in line with the ICP-Forests purpose, which is to investigate climate change and human impacts on forests.

Fig. 1 - Geographic distribution of the ICP-Forests plots. The plots included in this study are marked with an asterisk.

One of the main features of the ICP-Forests network is the availability of local meteorological data, with a meteorological station located in each ICP-Forest plot. The ICP-Forests meteorological network in Italy, one of the most climatically-heterogeneous countries in Europe, revealed mean annual temperature ranging between -0.9 °C and 15.1 °C, and total annual precipitation comprised between 595 mm and 1528 mm. The detected variables, monitored continuously at 10 second time resolution are air temperature for the profile of 0.1 m and 2 m (AT01 and AT2), relative humidity for the profile of 0.1 m and 2 m (RH01 and RH2), precipitation (PR) and snow height (SH). Each meteorological station is equipped with power unit and data logger. Climatic variables are stored in a database aggregated to a temporal resolution of 1 day. All the measurements were conducted below the canopy and the main features of each location are reported in Tab. 1.

Tab. 1 - Selected climatic variables in the study area. (Tmean): mean temperature; (Tdiff): temperature difference; (Py): annual precipitation; (Ps): summer precipitation; (DD5): degree days > 5 °C; (HCMD): Hargreaves climatic moisture deficit.

In this work, the time lapse between 1th January 1998 and 31th December 2013 has been analysed with 1-day sampling interval for a total of 5844 records. As a first step, we determined the proportion of missing data over the entire time series for each meteorological station, considering AT2 and RH2 representative of AT01 and RH01, respectively (Tab. 2).

Tab. 2 - Characteristics of the climatic time series between January 1998 and December 2013. (AT01, AT2): air temperature at 0.1 m and 2 m, respectively; (RH01, RH2): relative humidity at 0.1 m and 2 m, respectively; (PR): precipitation.

Two methods of statistical analysis were performed to obtain estimators aiming at (1) evaluating the error of estimation at seasonal level using an increasing proportion of available data using the Mean Absolute Relative Error (MARE); and (2) evaluating the amount of explained variance (r2) along the whole analysed period (1998-2013) estimating the monthly mean value with the same procedure. The MARE was calculated as (eqn. 1):

\begin{equation} MARE={\frac{ \left | \hat{ \gamma } - \gamma \right | } { \gamma }} \end{equation}

where {hat}γ and γ represent the estimated and the observed seasonal value of AT2 or RH2 or PR, respectively. For each proportion of sampling from 1% to 99% with an increase step of +1% (i.e., 98 repetitions in total) a unique value of MARE per each climatic variable was calculated. In order to avoid biases, a bootstrap procedure was implemented by repeating a random extraction of seasonal values. For each sampling proportion between 1% and 99%, 10.000 random extractions were computed. Then, the final MARE was obtained by averaging the 10.000 repetitions. To facilitate our evaluation, daily records were grouped in triplets of months (season I: January-March; season II: April-June; season III; July-September; season IV: October-December). Concerning the second analysis, we adopted a linear regression approach. A linear model was fitted using the estimated monthly average values (γ’) as independent variable and the observed values (γ) as dependent value. Then the amount of explained variance (r2 of the fitted model) was calculated as follows (eqn. 2):

\begin{equation} r^2={\frac{ \sum \left (\hat{ \gamma } - \bar{ \gamma } \right )^2} { \sum \left ( \gamma - \bar{ \gamma } \right )^2 }} \end{equation}

where {hat}γ is the predicted value of the linear model. Finally, all the results (MARE and r2 for each intensity of sampling) were modelled using the a Random Forest (RF) regression model ([6]) to assess the influence of each variable (i.e., moth, season, year, climatic variable, sampling proportion) and with the purpose of understanding the most relevant drivers. The importance of each predictive variable was estimated from permuting Out-Of-Bag (OOB) data by recording, for each tree, the prediction error (Mean Square Error) on the OOB portion of the data. Then, the same procedure was adopted after permuting each predictor variable. The difference between the two values were then averaged over all trees, and normalized by the standard deviation of the differences ([14]). All the statistical analysis and the RF were implemented in R software ([24]).


Fig. 2 clearly shows the MARE of AT2, RH2 and PR plotted against the sampling proportion used to estimate the seasonal mean. While for season II and season III an almost uniform and small MARE was calculated across all the sites (plots), a much different result was obtained for season I and season IV. For these seasons, the bootstrap procedure for AT2 resulted in a MARE around 500% with a sampling proportion of 60% of the total seasonal days. Particularly for season IV, an almost flat line for RH2 and PR was observed against a very unstable AT2. We conclude that among the three variables tested in this study, AT2 showed the statistically lowest seasonal accuracy, although this could be due to the presence of outliers.

Fig. 2 - Relationship between the MARE and the increasing proportion of days sampled with the seasonal analysis for the three studied variables

Regarding the representativeness across the whole time period, different results were obtained for each variable. Results for the three variables are reported in Fig. 3 where observations are fitted using a negative exponential model, showing an r2 of 0.98, 0.94 and 0.93 when fitting AT2, RH2 and PR, respectively. The relationship with the sampling proportion showed that AT2 was the variable with the highest amount of explained variance (higher than 85%) with a sampling intensity of 10%. Conversely, PR showed a much more regular trend with an almost linear shape. RH2 revealed an exponential increase until reaching 50% of the sampling proportion.

Fig. 3 - Amount of climatic variability explained when an increasing proportion of monthly values (days) is sampled in the considered climatic period (1998-2013). The red line and the blue line correspond to the average value and the fitted exponential model.

The Random Forest algorithm (RF) was run on the obtained result. In a first test the MARE was modelled as a combination of all the covariates. In the second test the amount of explained variance was modelled without the “season” term. The importance of each predictor is reported in Tab. 3. In the seasonal analysis the sampling proportion (i.e., number of available observations) was found to be a less important variable compared to the others, which were well balanced and not significantly different between each other, with a RF covering 62% of the total variability. On the other side, the RF run on the results of the trend analysis showed that sampling proportion and the studied climatic variable were the most relevant predictors, much more important than the meteorological station and the year, with an amount of explained variance of 92.3%.

Tab. 3 - Variables importance (% increment of MSE) when used as predictors in a Random Forest regression model. (n/i): not included.


In a changing environment, stable and reliable climatic data represent a fundamental resource to investigate the adaptability of forest systems. Many environmental and modelling studies are focused on long-term climatic averages (e.g., climatic normal over a 30 years period) to determine the effects of climate change on forest populations ([15], [22]). However, an increasing degree of representativeness is needed in studies that require a higher temporal-resolution, such as dendrochronology ([2], [19]), or studies based on seasonal effects on plant growth ([26], [29], [16]). The more detailed the analysis is, the more accurate and continuous the database should be. In such cases, missing data and outliers have to be identified and treated before further analysis in order to avoid possible biases. The presence of missing values in a dataset can heavily affect any kind of analysis ([30]), bringing to false expectations or underestimation of natural processes. To improve such datasets at regional level, the statistical downscaling with laps-rate regressions ([25], [13], [27]) or the use of statistical methods such as the Singular Values Decomposition ([7], [33]) may represent the only alternative to the interpolation of local climatic data from monitoring networks. When long time series are needed, data collected from web portals such as the CRU database (⇒ https:/­/­crudata.­uea.­ac.­uk/­cru/­data/­hrg/­), custom queries using stand-alone software ([34]) or object-oriented web portals (e.g., ⇒ http:/­/­climexp.­knmi.­nl) may represent the only solution. Our findings clearly highlight that, in case of monthly or seasonal average values, very few records can still allow accurate analysis of climate trends. Especially in the case of PR and RH2, a very small MARE was detected with a limited amount of measurements (<10%). AT2 showed a much wider difference between seasons, although we ascribe this to the presence of outliers, especially at ABR1 and FRI2 where the mean value of monthly air temperature is close to zero. In particular, while a similar standard deviation was found for almost all the plots in the same months, in FRI2 the coefficient of variation of AT2 was much higher than any other meteorological station. As a consequence, the high MARE in AT2 was mainly due to the small mean values in season I and IV, where most of the months are characterized by a low mean temperature. Results suggested that more attention on data collection should be paid during cold seasons given that almost all the meteorological stations experienced high air temperature values at least during the season II and III across almost the whole Italy. Under future climate changes a fundamental role will be played by extreme events, recognized as a high peak or depression after and before an almost stable climatic situation. Actually extreme events have been demonstrated to have a fundamental role for forest systems and plants communities even in Mediterranean areas ([10], [17], [3], [32]) and such rare and unpredictable events will require even more accurate and gap-free time-series to understand the effects of these events on forest ecosystems.

Although trend analysis showed that AT2 was the most seasonal-dependent variable, a low amount of data is required for its monthly representativeness. On the contrary, the trend analysis showed that PR requires consistent database (> 80% of valid data) to avoid relevant biases. RH2 showed an intermediate behaviour between AT2 and PR with an exponential increase of the amount of explained variability with an increased proportion of sampled days. This may be explained by the intrinsic temporal autocorrelation of such climatic parameters. Indeed, air temperature values are much more autocorrelated than relative humidity and precipitation.

In conclusion, our results may play a fundamental role both in case of local analysis (forest type - local climate) but also for the purpose of site comparison. Indeed, many efforts were spent to homogenize climate time-series for different sites by means of spatial interpolation with the use of external climate data ([36]), which expose to the risk of adopting unrealistic values. We demonstrated instead that when minimum requirements are accomplished (i.e., number of records per unit of time), the use of external databases may be avoided. This, of course, still implies that a rigorous data check for outliers should be performed. When monthly or seasonal accuracy is required for trend analysis, a high proportion of missing values can be accepted in case of AT2 and RH2, but not for PR due to the intra-seasonal or intra-monthly variability of the latter parameter. Although not the object of this study, when a higher resolution is required (daily or weekly) gap filling such as Singular Value Decomposition or external interpolated data may represent the only feasible solution.


Time series represent the time-evolution of the meteorological dynamic process and are fundamental to evaluate patterns and responses of forest species to climate changes. Accounting for forest response to climate is functional to sustainable forest management. When monthly or seasonal average values are needed, our results indicate that statistics can be proficiently estimated for both air temperature and relative humidity with a proportion of missing values higher than 50%. Conversely, the intra-seasonal or intra-monthly variability of precipitation requires a higher density of observations. In this case gap filling may represent the only possibility to avoid relevant biases. New emerging technologies have the potential to increase the robustness of the dataset thanks to remote control of measured parameters via wireless systems, especially in remote area where access is often costly and difficult, particularly in cold seasons when, as demonstrated in this study, a higher amount of measurements is needed to ensure data of good quality. Recent achievements obtained in the framework of EU funded projects such as SMART4ACTION showed that saving on maintenance costs of the stations and keeping a high level of accuracy in the measured values is possible.


The study was funded by the SMART4Action LIFE+ project “Sustainable Monitoring And Reporting To Inform Forest and Environmental Awareness and Protection” LIFE13 ENV/IT/000813.


Allegrini MC, Canullo R, Campetella G (2009). ICP-Forests (International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests): quality assurance procedure in plant diversity monitoring. Journal of Environmental Monitoring 11: 782.
::CrossRef::Google Scholar::
Amodei T, Guibal F, Fady B (2012). Relationships between climate and radial growth in black pine (Pinus nigra Arnold ssp. salzmannii [Dunal] Franco) from the south of France. Annals of Forest Science 70: 41-47.
::CrossRef::Google Scholar::
Barros C, Guéguen M, Douzet R, Carboni M, Boulangeat I, Zimmermann NE, Munkemuller T, Thuiller W (2017). Extreme climate events counteract the effects of climate and land-use changes in Alpine tree lines. Journal of Applied Ecology 54: 39-50.
::CrossRef::Google Scholar::
Bertini G, Amoriello T, Fabbio G, Piovosi M (2011). Forest growth and climate change: evidences from the ICP-Forests intensive monitoring in Italy. iForest - Biogeosciences and Forestry 4: 262-267.
::CrossRef::Google Scholar::
Bhowmik AK, Costa AC (2014). Representativeness impacts on accuracy and precision of climate spatial interpolation in data-scarce regions. Meteorological Applications 22: 368-377.
::CrossRef::Google Scholar::
Breiman L (2001). Random forests. Machine learning 5-32.
::CrossRef::Google Scholar::
Bretherton CS, Smith C, Wallace JM (1992). An intercomparison of methods for finding coupled patterns in climate data. Journal of Climate 5: 541-560.
::CrossRef::Google Scholar::
Bussotti F, Pollastrini M (2017). Traditional and novel indicators of climate change impacts on European forest trees. Forests 8 (4): 137.
::CrossRef::Google Scholar::
Di Salvatore U, Ferretti F, Cantiani P, Paletto A, De Meo I, Chiavetta U (2013). Multifunctionality assessment in forest planning at landscape level. The study case of Matese Mountain Community (Italy). Annals of Silvicultural Research 37: 45-54.
::CrossRef::Google Scholar::
Eilmann B, Rigling A (2012). Tree-growth analyses to estimate tree species’ drought tolerance. Tree Physiology 32: 178-187.
::CrossRef::Google Scholar::
Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ (2000). Creating a serially complete, national daily time series of temperature and precipitation for the Western United States. Journal of Applied Meteorology 39: 1580-1591.
::CrossRef::Google Scholar::
Fady B, Aravanopoulos FA, Alizoti P, Mátyás C, Wühlisch G, Westergren M, Belletti P, Cvjetkovic B, Ducci F, Huber G, Kelleher CT, Khaldi A, Kharrat MBD, Kraigher H, Kramer K, Mühlethaler U, Peric S, Perry A, Rousi M, Sbay H, Stojnic S, Tijardovic M, Tsvetkov I, Varela MC, Vendramin GG, Zlatanov T (2016). Evolution-based approach needed for the conservation and silviculture of peripheral forest tree populations. Forest Ecology and Management 375: 66-75.
::CrossRef::Google Scholar::
Flint LE, Flint AL (2012). Downscaling future climate scenarios to fine scales for hydrologic and ecological modeling and analysis. Ecological Processes 1: 2.
::CrossRef::Google Scholar::
Hastie T, Tibshirani R, Friedman J (2008). The elements of statistical learning (2nd edn). Springer-Verlag, Stanford, CA, USA, pp. 763.
::Google Scholar::
Isaac-Renton MG, Roberts DR, Hamann A, Spiecker H (2014). Douglas-fir plantations in Europe: a retrospective test of assisted migration to address climate change. Global Change Biology 20: 2607-2617.
::CrossRef::Google Scholar::
Kramer K, Ducousso A, Gömöry D, Hansen JK, Ionita L, Liesebach M, Lorent A, Schüler S, Sulkowska M, De Vries S, Von Wühlisch G (2017). Chilling and forcing requirements for foliage bud burst of European beech (Fagus sylvatica L.) differ between provenances and are phenotypically plastic. Agricultural and Forest Meteorology 234: 172-181.
::CrossRef::Google Scholar::
Lelieveld J, Hadjinicolaou P, Kostopoulou E, Chenoweth J, El Maayar M, Giannakopoulos C, Hannides C, Lange MA, Tanarhte M, Tyrlis E, Xoplaki E (2012). Climate change and impacts in the Eastern Mediterranean and the Middle East. Climatic Change 114: 667-687.
::CrossRef::Google Scholar::
Marchetti M, Vizzarri M, Lasserre B, Sallustio L, Tavone A (2014). Natural capital and bioeconomy: challenges and opportunities for forestry. Annals of Silvicultural Research 38: 62-73.
::CrossRef::Google Scholar::
Marchi M, Castaldi C, Merlini P, Nocentini S, Ducci F (2015). Stand structure and influence of climate on growth trends of a Marginal forest population of Pinus nigra spp. nigra. Annals of Silvicultural Research 39: 100-110.
::CrossRef::Google Scholar::
Marchi M, Chiavetta U, Castaldi C, Ducci F (2017a). Does complex always mean powerful? A comparison of eight methods for interpolation of climatic data in Mediterranean area. Italian Journal of Agrometeorology 1: 69-72.
::CrossRef::Google Scholar::
Marchi M, Ferrara C, Bertini G, Fares S, Salvati L (2017b). A sampling design strategy to reduce survey costs in forest monitoring. Ecological Indicators 81: 182-191.
::CrossRef::Google Scholar::
Marchi M, Nocentini S, Ducci F (2016). Future scenarios and conservation strategies for a rear-edge marginal population of Pinus nigra Arnold in Italian central Apennines. Forest Systems 25: e072.
::CrossRef::Google Scholar::
Perdinan P, Winkler JA (2014). Changing human landscapes under a changing climate: considerations for climate assessments. Environmental Management 53: 42-54.
::CrossRef::Google Scholar::
R Core Team (2017). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
::Online::Google Scholar::
Ramirez-Villegas J, Jarvis A (2010). Downscaling global circulation model outputs: the delta method. Decision and Policy Analysis Working Paper No. 1, CIAT Decision and Policy Analysis Working Paper, Policy Analysis 1: 1-18.
::Online::Google Scholar::
Rathgeber CBK, Longuetaud F, Mothe F, Cuny H, Le Moguédec G (2011). Phenology of wood formation: data processing, analysis and visualisation using R (package CAVIAR). Dendrochronologia 29: 139-149.
::CrossRef::Google Scholar::
Ray D, Bathgate S, Moseley D, Taylor P, Nicoll B, Pizzirani S, Gardiner B (2015). Comparing the provision of ecosystem services in plantation forests under alternative climate change adaptation management options in Wales. Regional Environmental Change 15: 1501-1513.
::CrossRef::Google Scholar::
Salvati L, Becagli C, Bertini G, Cantiani P, Ferrara C, Fabbio G (2016). Toward sustainable forest management indicators? A data mining approach to evaluate the impact of silvicultural practices on stand structure. International Journal of Sustainable Development and World Ecology 24 (4): 372-382.
::CrossRef::Google Scholar::
Savi F, Fares S (2014). Ozone dynamics in a Mediterranean Holm oak forest: comparison among transition periods characterized by different amounts of precipitation. Annals of Silvicultural Research 38: 1-6.
::CrossRef::Google Scholar::
Schneider T (2001). Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14: 853-871.
::CrossRef::Google Scholar::
Schueler S, Falk W, Koskela J, Lefèvre F, Bozzano M, Hubert J, Kraigher H, Longauer R, Olrik DC (2014). Vulnerability of dynamic genetic conservation units of forest trees in Europe to climate change. Global Change Biology 20: 1498-1511.
::CrossRef::Google Scholar::
Ummenhofer CC, Meehl GA (2017). Extreme weather and climate events with ecological relevance - a review. Philosophical Transactions of the Royal Society B: Biological Sciences 372: 20160135.
::CrossRef::Google Scholar::
Wallace JM, Smith C, Bretherton CS (1992). Singular value decomposition of wintertime sea surface temperature and 500-mb height anomalies. Journal of Climate 5: 561-576.
::CrossRef::Google Scholar::
Wang T, Hamann A, Spittlehouse DL, Murdock TQ (2012). ClimateWNA - High-resolution spatial climate data for western North America. Journal of Applied Meteorology and Climatology 51: 16-29.
::CrossRef::Google Scholar::
Williams MI, Dumroese RK (2013). Preparing for climate change: forestry and assisted migration. Journal of Forestry 111: 287-297.
::CrossRef::Google Scholar::
Ziche D, Seidling W (2010). Homogenisation of climate time series from ICP Forests Level II monitoring sites in Germany based on interpolated climate data. Annals of Forest Science 67: 804.
::CrossRef::Google Scholar::


Ferrara C, Marchi M, Fares S, Salvati L (2017).
Sampling strategies for high quality time-series of climatic variables in forest resource assessment
iForest - Biogeosciences and Forestry 10: 739-745. - doi: 10.3832/ifor2427-010
First Previous Next Last
© iForest

Download Reference

Paper ID# ifor2427-010
Title Sampling strategies for high quality time-series of climatic variables in forest resource assessment
Authors Ferrara C, Marchi M, Fares S, Salvati L
Close Download