# Introduction

Continuous cover forestry (CCF), a range of silvicultural approaches involving uninterrupted maintenance of forest cover and avoidance of clearcutting ([34]), is becoming increasingly important worldwide ([39]). Under this approach, there is a focus on the use of natural regeneration to develop uneven-aged and mixed-species stands ([34]). Various models have been thus developed to predict the occurrence of such regeneration. The process was often split into two stages: (i) determining if regeneration is successfully occurring during the time interval studied; and if so (ii) defining the species composition and density of the established seedlings ([28]). For the first stage, logistic equations with binomial distribution are often calibrated on various stand and site characteristics to estimate the probability of regeneration occurrence in a forest plot, considered to have a binary status of absence or presence ([11], [19], [32], [40]). Stochastic approaches are common since forest regeneration, particularly in boreal and temperate regions, tends to be sporadic ([28]). Then, the species composition and density are defined using different statistical approaches, often based on the Weibull or Poisson distribution.

Sitka spruce (*Picea sitchensis* Bong. Carr.) is a prolific seed producer with abundant natural regeneration after clear-cutting both in its natural range ([33]) and in the UK, where it is the commercial conifer with the highest potential for natural regeneration: up to 400,000 seedlings per ha on favourable sites after clearcutting or wind-throw, although with high variation among and within sites ([30]). Various reviews of the factors influencing the natural regeneration of Sitka spruce in the British Isles have been carried out, focusing on obtaining natural regeneration as a substitute for artificial planting in clear-felled areas ([6], [44], [30]). Sitka spruce also proved to have the potential for regeneration under canopy cover in the UK, and more recent studies researched how to obtain and use natural regeneration to transform even-aged, mono-specific conifer forests into irregular stands ([24], [25]). Mason ([26]) recently carried out an exhaustive review especially focused on Sitka spruce natural regeneration under canopy cover in the UK, and summarized the main factors involved (Tab. 1).

**Tab. 1 -**Some of the crucial factors influencing Sitka spruce natural regeneration, the general conclusions drawn in the literature about them, and the evidence quality of such conclusions. Adapted from Mason ([26]).

Factor | Conclusions | Evidence Quality |
---|---|---|

Seed availability | Mast years very important, in British Sitka spruce stands happening every 4-5 years |
Good-Moderate |

Germination conditions | Favourable seedbed conditions: moist soils with needle litter or light moss cover |
Moderate-Poor |

Vegetation competition | Avoid fertile sites or competition from ericaceous vegetation | Moderate |

Understorey microclimate | Retain some canopy cover to limit frost damage but provide adequate light |
Moderate |

Light requirements for growth | At least 20% of full light, plus an overstorey with basal area of 30 m^{2 }ha^{-1} and reduced tree density |
Good |

Browsing pressure | Keep deer population below 5 animals per 100 ha |
Moderate |

Seed availability is undoubtedly the first crucial factor: Sitka spruce seeds, like those of most temperate forest tree species, have a low survival rate in the forest soil and do not produce a viable seed bank. Sitka spruce in the UK starts to have a good seed crop at 25-35 years, after which the seed production increases with age and can reach high levels already at 35-40 years, depending also on the stand density ([30]). Years of heavy cone production tend to be synchronised amongst trees and to happen at periodic intervals called mast years, that in the UK can happen every 3-6 years ([6], [26]).

Seed germination is highly dependent on the seedbed characteristics. Nixon & Worrell ([30]) indicated as the most favourable seedbed soils with low fertility (because of less competing vegetation), with the presence of adequate moisture (neither too dry nor too wet), and without too much brash or needle litter (considered unfavourable due to low water retention). On the contrary, Von Ow et al. ([44]) found litter favourable to germination in Ireland. Low-growing mosses are generally considered favourable for regeneration ([26]) due to good water retention, while taller mosses seemed to have a negative effect likely because they prevent the seedlings’ root from reaching the mineral soil ([44]). In the coastal forests of North America, decayed logs are considered the most favourable seedbed for Sitka spruce seedlings ([18], [42]).

Stand structure can affect regeneration through different mechanisms. A certain level of overstorey cover was found to be beneficial for Sitka spruce regeneration both in its natural range ([5], [15]) and in the UK ([27]), likely thanks to the control of the growth of competing ground vegetation ([30]), and the influence on the microsite temperature and moisture ([9]). On the other hand, the presence of overstorey trees reduces the light availability for seedlings. Light-growth functions for the growth of Sitka spruce seedlings in the UK have been developed by Bianchi et al. ([4]).

Thinning interventions have been shown to have a positive effect by creating a favourable light environment. Studies carried out both in the UK and in North America generally found more Sitka spruce seedlings in the stands with lower densities, which were either more recently or more heavily thinned ([7], [31], [20], [14]), but differences were not always observed between silvicultural treatments ([3]). Most of the studies in the UK considered stands originated from artificial planting. Regeneration density in such pure Sitka spruce stands after thinning varied from 4,500 to 70,000 seedlings per ha, but when small germinants under 20 cm height were considered the density could go up 270,000 seedlings per ha ([31], [20], [3], [14]). In contrast, studies in North America focused on natural mixtures of Sitka spruce and western hemlock, and Sitka spruce regeneration occurred with lower densities (1,900-22,000 seedlings per ha - [7]). When comparing local overstorey variables to seedling density, contrasting results were found: either no relationship was observed ([14]), or only a weak positive correlation with basal area ([31]), or a weak negative correlation with stems per ha ([7]).

The UK has been defined “data-poor” regarding natural regeneration ([21]), and even if the qualitative information is extensive, there are no existing models to quantitatively predict the regeneration occurrence of Sitka spruce under canopy cover. The aim of this research was to prepare such models by investigating as main predictors the factors considered more affecting such processes. We also put emphasis on analysing the methodological approaches available given the constraints of the UK situation.

In the absence of studies following the development of regeneration over time, the dataset generated by Kerr et al. ([21]) is the most comprehensive regeneration survey of coniferous forests available in the UK to date, covering a wide range of forest structures and geographical areas. We thus decided to use this dataset for calibration. However, there were some limitations. The dataset was produced by a one-off sampling, including neither detailed information on the timing of the regeneration establishment nor on its size. The age of the regenerating trees could have been highly variable, and so could the biological processes they had been through, and/or the stand characteristics at the regeneration event could have been very different from the survey data. The only possible approach using such a dataset was to model the regeneration “presence”, and not the regeneration “occurrence”, the latter defined as the seedling establishment within a time interval. We thus calibrated models that could generate a regeneration tally like one produced from a field survey, for stands which do not have this information. First, we modelled the likelihood of Sitka spruce seedling presence, then its density. For each stage we identified the significant variables within the wide range of those included in the original survey. We considered plots as modelling units to allow the predictions to be sensitive to within-stand variations, as recommended by Miina et al. ([28]). The models prepared were then validated with an independent dataset.

# Methodology

## Calibration dataset

Kerr et al. ([21]) carried out multi-level sampling during 2008-2009 in 129 stands of coniferous species located in 38 forests across most of Great Britain. From this, we extracted information on 34 artificially-planted, Sitka spruce-dominated stands, located in 13 forests evenly distributed across most areas of Great Britain where Sitka spruce is present (see original research for more details). In the original survey, ten 0.01 ha circular plots (radius 5.6 m) were established in each stand, recording diameter at breast height (DBH, measured at 1.30 m above ground) and species for all trees with DBH > 7 cm. In a 2 × 2 m square located at the centre of the circular plot, the number and species of all trees less than 7 cm DBH were recorded, differentiating between seedlings (height < 1.30 m) and saplings (height ≥ 1.30 m). From the 340 plots retrieved, 138 showed at least one Sitka spruce seedling or sapling (40% of the total). We considered those plots to have presence of regeneration. Since saplings occurred in only four plots, in which seedlings were also present, we decided not to differentiate between them. From now on, we will refer to all regenerating trees as seedlings. The main characteristics of the calibration dataset are indicated in Tab. 2.

**Tab. 2 -**Details of the calibration dataset. Values at stand (Age, Quadratic Mean Diameter, Soil nutrient regime, Time after last thinning and Deer Impact Index) and plot level (the remaining parameters).

Variable | Min. | 1^{st} Qu. |
Mean | 3^{rd} Qu. |
Max. |
---|---|---|---|---|---|

Age (years) | 32 | 39 | 54.5 | 64 | 85 |

Basal area (m^{2 }ha^{-1}) |
1.6 | 43.6 | 58 | 70 | 196 |

Stems per hectare (n ha^{-1}) |
0 | 400 | 700 | 900 | 2.200 |

Quadratic mean diameter (cm) | 0 | 27 | 36.3 | 43.1 | 83 |

Maximum diameter breast height (cm) |
0 | 36 | 45.3 | 52 | 90 |

Global Site Factor | 0.02 | 0.16 | 0.21 | 0.26 | 0.55 |

Bare ground (%) | 0 | 0 | 1.2 | 0 | 85 |

Mosses (%) | 0 | 5 | 41.6 | 80 | 95 |

Seedling density (ha^{-1}) |
0 | 0 | 20.780 | 10.000 | 450.000 |

Soil Nutrient Regime (no. plots) |
Very Rich (10) |
Rich (10) |
Medium (130) |
Poor (180) |
Very poor (10) |

Time after last thinning (no. plots) |
Class 1 1-5 yrs (170) |
- | Class 2 6-10 yrs (90) |
- | Class 3 10+ yrs (80) |

Deer Impact Index (no. plots) |
Low (10) |
- | Moderate (290) |
- | High (40) |

Age of the plantation in years (from now on simply Age), Soil Nutrient Regime (SNR), time after last thinning, and Deer Impact Index (DII) were recorded at stand level. We calculated from the original inventory the plot level values for basal area (BA), stems per ha (SPH), and the maximum DBH (maxDBH). From those values we calculated at plot level the quadratic mean diameter (QMD, the diameter of a tree considered as having the average basal area); and the Global Site Factor (GSF), an indication of the canopy light transmittance, using the relationship established from Hale et al. ([17]).

As an indication of seed availability, we investigated the use of Age and two possible alternatives. Hasenauer & Kindermann ([19]) for MOSES used maxDBH (at plot level) to represent a mother-tree effect, while Schweiger & Sterba ([40]) used QMD as a substitute for age; both were positively correlated with regeneration occurrence in mixed-species, uneven-aged forests. However, in this dataset both maxDBH and QMD were negatively correlated with regeneration presence (preliminary results not shown). For this reason, maxDBH was considered as a possible indicator of local overstorey competition (see later), while QMD was discarded.

The SNR was estimated by the original field surveyor from analysis of the ground vegetation following the Ecological Site Classification criteria ([35]). Most of the stands were located on sites with either medium or poor SNR (respectively 38% and 53% of the total plots). Those two classes did not show a significant difference from each other in terms of regeneration presence frequency (Fisher’s exact test, two-sided: p=0.556, n=310), and only 9% of the plots were in other SNR classes, so we excluded this factor from further analysis. The SNR class indirectly influences regeneration due to its effect on ground vegetation, as described previously. Since the dataset included for the 2 × 2 m plots the percentage of ground covered by different classes of vegetation, we decided to use as candidate variables the favourable ground cover classes of Mosses and Bare Ground, instead of SNR, consistent with the model prepared by Kerr et al. ([22]).

We considered the plot-level stand density measures of BA, SPH and maxDBH as a negative proxy for the light regime under the forest cover (higher stand density, lower light level) and so expected to be negatively correlated with regeneration presence. On the other hand, GSF is a direct indication the light regime under the forest cover, expected to be positively correlated with regeneration presence. The time since the last thinning was estimated for each stand using both historical records and evidence on the ground; the expected effect was a negative correlation between the time since the intervention and the likelihood of regeneration. We divided the stands in the present study into three different Thinning Classes (TC) as in Kerr et al. ([21]): TC1, thinned in the last 1-5 years; TC2, thinned 6-10 years before; TC3, thinned more than 10 years before or never. We used discrete classes since there was often an uncertainty in the precise timing of the thinning. In some cases, it was observed that a thinning was carried out only in a fraction of the stand. Since we could not identify which specific plots were affected, we assigned an approximate thinning class to the whole stand with a subjective decision (for example, when only half of the stand was reported to be affected by a recent thinning as in TC1, and the rest by none, a TC2 was assigned to all the plots). We considered this variable as numeric.

The Deer Impact Index (DII) was visually estimated as low (no browsing observed), moderate (browsing damage on up to 25% of the regeneration) and high (browsing damage on more than 25% of the regeneration). Because of the unbalanced distribution (Tab. 2) and the lack of significant differences in regeneration presence frequency between the moderate and high impact classes (Fisher’s exact test, two-sided: p=0.611, n=330), this factor was discarded from further analysis.

Additionally, we retrieved stand-level geographical variables from topographic maps, namely northing, easting, elevation and aspect, and stand-level climatic variables from the Forestry Commission’s decision support system ESC-DSS ([35]), namely accumulated temperature above 5 °C, moisture deficit, Conrad continentality index and total summer and winter rainfall. Preliminary analysis (not shown) revealed that none of those variables was significant when included in a model and they were all discarded.

The density of Sitka spruce seedlings per plot was very different between the Thinning Classes (Fig. 1). Sitka spruce contributed to 97% of the seedlings in the study areas and different species were sporadic (present in only 2% of the plots); for simplicity the latter was ignored during the analysis. At stand level, considering all the plots with or without regeneration, there were on average of 20,740 seedlings per ha, with a minimum of 0 and a maximum of 250,000.

## Independent validation dataset

For independent validation, we assessed in 2016 four Sitka spruce-dominated stands in Clocaenog forest, Denbighshire, Wales (53° 04′ N, 3° 25′ W; altitude: 390-430 m a.s.l.), and four in Kielder forest, Northumberland, England (55° 10′ N, 2° 29′ W; altitude: 200-250 m a.s.l.). Both forests were originally artificial plantations that have been managed in recent years according to different CCF principles, using silvicultural systems ranging from irregular shelterwood to group selection. All stands belonged to Thinning Class 2, but most of them were thinned more frequently or with higher intensity in the past than stands in the calibration dataset. The situation in all stands was generally a lower tree density than under the traditional management (as defined by [8]), leading to a larger amount of natural regeneration. For each stand, we drew random non-parallel transects on a desktop map and placed on them 10 evenly spaced plots, later located in the field using a GPS receiver. The distance between plots varied with the size of the stand. We followed the same data collection protocol used for the calibration dataset and collected in this way 78 plots. The main characteristics of this dataset are shown in Tab. 3 for a comparison with the calibration dataset. SNR, DII and QMD were not considered, as in the calibration dataset. Again, we considered all seedlings and saplings as “seedlings”, and a total of 62 plots (about 80% of the total) had at least one of these. The density of Sitka spruce seedlings per plot is shown in Fig. 1. At stand level, considering all the plots with or without regeneration, there were on average 46,940 seedlings per ha, with a minimum of 4,500 and a maximum of 171,800.

**Tab. 3 -**Details of the validation dataset. Values at stand (Age and Time after last thinning) and plot level (the remaining parameters).

Variable | Min. | 1^{st} Qu. |
Mean | 3^{rd} Qu. |
Max. |
---|---|---|---|---|---|

Age (years) | 60 | 65 | 69 | 77 | 80 |

Basal area (m^{2 }ha^{-1}) |
7.6 | 28.8 | 41.4 | 53.8 | 107.2 |

Stems per hectare (n ha^{-1}) |
50 | 200 | 284 | 400 | 1.100 |

Maximum diameter at breast height (cm) | 35 | 44 | 50.6 | 55.8 | 85 |

Global site factor | 0.08 | 0.22 | 0.28 | 0.34 | 0.49 |

Bare ground (%) | 0 | 0 | 0.1 | 0 | 4 |

Mosses (%) | 0 | 70.6 | 82.3 | 99.7 | 100 |

Seedling density (ha^{-1}) |
0 | 2.500 | 48.460 | 52.500 | 417.500 |

Time after last thinning (no. plots) |
Class 1 1-5 yrs (0) |
- | Class 2 6-10 yrs (78) |
- | Class 3 10+ yrs (0) |

## Statistical analysis

### Regeneration presence

We carried out all the analyses using R statistical software ([36]). To estimate the probability of regeneration presence, we used a Generalized Linear Mixed Model (GLMM) fit by maximum likelihood (Laplace approximation) with binomial function and logit link, from the package “lme4” ([1]). Possible autocorrelation effects were considered using the stand and forest levels as random nested effects. The candidate fixed effects for the model were Age, BA, SPH, GSF, maxDBH, Thinning Class, Mosses and Bare Ground. We included a quadratic term for BA, SPH, and maxDBH to check if the relationship between stand density and regeneration was non-linear, as a certain level of canopy cover can be beneficial to natural regeneration. Then we removed non-significant parameters using a step-wise approach aimed at reducing the Akaike Information Criterion (AIC) to select the best model ([45]). We re-calibrated the best model structure on standardized variables (rescaled so that their new mean is equal to zero and the standard deviation to 1). This process transforms all the variables with different orders of magnitude to a similar scale, still maintaining their variability, making the magnitude of the model coefficients directly comparable.

We assessed the accuracy of the best model with a cross-validation technique ([2]). Using the same model structure, we re-calibrated the coefficients by removing all the plots belonging to one stand from the calibration dataset. Then we validated it on the plots belonging to the left-out stand and calculated their likelihood of regeneration presence. We repeated the process 34 times, once for each stand. After we estimated in such a way the likelihood of regeneration for each plot, to determine which ones the model would predict to have regeneration, we used two methods.

In the first method, we defined a cut-off likelihood value using the Receiver Operator Characteristics (ROC) curve method with the package pROC ([37]). We assigned the presence of regeneration to all plots with a likelihood above the cut-off, and otherwise the absence of regeneration. We estimated this cut-off as the likelihood value that would maximise the sum of sensitivity (the proportion of correctly identified positive plots, *i.e.*, in this case with presence of regeneration) and specificity (the proportion of correctly identified negatives, *i.e.*, with absence of regeneration). Once each plot was assigned its simulated status, we built a contingency table to compare the predictions with the observations. The ROC analysis has been widely used in medical science and other fields, since it is considered useful to evaluate the discriminatory ability of a continuous indicator to correctly assign into a two-group classification and to find an optimal cut-off point to least misclassify the two-group subjects ([13]). However, it has never been used for forest regeneration. In the second method, we used a more common stochastic approach ([19]). We generated for each plot a pseudo-random number between 0 and 1. If that number was lower than the regeneration likelihood, the plot was considered to have regeneration, and otherwise without regeneration. We ran the simulation 10,000 times, averaged the results, and built another contingency table. For both methods, we analysed the results also at stand level in the following way. For each stand, we calculated the difference between the total of all simulated regeneration plots minus the total observed ones. We checked the field notes to subjectively investigate why predictions were in error for the stands with the worst results (as in [12]). For this analysis, we did not consider it important if individual plots were wrongly simulated if the overall predictions at stand level were accurate.

### Regeneration density

We used two approaches. First, we investigated GLMMs using the same random and fixed effects as described above, using the sub-dataset for plots with presence of regeneration (n = 138), and a Gamma distribution with log-link to approximate the seedling distribution. No preliminary model based on all plots with presence of regeneration (n = 138) could converge (results not shown). The importance of the Thinning Class was evident from the sharp difference in seedling distribution amongst the classes, so we decided to calibrate separate models for TC1 and TC2 & TC3 (pooled together due to the lower number of observations). For those two subsets of data, we prepared GLMMs using the same random and fixed effects as described above (excluding Thinning Class). Then we removed non-significant parameters using a step-wise approach aimed at reducing the AIC to select the best model. We evaluated its accuracy through comparing predicted and observed values at plot and stand level. For the second approach, we simulated the seedling density by generating random numbers that approximated the observed density distribution for each Thinning Class ([10], [40]). We fitted Weibull distribution functions to simulate the distribution pattern of seedlings in each Thinning Class group using the package MASS ([43]). We used the values of seedlings per ha observed at plot level transformed to units of 1,000 for simplifying the calculations. For validation, in each plot observed with regeneration, we generated a random number 10,000 times from the resulting functions and averaged the results. We then compared the observations and simulations averaged at stand level. We did not compare results at plot level analysis since the random generation of numbers makes this analysis impossible.

### Independent validation

We calculated the likelihood of regeneration presence in the independent validation plots using the best model above selected (calibrated on the full dataset). Then, we used the same two methods as before to assign the presence of regeneration. First, we considered the same cut-off likelihood value previously determined with the ROC method, assigning the status of presence of regeneration to all plots above that threshold. Second, we used the stochastic method to randomly determine the presence or absence of regeneration. For both methods, we built contingency tables at plot level and examined the performance at stand level by comparing the total numbers of simulated and observed plots with regeneration, with the same procedures described above for the cross-validation. Then, we used both seedling density modelling methods prepared with the calibration dataset to simulate the density in the plots of the independent datasets with observed presence of regeneration. The simulated seedling density was compared with the observed values. A summary of the workflow for the statistical analysis is shown in Fig. 2.

# Results

## Regeneration presence

**Tab. 4 -**Details of coefficients of Model 1, estimating the likelihood of Sitka spruce regeneration presence at plot-level.

Effects | Variables/Stats | Value | St. Err | p-value |
---|---|---|---|---|

Fixed | (Intercept) | -2.693 | 1.703 | 0.114 |

Thinning class | -1.864 | 0.556 | 0.001 | |

Age | 0.087 | 0.03 | 0.003 | |

Mosses | 0.02 | 0.007 | 0.005 | |

(Basal area/100)^{2} |
-1.569 | 0.886 | 0.076 | |

Random Effects (intercept) | Variance | 3.726 | - | - |

Standard Deviation | 1.93 | - | - |

The model structure after the step-wise AIC reduction process is shown in Model 1 (eqn. 1), with more details of the coefficients shown in Tab. 4:

where *A* is the age of the plantation (in years) and *M* is the percentage of mosses.

The model did not converge when the forest-level random effect was included, so we maintained only the stand-level effect. The effect of bare ground was not significant, and it had a weak negative relationship with regeneration, contrary to the hypothesis. Only the quadratic term for BA remained in the best model structure amongst the stand density indicators. Note that values of BA were divided by 100 since they were on a different scale from the other variables.

Fig. 3 displays how the probability of regeneration changes according to variation in the model variables. Using Model 1, we calculated the likelihood of regeneration presence for new virtual datasets. In Fig. 3a, we used a dataset where we allowed only TC to vary (from 1 to 3) while the other fixed effects were kept at the mean values of the calibration dataset (shown in Tab. 2). In Fig. 3b, Fig. 3c and Fig. 3d, we allowed respectively Age, BA and Mosses to vary across the full range observed in the calibration dataset, while we kept the other fixed effects at their means except for TC. We repeated the analysis changing the Thinning Class, represented by the different lines (decreasing from 1 to 3 from top to bottom). Generally, from TC1 to TC2 there was a stronger decrease in regeneration likelihood than from TC2 to TC3. For TC1, regeneration probability decreases more sharply for Age less than 60 years and BA more than 60 m^{2} ha^{-1}. For TC2, only in old stands (more than 70 years old) was the probability of regeneration above 0.5, while for TC3 the likelihood was always low. The effect of mosses on regeneration likelihood was more linear.

**Fig. 3 -**Regeneration presence likelihood (

*p*

_{regen}) as a function of the model variables. In each graph, the likelihood was estimated with only one variable varying across all its range (plotted on the

*x*-axis), while the others were kept at the calibration population mean. Multiple lines indicate the results of the analysis using different values of Thinning Class (TC1: blue line; TC2: green line; TC3: red line).

Fig. 4 shows the coefficient values for the model shown in eqn. 1 when it was calibrated on the standardized variables. TC had the highest coefficient (*i.e.*, most influential) in absolute terms (1.522), followed by Age (1.255), Mosses (0.701) and BA (0.533).

After the cross-validation analysis, with the ROC method, the cut-off likelihood value for the regeneration presence probability was 0.3. Fig. 5 shows the ROC curve, that is all the combinations of specificity and sensitivity values obtained by using all the possible cut-off values. The chosen cut-off was the one that maximised their sum and corresponded to the point on the curve closest to the upper left corner, which would be to the ideal case of both specificity and sensitivity equal to 1. For the ROC method, the plots that had an estimated likelihood above 0.3 were considered by the model to have presence of regeneration. For the stochastic method, the pseudo-random generated numbers were checked with the likelihood values for each plot. Tab. 5 shows the contingency table of using both methods. For the ROC method, the plots correctly predicted (true positives plus true negatives) amounted to 73% of the total. The model estimated with similar accuracy plots with or without presence of regeneration (respectively 76% and 71%). For the stochastic method, there was a markedly lower accuracy in sensitivity (55%) and only a slightly better specificity (74%), bringing the overall accuracy lower than in the ROC method (66%).

**Tab. 5 -**Contingency tables for both the cross-validation and the independent validation results, using both the Response Operator Curve (ROC) method and stochastic method. “Yes” indicates the presence of regeneration, “No” the absence.

Validation | Method | Observed | Predicted | Partial Accuracy |
Overall accuracy |
||
---|---|---|---|---|---|---|---|

Yes | No | Total | |||||

Cross- validation |
ROC | Yes | 105 | 33 | 138 | 0.76 | 0.73 |

No | 58 | 144 | 202 | 0.71 | |||

Total | 163 | 177 | 340 | - | |||

Stochastic | Yes | 76 | 62 | 138 | 0.55 | 0.66 | |

No | 52 | 150 | 202 | 0.74 | |||

Total | 128 | 212 | 340 | - | |||

Independent Validation |
ROC | Yes | 62 | 0 | 62 | 1 | 0.82 |

No | 14 | 2 | 16 | 0.12 | |||

Total | 76 | 2 | 78 | - | |||

Stochastic | Yes | 44 | 18 | 62 | 0.71 | 0.64 | |

No | 10 | 6 | 16 | 0.39 | |||

Total | 54 | 24 | 78 | - |

When the results were aggregated at stand level for the ROC method, 21 stands out of 34 had a difference between total observed and predicted regeneration plots equal to or lower than 20% (11 with no difference), while five had a difference equal to or larger than 50% (worse than chance). For the stochastic method, very similar results were obtained: 22 stands out of 34 had a difference between total observed and predicted regeneration plots equal to or lower than 20% (10 with no difference), while five had a difference equal to or larger than 50%.

The worst simulated stands were almost the same stands in both methods. The field notes provided additional insights about them, showing that they were generally the ones subjected to heterogeneous thinning interventions within the same stand, suggesting that the TC class was inaccurate. In stand with fewer simulated regenerating plots than observed, it was also observed that windblow events had opened gaps comparable to a thinning, or that there was precocious cone production in young stands. In stands with more simulated regenerating plots than observed, it was noted that in stands favourable for regeneration according to all the model variables, the limiting factors were likely to be: (i) competing ground vegetation; (ii) presence of deer browsing; and (iii) lack of cone production. In the two worst over-simulated stands with both methods, everything seemed suitable for regeneration based on the field notes, and its total absence was inexplicable for the surveyor too.

## Regeneration density

In the GLMMs calibrated for TC1 and TC2 & TC3, only the effect of BA was significant, but with a positive relationship with seedling density in the former class (TC1) and a negative relationship for the latter group (TC2 & TC3). However, both models showed a very poor fit between the simulated and observed density values and they were discarded (results not shown).

The Weibull distributions fitted to seedling density distribution in each TC are described by the parameters in Tab. 6. When comparing the simulated values with the observed ones, the fit at stand level it did not provide adequate results. Generally, there was a poor correspondence between those values: only two stands had a simulated density ± 20% of the observed density. On average, the difference between simulated and observed values was 770 seedlings ha^{-1}, but with extremes of -177,500 and 59,000 seedlings ha^{-1}.

**Tab. 6 -**Parameters for the Weibull distributions fitted to seedling density per ha (× 1000).

Thinning Class |
Shape | Rate |
---|---|---|

TC1 | 0.696 | 52.555 |

TC2 | 0.871 | 14.134 |

TC3 | 1.834 | 4.651 |

## Independent validation

We used eqn. 1 to calculate the likelihood of regeneration presence in the independent dataset. With the ROC method, we considered regeneration to be present only in the plots with a likelihood greater than the same cut-off likelihood value of the cross-validation process (p = 0.3). The resulting contingency matrix is shown in Tab. 5, together with the results of the stochastic method. For the ROC method, while the total accuracy was 82%, this was because almost all plots (76 out of 78) were predicted to have regeneration, giving a sensitivity of 100% and a specificity of only 12%. For the stochastic method, the overall accuracy was again lower than for the ROC method (64%), although sensitivity and specificity were more even. After aggregating the results at stand level, however, worse results were found for the ROC method than for the stochastic method: out of eight stands, respectively four for the ROC method and six for the stochastic method had a difference between total observed and predicted plot with regeneration equal to or lower than 20%. In both methods, two stands had no difference between total observed and predicted plots with regeneration, and none had a difference equal to or larger than 50%.

Regeneration density was then estimated in the plots with observed regeneration presence (n=62). Only the Weibull distribution approach was used, with the function previously calibrated for TC2. The GLMM approach was already deemed too inaccurate. After averaging the results, there was no good correspondence between the simulated and observed values, and no stands had a simulated density ± 20% of the observed value. On average, the difference between simulated and observed values was -34,570 seedlings ha^{-1}, with extremes of -155,800 and 4,500 seedlings ha^{-1}.

# Discussion

The model predicting regeneration presence was based on the established knowledge of the biological and ecological characteristics of Sitka spruce. The effect of time since the last thinning showed the strongest significance in the model, and the largest coefficient after standardization. Consistently with Kerr et al. ([22]), the model showed that the probability of regeneration presence is high after an intervention, but it decreases rapidly and there is no positive effect after 10 years. If the operations are not repeated, the canopy can revert quickly to a closed status and small seedlings die off ([16]). The field notes showed that inaccuracies in the thinning regime information, or the presence of windblown gaps not considered in the model, were likely causes of the errors in the worst-simulated stands. To improve the accuracy, it is necessary for the model to know which plots are affected by a tree removal, irrespective of whether it is due to natural mortality or timber extraction.

The age of the plantation emerged as the second most important factor. Such a positive effect in the artificial plantations of the present study can be explained by the larger seed production of older trees, and possibly also by the higher number of gaps that can naturally occur in a mature canopy past the self-thinning stage. We tested the use of maximum DBH (at plot level) and quadratic mean diameter (at stand level) as possible alternatives to age, but in this research they were both negatively correlated with regeneration presence. For maximum DBH, it is likely that large trees present in the small study plots (5.6 m radius) were shading the ground and dispersing their seed outside the plots. Schweiger & Sterba ([40]) considered quadratic mean diameter to be a compound measure of age, density and site quality, and here it seems the density effect was predominant. Sitka spruce is a prolific seeding species (up to 20 million seed per ha released under canopy) with an estimated dispersal distance of 60-80 m ([44], [30]). In pure, even-aged stands seed availability is likely to be a factor not associated with the trees present at local level but with the general production at stand level, with little spatial variation ([24]). This may change in mixed-species, uneven-aged stands. In those situations, especially since age will not anymore be a suitable measure to describe the stand correctly, better studies on the role of mother trees and seed availability will be necessary. After checking the field notes, cone production that was exceptionally higher or lower than expected for that age of stand was a possible cause of error in the worst-simulated stands, suggesting that seed availability is not only controlled by age, even in single-species plantations.

Mosses showed a positive effect on regeneration consistent with previous findings. A thin layer of mosses cover is favourable for germination due to their water retention capacity, but heavy mosses can prevent roots from reaching the mineral soil ([44]). LePage et al. ([23]) found that the same ground cover can have different effects on regeneration according to the overstorey characteristics: for example, the positive effects of moss cover decreased with an increase in canopy cover. These various aspects could be the cause of the relatively low effect of mosses in the model. Further, in some stands the combined presence of competing ground vegetation (such as bramble, shrubs and tall grasses) and mosses seems to have affected the accuracy of the simulation. Additional studies may be necessary, considering the use of more specific classes (such as light and heavy mosses, deadwood in various stage of decomposition).

Increasing competition from the overstorey, expressed here as the quadratic term of basal area, influenced the regeneration negatively. However, for Thinning Class 1, at low overstorey levels the effect was relatively low and almost flat, likely confirming the benefit of a certain amount of shading. The same levels of basal area can be obtained with different numbers of trees, resulting in different canopy structures and thus light availability on the ground. When the number of trees is lower for a given basal area, there are likely to be more gaps between crowns and significantly more light at ground level ([17]). However, it is possible that the number of stems per ha was not significant in the present study because both age and Thinning Class were already partially describing the reduced number of trees resulting from natural mortality and anthropic removals.

None of the topographic and climatic variables tested showed significance. The climatic data were interpolations for 10 km grid squares of average climatic data collected during 1960-1990. They had already been found in another study to lack the precision needed for stand analysis ([29]). The under-canopy climate is also generally different from the climate of open sites, with a degree of variation according to the stand characteristics ([41]). Significant differences between forest districts were not identified in this study. Foresters have not observed regional differences in the occurrence of Sitka spruce natural regeneration across the UK (Bill Mason, pers. comm.).

The cross-validation process with the use of the Response Operator Characteristics curve showed satisfactory statistical results at plot level: 73% of plots were correctly simulated, with similar values for specificity and sensitivity. The stochastic method, such as is employed by various models, showed worse results: 66% of total plots were correctly simulated, with a larger difference between sensitivity and specificity. However, when aggregating the results at stand level and considering the difference between the total simulated and total observed plots with regeneration, the results were similar between methods: around two-thirds of the stands showed an acceptable error (simulated values within ± 20% of observed values). In a non-spatial forest growth simulator (*sensu* [38]) such as MOSES_GB, the accuracy at stand level may be more important than at plot level since the actual positions of the trees are not known.

The results of the independent validation with the Response Operator Characteristics curve method were not satisfactory since the model predicted regeneration in almost all plots, even if the total accuracy was 82%. Using the stochastic method, the total accuracy was worst (64%), although there was a slightly better balance between sensitivity and specificity. It is evident that the independent dataset is describing a situation largely different from the calibration dataset, noting the differences both in the stand variables (Tab. 1, Tab. 2) and the high frequency of plots with regeneration presence (about 80% in the independent dataset *versus* 40% of the calibration). The independent validation stands surveyed have been managed specifically to obtain natural regeneration. All the stands belonged to the Thinning Class 2, but most of them had been thinned more regularly and with higher intensity than those in the calibration dataset. When we aggregated the results at stand level and considered the difference between the total simulated and total observed plots with regeneration, the results were better for the stochastic method: two-thirds of the stands showed an acceptable error (simulated values within ± 20% of observed values), against half for the Response Operator Characteristics curve method. It seems that the cut-off calculated for the cross-validation process cannot be applied to the independent dataset, and although the model still presents problems in its application to continuous cover forestry situations, the stochastic method gave better results in this case.

The models tested here for regeneration density did not give results of acceptable accuracy. Generating random numbers from Weibull distributions was, in the present study, the only option found and still produced inadequate results both during the auto-validation and independent validation. Nonetheless, even if the models were deemed too inaccurate, it is interesting to note that the effect of basal area was significant and positive in the seedling density model based only on plots belonging to Thinning Class 1, suggesting a possible mother-tree positive effect. In the model for Thinning Class 2 & 3, basal area had a negative effect, maybe because the already-lower light availability is aggravated by bigger tree size and the overstorey competition effect becomes predominant. Similar results were observed by Page et al. ([31]) in Sitka spruce forests in the UK.

A very important limitation of both models was the lack of data on the regeneration size or age. Both the regeneration presence and density model did not consider the possibility of other tree species germinating and competing with Sitka spruce, likely another crucial limitation of the use of these models in mixed forest stands resulting from continuous cover forestry practices. Presence of deer browsing, although not statistically significant in this analysis, was found in the field notes as a possible cause of limiting factor for regeneration in some sites where all the model variables were at a beneficial level for regeneration.

Concluding, the tools here described can be used to simulate regeneration presence in traditional Sitka spruce plantations in the UK. Then, the growth of the regeneration can be predicted with the light-growth models presented by Bianchi et al. ([4]). However, the regeneration occurrence tools are not adequate for forests already in an advanced stage of transformation to CCF systems, and the density results must be treated with caution.

# Acknowledgement

We wish to thank: Dr. Gary Kerr, Dr. Victoria Stokes and Barnaby Wilder (Forest Research) for collecting and providing the original dataset used in this research; Dr. Catia Arcangeli (Forest Research) and Dr. Caterine Cahalan (Bangor University) for their comments on the study; and the Forestry Commission and the Scottish Forestry Trust for funding the study.

# References

::Online::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::Online::Google Scholar::

::Online::Google Scholar::

::Online::Google Scholar::

*Picea sitchensis*(Bong.) Carr) managed under different silviculture systems and assess balance between overstorey and understorey carbon. MSc thesis, Bangor University, Bangor, UK, pp. 111.

::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

*Picea-Tsuga*forests of Oregon and Washington. Ecology 70: 48-59.

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::Online::Google Scholar::

::Online::Google Scholar::

*versus*substrate limitation of seedling recruitment in northern temperate forests of British Columbia. Canadian Journal of Forest Research 30 (3): 415-427.

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

*Picea sitchensis*) in the British Isles. Forests 6 (4): 879-902.

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::CrossRef::Google Scholar::

*Quercus suber*forest in the eastern Iberian Peninsula. Journal of Vegetation Science 17 (6): 729-738.

::CrossRef::Google Scholar::

::Online::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

::Online::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::Google Scholar::

*Picea abies*(L.) Karst.) in Austria. Forest Ecology and Management 97: 107-118.

::CrossRef::Google Scholar::

::Google Scholar::

*Picea sitchensis*(Bong) Carr.) in coastal forests of the pacific Northwest, North America. Journal of Biogeography 17: 47-58.

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::

::CrossRef::Google Scholar::