Taper equations are indispensable tools for characterizing the stem profile of trees, providing valuable insights for forest management, timber inventory, and optimal assortments allocation. The recent progress in Terrestrial Laser Scanning (TLS) has revolutionized forest inventory practices by enabling nondestructive data collection. In this study, four taper models from three different model categories were established based on point cloud data of 219 Pinus nigra trees. The taper equations fitted with TLS data were used to predict the diameter at specific stem heights and the total stem volume. The results show that among fitted models, the Max and Burkhart segmented model calibrated by the means of a mixedeffects approach provided the best estimate of the diameter at different heights and the total stem volume evaluated for different diameter at breast height (DBH) classes. In numerical terms, this model estimated the diameter and the volume with a respective overall error of 0.781 cm and 0.021 m^{3}. The predicted profile also shows that above a relative height of 0.7, the diameter error tends to increase due to the low reliability of data collected beyond the base of the crown primarily caused by interference from branches and leaves. Nevertheless, this study shows that TLS technology presents a compelling opportunity and a promising nondestructive alternative for generating taper profiles and estimating tree volume.
Taper EquationsVolume EquationsForest MensurationForest AssessmentEnvironmental ManagementMax and BurkhartBSplinesRandom ForestTLSForest InventoryIntroduction
Precision forestry is becoming increasingly important in the face of the urgent challenges posed by climate change and the rising call for sustainable forest management (SFM) practices (Kovácsová & Antalová 2010). This emerging trend depends on leveraging precise data obtained using advanced technologies to guide wellinformed decisionmaking (Fardusi et al. 2017). In the realm of SFM, a noteworthy potential exists for value creation through implementing improved practices. Besides the ecological advantages related to enhanced productivity, which contribute to carbon sequestration and alleviate pressure on forests, the adoption of this paradigm shift is fundamentally linked with substantial economic and social value (Choudhry & O’Kelly 2018).
One concrete and increasingly prevalent instance of the potential being realized is precision harvesting (PH), which aims to maximize the efficiency of the harvesting operations while minimizing the impact on the environment and overcoming several limitations inherent to conventional logging practices (e.g., lack of precision and inefficient resource utilization due to lack of treelevel data  Olivera Farias & Visser 2016). Within the scope of PH, the optimal allocation of assortments is a crucial component of the wood products supply chain and the carbon stock projections since different wood products hold different economical values and carbon storage potential (BrunetNavarro et al. 2017, Marchi et al. 2020). Stem taper equations, which predict the change in stem form from ground to tip can accurately disaggregate trees into specific products based on certain specifications like log length and diameter and can help reach a better optimization of wood products (Calders et al. 2015, Calders et al. 2022, Puletti et al. 2019).
In the academic literature, a panoply of taper model formulations and methods of parameter estimation is currently in use and the selection of a proper model form is more important than the actual fitting method (Weiskittel et al. 2011). Following the classification of McTague & Weiskittel (2020), taper models could be grouped into three main categories: (i) parametric models (e.g., simpletaper, variableexponent, and segmented equations) which include most taper equations, are calibrated by the means of parametric approaches (e.g., nonlinear least squares  NLS  and nonlinear mixed effects  NLME  methods) and can ensure biologically consistent behavior (Mäkelä & Valentine 2020). The Max and Burkhart model stands as an instance of this class, particularly within the segmented equations category, and has been demonstrated to exhibit high accuracy in predicting stem taper (Gordon 1983, Kozak & Smith 1993, Max & Burkhart 1976). (ii) Semiparametric models (e.g., Bsplines and Psplines) which offer greater flexibility in the fit without requiring an extensive addition of parameters, may provide a better representation for complex form species (Kuzelka & Marušák 2014). The Bspline regression model is a particular instance of semiparametric model that was, first introduced by Kublin et al. (2013), and has been successfully used to predict stem taper (Erber et al. 2022, Hansen et al. 2023). (iii) Nonparametric models (e.g., Random Forest  RF  and Artificial Neural Network ANN) can offer a strong predictive performance without requiring the testing of statistical hypotheses (Baker et al. 2018). In particular, RF, an ensemble learning method in machine learning, has been gaining popularity in modeling taper by providing precise predictions and demonstrating generalizable performance (Nunes & Görgens 2016, Salekin et al. 2021).
Developing taper equations requires multiple observations on upperstem diameters and other covariates collected on standing or felled trees (GómezGarcía et al. 2013). Precision forestry in general offers substantial opportunities that may be beneficial for the forest sector including the development of decision tools such as taper equations. The advancements in Terrestrial Laser Scanning (TLS) technology present an unprecedented opportunity for acquiring essential data with high precision and nondestructive methods, thereby significantly reducing the time required compared to conventional approaches (Calders et al. 2015, Pitkänen et al. 2019). Data acquisition for TLSbased forest missions can be performed following single or multiscan approaches. The singlescan approach has the simplest data acquisition setting and the fastest speed, however, a major problem with that scan method is that only parts of the trees are covered due to occlusion effects by other objects. On the other hand, the multiscan approach appears the most accurate technique and has the potential to fully cover trees within the sampling area, since scans are performed from multiple directions to overcome the occlusion issue (Liang et al. 2016). Many studies using the multiscan mode reported a stem detection rate between 62.1% and 100% depending on the forest structure and the scanning setup (Liang & Hyyppä 2013, Maas et al. 2008).
Pinus nigra Arn. is a fastgrowing conifer with a wide though fragmented distribution across Europe and Asia Minor, predominantly in mountainous areas. Due to its ecological flexibility, it is one of the most widely used tree species for reforestation worldwide, and it is considered a potential substitute for native coniferous species in Central Europe under future climate change (Thiel et al. 2012). Economically, it is one of the most important conifers in Southern Europe, as its wood is highly suitable for general construction, indoor flooring, furniture industry, fuelwood, and paper pulp (Enescu et al. 2016). In Italy, Pinus nigra Arn. stands cover an area of 444.785 hectares, representing 4.25% of the total national forest area, with age ranging from 50 to 95 years (Marchi et al. 2020). The previously developed yield equations for Pinus nigra Arn. were based on data collected from young trees and have been shown to underestimate the volume of mature trees in the current conditions (Bernetti et al. 1969). Therefore, the primary aim of this study was to compare the performance of four taper models from tree class categories in predicting diameter over bark (DOB) and total stem volume of Pinus nigra Arn. specifically using data collected from multiscan approach with TLS. The developed tool will provide forest managers with more accurate estimates of volume and enable better optimization of assortments for industrial purposes.
Materials and methodsStudy area and sampling protocol
The study was conducted in the Vallombrosa forest (43° 44′ N, 11° 34′ E), a biogenetic reserve located about 50 km eastsoutheast of Florence, Tuscany, Italy. The native vegetation of this forest is mainly represented by beech (Fagus sylvatica) at higher elevations, oakhornbeam stands (Quercus spp. mixed with Carpinus betulus L. and Ostrya carpinifolia Scop.), and chestnut (Castanea sativa Mill.) at lower altitudes (Dálya et al. 2019). As part of the main reforestation program that occurred after the World War II, the European black pine (Pinus nigra Arn.) was introduced in the extreme western side of this forest (Bottalico et al. 2012). Covering 11% of the total forest area, Pinus nigra Arn. stands were specifically studied here.
The Pinus nigra Arn. area was stratified as part of the sampling scheme based on two criteria: stand density and total tree height. Each criterion was divided into three levels to ensure optimal representation of forest stand variability, resulting in a total of nine strata. An airborne lidar flight was utilized to achieve this stratification, providing highresolution data on forest structure and height. The ALS survey was carried out under leafon condition using a Eurocopter AS350 B3 equipped with a LiDAR RIEGL LMSQ680i sensor. The flight height was 1000 m above ground level. Fullwaveform LiDAR data were registered and discretized to a point density of 10 points m^{2}. Standard procedures for preprocessing ALS data (e.g., outliers and noise removal, classification of ground/nonground, and computation of height on the ground) were carried out with the LAStools software (Isenburg 2017) to obtain ALS normalized point clouds. Subsequently, one scan center was randomly placed in each stratum; then, a multiscan TLS mission was performed on each scan center to collect detailed information on forest characteristics (Fig. 1).
TLS data collection and processing
TLS data were acquired using a FARO Focus 3D × 130 (FARO Technologies Inc., Lake Mary, FL, USA). The instrument uses a phaseshiftbased technology with a maximum range of 120 m and acquires data with an azimuth scan angle of 360°. It collects the x, y, and z coordinates, and the intensity of the laser returns with a scanranging noise of ± 1 mm.
Using the TLS device, the data acquisition was set to a reasonable scan configuration providing a good tradeoff between a sufficient density and the required time for a single scan (resolution of 7.6 mm at 10 m and 1/5 4× overall quality), for a total of about 28 million pulses per scan. With the aim of reducing occlusion due to other obstacles or vegetation and ensuring sufficient coverage, scanning was performed in multiple positions, at least 8 scans per plot subjectively distributed based on the density and structure. Each of the trees in the stand was at least scanned from 3 positions and up to 12 white polystyrene registration spheres (14 cm diameter) were placed in each plot to aid in the digital registration of individual scans.
Individual scans were merged on the plot level using the automatic registration algorithm implemented in Trimble Real Works^{®} (TRW) software. The software automatically joins overlapping redundant points to create one seamless 3D point cloud suited for the analysis. Details of the operation (settings, criteria, and thresholds) performed by the software are not declared nor accessible. The process attained a very low plotlevel registration error, and TRW achieved a high precision scan placement (mean = 2.33 mm, sd = 3.18 mm, max = 8.21 mm). Using the TRW tool, each pine tree stem was manually separated from the rest (i.e., from the ground, the crown, and other nonpine points). Diameter measurements were acquired at approximately 1meter intervals through the fitting of cylinders to point cloud data (Fig. 2). To determine the volume, Huber’s formula was employed, which entails using the diameter measured at the midpoint of a log segment (eqn. 1):
V = h \cdot s_{0.5}
where V is the log volume, H is the log length, and s_{0.5} is the midsection sectional area.
Detailed descriptive statistics are provided for the trees inventoried in this study (Tab. 1). Statistics are derived from stem attributes estimated from point cloud data collected by the TLS device. The diameters and total height of the stems were estimated using the method described earlier, while the stem volume was determined by applying the Huber formula to each of the 1mlength cylinders. In total, 219 trees were extracted from the inventoried area and were systematically allocated to 5 folds. Four folds were exclusively designated for model training, while the remaining fold was reserved for rigorous testing during each iteration of the crossvalidation procedure.
Used models
The data acquired by TLS was used to fit 4 taper models from 3 different model categories, namely parametric, semiparametric, and nonparametric. Thereafter, the fitted models were evaluated in terms of their ability to predict diameters at different heights and the total stem volume as well. In the following, the different used taper models, as well as the evaluation metrics, are explicitly introduced.
Max and Burkhart (1976)
The segmented stem profile model developed by Max & Burkhart (1976) was used to represent the parametric model’s category. Numerous studies have shown that this model is accurate in estimating upper stem diameters of many species (Jiang & Liu 2011, Ozçelik & Brooks 2012, Pang et al. 2016). The model consists of three quadratic functions representing the stem’s lower, middle, and upper parts grafted together at two points (a_{1} and a_{2}) with continuous polynomials and continuous first derivatives at each point. In the present study the taper equation was constrained to pass through diameter at breast height (dbh) while it is by construction constrained for the total height (ht).
Max & Burkhart (1976) taper equation can be expressed as follows (eqn. 2):
where y_{ij} = (d_{ij} / dbh_{i})^{2}, dbh_{i} is the dbh of tree i, with i = [1, 2, …, N], N is the number of trees in the sample, d_{ij }is the ith upperstem diameter over bark (dob) at height h_{ij} on tree i, with j = [1, 2, …, n_{i}], n_{i }is the number of diameter measurements for tree i, z_{ij} = 1  (h_{ij}/ht) is the complement of the relative height, a_{1} and a_{2} are the join points to be estimated from the data, I_{k} = 1 if z > a_{i}, and 0 otherwise with k = [1, 2], b_{p}’s are regression coefficients with p = [1, 2, 3, 4] and ε_{ij} is the residual error term associated to the prediction of y_{ij}.
To constrain the taper equation to pass through dbh at the breast height, only one regression coefficient is needed to be changed. In this case, the b_{1 }parameter was modified as suggested in Cao (2009) and replaced by the modified b_{1}* which can be expressed as follows (eqn. 3):
where z_{bh} = 1  (1.3/ht). To estimate the parameters of eqn. 2, we used the LevenbergMarquardt algorithm, which is implemented within the “nls” function in the R stats package (R Core Team 2022).
Mixedeffects model
The stem taper models may show a high degree of multicollinearity, autocorrelation, and heteroscedasticity, all of which can be addressed with appropriate statistical methods. Many studies reported a reduction in the withintree autocorrelation when including a treelevel random effect (Garber & Maguire 2003, Cao & Wang 2011). Using a mixed model for example, the autocorrelation within the stem can be accounted for at each height measurement level.
In the mixed effect formulation, all parameters of eqn. 2 are expressed as fixed effect (population level coefficients) while some parameters may contain additional random effect (for each tree within the population). In the case of the Max & Burkhart (1976) taper equation, different combinations of the parameters (b_{1}, b_{2}, b_{3}, b_{4}, a_{1}, a_{2}) could be a potential candidate for random parameters.
Here is an expression for the stem profile including the fixed and random effects (eqn. 4):
where a_{1i} and a_{2i} are join points to be estimated from the data for each tree i, I_{k} = 1 if z > a_{ki}, and 0 otherwise with k = [1, 2], and b_{pi}’s are the regression coefficients to be estimated for each tree i, with p = [1, 2, 3, 4].
To define the statistical properties of the terms in eqn. 4, its vectorial formulation presented in the following offers a more simplified notation and will be considered (eqn. 5):
Y_{i} = (A_{i} B +B_{i}U_{i})X_{i} + \varepsilon_{i}
where Y_{i} is a vector of predictions, X_{i} is a matrix of predictors, A_{i} is a matrix for the fixed effects, B_{i }is a matrix for random effects, B is a vector of fixed parameters, U_{i} is the vector for random effects and ε_{i} is a vector of random errors, with the assumptions that ε_{i} = N(0,R) and U_{i} = N(0,D) where R is the diagonal variancecovariance matrix of ε_{i} and D is a diagonal variancecovariance matrix of U_{i}.
Using the “saemix” package in R (Comets et al. 2017), the stochastic approximation expectation maximization algorithm was used to estimate the parameters of eqn. 4.
The volume equation derived from the integration of the Max and Burkhart taper equation can be expressed as follows (eqn. 6):
where z_{y} = 1  (h_{y}/ht) and z_{x} = 1  (h_{x}/ht), h_{x }and h_{y }being respectively the lower height of interest (m) and the upper height of interest, K is a conversion factor (= π/40.000), I_{k} = 1 if z_{y} > a_{i}, and 0 otherwise, J_{k} = 1 if z_{x} > a_{i}, and 0 otherwise, with k = [1, 2].
Bsplines
Semiparametric models are a class of statistical models that provide a flexible compromise between the rigidity of fully parametric models and the complexity of fully nonparametric models. By avoiding rigid assumptions about functional form, semiparametric models offer the potential to capture the underlying relationship between variables more accurately, particularly for unknown or complex functional forms, while parametric approaches can be used for estimating model coefficients. Basissplines (Bsplines) are a type of semiparametric model that possess favorable numerical properties. To construct a Bspline model, a fixed number of splines, also called polynomials, are connected at specific points known as knots while ensuring the condition of continuity of their second derivatives across the knots.
To formalize, here we consider that the population mean diameter for a given relative height (h_{r}) may be approximated by a Bspline function of degree p and can be expressed as follows (eqn. 7):
where f(hr) is the response of the model which renders the diameter value d_{i} at a given h_{r}, β_{l}’s are the parameters of the model, d_{1 }is the number of parameters in the Bspline function with the condition that d_{1} ≤ k_{1} + 1 + p; k_{1} corresponds to the number of internal knots, B_{l,p}^{(1)} (hr) is the Bspline basis function computed based on the “de Boor recurrence relation”.
Due to the nonlinear nature of stem tapering and the hierarchy of data acquired from sampled trees, semiparametric models can be calibrated more efficiently using mixed approaches, which can also consider the random deviation of an individual tree from the population average. Linear mixed effect models can be used for the calibration if knot values are fixed a priori, which has the advantage of being less computationally expensive and more numerically stable than nonlinear mixed models.
The Bspline based taper equation considering both fixed and random effects can be written as (eqn. 8):
where f(hr_{ij}) corresponds to the population mean response, g(hr_{ij}) represents the treespecific deviation from f(hr_{ij}), ε_{ij} is the treespecific residual error, d_{ij} corresponds to the jth diameter from the ith sample tree, hr_{ij} corresponds to the jth relative height from the ith sample tree, β_{l}’s are the fixed regression coefficient of the model, θ_{i,l}’s are the random effect parameters to be estimated for each tree, d_{2} is the number of random parameters in the g function with the condition that d_{2 }≤ k_{2} + 1 + p; k_{2 }corresponds to the number of internal knots in the random effect part of the model, B_{l,p}^{(2)}(hr_{ij}) is the Bspline basis function for the random effect part of the model.
The vectorial formulation of eqn. 8 can be expressed as follows (eqn. 9):
where Y_{i }is the vector of diameter predictions at the level of the ith tree, X_{i} is the matrix of Bspine basis values for the fixed effect part of the model, Z_{i} is the matrix of Bspine basis values for the random effect part of the model, β is a vector of fixed parameters, θ_{i} is the vector of random effect parameters, ε_{i} is the vector of residual error with the assumption that ε_{i} = N(0, R) and θ_{i} = N(0, G) where R is the diagonal variancecovariance matrix of ε_{i} and G is a positive nondiagonal matrix of θ_{i}. For more details on this method, see Kublin et al. (2013).
According to the suggestions by Kublin et al. (2013). The parameters of eqn. 8 were estimated considering knots [0.0, 0.1, 0.75, 1.0] corresponding to the Bspline basis function B^{(1)}(hr) for the population mean function and [0.0, 0.1, 1.0] for B^{(2)}(hr) representing the deviation from the population average. Hence, the regression model depends only on relative height. Measurements of dbh and further diameter measurements of a specific tree are merely used to calibrate the tree specific deviation from the average taper curve by estimating the corresponding random effects. To ensure diameter estimates of zero at tree top, the last spline in both B^{(1)}(hr) and B^{(2)}(hr) are omitted (see Fig. S1 in Supplementary material for a graphical visualization and Kublin et al. 2013).
The R package “TapeR” was used to calibrate the model by the Restricted Maximum Likelihood (REML) method (default in the internally used R package “nlme”  Pinheiro et al. 2023) providing the dbh values for each sampling tree of training data. To estimate stem volumes, the implemented numerical integration method included in the “TapeR” package was used (Kublin et al. 2023).
Random Forest
Nonparametric methods have been recently used for taper predictions providing reasonable performances (Nunes & Görgens 2016, Yang & Burkhart 2020). The present study used Random Forest to estimate the diameter at different heights of the stem. This algorithm works by training many decision trees on random subsets of the features, then averaging out their predictions. Regarding the structure of the model, the diameter was set as the dependent variable, which were predicted by four independent predictors namely dbh, h, ht and h/ht (abbreviations as above). For finetuning the model, the grid search method was used to evaluate all possible combinations of hyperparameters (viz., the number of decision trees; the number of features to consider when looking for the best split; the minimum number of samples required to split an internal node) within the search space as well as the crossvalidation method to prevent overfitting. The Python library “scikitlearn” (ver. 0.24.2) which includes an implementation of the Random Forest algorithm was used here (Pedregosa et al. 2011).
Since nonparametric methods are not algebraically integrable, including the random forest algorithm, the Monte Carlo integration, which is a numerical integration approach, was selfimplemented and used to predict the stem volume for each sample tree (Fig. 3). This technique relies on a random choice of points at which the integrand is evaluated. Monte Carlo integration could be expressed as follows (eqn. 10):
where h_{a} and h_{b} being respectively the lower and the upper heights of interest (m), K is the conversion factor (= π/40.000), h_{i }corresponds to the ith height on the tree generated randomly from a uniform distribution U(h_{a}, h_{b}), i = [1, 2, …, N], and N is the number of samples generated with the distribution U.
Evaluation of the methods
Taper models should yield unbiased estimates with minimal variance in both diameter outside bark (dob) and stem volume. A single overall measure of bias or residual error of estimation calculated for dob and stem volume does not provide an exclusive indicator of the goodness in evaluating several taper equations for a set of trees of a given species. This holds true when calculating bias metrics at different parts of stems and for different trees can result in a zero or closetozero average given that bias can be positive or negative. Alternatively, the standard error which can report the variability of the biases is considered of better reliability. Moreover, when dealing with small sample size, the residual variance can be influenced by the degrees of freedom, making the correlation index, also known as the fitness index, a useful metric to address this concern. All those metrics were used to rank the fitted models.
The most reliable measure of mean bias (B), overall residual variance (SSE) and the fitness index (FI) can be expressed as follows (eqn. 11  eqn. 13):
where Y_{i} is the actual observation, Ŷ_{i} is the predicted value of the actual observation, bar{Y}_{i} is the mean of the actual observations, n is the number of observation and k corresponds to the number of estimated parameters.
As part of our evaluation process, we employed a grouping methodology to assess the performance of the taper models in conjunction with the overall assessment presented earlier. For dob, biases and standard errors were calculated for each 10% step of the height along the stem, while for the total stem volume (from ground to top), the biases and standard errors were calculated by DBH classes. This allowed us to gain a comprehensive understanding of the accuracy of the taper equations across various tree dimensions as well as across different parts within the trees.
ResultsTaper prediction
The model by Max & Burkhart (1976) was fit using both fixed and mixedeffects approaches. Different mixedeffects combinations of parameters were tested (Tab. S1 in Supplementary material), and the combination involving a_{1}, a_{2}, b_{1}, b_{2} and b_{4} produced the lowest values of Akaike’s information criterion (AIC) and Bayesian information criterion (BIC), resulting in the following parameter values (Tab. 2) and model form (eqn. 14):
The mixed Bsplines model was fit using the linear mixedeffects approach implemented in the TapeR package and reached convergence. In order to constrain the model predictions to equal zero at the tree top, d_{1 }which is the number of fixed parameters was set to 5, while d_{2} which corresponds to the number of random parameters was set to 4, omitting the topmost spline, respectively (see Fig. S1 in Supplementary material). The results of Bsplines model fitting are represented in Tab. 3.
Regarding the RF model, after rigorous finetuning, the model reached a state of convergence as the number of decision trees stabilized at an optimal value of 267. In addition, the hyperparameter tuning process led to the following optimized configuration: number of features to best split = 3; minimum samples to split internal nodes = 4.
For each taper equation, overall statistics of fit (Bias, SEE and FI) for the entire stem were calculated and are represented in Tab. 4 for dob by model. Among the four models under investigation, discernible trends in bias emerged. The Max and Burkhart models exhibited a noteworthy positive bias, suggestive of their propensity to overestimate diameters. In contrast, the Bspline and Random Forest models displayed a discernible negative bias, indicating a tendency to underestimate diameters. Notably, the Max and Burkhart fixed effect model demonstrated a particularly pronounced inclination towards overestimation. The results suggest that mixedeffect models outperformed models considering only fixed effects. In particular, the Max and Burkhart and Bsplines mixedeffects models showed relatively comparable SEE values of 0.78 cm and 0.96 cm, respectively. In contrast, the Max and Burkhart fixedeffect and random forest models had comparable SEE values of 1.90 cm and 2.26 cm, respectively. This comparison holds true for FI as well, although to a lesser extent than for SEE.
To assess the performance of taper models at different positions throughout the stem, the statistics of fit, derived from the 5fold crossvalidation results, were analyzed by relative height classes. These statistics are documented in Tab. S2 (Supplementary material) and graphically depicted in Fig. 1. To comprehensively assess the taper of all models, the average bias, SEE, and FI were calculated for each model by relative height class along the stem. The results show that models including mixed effects outperformed those considering only fixed effect roughly along all the section heights. Noticeably, a significant increase of SEE was observed for all the models from a relative height class of (7080%) and was shown to coincide with the crown base height (CBH) for most of the trees (Fig. 4).
Bias measurements indicate that the Max and Burkhart fixedeffect model tends to consistently overestimate upper stem diameters throughout the entire stem. On the other hand, the random forest model tends to underestimate upper stem diameters. Bsplinebased predictions tend to overestimate upper stem diameters in the lower and upper portions while underestimating in the middle portion. In contrast, predictions from the Max and Burkhart mixedeffects models exhibit an opposing bias pattern in the lower and middle sections but demonstrate similar behavior in the upper portion of the stem.
Volume prediction
The predictions of volume (over bark) by four taper models were compared with each other. Overall statistics of fit (Bias, SEE, and FI) for the total stem volume are represented in Tab. 5. The results show that mixedeffects models outperformed models considering only fixed effect. Specifically, Max and Burkhart and Bsplines models had a comparable SEE values of respectively 0.021 and 0.034 m^{3}, while the Max and Burkhart fixed effect and the Random Forest models had comparable SEE values of respectively 0.221 and 0.273 m^{3}. In addition, the overall results show that all the four models tend to generally overestimate the volume as the bias resulted in positive values for all the candidate models. Notably, the Max and Burkhart models (both fixed and mixed effects) exhibited the smallest overprediction, with an absolute bias value of less than 0.01 m^{3}.
To assess the performance of stem volume prediction for different tree dimensions, the statistics of fit (SEE, bias, and FI) were analyzed by DBH class categories. These statistics are documented in Tab. S3 (Supplementary material) and graphically depicted in Fig. 6. In general, results show that prediction errors are larger in large diameter trees for both Max and Burkhart fixed effect and Random Forest models, while the prediction error tend to be constant over all the DBH classes for Max and Burkhart mixed effects and Bsplines models. The results also demonstrate a strong correlation between volume predictions and volume observations (FI > ~ 0.5) for all four models across all DBH classes, except for the DBH class category (5565 cm) and (> 65 cm), where both the M&B fixed effect and Random Forest models exhibit limitations (FI ~ 0.42 and 0.21, SEE ~ 0.39 m^{3} and 0.71 m^{3}, respectively).
Model ranking
A taperestimating system should provide an unbiased estimation with minimum variance of both diameter outside bark (dob) and total stem volume outside bark. Using the results from Tab. 4 and Tab. 5, it is possible to rank the four estimating systems. The rank sums were created by ranking the performance by dob and stem volume which were equally weighted here in this study. The rank sums reported in Tab. 6 were generated in such a way that each estimating system was assigned a rank separately for every relative height for dob and for every DBH class for the volume. These ranks were also summed for standard errors and biases. As a result, Max and Burkhart mixedeffects system ranked first in most of all the ranking categories. Bsplines based system ranked second in estimating diameter and volume by standard error of estimates while random forest system ranked third and Max and Burkhart fixed effect ranked fourth with regards to the same previously discussed estimates.
Discussion
Taper equations are an indispensable tool in the field of forest management and industry. However, the development of an accurate taper model present inherent challenges, mainly due to the need for comprehensive customization to account for the characteristics exhibited by each species and sitespecific environmental conditions, and the availability of highquality data required for the model development. In the present study, four taper models from three different class categories were compared in terms of their capacity to predict diameter over bark (dob) and total stem volume of Pinus nigra Arn. Across models, the Max and Burkhart and Bsplines mixedeffects models demonstrated superior accuracy and reliability in predicting both dob and total stem volume, with a small advance for the Max and Burkhart model.
The Terrestrial Laser Scanner (TLS) has previously been employed as an effective tool for measuring trees in forest environments and generating taper equations (Saarinen et al. 2019, Sun et al. 2016). The case study conducted on black pine forests, which is presented here, demonstrated the feasibility of using TLS for surveys and highlighted the high quality of the acquired data. In the Italian context, where tree measurements for quantitative purposes have significantly decreased in recent decades, technological tools like TLS represent a pivotal advancement for rejuvenating our understanding of the Italian forest sector.
In the present study, biases and standard errors of estimate were evaluated for diameter outside bark (dob) and total stem volume. These biases and standard errors were also evaluated for different heights within the trees and across different tree sizes. The overall goodnessoffit statistics (Bias, SEE, and FI) were calculated and reported in Tab. 4, Tab. 5 and Fig. 5, Fig. 6 for the four tested models. The results indicate that the Max and Burkhart and Bsplines mixedeffects models explained more than 99% of the total variation in predicting the upperstem diameter and total stem volume. The Max and Burkhart fixedeffect model explained more than 99% of this total variation, while the Random Forest model explained more than 96% for the diameter variation and 96% for the total volume variation.
The superiority of models that include random effects over those that only consider fixed effects has been demonstrated in various previous studies, and our findings are in agreement with this. For example, a study conducted by Leites & Robinson (2004) showed that the inclusion of random effects considerably improved the fit of the Max and Burkhart taper equation for Pinus taeda. In a different study by Bronisz & Zasada (2020), the Kozak’s taper equation fitted using a mixedeffects approach provided better results for the upper diameter and total stem volume for Pinus sylvestris. In our study, the Max & Burkhart (1976) model was fitted using both fixed and mixed effects approaches. Notably, the error was reduced by 52% for the upperstem diameter and by 89% for the total stem volume when random effects were additionally considered.
It is also noteworthy that the evaluation of the proposed models exhibited a higher Standard Error of Estimation (SEE) in predicting the upperstem diameter at a relative height of [0.7, 0.8] (Tab. S2  Supplementary material). The analysis of the distribution of the tree’s relative crown base height reveals that 50% of the observations are located between a value of 0.7 and 0.8, and the maximum relative crown base height was found to be approximately equal to 0.8 (Fig. 4). This implies that all observations above a relative height of 0.8 were necessarily measured inside the crown. This last result may partly explain the notable increase in error in predicting the upperstem diameter at a relative height of 0.7. In fact, many studies have reported the inability of Terrestrial Laser Scanners to collect reliable data beyond the base of the crown due to interference from branches and leaves as well as the mutual occlusion between canopies. Practically, when approaching higher parts of the stem, occlusion starts to deteriorate the quality of cylinder fitting (Liu et al. 2018, Pitkänen et al. 2019). Additionally, as the distance from the scanner increases, spatial resolution decreases, which is a wellknown effect in scanning that particularly impact the upper parts of trees (Liang et al. 2016). The interference from the crown, compounded with the increased distance, may explain the models’ inability to detect a welldefined taper pattern above the crown base.
In this study, Monte Carlo numerical integration was employed for estimating the volume of tree stems using the nondifferentiable and complex Random Forest model. This decision was justified by the model’s inherent characteristics mentioned earlier, which render traditional integration methods less suitable. It is imperative to underscore the significance of comparing our chosen method with alternative numerical integration techniques to gain insights into result robustness and accuracy, potentially informing future research. Notably, the RF model, unlike the parametric and semiparametric models used in this study, typically demands a substantial amount of training data. In our analysis, the RF model obtained the lowest score in terms of SEE (dob and volume). This highlights the potential impact of our relatively small dataset (219 trees) on RF performance. In addition, the use of various integration methods (Huber, Monte Carlo, and analytical integration) in our study may introduce minor biases affecting our results. Further research is needed to assess and potentially mitigate these biases, as their combined impact on the findings remains uncertain.
Considering the results of this study, the Max & Burkhart (1976) taper model, calibrated by means of a mixedeffects approach, and its compatible volume integration could be considered for operational use in estimating diameter outside bark at different heights, as well as total stem volume, respectively, with an overall precision of 0.781 cm and 0.021 m³ in Pinus nigra Arn.
Conclusion
This study explored the application of precise and nondestructive techniques, such as Terrestrial Laser Scanning, for collecting data to develop taper equations specific to Pinus nigra Arn., in line with the principles of precision forestry. The collected data were used to fit four taper models from three different class categories, evaluated for their ability to predict diameter at various heights and total stem volume. The Max & Burkhart (1976) taper model, calibrated with a mixedeffects approach, proved superior in estimating diameter and volume compared to other models. However, a common limitation among all models was their inability to accurately predict diameter beyond a relative height of 0.7 due to crown interference. The selected model, within the scope of this study, can be operationally utilized by forest managers to enhance predictions of diameter and stem volume, aiding in optimizing wood allocation to different harvested products. Future research perspectives may focus on refining algorithms and methods for extracting diameter at the upper parts of the stem.
Data, code availability and setup
The whole dataset is made available at https://doi.org/10.5281/zenodo.8414408. The study was performed on a laptop with Win10 22H2 operating system, an Intel Core i5 7200U CPU with maximal capacity of 2.70 GHz, and 8 Gb of RAM.
Authors contribution
NP and CV contributed equally to the manuscript.
ReferencesBaker ER, Peña JM, Jayamohan J, Jé AMechanistic models versus machine learning, a fight worth fighting for the biological community? Biology Letters 14 (5): 20170660.2018Bernetti G, Cantiani M, Hellrigl BRicerche alsometriche e dendrometriche sulle pinete di pino nero e laricio della Toscana [Alsometric and dendrometric research on the black pine and larch pine forests of Tuscany]. L’Italia Forestale e Montana 26 (1): 1040. [in Italian]1969Bottalico F, Chirici G, Travaglini DThe forest management of Vallombrosa from 1876 to 2006: Analysis of forest maps. L’Italia Forestale e Montana 67 (6): Art. 6.2012Bronisz K, Zasada MCorrection: Bronisz K. and Zasada M., Comparison of fixed and mixedeffects approaches to taper modeling for Scots pine in West Poland. Forests 2019, 10, 975. Forests 11 (4): 437.2020BrunetNavarro P, Jochheim H, Muys BThe effect of increasing lifespan and recycling rate on carbon storage in wood products from theoretical model to application for the European wood sector. Mitigation and Adaptation Strategies for Global Change 22 (8): 11931205.2017Calders K, Newnham G, Burt A, Murphy S, Raumonen P, Herold M, Culvenor D, Avitabile V, Disney M, Armston J, Kaasalainen MNondestructive estimates of aboveground biomass using terrestrial laser scanning. Methods in Ecology and Evolution 6 (2): 198208.2015Calders K, Verbeeck H, Burt A, Origo N, Nightingale J, Malhi Y, Wilkes P, Raumonen P, Bunce RGH, Disney MLaser scanning reveals potential underestimation of biomass carbon in temperate forest. Ecological Solutions and Evidence 3 (4): e12197.2022Cao QVCalibrating a segmented taper equation with two diameter measurements. Southern Journal of Applied Forestry 33 (2): 5861.2009Cao QV, Wang JCalibrating fixed and mixedeffects taper equations. Forest Ecology and Management 262 (4): 671673.2011Choudhry H, O’Kelly GPrecision forestry: a revolution in the woods. McKinsey & Co., Singapore, pp. 111.2018Comets E, Lavenu A, Lavielle MParameter estimation in nonlinear mixed effect models using saemix, an R Implementation of the SAEM algorithm. Journal of Statistical Software 80 (3): 141.2017Dálya LB, Capretti P, Ghelardini L, Jankovsky LAssessment of presence and distribution of Armillaria and Heterobasidion root rot fungi in the forest of Vallombrosa (Apennines Mountains, Italy) after severe windstorm damage. iForest  Biogeosciences and Forestry, 12 (1): 118124.2019Enescu C, De Rigo D, Caudullo G, Durrant TPinus nigra in Europe: distribution, habitat, usage and threats. In: “European Atlas of Forest Tree Species” (SanMiguelAyanz J, De Rigo D, Caudullo G, Houston Durrant T eds). EU Publication Office, Luxembourg, pp. 126127.2016Erber G, Gollob C, Krassnitzer R, Nothdurft A, Stampfer KStemlevel bucking pattern optimization in chainsaw bucking based on terrestrial laser scanning data. Croatian Journal of Forest Engineering 43 (2): 287301.2022Fardusi MJ, Chianucci F, Barbati AConcept to practice of geospatialinformation tools to assist forest management and planning under precision forestry framework: a review. Annals of Silvicultural Research 41 (1): 314.2017Garber SM, Maguire DAModeling stem taper of three central Oregon species using nonlinear mixed effects models and autoregressive error structures. Forest Ecology and Management 179 (1): 507522.2003Gordon AComparison of compatible polynomial taper equations. New Zealand Journal of Forestry Science 13 (2): 146155. 1983GómezGarcía E, CrecenteCampo F, DiéguezAranda USelection of mixedeffects parameters in a variableexponent taper equation for birch trees in northwestern Spain. Annals of Forest Science 70 (7): 707715.2013Hansen E, Rahlf J, Astrup R, Gobakken TTaper, volume, and bark thickness models for spruce, pine, and birch in Norway. Scandinavian Journal of Forest Research 38 (6): 413428.2023Isenburg MLAStools  Efficient LiDAR processing software. Version 141017 academic, web site.2017Jiang L, Liu RSegmented taper equations with crown ratio and stand density for Dahurian larch (Larix gmelinii) in Northeastern China. Journal of Forestry Research 22 (3): 347352.2011Kovácsová P, Antalová MPrecision forestrydefinition and technologies. Sumarski List 34 (1112): 603610.2010Kozak A, Smith JHGStandards for evaluating taper estimating systems. The Forestry Chronicle 69 (4): 438444.1993Kublin E, Breidenbach J, Kändler GA flexible stem taper and volume prediction method based on mixedeffects Bspline regression. European Journal of Forest Research 132 (5): 983997.2013Kublin E, Breidenbach J, Vonderach CTapeR: flexible tree taper curves based on semiparametric mixed models (0.5.2). Computer software, web site.2023Kuzelka K, Marušák RComparison of selected splines for stem form modeling: a case study in Norway spruce. Annals of Forest Research 57 (1): 1371482014Leites LP, Robinson APImproving taper equations of loblolly pine with crown dimensions in a mixedeffects modeling framework. Forest Science 50 (2): 204212.2004Liang X, Hyyppä JAutomatic stem mapping by merging several terrestrial laser scans at the feature and decision levels. Sensors 13 (2): 16141634.2013Liang X, Kankare V, Hyyppä J, Wang Y, Kukko A, Haggrén H, Yu X, Kaartinen H, Jaakkola A, Guan F, Holopainen M, Vastaranta MTerrestrial laser scanning in forest inventories. ISPRS Journal of Photogrammetry and Remote Sensing 115: 6377.2016Liu G, Wang J, Dong P, Chen Y, Liu ZEstimating individual tree height and diameter at breast height (DBH) from Terrestrial Laser Scanning (TLS) data at plot level. Forests 9 (7): 398.2018Maas HG, Bienert A, Scheller S, Keane EAutomatic forest inventory parameter determination from terrestrial laser scanner data. International Journal of Remote Sensing 29 (5): 15791593.2008Marchi M, Scotti R, Rinaldini G, Cantiani PTaper function for Pinus nigra in Central Italy: is a more complex computational system required? Forests 11 (4): 405.2020Max T, Burkhart HSegmented polynomial regression applied to taper equations. Forest Science 22: 283289.1976McTague JP, Weiskittel AEvolution, history, and use of stem taper equations: a review of their development, application, and implementation. Canadian Journal of Forest Research 51 (2): 210235.2020Mäkelä A, Valentine HTApplications and future outlook. In: “Models of Tree and Stand Dynamics: Theory, Formulation and Application” (Mäkelä A, Valentine HT eds). Springer International Publishing, Cham, Switzerland, pp. 245266.2020Nunes MH, Görgens EBArtificial intelligence procedures for tree taper estimation within a complex vegetation mosaic in Brazil. PLoS One 11 (5): e0154738.2016Olivera Farias A, Visser RUsing the harvester onboard computer capability to move towards precision forestry. New Zealand Journal of Forestry 60: 37.2016Ozçelik R, Brooks JRCompatible volume and taper models for economically important tree species of Turkey. Annals of Forest Science 69 (1): 105118.2012Pang L, Ma Y, Sharma RP, Rice S, Song X, Fu LDeveloping an improved parameter estimation method for the segmented taper equation through combination of constrained twodimensional optimum seeking and least square regression. Forests 7 (9): 194.2016Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay EScikitlearn: machine learning in Python. Journal of Machine Learning Research 12 (85): 28252830.2011Pinheiro J, Bates D, DebRoy S, Sarkar D, Heisterkamp S, Willigen BV, Ranke J, R Core Teamnlme: linear and nonlinear mixed effects models. Version 3:1162, web site.2023Pitkänen TP, Raumonen P, Kangas AMeasuring stem diameters with TLS in boreal forests by complementary fitting procedure. ISPRS Journal of Photogrammetry and Remote Sensing 147: 294306.2019Puletti N, Grotti M, Scotti REvaluating the eccentricities of poplar stem profiles with terrestrial laser scanning. Forests 10 (3): 239.2019R Core TeamR: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.2022Saarinen N, Kankare V, Pyörälä J, Yrttimaa T, Liang X, Wulder MA, Holopainen M, Hyyppä J, Vastaranta MAssessing the effects of sample size on parametrizing a taper curve equation and the resultant stemvolume estimates. Forests 10 (10): 848.2019Salekin S, Catalán CH, Boczniewicz D, Phiri D, Morgenroth J, Meason DF, Mason EGGlobal tree taper modelling: a review of applications, methods, functions, and their parameters. Forests 12 (7): 913.2021Sun Y, Liang X, Liang Z, Welham C, Li WDeriving merchantable volume in poplar through a localized tapering function from nondestructive terrestrial laser scanning. Forests 7 (4): 87.2016Thiel D, Nagy L, Beierkuhnlein C, Huber G, Jentsch A, Konnert M, Kreyling JUniform drought and warming responses in Pinus nigra provenances despite specific overall performances. Forest Ecology and Management 270: 200208.2012Weiskittel AR, Kershaw JA, Vanclay JKIndividualtree static equations. In: “Forest Growth and Yield Modeling”. John Wiley and Sons, Ltd., Hoboken, NJ, USA, pp. 115137.2011Yang SI, Burkhart HERobustness of parametric and nonparametric fitting procedures of treestem taper with alternative definitions for validation data. Journal of Forestry 118 (6): 576583.2020
Map of the geographical situation of the Vallombrosa forest and the location of Pinus nigra Arn. stands in this forest (Orange color). Stratification of those stands by density (D1D3) and canopy height (H1H3; down left subfigure).
Schematic illustration of the point cloud data exploration. (a): The resulting point cloud after the preprocessing steps from a small region within one stratification unit; (b): identification of a representative tree within the point cloud; (c): zoomedin view of the stem of the identified tree; (d): removal of branches for stem cleaning; (e): fitting a cylinder to a specific section of the tree; (f): crosssectional view of the point cloud for the section depicted in panel (e).
Schematic tree form representation for the three applied models. (a) the Max and Burkhart parametric model; (b) Bsplines semiparametric model; (c) Random Forest nonparametric model.
Distribution of the relative crown base height (in gray color). The distribution fit curve (in red color), The boxplot of the relative crown base height (in blue color).
Bias, SEE and FI of the estimate of diameter over bark (dob) by relative class for the four models.
Bias, SEE and FI of the estimate of the stem volume by DBH class for the four models.
Mean, standard deviation (SD), and range for tree characteristics from all the samples. (*): Disk refers to a given crosssection of the stem, at a given height level, for which the diameter is measured.
Parameter
Mean
SD
min
max
DBH (cm)
45.05
11.22
21.0
82.0
Total height (m)
30.7
4.0
20.5
40.5
Disk* dob (cm)
29.3
12.5
0
82.5
Disk height (m)
14.5
8.7
0.9
40.5
Crown base height (m)
21.0
4.0
11.7
31.3
Volume (m^{3})
2.27
1.32
0.38
8.04
Estimates of parameters (± standard errors) for fixedeffects and mixedeffects taper models, based on the validation data set.
Parameters
Fixedeffect model
Mixedeffects model
a_{1}
0.6384 ± 0.0242
0.71 ± 0.007
a_{2}
0.8756 ± 0.0043
0.89 ± 0.001
b_{1}
0.9577 ± 0.0085
0.96 ± 0.014
b_{2}
0.2546 ± 0.0166
0.25 ± 0.017
b_{3}
1.1693 ± 0.2123
2.06 ± 0.083
b_{4}
25.5313 ± 1.9310
38.67 ± 1.428
var(a_{1})

0.006 ± 0.00074
var(a_{2})

0.0001 ± 0.00002
var(b_{1})

0.042 ± 0.0042
var(b_{2})

0.063 ± 0.0065
var(b_{4})

52.250 ± 20.10
Estimates of parameters (± standard errors) for Bsplines model, based on the validation data set.
Parameters
Bsplines model
β_{1}
59.10 ± 0.89
β_{2}
41.58 ± 0.52
β_{3}
31.76 ± 0.22
β_{4}
29.27 ± 0.44
β_{5}
13.88 ± 0.05
var(θ_{1})
172.95 ± 13.15
var(θ_{2})
114.94 ± 10.72
var(θ_{3})
38.71 ± 6.22
var(θ_{4})
136.16 ± 11.66
Total stem fit statistics for the four taper models from crossvalidation results. The underlined values refer to the best statistics. M&B is used here as the acronym of Max and Burkhart.
Model
Bias ± SD(cm)
SEE ± SD(cm)
FI ± SD
M&B fixed effect
0.274 ± 0.292
1.90 ± 0.213
0.97 ± 0.030
M&B mixed effects
0.027 ± 0.073
0.78 ± 0.094
0.99 ± 0.021
Bsplines
0.015 ± 0.082
0.96 ± 0.102
0.99 ± 0.006
Random forest
0.010 ± 0.357
2.26 ± 0.286
0.96 ± 0.046
Total stem volume fit statistics for the four taper models from crossvalidation results. The underlined values refer to the best statistics. M&B is used here as the acronym of Max and Burkhart.
Model
Bias ± SD (m^{3})
SEE ± SD (m^{3})
FI ± SD
M&B fixed effect
0.0085 ± 0.065
0.221 ± 0.044
0.97 ± 0.207
M&B mixed effects
0.0080 ± 0.001
0.021 ± 0.002
0.99 ± 0.001
Bsplines
0.0240 ± 0.006
0.034 ± 0.005
0.99 ± 0.005
Random forest
0.0560 ± 0.066
0.273 ± 0.066
0.91 ± 0.169
Rank sum analysis for the four estimating systems.
Model Parameters
Max and Burkhart
Bsplines
Random forest
Fixed effect
Mixed effects
Overall SEE for diameter and volume
6 (3)
2 (1)
4 (2)
8 (4)
Overall bias for diameter and volume
6 (3)
4 (1)
5 (2)
5 (2)
SEE for diameter by relative height
31 (3)
17 (1)
16 (2)
36 (4)
Bias for diameter by relative height
29 (3)
21 (2)
30 (4)
20 (1)
SEE for volume by DBH classes
15 (3)
6 (1)
8 (2)
18 (4)
Bias for volume by DBH classes
16 (4)
9 (1)
13 (3)
12 (2)
Total
103 (4)
59 (1)
76 (2)
99 (3)
Tab. S1  Model selection statistics for evaluating the inclusion of random parameters in the Max and Burkhart taper model.
Fig. S1  Population meanfixed effects Bspline basis.
Tab. S2  Bias (cm), standard error (cm), and fitness index of the estimate of dob by relative height for four models.
Tab. S3  Bias (m^{3}), standard error (m^{3}), and fitness index of the estimate of stem volume by DBH class for four models.