Knowledge about the aboveground biomass (AGB) and the diameters at breast height (DBH) distribution can lead to a precise estimation of carbon density and forest structure which can be very important for ecology studies especially for those concerning climate change. In this study, we propose to predict DBH and AGB of individual trees using tree height (H) and crown diameter (CD), and other metrics extracted from airborne laser scanning (ALS) data as input. In the proposed approach, regression methods, such us support vector machine for regression (SVR) and random forests (RF), were used to find a transformation or a transfer function that links the input parameters (H, CD, and other ALS metrics) with the output (DBH and AGB). The developed approach was tested on two datasets collected in southern Norway comprising 3970 and 9467 recorded trees, respectively. The results demonstrate that the developed approach provides better results compared to a stateoftheart work (based on a linear model with the standard leastsquares method) with RMSE equal to 81.4 kg and 92.0 kg, respectively (compared to 94.2 kg and 110.0 kg) for the prediction of AGB, and 5.16 cm and 4.93 cm, respectively (compared to 5.49 cm and 5.30 cm) for DBH.
Forests are considered a major component of the global carbon cycle. A precise characterization of forest ecosystems in terms of carbon stock density and forest structure is an important key in international efforts to mitigate climate change. Carbon density can be estimated directly from the aboveground biomass (AGB) of trees, while the knowledge about the distribution of diameter at breast height (DBH) can be useful in understanding the forest structure (
In forest inventories in general, DBH and height (H) are measured and registered in the field in order to predict the AGB using allometric models. AGB is then converted to carbon density for each fieldreference tree (
The objective of the current study is to analyze the use of machine learning methods, such as Support Vector Machines for Regression (SVR) and Random Forest (RF) to predict DBH and AGB at the ITC level using metrics extracted from ALS data. In the first part of the experiment, only H and CD are used as input in order to compare with the work of
In this study, two datasets located in boreal forests of southeastern Norway were used: Hadeland and Våler (
The field data acquired in the Hadeland district (
ALS data were acquired on 21^{st} and 22^{nd} of August 2015 using a Leica ALS70 laser scanner operated at a pulse repetition frequency of 270 kHz. The flying altitude was of 1100 m above ground level. Up to four echoes per pulse were recorded and the resulting density of single and first echoes was 5 m^{2}.
The data were acquired in the Våler municipality in the southern part of Norway (
The ALS data were acquired on 9^{th} September 2011 using a Leica ALS70 system operating with a pulse repetition frequency of 180 kHz. The flying altitude was of 1500 m above ground level. Up to four echoes per pulse were recorded and the resulting density of single and first echoes was 2.4 m^{2}.
In
ITCs were delineated using an approach based on the ALS data and the delineation algorithm of the R package “itcSegment”. The algorithm starts first by finding the local maxima within a rasterized Canopy Height Model (CHM) and designates them as tree tops, and then uses a decision tree method to grow individual crowns around the local maxima. The different steps for this adopted approach are as follows (
apply a 3 × 3 lowpass filter to the rasterized CHM in order to smooth the surface and reduce the number of local maxima;
localization of local maxima by using a circular moving window of variable size. The user provides a minimum and maximum size of the moving window; the window size is adapted according to the central pixel of the window: the size of the window is linearly related to the CHM height. A pixel of the CHM is considered as local maximum if its value is greater than all other values in the window, and if it is greater than some minimum height above ground. The window size is adapted according to the height of the central pixel of the window;
labeling each local maximum as an “initial region” around which a tree crown can grow;
extraction of the heights of the four neighboring pixels from the CHM and adding them (the pixels) to the region if their vertical distance from the local maximum is less than a predefined percentage of the local maximum height, and less than a predefined maximum difference;
reiteration of the previous step for all the neighboring cells included in the region until no further pixels are added to the region;
extraction of single and first echoes from the ALS data from each identified region (having first removed low elevation echoes,
application of a 2D convex hull to these echoes. The resulting polygons become the final ITCs. For each ITC CD and H are provided. The CD is computed as 2 · √(ITC_{area}/π), while the height is computed as the 99th percentile of the elevation of the single and first return ALS echoes inside each ITC.
The delineated ITCs were automatically matched to the trees in the field data sets. If only one fieldmeasured tree was included inside an ITC, then that tree was associated with that ITC. In the case that more than one fieldmeasured tree was included in a segmented ITC, the fieldmeasured tree with the height most similar to the ITC height was chosen.
From each delineated ITC, metrics were extracted in order to build the regression models. In particular, two sets of metrics were considered. The first set, called H+CD, contained two geometric metrics of the extracted ITCs, the height and crown diameter. The second set contains metrics extracted from the ALS points falling inside each ITC. This set of metrics comprised 50 statistics and they are summarized in
Let us consider a matrix of training observations
Support Vector machine for Regression (SVR 
Therefore, SVR is formulated as minimization of the following cost function (
subject to (
where
The aforementioned optimization problem can be transformed through a Lagrange function into a dual optimization problem expressed in the original dimensional feature space in order to lead to the following dual prediction model (
where
The Random Forest (RF) method, which was proposed by
Given training set
The evaluation of the Random Forest regression is done through the minimization of the mean square error (MSE) in order to select the optimum trees in the forest. In this study, the Random Forest classifier implemented in the “randomForest” library of the software R was used.
In order to evaluate our methods, each dataset (
Regarding the SVR, the Radial Basis Function (RBF) was used as kernel functions. To compute the best parameter values, we use a crossvalidation technique with a number of folds equal to 3. During the cross validation, the parameter of regularization of the SVR
For the RF method, we fix the number of trees to grow to 100, while the number of variables which will be randomly sampled as candidates at each split is fixed to 1 when using only CD and H as features and to 25 when using all the features (CD+ H+ALS data).
In order to evaluate the developed method of prediction and perform a direct comparison with results of the stateoftheart methods, we adopted the Root Mean Square Error (RMSE) which measures the differences between values predicted by our model and the groundreference values (
where
We also adopted the percentage improvement ratio measure (
In the first part of the analysis, only H and CD were used as input in order to compare the obtained results with those reported in
From
In greater detail, considering the AGB prediction,
To see visually the quality of the results, we show in
In order to improve the results, a set of ALS metrics were used together with the previous metrics (H and CD). The obtained results (
In
In this work, we proposed an approach to predict DBH and AGB of trees from remote sensing data by using SVR and RF regression methods. The developed approach was tested on two datasets. On the first part of the experiments, the metrics H and CD were used in order to predict DBH and AGB. The obtained results were promising and the improvements were noticeable, especially in terms of
Finally, in order to improve the quality of the results and to get better predictions for old trees with large values of DBH and AGB, we think it can be more advantageous to use techniques that preprocess the data in order to yield a balance for their distribution over all the scale of the different metrics.
This work was supported by the HyperBio project (project 244599) financed by the BIONR program of the Research Council of Norway and TerraTec AS, Norway.
Location of the two study areas. (A) Hadeland; (B) Våler.
Architecture of the prediction system used.
Fieldreference DBH
Fieldreference AGB
Fieldreference DBH
Fieldreference AGB
Summary statistics of the field data for all datasets. For the tree height, DBH and AGB the data range and the mean (in brackets) are provided. For the species the number of trees and the percentage (in brackets) are provided.
Variable  Species  Hadeland  Våler 

Tree height (m)  Spruce  5.8  25.4 (15.8)  3.5  33.3 (18.6) 
Pine  4.7  23.1 (16.0)  4.4  26.0 (15.1)  
Broadleaves  5.1  22.9 (13.8)  5.8  26.3 (15.3)  
DBH (cm)  Spruce  5.1  44.1 (19.3)  4.3  50.3 (21.1) 
Pine  4.7  51.1 (25.6)  4.0  47.9 (20.2)  
Broadleaves  4.0  49.5 (14.8)  4.7  38.9 (16.2)  
Tree AGB (kg)  Spruce  6.1  681.4 (155.1)  4.2  1232.9 (216.3) 
Pine  3.1  691.0 (214.3)  2.4  728.4 (146.3)  
Broadleaves  2.4  738.2 (103.2)  4.1  680.4 (125.5)  
Species  Spruce  737 (59.7%)  1326 (50.2%) 
Pine  315 (25.5%)  956 (36.2%)  
Broadleaves  182 (14.8%)  361 (13.6%) 
Metrics extracted from the ALS points.
Metric  Description 

Zmax  Maximum Z 
Zmean  Mean Z 
Zsd  Standard deviation of Z distribution 
Zskew  Skewness of Z distribution 
Zkurt  Kurtosis of Z distribution 
Zentropy  Entropy of Z distribution 
ZqP  Ph percentile of height distribution, with P from 5 to 95 at steps of 5 
ZpcumP  Cumulative percentage of points in the P^{th} layer, with P from 5 to 95 at steps of 5 
Itot  Sum of intensities for each return 
Imax  Maximum intensity 
Imean  Mean intensity 
Isd  Standard deviation of intensity 
Iskew  Skewness of intensity distribution 
Ikurt  Kurtosis of intensity distribution 
IpcumzqP  Percentage of intensity returned below the P^{th} percentile of Z, with P from 5 to 95 
pRth  Percentage of R^{th} return, with R from 1 to 4 
Accuracy statistics for DBH and AGB predictions using H+CD as input.
Dataset  Method  DBH  AGB  

Hadeland 

5.49    94.19   
RF  5.42  1.28  84.35  10.45  
SVR  5.16  6.01  81.43  13.55  
Våler 

5.30    109.99   
RF  5.19  2.08  95.46  13.21  
SVR  4.93  6.98  92.04  16.32 
Accuracy statistics for DBH and AGB predictions using H+CD+ALS metrics.
Dataset  Method  DBH  AGB  

Hadeland  RF  4.79  12.75  75.97  19.34 
SVR  4.93  10.20  78.54  16.62  
Våler  RF  4.88  7.92  88.93  19.15 
SVR  4.87  8.11  91.15  17.13 