^{1}

^{*}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{7}

^{8}

Climate is one of the main causes of forest fires in Europe. In addition, forest fires are influenced by other factors, such as the reconstruction of tree stands with a uniform species composition and increasing human pressure. At the same time, the increasing number of fires is accompanied by a steady increase in the number and quality of spatial information collected, which affects the ability to conduct more accurate studies of forest fires. The appropriate use of spatial information systems (GIS) together with all the collected information on fires could provide new insights into their causes and, in further steps, allow the development of new, more accurate predictive models. The objectives of the study were: (i) to estimate the probability of fire occurrence in the period 2007-2016; (ii) to evaluate the performance of the developed model; (iii) to identify and quantify anthropogenic, topographic and stand factors affecting the probability of fire occurrence in forest areas in Poland. To achieve these objectives, a statistical model based on a logistic regression approach was built using the nationwide forest fire database for the period from 2007 to 2016. The information in the database was obtained from the Polish State Forest Information System (SILP). Then it was supplemented with spatial, topographic and socio-economic information from various spatial and statistical databases. The results showed that fire probability is significantly positively affected by population density and distance from buildings. In addition, the further the distance from roads and railways, watercourses and water objects or the edge of the forest, height above sea level, and steep slopes, the lower is the fire probability. Analysis of spatial, ecological and socio-economic factors provides new insights that contribute to a better understanding of fire occurrence in Poland.

Forest fires are a major cause of forest environmental degradation and an important contributor to global carbon emissions (

Poland is classified as one of the countries with average forest fire risk in the European Union (

It is not only in Poland that human factors are of great importance in evaluating the fires occurrence. Accidents/negligence or arson are also cited as one of the main causes of forest fires in many other European countries (

Due to the increase of forest fires in Poland in recent years, it is important to learn about additional groups of factors that may influence the fire occurrence (

So far, spatial, topographic, and socio-economic factors have been largely ignored in research in Poland. Considering this, and taking into account previous research results from other European countries (

The aim of the study was: (i) to estimate the probability of fire occurrence over the period 2007-2016; (ii) to evaluate the performance of the model; (iii) to identify and quantify anthropogenic, topographic and stand factors affecting the probability of fire occurrence in forest areas in Poland.

The study area included Polish State Forests (7354.8 thousand ha, 76.9% of all forests in the country). The analyses were performed on the basis of data on forest fires collected between Jan 1, 2007 and May 31, 2016 (

Information on forest fires was obtained from the Polish State Forest Information System (SILP). In the analyzed period (Jan 01, 2007 to May 31, 2016 -

The location of each fire in relation to topography was compiled based on freely available geospatial data (see Tab. S1 in Supplementary material). The topographic variables: elevation above sea level, terrain aspect, and slope were obtained from Shuttle Radar Topography Mission (SRTM) data. The average accuracy of the SRTM elevation model for Europe is 6.2 m. According to

The planar distance of the fire from the nearest railroad tracks, roads, fire access roads, surface water bodies, and rivers was determined based on the OpenStreetMap and Forest Digital Map combined databases. Two other freely available data sources, CORINE Land Cover 2012 and protected areas layer (nature reserve, landscape park, protected landscape area) of the Polish General Directorate of Environmental Protection, allowed determining the distance of the fire from urban areas and protected natural objects. The distance to the nearest 1 km grid (number of people>0) with demographic data from Statistics Poland was also calculated.

Municipality level data (level 2 according to Local Administrative Unit - LAU), such as population density (persons km^{-2}), net migration rate (number of persons), number of persons living in urban areas, unemployment rate (%), average ^{-1}), were extracted from the Statistics Poland Local Data Bank and aggregated to the forest district level. Aggregation was done by calculating the weighted average of each characteristic for the given forest districts, where the weight corresponded to the share of the municipality area in the total area of the forest district.

The data on fires contained information on the location of their occurrence, so it was possible to use the “Extract Values to Points” tool of ArcGis® ver. 10.5 (ESRI, Redwoods, CA, USA) to obtain variables from grids that contained data on topography. The distance of the fire to the selected spatial objects was calculated using the “Near” tool in ArcGis®. Statistical information was assigned to the forest fire layer based on the combination of forest districts and attribute tables of the forest fire layer according to the spatial identifier (

In the first step of data processing, a content analysis of the collected material were conducted, including elimination of incomplete observations from the database, aggregation of some variables,

Qualitative variables were coded as dummy variables after aggregation of detailed values (indicator coding, see

In analyzing the empirical material, the use of different statistical models for qualitative binomial variables (fire/no fire) was tested. First, the usefulness of the following models was analyzed: logit, probit, LPM (Linear Probability Model), and Poisson and Cox regression models. Since Cox models are concerned with estimating the duration of the dependent variable and Poisson regression models are models based on a variable that is a counter, they were not used. The content analysis supported by the statistical analysis showed that the best model for the analysis of the empirical material is the logistic regression model. In the study, we used logit models for qualitative dependent variables in the single-equation econometric model (

where _{i} is probability of the event occurrence, and _{i} = _{i} ^{2}/3, the probability density function has the following form (

and the inverse function to the F function is (

This expression is called logit, and the model a logit model. Logit is a logarithm of the odds ratio of _{i} variable to have and not to have the value of 1 (log-odds ratio). For _{i} = 0.5, logit = 0, for _{i} <0.5 logit is negative and for _{i} > 0.5 logit is positive. The final form of the logit model used is the followig (

Estimation of the logit model parameters was done using maximum likelihood method (MLM) in the GRETL software package (

The Maximum Likelihood (MWW) method is a commonly used estimation method for models of various types in large samples. It is primarily convenient, available in all statistical packages that use some version of the numerical algorithm to estimate the parameters (the Newton-Raphson method or the scoring method, depending on the package), and satisfies a number of desired theoretical properties. The MNW estimator is a consistent, asymptotically effective, and unbiased parameter as it achieves the Rao-Cramer’s lower bound and is invariant.

The results of the estimation have been interpreted for marginal effects of _{i} value (sensitivity of _{i} probability to exogenous variables) according to the following formula (

Estimated marginal effects shown in the

The results of the estimation have also been interpreted for the odds ratio defined as follows (

The advantage of the logit model is the possibility of interpretation of convenient parameter ^{β} using the “odds ratio” term, which is a ratio of probability of the event to occur and probability that it does not occur. In the case of the intercept, ^{βo} value is usually interpreted as an odd of the phenomenon occurrence in the reference group (all variables and their products equal zero).

Probability of the fire occurrence was estimated using the following formula (

In general, there are two approaches to measure the choice of the right model. According to

Three popular large-sample significance tests based on MNW were used. These tests are: Likelihood ratio test (LR), Wald test, and Rao scoring test (also known as Lagrange multiplier test - LM). In all three tests, the null hypothesis assumes that the parameters of the model meet the overall condition g(

The goodness-of-fit of the model to empirical data was performed using the following measures: (i) the McFadden pseudo-R^{2 }coefficient based on the likelihood ratio (

where _{ww} is the likelihood function maximum, if the function is maximised with all parameters, while _{p} is the maximum under maximisation under the following condition _{i }= 0 for i = 1, 2, …, ^{2 }(

It shows a combination of both types of errors for different c-value thresholds,

After estimating the parameters of the logit model using the maximum likelihood method (MLM), which maximizes the logarithm of the likelihood with respect to the model parameters (

In all the models obtained, the coefficients marked “

For dummy variables, their slopes can be interpreted in terms of the intercept, assuming that all other predictors are constant. A positive slope value means that the log likelihood of fire in forest intercepts is increasing for a given dummy variable. A negative slope value means the opposite trend.

The summary of the logit model MLM estimation is shown in

The analysis results presented in

The studies conducted showed that among the factors studied, the ratio of the probability of fire occurrence was most strongly associated with the stands where oaks have a dominant share (share_2). Fire outbreak on montane broadleaved, mixed broadleaved and riparian sites, and mixed coniferous, broadleaved and mixed broadleaved sites (habitat_8 and habitat_9) is much less likely than on the reference sites (10% and 13%

Factors that may reduce the likelihood of fire occurrence (value of the logit function) and consequently the likelihood of fire in relation to the intercept (

Compared to coniferous forest sites, the probability of fire occurrence, is lower at sites such as: montane broadleaved, mixed broadleaved, riparian sites, upland mixed coniferous, broadleaved, mixed broadleaved sites and montane forest sites. For the remaining sites, the probability of fire occurrence is not significantly different from the probability for coniferous forest sites. Compared to the reference species group 1 (pines and larch), the significant decrease in the model value of fire probability was found for species group 2 (oaks). Species group 3 (European beech) have an even lower value of fire probability, and the lowest value is characteristic of black alder and grey alder stands, although the difference is very small compared to beech. For the other species groups, the logit value is not significantly different from the value observed in pine stands (group 1). In addition, the probability of fire is also influenced by the soil cover. A significantly lower modelled value of the fire probability was found for a heavily turfed cover (cover_5).

The fire probability is significantly positively affected by “population_density”, “population_grid” and “distance_buiding” (the more people and the greater the distance from buildings, the higher the probability -

There are two acceptable approaches to statistical modeling in the literature: traditional data modeling and algorithmic modeling (

Anthropogenic factors that cause increases in fire probability are inextricably linked to human activities in forested areas (

A significant effect of population density on fire probability was also demonstrated by ^{2} and 85% occur within 500 m of buildings. In Portugal, 70% of all fires occurred in municipalities with more than 100 persons km^{-2}, although they represent only 21% of the territory (

Our research has confirmed that as distance from roads increases, the probability of fire decreases. This is a consequence of the greater availability of forest land, which leads to more intensive use by society (

A more detailed analysis of the impact of human activities on fire occurrence would be possible thanks to the use of real data on tourist/car traffic in forest areas. Taking into account current technologies and data (

The probability of fire occurrence is also influenced by factors related to landform, such as elevation, slope, aspect, and topographic position. These factors are partly responsible for shaping soil and climatic conditions in a given area and consequently affect vegetation (fuel). In our studies, fire probability decreases with increasing mean elevation above sea level and with increasing slope. This may be related to both forest availability and population distribution (

According to the previous study of

Our analysis provided new information on fire occurrence in Polish forests. The presented method made it possible to reveal the factors influencing forest fire occurrence in Poland. Results of this study can be used by the bodies responsible for forest management in Poland, especially the State Forests, to take measures aimed at changing the approach to the classification of forest areas according to the category of fire risk, which is established every 10 years. Until now, the classification took into account an only one anthropogenic factor, namely the number of people per 0.01 km^{2} of forest area. The presented analysis has shown that there are many other factors that increase the probability of fire and that can be easily included in the classification. The results of this work can help to better understand and adapt future investments in forest fires prevention systems. More accurate models will allow to better place fire prevention monitoring, properly design tourism infrastructure, and last but not least, properly adjust species composition. Moreover, the results of this work can initiate further scientific research on the probability patterns of forest ignition in Poland (predictive model).

Polish forests (A), location of forest fires in Polish State Forests (1.01.2007 - 31.05.2016) (B) and regional variability of forest area per inhabitant (C).

Number of forest fires in each year of the studied period. Data for 2016 come only from January-May.

Spatial information assigned to fires and random points. (SRTM): Shuttle Radar Topography Mission; (OSM/LMN): OpenStreetMap/Forest Digital Map; (CLC): CORINE LandCover.

Estimated marginal effects of the logit model. Prediction accuracy: y = 1 - n11/n1; y = 0 - n00/n0; Odds ratio(OR) = (n11·n00)/(n01·n10), OR>1 denotes better classification. ROC (receiver operating characteristic) curve - minimisation of the

Observed | Predicted | Total | |
---|---|---|---|

Ŷ = 1 | Ŷ = 0 | ||

Y=1 | n11 | n10 | n1. |

Y=0 | n01 | n00 | n0. |

Total | n.1 | n.0 | N |

Summary of the logit model MLM estimation

Statistics | Value |
---|---|

Average dependent variable | 0.4869 |

Standard deviation | 0.4998 |

McFadden R^{2} |
0.9891 |

Sensitivity | 0.9968 |

Specificity | 0.9984 |

AUC | 0.9998 |

Number of correctly predicted cases | 37.214 (99.8%) |

Parameters of the logit model. For the explanation of variables, see text.

Variable | Std. Error | Z | P>z | Odds ratio | Prob. | Marginaleffect | |
---|---|---|---|---|---|---|---|

intercept | 0.2754 | 0.3253 | 0.8464 | 0.3973 | 1.3170 | 0.5684 | - |

habitat_8 | -2.1868 | 0.6847 | -3.1940 | 0.0014 | 0.1123 | 0.1009 | -0.0077 |

habitat_9 | -1.9395 | 0.4628 | -4.1900 | <0.0001 | 0.1438 | 0.1257 | -0.0047 |

species_2 | -0.9857 | 0.4154 | -2.3730 | 0.0177 | 0.3732 | 0.2718 | -0.0016 |

species_3 | -2.0216 | 0.7371 | -2.7420 | 0.0061 | 0.1324 | 0.1170 | -0.0064 |

species_6 | -3.8933 | 1.9607 | -1.9860 | 0.0471 | 0.0204 | 0.0200 | -0.0416 |

population_density | 0.0012 | 0.0003 | 3.6990 | 0.0002 | 1.0012 | 0.5003 | 0.0000 |

distance_road | -0.0116 | 0.0025 | -4.5600 | <0.0001 | 0.9885 | 0.4971 | 0.0000 |

distance_railway | -0.0002 | 0.0000 | -6.7280 | <0.0001 | 0.9998 | 0.4999 | 0.0000 |

distance_buiding | 0.0192 | 0.0019 | 9.9740 | <0.0001 | 1.0194 | 0.5048 | 0.0000 |

asl | -0.0034 | 0.0011 | -3.0540 | 0.0023 | 0.9966 | 0.4992 | 0.0000 |

slope | -0.3183 | 0.0317 | -10.0500 | <0.0001 | 0.7274 | 0.4211 | -0.0003 |

aspect | 0.3410 | 0.0296 | 11.5100 | <0.0001 | 1.4064 | 0.5844 | 0.0004 |

distance_water | -0.0008 | 0.0002 | -5.3200 | <0.0001 | 0.9992 | 0.4998 | 0.0000 |

distance_edge | -0.0156 | 0.0019 | -8.1910 | <0.0001 | 0.9845 | 0.4961 | 0.0000 |

population_grid | 0.0018 | 0.0003 | 6.3600 | <0.0001 | 1.0018 | 0.5004 | 0.0000 |

cover_5 | -1.2335 | 0.4127 | -2.9890 | 0.0028 | 0.2913 | 0.2256 | -0.0022 |

share_2 | 1.9055 | 0.9144 | 2.0840 | 0.0372 | 6.7230 | 0.8705 | 0.0009 |

Tab. S1 - Description of the independent variables: socio-economic (quantitative), geographical (interval and ratio scales), qualitative and quantitative forest variables.

Tab. S2 - Basic characteristics of the distribution (mean, standard deviation, coefficient of variation, asymmetry, kurtosis).