Comparison of supervised classifiers in the discrimin ation of preservation areas in a hydroelectric reservoir

The maintenance of riparian forests is considered one of the main vegetative practices for mitigating the degradation of water resources and is mandatory by law. However, in Brazil there is still a progressive and constant decharacterization of these areas. Facing this reality, it is necessary to broaden researches that identify the occurring changes and provide efficient solutions at a fast pace and low cost. Remote sensing techniques show great application potential in characterizing natural resources. The objective of this work was to map, to characterize the land use and occupation and to verify the best method of high spatial resolution image classification of the Permanent Preservation Areas of the Funil Hydroelectric Power Plant reservoir, located between the municipalities of Lavras, Perdões, Bom Sucesso, Ibituruna, Ijací and Itumirim, in the state of Minas Gerais. The methods used to classify the high spatial resolution image from the Quickbird satellite were visual, object-oriented and pixel-by-pixel. Results showed the best method for mapping land use and occupation of the study area was object-oriented classification using the K-nearest neighbor algorithm, with kappa coefficient of 0.88 and global accuracy of 91.40%.


Introduction
Permanent Preservation Areas (PPA) are vitally important for the environmental and ecological balance of the planet, since they must be covered with their original vegetation and are not suitable for land-use alterations (DIETZOLD; WENDEL, 2004). The concept of PPA comes from the society acknowledgement about the importance of conserving vegetation in localities near watersheds and their components (BRASIL, CONAMA no. 303, 2002). Due to its importance, it is currently necessary to monitor these areas efficiently (MASCARENHAS et al., 2009), since even though suppression is prohibited, riparian forests are directly affected by the construction of hydroelectric dams, road ope-nings in places with a rugged relief, implementations of agricultural crops and pasture DIAS, 2004). Riparian forests are transition areas between the terrestrial and aquatic ecosystems and have the function of regulating energy and nutrient transfers from one ecosystem to the other (CAVALCANTI; LOCKABY, 2006;COLLINS et al., 2010;CORRELL, 2001;KAGEYAMA et al., 2002;LIMA, 1998). When located in agricultural areas, they prevent or minimize the sediment movement generated by erosive processes (ADDISCOTT, 1997).
Amongst many initiatives from social and governmental organizations, certain actions are considered promising, such as the laws governing the use of natural resources. In Brazil, the Brazilian Forest Code, regulated by the Law No. 12,651, of May 25, 2012, establishes the fixed vegetation areas to be protected around artificial lakes, with the role of permanent preservation (BRASIL, 2012).
The definition of the area around the reservoir of the Funil Hydroelectric Power Plant (Funil HPP) is stipulated in the Environmental Plan for Conservation and Use of the Artificial Reservoir Environment (PACUERA, 2011), which defines it as a direct contribution area, i.e. reservoir slopes, which physically interfere directly in the conservation of water quality. This area, although not flooded, has a close relationship with the reservoir and may be affected by it, it is delimited by an imaginary line that connects the marginal ridges, parallel to the flood quota (PACUERA, 2011).
Despite the laws that regulate the maintenance and restoration of reservoir areas, there is still a progressive and constant uncharacterization of these areas. Among the main causes, it can be cited the absence of an official demarcation, the state's failure to promote efficient environmental monitoring and the unavailability of knowledge and operational methods that enable these inspections.
Geographic Information Systems (GIS) can serve as an auxiliary tool in the control and enforcement of PPAs by joining remote sensor data with digital image processing techniques (VALLE JUNIOR et al., 2010;SOARES et al., 2011;ROVANI;CASSOL, 2012;COSTA et al. 2013).
GIS allows spatial analysis through the use of various environmental elements in an integrated manner, generating efficient results quickly and at low cost (PORTES et al., 2009).
Studies made by Santos and collaborators (2014) have shown the importance of remote sensing on biomass estimation and structural analyses of tropical forests, where techniques were used to analyze regional phenological patterns and to quantify the impacts of natural and man-made environmental changes on these ecosystems and the importance of the combined use of these techniques in an attempt to understand these changes quickly and efficiently.
A difficulty commonly encountered in remote sensing studies is the availability of fast and precise methods for obtaining land use and land cover maps. The visual classification of high-resolution satellite images is a very precise technique. However, it becomes unfeasible when applied in large areas due to the great time demand to perform it. A recently used alternative was digital classification, at which algorithms allowed an automatic pixel-by-pixel classification. However, maps resulting from this classification did not present good quality and required constant post-classification edits.
For overcoming limitations, such as the great demand for time in classifications and large study areas, the techniques of segmented and supervised image classification are emerging. These techniques consider spatial and texture information in high spatial resolution, in addition to spectral information in its evaluation. Therefore, this study aimed to compare different high spatial resolution image classifiers in order to evaluate and quantify the differences as to the accuracy of each classifier in the Funil Hydroelectric Power Plant (Funil HPP) artificial reservoir's PPA.

Study area
The study area is located in the area surrounding the Funil HPP as shown in Figure 1, it belongs to the upper Grande river basin and it is located in the south of Minas Gerais state. UHE Funil encompasses 33 municipalities, it covers an area of approximately 9,000 km², it has a drainage area of 240 km and it reaches a population of about 365,000 inhabitants (UPGRH-GD1, 2015). Located in an area with undulating relief, its dominant vegetation is the transition between Atlantic Forest and Cerrado (FERREIRA, 2005). According to the Köppen classification, the climate of the region is Cwb type (subtropical highland). UHE Funil is inserted in the upper Grande river basin, upstream of the Furnas reservoir, its most important tributaries are the Mortes river and the Capivari river, which, when meeting Grande river, form the Funil HPP reservoir.

Quickbird imaging
For this study, two images provided by the Funil HPP company were used, shown in Figure 2. The product supplied is a mosaic of Quickbird satellite images, orthorectified, dated of July 30, 2009 and August 7, 2009, containing three multispectral bands (RGB-321) in true color rendering. The spectral bands in which the characteristics are shown in Table 1 were fused using the Gram-Schmidt method, simulating a panchromatic band from the low spatial resolution multispectral bands (RSI, 2009), generating an image with spatial resolution of 61 centimeters and radiometric resolution of 11bits. The mosaic of images totals an area of 470 km². The project execution steps are: data collection; information gathering; image segmentation; image classifications and statistical analysis of the results according to the flow chart presented in Figure 3.

Planialtimetric survey
For the definition of PPA, a planialtimetric survey from the reservoir was converted into the shapefile format. This survey measured the difference between the geoid and the ellipsoid to define the normal operating quota (808.00 m quota) and the maximum maximorum quota (810.61 m) already established by the venture project. This survey was processed using the AutoCAD 2010® program. The planialtimetric survey was performed using GPS/GNSS Trimble R6 (L1/L2) receivers. The survey was based on the specific dimension reference ellipsoid. The ripple model used in the survey was the IBGE's MAPGEO2010.
The equipment features a Real Time Kinematic (RTK) radio correction system. The RTK positioning concept is based on instantaneous transmission of satellite signal correction data from receivers installed at the reference vertex to the receptor that runs through the vertices of interest. This provides instant and real-time knowledge of precise coordinates for the survey vertices.

Field reference
To create the field reference map, we used the visual classification performed in the ENVI EX 4.8® software, with field corrections in areas of the survey considered doubtful during the classification. For visual classification, the quality requirements imposed by Brazilian legislation in the Decree No. 89,817 of June 20, 1984 were followed, mapping on a fixed classification scale of 1:1,000.
Visual classification was performed throughout the surrounding area based on the images acquired from the reservoir. For the surrounding area, eight types of land use were discriminated, classified in: "water bodies"; "anthropic use", "pasture"; "natural vegetation"; "exposed soil", "mining", "rocky outcrop" and "crops".

Pixel-by-pixel Maximum Likelihood image classification (MAXVER)
For pixel-by-pixel classification, the MAXVER algorithm from ENVI EX 4.8® software was used, samples of 200 pixels were collected for each land use class, representing 7.6x10-8 % of the total image area. After the classification, the raster was cut based on the PPA delimitation shapefile, generated by the planialtimetric survey, and as a result only the classes: "water bodies"; "anthropic use", "pasture"; "natural vegetation"; "exposed soil" and "crops" were identified in the intended area. In this classification, only the spectral information of each pixel is used to classify the images.

Object oriented image classification
In this stage, segmentation and object-oriented classification were performed using the three visible bands (RGB-321) fused with the panchromatic band (PAN) of the Quickbird satellite.

Segmentation
Image segmentation algorithms are used to extract image information, especially in high resolution images. These algorithms aim to divide an image into spatially continuous, separate and homogeneous regions.
The segmentation was performed in the ENVI EX 4.8® software, this process partitions the image, subdividing it. The level of detail is associated with the detection of objects and regions of interest, which in this study were the identified classes. The process of segmentation by feature extraction is based on the algorithm that simulates a watershed by immersion, the so-called "watersheds by immersion" (SARMIENTO et al., 2014;CAMPOS et al., 2013;VINCENT;SOILLE, 1991). The initial step is the extraction of image features, using an object-based approach to segment images, defined from a region of interest with spatial, spectral (brightness and color) and textural characteristics that define the region (AGUIRRE-GUTIÉRREZ; SEIJMONSBERGEN; DUIVENVOORDEN, 2012;BLASCHKE, 2010;YAN et al., 2006).
The algorithm requires only one input parameter, scale level, at which the pixel similarity is defined. Choosing a higher level of scale, fewer segments were defined, while on a low level scale, more segments emerged to be defined, that is, on a scale from 0 to 100, 0 refers to supersegmentation and 100 to no segmentation. Then segment fusion was performed, small segments were aggregated within large areas, this parameter defines the fusion level ranging from 0 to 100, at which there is the merging of adjunct segments based on a combination of spatial information and spectral. These values, assigned to the scale and fusion level, were set based on the preview provided by the software.
The values used were reached through attempts and repetitions in the search to find the best results. For this study, the merge value used was 90%, the scale level in the segmentation was 40% and the refinement of the segmentation was made using contrast to compare other areas of the image.

Classification using K-nearest neighbor (KNN) and Support vector machine (SVM)
For the classification, samples of "water bodies"; "anthropic use"; "pasture"; "natural vegetation"; "exposed soil"; "mining"; "rocky outcrop" and "crops" were collected. Afterwards, all parameters provided by the ENVI software for spatial, texture, spectral and custom attributes, described below, were considered: • Spatial: area, length, compactness, convexity, solidity, roundness, form factor, elongation, rectangle measurement, major direction, major axis length, minor axis length, number of polygon holes, and total polygon area of the external contour; • Texture: texture distance, texture average, texture variance, texture entropy; • Spectral: minimum, maximum, average values and standard deviation of the pixels that make up bands 1, 2 and 3 (RGB).
In order to extract information and recognize homogeneous patterns and objects, the KNN and SVM methods available by the software were adopted.
The KNN classification algorithm is a pattern recognition technique in which the software requires the selection of parameter k values, which is the number (k) of neighbors considered during the classification (XU et al., 2013). The values 1, 3, 5 and 7 (only odd values are used) were tested for the parameter K, which represents the number of neighbors tested in the classification, and the value 1 was selected for presenting better results.
The SVM algorithm determines decision boundaries for class separation as well as error minimization (MOUNTRAKIS; JUNGHOIM, 2010). The kernel radial basis function was selected, recommended by Pereira et al. (2011) and Roza and Ribeiro (2013). The gamma and parameter values used were 0.03 and 100, which are defaults suggested by the program.
After the classification, a raster clipping was performed based on the APP delimitation shapefile, generated by the planialtimetric survey, and as a result only the classes: "water bodies"; "anthropic use"; "pasture"; "natural vegetation"; "exposed soil" and "crops" were identified in the intended area.

Post classification
Subsequently to the classifications using the KNN, the SVM and the MAXVER algorithms, a post-classification was performed. In other words, a comparison of the thematic maps resulting from the classifications with the field reference map to analyze the accuracy of the classifications was made.
In the post-classification process, the accuracy of thematic maps was accessed based on the confusion matrices, resulting in the following analyzes: kappa coefficient, overall accuracy, user accuracy (from the user's point of view) and producer accuracy (from the producer point of view).
Global Accuracy is obtained by dividing the sum of correctly classified pixels contained in the main diagonal by the total number of pixels with the result as a percentage of hits, and the minimum accepted for using in maps is 85% (JENSEN, 1996). The overall accuracy and ratio between the sum of all elements correctly classified by the total number of elements were calculated in Equation 1: (1) User accuracy is calculated by dividing the total number of correctly classified pixels in a class by the total number of pixels classified in that class and it refers to commission errors from the probability that a classified pixel in the image will represent the class in the field. User accuracy is expressed by dividing the number of correctly classified elements in a class by the total number of elements classified in it, and producer accuracy is calculated by dividing the total number of correctly classified pixels in a class by the total number of pixels of this class, indicated in the field reference.
The kappa coefficient proposed by Landis and Koch (1977) accounts for the entire confusion matrix in its calculation, including the elements outside the main diagonal, which represent the disagreements in the classification and was calculated using Equation 2.
(2) At which: k = Concordance Kappa coefficient; N = number of observations (field truths); xii = observations on row i and column i (main diagonal); xi + = marginal total of line i; x + i = marginal total of column i.

Reference map
The field reference map was classified encompassing the entire reservoir surrounding area, totaling 21,195.91 hectares. In this map, eight land use classes were identified, of which 13,107.67 ha belonged to "pasture" class; 3,792.98 ha "natural vegetation" class; 2,817.97 ha "crops" class; 573.53 ha "water bodies" class; 349.85 ha "anthropic use" class; 345.01 ha "exposed soil" class; 170.67 ha "mining" class and 38.23 ha "rocky outcrop" class, according to Table 2.

Funil HPP's PPA visual classification
The area of interest considered in the study (PPA) are the areas comprised between the operation quotas of the Funil HPP (areas between the 808 and 810.61 quotas), at which in an area of 1043 ha, five land use classes were found, being 631.930 ha from "pasture" class; 294,763 ha "natural vegetation" class; 47,786 ha "crops" class; 45,408 ha "exposed soil" class and 23,597 ha "anthropic use" class.
The identification of the classes was made through contiguous visual classification and the field checks served as a guide for the creation of the field reference map that was used for the analysis of the classifier's veracity. In many studies, the use of visual classification is taken as a field reference and also as the most reliable means of identifying classes in an area. All values and classes obtained by the different classification methods were described in Table 3.

Classification using MAXVER algorithm
The results obtained from the MAXVER classifier algorithm found five classes that discriminate the land use. These classes quantified area values of 501.633 ha of "pasture" class; 376.460 ha "natural vegetation" class; 35.018 ha "crops" class; 82.102 ha "exposed soil" class and 48.271 ha "anthropic use" class.
Analyzing the data from the MAXVER algorithm in the land use and soil occupation classification in the Funil HPP reservoir's APPs, the obtained kappa coefficient showed the classification efficiency, reaching a "Very Good" kappa coefficient (0.68) according to Landis and Kock (1977) and an overall accuracy coefficient of 79.89%. Bolfe et al. (2004), when performing supervised classifications using the MAXVER method in an attempt to quantify different populations of three different vegetation classes, reached a kappa coefficient of 0.84 and an overall accuracy of 85.23%. The authors concluded that the kappa coefficient demonstrated consistency in assessing the accuracy of the mappings produced by the Maximum Likelihood method.

Object oriented classification using the Support Vector Machine (SVM) algorithm
The results obtained from the SVM classifier algorithm found five classes that discriminate land use. The area values for these classes were: 600.2254 ha class "pasture", 256.2016 ha class "natural vegetation", 116.2529 ha class "crops", 57.2037 ha class "exposed soil" and 13.5994 ha class "anthropic use".
Analyzing the data from the SVM algorithm on land use classification in the Funil HPP reservoir APPs, the obtained kappa coefficient showed the classification efficiency, reaching a "Very Good" kappa coefficient (0.80) according to Landis and Kock (1977) and an overall coefficient of accuracy of 86.29%.

Image classification using the K-nearest neighbor (KNN) algorithm
The classifier algorithm KNN obtained five classes that discriminate the land use. These classes quantified area values of 501.6337 ha for the "pasture" class, 304.1596 ha for the "natural vegetation" class, 107.7757 ha for the "crops" class, 44.0180 for the "exposed soil" class, and 24.5970 for the "anthropic use" class.
In the present work, the kappa coefficient aiming to analyze the discrimination of land use and occupation classes using the classified KNN algorithm showed the classification efficiency, reaching an "Excellent" kappa coefficient (0.88) according to Landis and Kock (1977), and an overall coefficient of accuracy of 91.40%. Figure 4 shows the results obtained by both classifications in a given area. It is possible to compare the difference of the classes found in each of the classification algorithms used with the visual classification.

Coeficiente Kappa e Coeficiente de Exatidão Global
Statistical values obtained from the classifiers are described in Table 2. It is possible to observe in Table 2 that the kappa coefficient and overall accuracy resulting from the K-nearest neighbor classification were higher than the others. The object-oriented classification at which the spectral, spatial and texture attributes are considered obtained a better result when compared to the pixel-by-pixel classification, at which the classifier algorithm considers only the spectral information attribute. Through this result, it was found that the classifier based on objectoriented image analysis satisfactorily perform their functions in the PPA classification.
By analyzing Figure 5, which shows the results obtained through the three classification algorithms that contemplated this study, MAXVER, KNN and SVM, it was observed the classifier that resulted in values closer to the reference map was the KNN algorithm. This also achieved the best statistical results for kappa coefficient and overall accuracy. However, the other classifiers performed their functions satisfactorily, with their particularities, considering that for discrimination of certain classes, the other algorithms reached values higher than the KNN as is the case of the "Pasture" and "Exposed Soil" class.

User and Producer Accuracy
For the KNN classification, the best values for user accuracy were observed for the "natural vegetation", pasture", "anthropic use" and "exposed soil" classes, obtaining lower values for the "crops" class. These values refer to commission errors from the probability that a classified pixel in the image will represent the class in the field, as shown in Table 3. For producer accuracy, the KNN classifier obtained better results for the "pasture", "exposed soil" and "anthropic use" classes, obtaining lower producer accuracy values for the "natural vegetation" and "crops" classes. These values refer to omission errors, which is the probability that a reference pixel will be correctly classified.

Conclusion
The K-nearest neighbor classifier was the algorithm that provided the best potential for discriminating Permanent Preservation Areas in high spatial resolution images. Thus, the use of the segmentation method combined with object-oriented classification is recommended in order to evaluate different classes of land use and in the inspection and monitoring of protected areas.