Predicting height growth in bean plants using non-linear and polynomial models

Brazil has stood out worldwide as one of the main producers and consumers of beans, which makes their cultivation important for the economic and social development of the country. As the bean plant has a short growth cycle, its modeling is essential for optimizing management plans for this crop. This modeling can be performed by linear and non-linear models, but the latter have stood out for providing more information to the researcher, mainly due to the practical interpretation of their parameters. In this sense, in the R statistical software, the third-degree linear polynomial model and the Logistic and Gompertz non-linear models were adjusted to height data, in centimeters, in relation to time, in days after emergence, totaling 11 observations. As criteria to assess the quality of the fit, the adjusted coefficient of determination, the corrected Akaike information criterion and the residual standard deviation were used. The logistic model best fitted the data.


Introduction
Common bean (Phaseolus vulgaris L.) has stood out as one of the main crops of Brazilian agribusiness, ranking the country as the third largest producer of beans in the world, only behind India and Myanmar (FAOSTAT, 2019). Among the countries that make up Mercosur, Brazil stands out as the main producer and consumer of this legume, producing around 3.1 million tons per year (CONAB, 2019) and, thus, stimulating family farming and the local economy, as it is a crop explored from the small to the large producer (MALTA et al., 2017).
The importance of beans goes beyond the economic aspect, as it is a highly nutritious food and one of the basic components of the Brazilian diet, consumed daily by different social classes across the country. According to the food guide for the Brazilian population (BRASIL, 2014), common bean grains bring numerous health benefits, as they are an excellent source of protein, carbohydrates, fiber, B vitamins, iron, calcium and other minerals.
According to Lima et al. (2019), as the common bean plant has a short growth cycle, its modeling is essential, as this way it is possible to optimize the management techniques of this crop. For the study of growth curves, linear and non-linear models can be used, however, nonlinear models stand out for their parsimony and practical interpretations of the parameters, which helps the researcher to find practical applications of their characteristics in addition to summarizing various information in a few parameters (OLIVEIRA et al., 2013;ARCHONTOULIS, MIGUEZ, 2015;FERNANDES et al., 2017;LIMA et al., 2017;. Several researchers have used non-linear models with satisfactory results for the adjustment of plant species growth curves. Frühauf et al. (2020) fitted the Logistic, Gompertz, von Bertalanffy and Brody models for the diameter growth of cedar (Cedrela fissilis); these same models were used by Jane et al. (2020) to describe the height and diameter of sugarcane variety RB92579; Prado et al. (2020) modeled Predicting height growth in bean plants using non-linear and polynomial models Ariana Campos Frühauf 1, Edilson Marcelino Silva 2, Tales Jesus Fernandes 3, Joel Augusto Muniz 4 the growth of green dwarf coconut fruit with the Logistic and Gompertz models; and Silva et al. (2020a) described the growth of blackberry fruit by the double Logistic and double Gompertz models . Martins Filho et al. (2008) evaluated the growth of common bean cultivars using nonlinear models and Bayesian inference.
The adjustment with non-linear models allows the researcher a broader perspective on the growth of that plant, but its difficulty in adjustment and convergence makes some researchers choose to use linear models. This can be seen in studies such as those developed by Batista et al. (2019), who adjusted the first-and second-degree polynomial models to describe the initial growth of the melon plant, Pineda-Herrera et al. (2019), who used the third-degree polynomial model to adjust the diameter growth of three tree species and Saldaña et al. (2017), who used the same model to describe the growth of tomato leaf area. Therefore, this study aimed at comparing the adjustments of the third-degree linear polynomial model and the non-linear Logistic and Gompertz models for describing the height growth in BRS MG Talismã bean plants.

Material and methods
Data were obtained in an experiment conducted by Vieira et al. (2008) with the BRS MG Talismã bean cultivar, in conventional planting. For the study, ten plants' heights were measured 7 days after emergence (DAE) and, thereafter, every 7 days measurements were taken until 77 DAE.
Third-degree linear polynomial model (Eq. 1) and non-linear Logistic (Eq. 2) and Gompertz (Eq. 3) models with parameterization by Fernandes et al. (2015), to adjust plant height as a function of days after emergence. x i is the time in the i-th measurement, given in days after emergence, with i = 1, 2, …, 11; β i are parameters of the linear model, with i = 0, 1, 2, 3; α is the expected value for maximum height of the plant; γ is the abscissa of the inflection point; κ is the maturity index, that is, the larger the value the less time the plant takes to reach its maximum size; ε i is the random error, which is assumed to have a normal distribution, constant variance and to be independent, that is, Parameters were estimated by the least squares method, which consists of minimizing the sum of the square of the residuals and giving rise to a system of normal equations. For the linear model, this system has an explicit solution, thus facilitating parameter estimation. For the non-linear model, this system does not have a direct solution, so it is necessary to use iterative methods to obtain these estimates. Among the various iterative methods described in the literature, the Gauss-Newton algorithm was used and the choice of initial values for the iterative process was performed based on an initial exploratory data analysis (SILVEIRA et al., 2018;PAULA et al., 2019;SILVA et al., 2019a;SILVA et al., 2019b;PAULA et al., 2020;SILVA et al., 2020b).
According to Silva et al. (2021), after adjusting the models, it is necessary to check the assumptions of normality, independence and homoscedasticity of residuals, which ensures the correct inference about the parameters. Among the various tests described in the literature, the Shapiro-Wilk test for normality, Durbin-Watson for independence and Breusch-Pagan for homoscedasticity were used.
The comparison of the models regarding the goodness of fit was based on the results found for the adjusted coefficient of determination (Eq. 3), the corrected Akaike information criterion (Eq. 4) and the residual standard deviation (Eq. 5).
At which: R 2 is the coefficient of determination; n is the number of observations; p is the number of parameters of the fitted model; i is linked to the fit of the intercept on the curve, equal to 1 if there is an intercept and 0 otherwise; SSR represents the sum of squares of the residuals; MSE represents the mean square of the error.
The model with the best adherence to the data is the one with the highest value for 2 aj R and lowest values for AIC C and RSD. All analyses were carried out in R statistical software R (R CORE TEAM, 2021). Figure 1 shows the sigmoidal pattern for data relating the bean plant height growth and its age, in days after emergence. The thirddegree polynomial model and the Logistic and Gompertz models were adjusted. These models have represented well the data with this type of dispersion, as can be seen in the fit made by Jane et al. (2019), who used these models to adjust pepper plants growth with satisfactory results.

Results and discussion
After adjusting the models, an important step in modeling was carried out, which is the residual analysis, because if any of the assumptions is not met by the residuals, the model can generate imprecise estimates, which makes it inadequate for representing this dataset (ARCHONTOULIS; MIGUEZ, 2015;FERNANDES et al., 2014). Thus, Table 1 lists the results obtained for the Shapiro-Wilk, Durbin-Watson and Breusch-Pagan tests applied to check the assumptions of residual normality, independence and homoscedasticity, respectively. All tests were non-significant (p-value > 0.01) for the polynomial and logistic models, which indicates that the residuals are independent and identically distributed following a normal distribution with zero mean and constant variance,  For the Gompertz model, the tests to check constant normality and variance were nonsignificant (p-value > 0.01), indicating that the residuals are normally distributed with constant variance, but the Durbin-Watson test presented values significant (p-value < 0.01), indicating residual autocorrelation.
Despite the result of the Durbin-Watson test, the autocorrelation parameter was not incorporated into the Gompertz model, as it proved to be non-significant including zero in the confidence interval. From Figure 2, it is also possible to visually observe that the Gompertz model, as well as the other models, polynomial and logistic, met the assumption of independence of the residual, indicating that it is not necessary to incorporate the autoregressive parameter into the model. Figure 3 shows the graphical analysis of the residuals, which corroborated the results obtained by the Shapiro-Wilk and Breusch-Pagan tests present in Table 1, indicating homoscedasticity, as well as normality of the residuals of the adjusted Polynomial, Logistic and Gompertz models to the height of BRS MG Talismã bean plants. Table 2 lists the parameter estimates and their respective 95% confidence intervals, based on the adjustment of the polynomial, logistic and Gompertz models for BRS MG Talismã bean plant height (cm), taken in days after the emergency. All estimated parameters were significant, by t-test, at 5% significance.    Table 2 indicated the confidence intervals did not pass through zero, showing quality in obtaining estimates and indicating that the parameters are not null. The Logistic model had intervals with smaller amplitude, which, according to Muianga et al. (2016), indicates greater precision in parameter estimates. Based on the estimates of the α parameter, it is observed that the maximum height of bean plants was 96.93 cm for the Logistic model and 107.64 cm for the Gompertz model, which according to Vieira et al. (2008) is consistent with the growth of common bean plants, whose maximum height range from 55 cm to 140 cm.

Results in
According to Mischan and Pinho (2014), the inflection point is a transition point of growth, which is very important for analysis of development of the object under study, because from it, the growth is slowed down, decreasing its speed and tending to stability. Based on the estimation of parameters of the non-linear model, it is possible to find the point of the adjusted model, which, according to Jane et al. (2019), for the Logistic model, this point occurs at 50% horizontal asymptote α, that is, exactly in the middle of the curve and for the Gompertz model, at 37% of the same asymptote α. Therefore, coordinates of the inflection point of these models are given by and From Table 2, the inflection point for the logistic model has as IP coordinates (37.19; 48.46), indicating that the bean plant growth was decelerated at approximately 37 days after emergence, when it reaches around 48.46 cm in height, and for the Gompertz model, coordinates were IP (31.65; 39.60), that is, growth deceleration was approximately 32 days after emergence when the bean plant reaches around 39.60 cm. Table 3 lists the results of the criteria used to assess the goodness of fit. It is possible to see that the models fit the data well, however the Logistic model had lower values for DPR and AIC C and higher values for 2 aj R , indicating the superiority of this model to describe the BRS MG Talismã bean plant growth. Some authors such as Mangueira et al. (2016) and Prado et al. (2013) also obtained better results with the fit of the logistic model to describe the height growth of corn plant and dwarf coconut fruits, respectively.
In Figure 4, graphs of polynomial, Logistic and Gompertz models for fitting height (cm) data of BRS MG Talismã bean plants, over time (DAE).
Both the visual analysis and the criteria used to check the quality of the fit indicated that the polynomial and logistic models had similar adherence to data and were superior to the Gompertz model in describing the bean plant height growth over time. Therefore, the choice of the appropriate model to describe the data is up to the researcher, who should take into account that non-linear models are more parsimonious and allow a broader inference about the object under study. According to Archontoulis and Miguez (2015), one of the main advantages of non-linear models over linear models is the possibility of practical interpretation of their parameters, which according to Tholon et al. (2012) should be taken into account during the process of choosing the model used, as a lot of important information for the researcher may be lost by this choice.
Information like the maximum growth of the object under study, the growth slowdown point, among other information that the non-linear models can provide, could have been aggregated to studies such as Batista et al. (2019), Pineda-Herrera et al. (2019) and Saldaña et al. (2017), who fitted linear models to describe the growth of melon plants, the diameter of tree species and the leaf area of tomato plants, respectively, bringing contributions to their research.

Conclusions
The tested models were adequate to describe the height in growth of the common bean BRS MG Talismã plants over days after emergence, however, the third-degree polynomial model and the Logistic model achieved similar and superior fit to the Gompertz model. The Logistic model best fitted the data. Comparing the models, non-linear models are generally more parsimonious and provide more information than linear models, mainly due to the practical interpretation of their parameters.