Prediction of Nickel Concentrations in Suburban and Urban Soils Using Mixed Empirical Bayesian Kriging and Support Vector Machine Regression

Thank you for visiting Nature.com.The browser version you are using has limited support for CSS.For the best experience, we recommend that you use an updated browser (or turn off compatibility mode in Internet Explorer).In the meantime, to ensure continued support, we will display the site without styles and JavaScript.
Soil pollution is a big problem caused by human activities.The spatial distribution of potentially toxic elements (PTEs) varies in most urban and peri-urban areas.Therefore, it is difficult to spatially predict the content of PTEs in such soils.A total of 115 samples were obtained from Frydek Mistek in the Czech Republic.Calcium (Ca), magnesium (Mg), potassium (K) and nickel (Ni) concentrations were determined using inductively coupled plasma emission spectrometry.The response variable is Ni and the predictors are Ca, Mg, and K.The correlation matrix between the response variable and the predictor variable shows a satisfactory correlation between the elements.The prediction results showed that Support Vector Machine Regression (SVMR) performed well, although its estimated root mean square error (RMSE) (235.974 mg/kg) and mean absolute error (MAE) (166.946 mg/kg) were higher than the other methods applied.Mixed models for Empirical Bayesian Kriging-Multiple Linear Regression (EBK-MLR) perform poorly, as evidenced by coefficients of determination less than 0.1.The Empirical Bayesian Kriging-Support Vector Machine Regression (EBK-SVMR) model was the best model, with low RMSE (95.479 mg/kg) and MAE (77.368 mg/kg) values and high coefficient of determination (R2 = 0.637).The EBK-SVMR modeling technique output is visualized using a self-organizing map.Clustered neurons in the plane of the hybrid model CakMg-EBK-SVMR component show multiple color patterns that predict Ni concentrations in urban and peri-urban soils.The results demonstrate that combining EBK and SVMR is an effective technique for predicting Ni concentrations in urban and peri-urban soils.
Nickel (Ni) is considered a micronutrient for plants because it contributes to atmospheric nitrogen fixation (N) and urea metabolism, both of which are required for seed germination.In addition to its contribution to seed germination, Ni can act as a fungal and bacterial inhibitor and promote plant development.The lack of nickel in the soil allows the plant to absorb it, resulting in chlorosis of leaves.For example, cowpeas and green beans require the application of nickel-based fertilizers to optimize nitrogen fixation2.Continued application of nickel-based fertilizers to enrich the soil and increase the ability of legumes to fix nitrogen in the soil continuously increases the nickel concentration in the soil.Although nickel is a micronutrient for plants, its excessive intake in the soil can do more harm than good.The toxicity of nickel in soil minimizes soil pH and hinders the uptake of iron as an essential nutrient for plant growth1.According to Liu3, Ni has been found to be the 17th important element required for plant development and growth.In addition to nickel’s role in plant development and growth, humans need it for a variety of applications.Electroplating, the production of nickel-based alloys, and the manufacture of ignition devices and spark plugs in the automotive industry all require the use of nickel in various industrial sectors.In addition, nickel-based alloys and electroplated articles have been widely used in kitchenware, ballroom accessories, food industry supplies, electrical, wire and cable, jet turbines, surgical implants, textiles, and shipbuilding5.Ni-rich levels in soils (ie, surface soils) have been attributed to both anthropogenic and natural sources, but primarily, Ni is a natural source rather than anthropogenic4,6.Natural sources of nickel include volcanic eruptions, vegetation, forest fires, and geological processes; however, anthropogenic sources include nickel/cadmium batteries in the steel industry, electroplating, arc welding, diesel and fuel oils, and atmospheric emissions from coal combustion and waste and sludge incineration Nickel accumulation7,8.According to Freedman and Hutchinson9 and Manyiwa et al. 10, the main sources of topsoil pollution in the immediate and adjacent environment are mainly nickel-copper-based smelters and mines.The top soil around the Sudbury nickel-copper refinery in Canada had the highest levels of nickel contamination at 26,000 mg/kg11.In contrast, pollution from nickel production in Russia has resulted in higher nickel concentrations in Norwegian soil11.According to Alms et al. 12, the amount of HNO3-extractable nickel in the region’s top arable land (nickel production in Russia) ranged from 6.25 to 136.88 mg/kg, corresponding to a mean of 30.43 mg/kg and a baseline concentration of 25 mg/kg.According to kabata 11, the application of phosphorus fertilizers in agricultural soils in urban or peri-urban soils during successive crop seasons can infuse or contaminate the soil.The potential effects of nickel in humans may lead to cancer through mutagenesis, chromosomal damage, Z-DNA generation, blocked DNA excision repair, or epigenetic processes13.In animal experiments, nickel has been found to have the potential to cause a variety of tumors, and carcinogenic nickel complexes may exacerbate such tumors.
Soil contamination assessments have flourished in recent times due to a wide range of health-related issues arising from soil-plant relationships, soil and soil biological relationships, ecological degradation, and environmental impact assessment.To date, spatial prediction of potentially toxic elements (PTEs) such as Ni in soil has been laborious and time-consuming using traditional methods.The advent of digital soil mapping (DSM) and its current success15 have greatly improved predictive soil mapping (PSM).According to Minasny and McBratney16, predictive soil mapping (DSM) has proven to be a prominent subdiscipline of soil science.Lagacherie and McBratney, 2006 define DSM as “the creation and filling of spatial soil information systems through the use of in situ and laboratory observational methods and spatial and non-spatial soil inference systems”.McBratney et al. 17 outline that the contemporary DSM or PSM is the most effective technique for predicting or mapping the spatial distribution of PTEs, soil types and soil properties.Geostatistics and Machine Learning Algorithms (MLA) are DSM modeling techniques that create digitized maps with the help of computers using significant and minimal data.
Deutsch18 and Olea19 define geostatistics as “the collection of numerical techniques that deal with the representation of spatial attributes, mainly employing stochastic models, such as how time series analysis characterizes temporal data.” Primarily, geostatistics involves the evaluation of variograms, which allow Quantify and define the dependencies of spatial values from each dataset20.Gumiaux et al. 20 further illustrate that the evaluation of variograms in geostatistics is based on three principles, including (a) computing the scale of data correlation, (b) identifying and computing anisotropy in dataset disparity and (c) in addition to In addition to taking into account the inherent error of the measurement data separated from the local effects, the area effects are also estimated.Building on these concepts, many interpolation techniques are used in geostatistics, including general kriging, co-kriging, ordinary kriging, empirical Bayesian kriging, simple kriging method and other well-known interpolation techniques to map or predict PTE, soil characteristics, and soil types.
Machine Learning Algorithms (MLA) are a relatively new technique that employs larger non-linear data classes, fueled by algorithms primarily used for data mining, identifying patterns in data, and repeatedly applied to classification in scientific fields such as soil science and return tasks.Numerous research papers rely on MLA models to predict PTE in soils, such as Tan et al. 22 (random forests for heavy metal estimation in agricultural soils), Sakizadeh et al. 23 (modelling using support vector machines and artificial neural networks) soil pollution ).In addition, Vega et al. 24 (CART for modeling heavy metal retention and adsorption in soil) Sun et al. 25 (application of cubist is the distribution of Cd in soil) and other algorithms such as k-nearest neighbor, generalized boosted regression, and boosted regression Trees also applied MLA to predict PTE in soil.
The application of DSM algorithms in prediction or mapping faces several challenges.Many authors believe that MLA is superior to geostatistics and vice versa.Although one is better than the other, the combination of the two improves the level of accuracy of mapping or prediction in DSM15.Woodcock and Gopal26 Finke27; Pontius and Cheuk28 and Grunwald29 comment on deficiencies and some errors in predicted soil mapping.Soil scientists have tried a variety of techniques to optimize the effectiveness, accuracy, and predictability of DSM mapping and forecasting.The combination of uncertainty and verification is one of many different aspects integrated into DSM to optimize effectiveness and reduce defects.However, Agyeman et al. 15 outline that the validation behavior and uncertainty introduced by map creation and prediction should be independently validated to improve map quality.The limitations of the DSM are due to geographically dispersed soil quality, which involves a component of uncertainty; however, the lack of certainty in the DSM may arise from multiple sources of error, namely covariate error, model error, location error, and analytical Error 31.Modelling inaccuracies induced in MLA and geostatistical processes are associated with a lack of understanding, ultimately leading to oversimplification of the real process32.Regardless of the nature of the modeling, inaccuracies can be attributed to modeling parameters, mathematical model predictions, or interpolation33.Recently, a new DSM trend has emerged that promotes the integration of geostatistics and MLA in mapping and forecasting.Several soil scientists and authors, such as Sergeev et al. 34; Subbotina et al. 35; Tarasov et al. 36 and Tarasov et al. 37 have exploited the accurate quality of geostatistics and machine learning to generate hybrid models that improve the efficiency of forecasting and mapping. quality.Some of these hybrid or combined algorithm models are Artificial Neural Network Kriging (ANN-RK), Multilayer Perceptron Residual Kriging (MLP-RK), Generalized Regression Neural Network Residual Kriging (GR- NNRK)36, Artificial Neural Network Kriging-Multilayer Perceptron (ANN-K-MLP)37 and Co-Kriging and Gaussian Process Regression38.
According to Sergeev et al., combining various modeling techniques has the potential to eliminate defects and increase the efficiency of the resulting hybrid model rather than developing its single model.In this context, this new paper argues that it is necessary to apply a combined algorithm of geostatistics and MLA to create optimal hybrid models to predict Ni enrichment in urban and peri-urban areas.This study will rely on Empirical Bayesian Kriging (EBK) as the base model and mix it with Support Vector Machine (SVM) and Multiple Linear Regression (MLR) models.Hybridization of EBK with any MLA is not known.The multiple mixed models seen are combinations of ordinary, residual, regression kriging, and MLA.EBK is a geostatistical interpolation method that utilizes a spatially stochastic process that is localized as a non-stationary/stationary random field with defined localization parameters over the field, allowing for spatial variation39.EBK has been used in a variety of studies, including analyzing the distribution of organic carbon in farm soils40, assessing soil pollution41 and mapping soil properties42.
On the other hand, Self-Organizing Graph (SeOM) is a learning algorithm that has been applied in various articles such as Li et al. 43, Wang et al. 44, Hossain Bhuiyan et al. 45 and Kebonye et al.46 Determine the spatial attributes and grouping of elements.Wang et al. 44 outline that SeOM is a powerful learning technique known for its ability to group and imagine non-linear problems.Unlike other pattern recognition techniques such as principal component analysis, fuzzy clustering, hierarchical clustering, and multi-criteria decision making, SeOM is better at organizing and identifying PTE patterns.According to Wang et al. 44, SeOM can spatially group the distribution of related neurons and provide high-resolution data visualization.SeOM will visualize Ni prediction data to obtain the best model to characterize the results for direct interpretation.
This paper aims to generate a robust mapping model with optimal accuracy for predicting nickel content in urban and peri-urban soils.We hypothesize that the reliability of the mixed model mainly depends on the influence of other models attached to the base model.We acknowledge the challenges facing the DSM, and while these challenges are being addressed on multiple fronts, the combination of advances in geostatistics and MLA models appears to be incremental; therefore, we will attempt to answer research questions that may yield mixed models.However, how accurate is the model at predicting the target element?Also, what is the level of efficiency evaluation based on validation and accuracy evaluation?Therefore, the specific goals of this study were to (a) create a combined mixture model for SVMR or MLR using EBK as the base model, (b) compare the resulting models (c) propose the best mixture model for predicting Ni concentrations in urban or peri-urban soils , and (d) the application of SeOM to create a high-resolution map of nickel spatial variation.
The study is being carried out in the Czech Republic, specifically in the Frydek Mistek district in the Moravia-Silesian region (see Figure 1).The geography of the study area is very rugged and is mostly part of the Moravia-Silesian Beskidy region, which is part of the outer rim of the Carpathian Mountains.The study area is located between 49° 41′ 0′ N and 18° 20′ 0′ E, and the altitude is between 225 and 327 m; however, the Koppen classification system for the climatic state of the region is rated as Cfb = temperate oceanic climate, There is a lot of rainfall even in the dry months.Temperatures vary slightly throughout the year between −5 °C and 24 °C, rarely falling below −14 °C or above 30 °C, while the average annual precipitation is between 685 and 752 mm47.The estimated survey area of the whole area is 1,208 square kilometers, with 39.38% of the cultivated land and 49.36% of the forest coverage.On the other hand, the area used in this study is about 889.8 square kilometers.In and around Ostrava, the steel industry and metal works are very active.Metal mills, the steel industry where nickel is used in stainless steels (e.g. for resistance to atmospheric corrosion) and alloy steels (nickel increases the strength of the alloy while maintaining its good ductility and toughness), and intensive agriculture such as phosphate fertilizer application and livestock production are research potential sources of nickel in the region (eg, adding nickel to lambs to increase growth rates in lambs and low-fed cattle).Other industrial uses of nickel in research areas include its use in electroplating, including electroplating nickel and electroless nickel plating processes.Soil properties are easily distinguishable from soil color, structure, and carbonate content.The soil texture is medium to fine, derived from the parent material.They are colluvial, alluvial or aeolian in nature.Some soil areas appear mottled in the surface and subsoil, often with concrete and bleaching.However, cambisols and stagnosols are the most common soil types in the region48.With elevations ranging from 455.1 to 493.5 m, cambisols dominate the Czech Republic49.
Study area map [The study area map was created using ArcGIS Desktop (ESRI, Inc, version 10.7, URL: https://desktop.arcgis.com).]
A total of 115 topsoil samples were obtained from urban and peri-urban soils in the Frydek Mistek district.The sample pattern used was a regular grid with soil samples spaced 2 × 2 km apart, and topsoil was measured at a depth of 0 to 20 cm using a hand-held GPS device (Leica Zeno 5 GPS).Samples are packaged in Ziploc bags, properly labeled, and shipped to the laboratory.The samples were air-dried to produce pulverized samples, pulverized by a mechanical system (Fritsch disc mill), and sieved (sieve size 2 mm).Place 1 gram of dried, homogenized and sieved soil samples in clearly labeled teflon bottles.In each Teflon vessel, dispense 7 ml of 35% HCl and 3 ml of 65% HNO3 (using an automatic dispenser – one for each acid), cover lightly and allow the samples to stand overnight for the reaction (aqua regia program) .Place the supernatant on a hot metal plate (temperature: 100 W and 160 °C) for 2 h to facilitate the digestion process of the samples, then cool.Transfer the supernatant to a 50 ml volumetric flask and dilute to 50 ml with deionized water.After that, filter the diluted supernatant into a 50 ml PVC tube with deionized water.Additionally, 1 ml of the dilution solution was diluted with 9 ml of deionized water and filtered into a 12 ml tube prepared for PTE pseudo-concentration.The concentrations of PTEs (As, Cd, Cr, Cu, Mn, Ni, Pb, Zn, Ca, Mg, K) were determined by ICP-OES (Inductively Coupled Plasma Optical Emission Spectroscopy) (Thermo Fisher Scientific, USA) according to standard methods and agreement.Ensure Quality Assurance and Control (QA/QC) procedures (SRM NIST 2711a Montana II Soil).PTEs with detection limits below half were excluded from this study.The detection limit of the PTE used in this study was 0.0004.(you).In addition, the quality control and quality assurance process for each analysis is ensured by analyzing reference standards.To ensure that errors were minimized, a double analysis was performed.
Empirical Bayesian Kriging (EBK) is one of many geostatistical interpolation techniques used in modeling in diverse fields such as soil science.Unlike other kriging interpolation techniques, EBK differs from traditional kriging methods by considering the error estimated by the semivariogram model.In EBK interpolation, several semivariogram models are computed during interpolation, rather than a single semivariogram.Interpolation techniques make way for the uncertainty and programming associated with this plotting of the semivariogram that constitutes a highly complex part of a sufficient kriging method.The interpolation process of EBK follows the three criteria proposed by Krivoruchko50, (a) the model estimates the semivariogram from the input dataset (b) the new predicted value for each input dataset location based on the generated semivariogram and (c) the final A model is computed from a simulated dataset.The Bayesian equation rule is given as a posterior
Where \(Prob\left(A\right)\) represents the prior, \(Prob\left(B\right)\) marginal probability is ignored in most cases, \(Prob (B,A)\ ) .The semivariogram calculation is based on Bayes’ rule, which shows the propensity of observation datasets that can be created from semivariograms.The value of the semivariogram is then determined using Bayes’ rule, which states how likely it is to create a dataset of observations from the semivariogram.
A support vector machine is a machine learning algorithm that generates an optimal separating hyperplane to distinguish identical but not linearly independent classes.Vapnik51 created the intent classification algorithm, but it has recently been used to solve regression-oriented problems.According to Li et al.52, SVM is one of the best classifier techniques and has been used in various fields.The regression component of SVM (Support Vector Machine Regression – SVMR) was used in this analysis.Cherkassky and Mulier53 pioneered SVMR as a kernel-based regression, the computation of which was performed using a linear regression model with multi-country spatial functions.John et al54 report that SVMR modeling employs hyperplane linear regression, which creates nonlinear relationships and allows for spatial functions.According to Vohland et al. 55, epsilon (ε)-SVMR uses the trained dataset to obtain a representation model as an epsilon-insensitive function that is applied to map the data independently with the best epsilon bias from training on correlated data.The preset distance error is ignored from the actual value, and if the error is larger than ε(ε), the soil properties compensate it.The model also reduces the complexity of the training data to a broader subset of support vectors.The equation proposed by Vapnik51 is shown below.
where b represents the scalar threshold, \(K\left({x}_{,}{ x}_{k}\right)\) represents the kernel function, \(\alpha\) represents the Lagrange multiplier, N Represents a numeric dataset, \({x}_{k}\) represents data input, and \(y\) is data output.One of the key kernels used is the SVMR operation, which is a Gaussian radial basis function (RBF).The RBF kernel is applied to determine the optimal SVMR model, which is critical to obtain the most subtle penalty set factor C and kernel parameter gamma (γ) for the PTE training data.First, we evaluated the training set and then tested the model performance on the validation set.The steering parameter used is sigma and the method value is svmRadial.
A multiple linear regression model (MLR) is a regression model that represents the relationship between the response variable and a number of predictor variables by using linear pooled parameters calculated using the least squares method.In MLR, a least squares model is a predictive function of soil properties after selection of explanatory variables.It is necessary to use the response to establish a linear relationship using explanatory variables.PTE was used as the response variable to establish a linear relationship with the explanatory variables.The MLR equation is
where y is the response variable, \(a\) is the intercept, n is the number of predictors, \({b}_{1}\) is the partial regression of the coefficients, \({x}_{ i}\) represents a predictor or explanatory variable, and \({\varepsilon }_{i}\) represents the error in the model, also known as the residual.
Mixed models were obtained by sandwiching EBK with SVMR and MLR.This is done by extracting predicted values from EBK interpolation.The predicted values obtained from the interpolated Ca, K, and Mg are obtained through a combinatorial process to obtain new variables, such as CaK, CaMg, and KMg.The elements Ca, K and Mg are then combined to obtain a fourth variable, CaKMg.Overall, the variables obtained are Ca, K, Mg, CaK, CaMg, KMg and CaKMg.These variables became our predictors, helping to predict nickel concentrations in urban and peri-urban soils.The SVMR algorithm was performed on the predictors to obtain a mixed model Empirical Bayesian Kriging-Support Vector Machine (EBK_SVM).Similarly, variables are also piped through the MLR algorithm to obtain a mixed model Empirical Bayesian Kriging-Multiple Linear Regression (EBK_MLR).Typically, the variables Ca, K, Mg, CaK, CaMg, KMg, and CaKMg are used as covariates as predictors of Ni content in urban and peri-urban soils.The most acceptable model obtained (EBK_SVM or EBK_MLR) will then be visualized using a self-organizing graph.The workflow of this study is shown in Figure 2.
Using SeOM has become a popular tool for organizing, evaluating, and forecasting data in the financial sector, healthcare, industry, statistics, soil science, and more.SeOM is created using artificial neural networks and unsupervised learning methods for organization, evaluation, and prediction.In this study, SeOM was used to visualize Ni concentrations based on the best model for predicting Ni in urban and peri-urban soils.The data processed in the SeOM evaluation are used as n input-dimensional vector variables43,56.Melssen et al. 57 describe the connection of an input vector into a neural network through a single input layer to an output vector with a single weight vector.The output generated by SeOM is a two-dimensional map consisting of different neurons or nodes woven into hexagonal, circular, or square topological maps according to their proximity.Comparing map sizes based on metric, quantization error (QE) and topographic error (TE), the SeOM model with 0.086 and 0.904, respectively, is selected, which is a 55-map unit (5 × 11).The neuron structure is determined according to the number of nodes in the empirical equation
The number of data used in this study is 115 samples.A random approach was used to split the data into test data (25% for validation) and training data sets (75% for calibration).The training dataset is used to generate the regression model (calibration), and the test dataset is used to verify the generalization ability58.This was done to assess the suitability of various models for predicting nickel content in soils.All models used went through a ten-fold cross-validation process, repeated five times.The variables produced by EBK interpolation are used as predictors or explanatory variables to predict the target variable (PTE).Modeling is handled in RStudio using the packages library(Kohonen), library(caret), library(modelr), library(“e1071″), library(“plyr”), library(“caTools”), library(” prospectr”) and libraries (“Metrics”).
Various validation parameters were used to determine the best model suitable for predicting nickel concentrations in soil and to evaluate the accuracy of the model and its validation.Hybridization models were evaluated using mean absolute error (MAE), root mean square error (RMSE), and R-squared or coefficient determination (R2).R2 defines the variance of the proportions in the answer, represented by the regression model.RMSE and variance magnitude in independent measures describe the predictive power of the model, while MAE determines the actual quantitative value.The R2 value must be high to evaluate the best mixture model using the validation parameters, the closer the value is to 1, the higher the accuracy.According to Li et al. 59, an R2 criterion value of 0.75 or greater is considered a good predictor; from 0.5 to 0.75 is acceptable model performance, and below 0.5 is unacceptable model performance.When selecting a model using the RMSE and MAE validation criteria evaluation methods, the lower values obtained were sufficient and were considered the best choice.The following equation describes the verification method.
where n represents the size of the observed value\({Y}_{i}\) represents the measured response, and \({\widehat{Y}}_{i}\) also represents the predicted response value, therefore, for the first i observations.
Statistical descriptions of predictor and response variables are presented in Table 1, showing mean, standard deviation (SD), coefficient of variation (CV), minimum, maximum, kurtosis, and skewness.The minimum and maximum values of the elements are in decreasing order of Mg < Ca < K < Ni and Ca < Mg < K < Ni, respectively.Concentrations of the response variable (Ni) sampled from the study area ranged from 4.86 to 42.39 mg/kg.Comparison of Ni with the world average (29 mg/kg) and the European average (37 mg/kg) showed that the overall calculated geometric mean for the study area was within the tolerable range.Nonetheless, as shown by Kabata-Pendias11, a comparison of the average nickel (Ni) concentration in the current study with agricultural soils in Sweden shows that the current average nickel concentration is higher.Likewise, the mean concentration of Frydek Mistek in urban and peri-urban soils in the current study (Ni 16.15 mg/kg) was higher than the allowable limit of 60 (10.2 mg/kg) for Ni in Polish urban soils reported by Różański et al.Furthermore, Bretzel and Calderisi61 recorded very low mean Ni concentrations (1.78 mg/kg) in urban soils in Tuscany compared to the current study.Jim62 also found a lower nickel concentration (12.34 mg/kg) in Hong Kong urban soils, which is lower than the current nickel concentration in this study.Birke et al63 reported an average Ni concentration of 17.6 mg/kg in an old mining and urban industrial area in Saxony-Anhalt, Germany, which was 1.45 mg/kg higher than the average Ni concentration in the area (16.15 mg/kg).Current research.The excessive nickel content in soils in some urban and suburban areas of the study area may be mainly attributed to the iron and steel industry and metal industry.This is consistent with the study by Khodadoust et al. 64 that the steel industry and metalworking are the main sources of nickel contamination in soils.However, the predictors also ranged from 538.70 mg/kg to 69,161.80 mg/kg for Ca, 497.51 mg/kg to 3535.68 mg/kg for K, and 685.68 mg/kg to 5970.05 mg/kg for Mg.Jakovljevic et al. 65 investigated the total Mg and K content of soils in central Serbia.They found that the total concentrations (410 mg/kg and 400 mg/kg, respectively) were lower than the Mg and K concentrations of the current study.Indistinguishable, in eastern Poland, Orzechowski and Smolczynski66 assessed the total content of Ca, Mg and K and showed average concentrations of Ca (1100 mg/kg), Mg (590 mg/kg) and K (810 mg/kg) The content in the topsoil is lower than the single element in this study.A recent study by Pongrac et al. 67 showed that the total Ca content analyzed in 3 different soils in Scotland, UK (Mylnefield soil, Balruddery soil and Hartwood soil) indicated a higher Ca content in this study.
Due to the different measured concentrations of the sampled elements, the data set distributions of the elements exhibit different skewness.The skewness and kurtosis of the elements ranged from 1.53 to 7.24 and 2.49 to 54.16, respectively.All calculated elements have skewness and kurtosis levels above +1, thus indicating that the data distribution is irregular, skewed in the right direction and peaked.The estimated CVs of the elements also show that K, Mg, and Ni exhibit moderate variability, while Ca has extremely high variability.The CVs of K, Ni and Mg explain their uniform distribution.Furthermore, the Ca distribution is non-uniform and external sources may affect its enrichment level.
The correlation of the predictor variables with the response elements indicated a satisfactory correlation between the elements (see Figure 3).The correlation indicated that CaK exhibited moderate correlation with r value = 0.53, as did CaNi.Although Ca and K show modest associations with each other, researchers such as Kingston et al. 68 and Santo69 suggest that their levels in soil are inversely proportional.However, Ca and Mg are antagonistic to K, but CaK correlates well.This may be due to the application of fertilizers such as potassium carbonate, which is 56% higher in potassium.Potassium was moderately correlated with magnesium (KM r = 0.63).In the fertilizer industry, these two elements are closely related because potassium magnesium sulfate, potassium magnesium nitrate, and potash are applied to soils to increase their deficiency levels.Nickel is moderately correlated with Ca, K and Mg with r values = 0.52, 0.63 and 0.55, respectively.The relationships involving calcium, magnesium, and PTEs such as nickel are complex, but nonetheless, magnesium inhibits calcium absorption, calcium reduces the effects of excess magnesium, and both magnesium and calcium reduce the toxic effects of nickel in soil.
Correlation matrix for elements showing the relationship between predictors and responses (Note: this figure includes a scatterplot between elements, significance levels are based on p < 0,001).
Figure 4 illustrates the spatial distribution of elements.According to Burgos et al70, the application of spatial distribution is a technique used to quantify and highlight hot spots in polluted areas.The enrichment levels of Ca in Fig. 4 can be seen in the northwest part of the spatial distribution map.The figure shows moderate to high Ca enrichment hotspots.The calcium enrichment in the northwest of the map is likely due to the use of quicklime (calcium oxide) to reduce soil acidity and its use in steel mills as alkaline oxygen in the steelmaking process.On the other hand, other farmers prefer to use calcium hydroxide in acidic soils to neutralize pH, which also increases the calcium content of the soil71.Potassium also shows hot spots in the northwest and east of the map.The Northwest is a major agricultural community, and the moderate-to-high pattern of potassium may be due to NPK and potash applications.This is consistent with other studies, such as Madaras and Lipavský72, Madaras et al.73, Pulkrabová et al.74, Asare et al.75, who observed that soil stabilization and treatment with KCl and NPK resulted in high K content in the soil. Spatial Potassium enrichment in the northwest of the distribution map may be due to the use of potassium-based fertilizers such as potassium chloride, potassium sulfate, potassium nitrate, potash, and potash to increase the potassium content of poor soils.Zádorová et al. 76 and Tlustoš et al. 77 outlined that the application of K-based fertilizers increased the K content in the soil and would significantly increase the soil nutrient content in the long run, especially K and Mg showing a hot spot in the soil.Relatively moderate hotspots in the northwest of the map and the southeast of the map.Colloidal fixation in soil depletes the concentration of magnesium in the soil.Its lack in soil causes plants to exhibit yellowish intervein chlorosis.Magnesium-based fertilizers, such as potassium magnesium sulfate, magnesium sulfate, and Kieserite, treat deficiencies (plants appear purple, red, or brown, indicating magnesium deficiency) in soils with a normal pH range6.The accumulation of nickel on urban and peri-urban soil surfaces may be due to anthropogenic activities such as agriculture and the importance of nickel in stainless steel production78.
Spatial distribution of elements [spatial distribution map was created using ArcGIS Desktop (ESRI, Inc, Version 10.7, URL: https://desktop.arcgis.com).]
The model performance index results for the elements used in this study are shown in Table 2.On the other hand, the RMSE and MAE of Ni are both close to zero (0.86 RMSE, -0.08 MAE).On the other hand, both RMSE and MAE values of K are acceptable.RMSE and MAE results were greater for calcium and magnesium.Ca and K MAE and RMSE results are larger due to different datasets.The RMSE and MAE of this study using EBK to predict Ni were found to be better than the results of John et al. 54 using synergistic kriging to predict S concentrations in soil using the same collected data.The EBK outputs we studied correlate with those of Fabijaczyk et al. 41, Yan et al. 79, Beguin et al. 80, Adhikary et al. 81 and John et al. 82, especially K and Ni.
The performance of individual methods for predicting nickel content in urban and peri-urban soils was evaluated using the performance of the models (Table 3).Model validation and accuracy evaluation confirmed that the Ca_Mg_K predictor combined with the EBK SVMR model yielded the best performance.Calibration model Ca_Mg_K-EBK_SVMR model R2, root mean square error (RMSE) and mean absolute error (MAE) were 0.637 (R2), 95.479 mg/kg (RMSE) and 77.368 mg/kg (MAE) Ca_Mg_K-SVMR was 0.663 (R2), 235.974 mg/kg (RMSE) and 166.946 mg/kg (MAE).Nonetheless, good R2 values were obtained for Ca_Mg_K-SVMR (0.663 mg/kg R2) and Ca_Mg-EBK_SVMR (0.643 = R2); their RMSE and MAE results were higher than those for Ca_Mg_K-EBK_SVMR (R2 0.637) (see Table 3).In addition, the RMSE and MAE of the Ca_Mg-EBK_SVMR (RMSE = 1664.64 and MAE = 1031.49) model are 17.5 and 13.4, respectively, which are larger than those of the Ca_Mg_K-EBK_SVMR.Likewise, the RMSE and MAE of the Ca_Mg-K SVMR (RMSE = 235.974 and MAE = 166.946) model are 2.5 and 2.2 larger than those of the Ca_Mg_K-EBK_SVMR RMSE and MAE, respectively.The calculated RMSE results indicate how concentrated the data set is with the line of best fit.Higher RSME and MAE were observed.According to Kebonye et al. 46 and john et al. 54, the closer the RMSE and MAE are to zero, the better the results.SVMR and EBK_SVMR have higher quantized RSME and MAE values.It was observed that the RSME estimates were consistently higher than the MAE values, indicating the presence of outliers.According to Legates and McCabe83, the extent to which the RMSE exceeds the mean absolute error (MAE) is recommended as an indicator of the presence of outliers.This means that the more heterogeneous the dataset, the higher the MAE and RMSE values.The accuracy of cross-validation assessment of the Ca_Mg_K-EBK_SVMR mixed model for predicting Ni content in urban and suburban soils was 63.70%.According to Li et al. 59, this level of accuracy is an acceptable model performance rate.The present results are compared to a previous study by Tarasov et al. 36 whose hybrid model created MLPRK (Multilayer Perceptron Residual Kriging), related to the EBK_SVMR accuracy evaluation index reported in the current study, RMSE (210) and The MAE (167.5) was higher than our results in the current study (RMSE 95.479, MAE 77.368).However, when comparing the R2 of the current study (0.637) with that of Tarasov et al. 36 (0.544), it is clear that the coefficient of determination (R2) is higher in this mixed model.The margin of error (RMSE and MAE) (EBK SVMR) for the mixed model is two times lower.Likewise, Sergeev et al.34 recorded 0.28 (R2) for the developed hybrid model (Multilayer Perceptron Residual Kriging), while Ni in the current study recorded 0.637 (R2).The prediction accuracy level of this model (EBK SVMR) is 63.7%, while the prediction accuracy obtained by Sergeev et al. 34 is 28%.The final map (Fig. 5) created using the EBK_SVMR model and Ca_Mg_K as a predictor shows predictions of hot spots and moderate to nickel over the entire study area.This means that the concentration of nickel in the study area is mainly moderate, with higher concentrations in some specific areas.
The final prediction map is represented using the hybrid model EBK_SVMR and using Ca_Mg_K as the predictor.[The spatial distribution map was created using RStudio (version 1.4.1717: https://www.rstudio.com/).]
Presented in Figure 6 are PTE concentrations as a composition plane consisting of individual neurons.None of the component planes exhibited the same color pattern as shown.However, the appropriate number of neurons per drawn map is 55.SeOM is produced using a variety of colors, and the more similar the color patterns, the more comparable the properties of the samples.According to their precise color scale, individual elements (Ca, K, and Mg) showed similar color patterns to single high neurons and most low neurons.Thus, CaK and CaMg share some similarities with very high-order neurons and low-to-moderate color patterns.Both models predict the concentration of Ni in soil by displaying medium to high hues of colors such as red, orange and yellow.The KMg model displays many high color patterns based on precise proportions and low to medium color patches.On a precise color scale from low to high, the planar distribution pattern of the components of the model showed a high color pattern indicating the potential concentration of nickel in the soil (see Figure 4).The CakMg model component plane shows a diverse color pattern from low to high according to an accurate color scale.Furthermore, the model’s prediction of nickel content (CakMg) is similar to the spatial distribution of nickel shown in Figure 5.Both graphs show high, medium and low proportions of nickel concentrations in urban and peri-urban soils.Figure 7 depicts the contour method in the k-means grouping on the map, divided into three clusters based on the predicted value in each model.The contour method represents the optimal number of clusters.Of the 115 soil samples collected, category 1 obtained the most soil samples, 74.Cluster 2 received 33 samples, while cluster 3 received 8 samples.The seven-component planar predictor combination was simplified to allow for correct cluster interpretation.Due to the numerous anthropogenic and natural processes affecting soil formation, it is difficult to have properly differentiated cluster patterns in a distributed SeOM map78.
Component plane output by each Empirical Bayesian Kriging Support Vector Machine (EBK_SVM_SeOM) variable.[SeOM maps were created using RStudio (version 1.4.1717: https://www.rstudio.com/).]
Different cluster classification components [SeOM maps were created using RStudio (version 1.4.1717: https://www.rstudio.com/).]
The current study clearly illustrates modeling techniques for nickel concentrations in urban and peri-urban soils.The study tested different modeling techniques, combining elements with modeling techniques, to obtain the best way to predict nickel concentrations in soil.The SeOM compositional planar spatial features of the modeling technique exhibited a high color pattern from low to high on an accurate color scale, indicating Ni concentrations in the soil.However, the spatial distribution map confirms the planar spatial distribution of components exhibited by EBK_SVMR (see Figure 5).The results show that the support vector machine regression model (Ca Mg K-SVMR) predicts the concentration of Ni in soil as a single model, but the validation and accuracy evaluation parameters show very high errors in terms of RMSE and MAE.On the other hand, the modeling technique employed with the EBK_MLR model is also flawed due to the low value of the coefficient of determination (R2).Good results were obtained using EBK SVMR and combined elements (CaKMg) with low RMSE and MAE errors with an accuracy of 63.7%.It turns out that combining the EBK algorithm with a machine learning algorithm can generate a hybrid algorithm that can predict the concentration of PTEs in soil.The results show that using Ca Mg K as predictors to predict Ni concentrations in the study area can improve the prediction of Ni in soils.This means that the continuous application of nickel-based fertilizers and industrial pollution of the soil by the steel industry has a tendency to increase the concentration of nickel in the soil.This study revealed that the EBK model can reduce the level of error and improve the accuracy of the model of soil spatial distribution in urban or peri-urban soils.In general, we propose to apply the EBK-SVMR model to assess and predict PTE in soil; in addition, we propose to use EBK to hybridize with various machine learning algorithms.Ni concentrations were predicted using elements as covariates; however, using more covariates would greatly improve the performance of the model, which can be considered a limitation of the current work.Another limitation of this study is that the number of datasets is 115.Therefore, if more data are provided, the performance of the proposed optimized hybridization method can be improved.
PlantProbs.net.Nickel in Plants and Soil https://plantprobs.net/plant/nutrientImbalances/sodium.html (Accessed 28 April 2021).
Kasprzak, KS Nickel advances in modern environmental toxicology.surroundings.toxicology.11, 145–183 (1987).
Cempel, M. & Nikel, G. Nickel: A review of its sources and environmental toxicology.Polish J. Environment.Stud.15, 375–382 (2006).
Freedman, B. & Hutchinson, TC Pollutant input from the atmosphere and accumulation in soil and vegetation near a nickel-copper smelter in Sudbury, Ontario, Canada.can.J. Bot.58(1), 108-132.https://doi.org/10.1139/b80-014 (1980).
Manyiwa, T. et al.Heavy metals in soil, plants and risks associated with grazing ruminants near the Selebi-Phikwe copper-nickel mine in Botswana.surroundings.Geochemistry.Health https://doi.org/10.1007/s10653-021-00918-x (2021).
Cabata-Pendias.Kabata-Pendias A. 2011. Trace elements in soil and… – Google Scholar https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Kabata-Pendias+A.+2011.+Trace+ Elements+in+soils+and+plants.+4th+ed.+New+York+%28NY%29%3A+CRC+Press&btnG= (Accessed 24 Nov 2020).
Almås, A., Singh, B., Agriculture, TS-NJ of & 1995, undefined.Effects of the Russian nickel industry on heavy metal concentrations in agricultural soils and grasses in Soer-Varanger, Norway.agris.fao.org.
Nielsen, GD et al.Nickel absorption and retention in drinking water are related to food intake and nickel sensitivity.toxicology.application.Pharmacodynamics.154, 67–75 (1999).
Costa, M. & Klein, CB Nickel carcinogenesis, mutation, epigenetics or selection.surroundings.Health Perspective.107, 2 (1999).
Ajman, PC; Ajado, SK; Borůvka, L.; Bini, JKM; Sarkody, VYO; Cobonye, NM; Trend analysis of potentially toxic elements: a bibliometric review.Environmental Geochemistry and Health.Springer Science & Business Media BV 2020.https://doi.org/10.1007/s10653-020-00742-9.
Minasny, B. & McBratney, AB Digital Soil Mapping: A Brief History and Some Lessons.Geoderma 264, 301–311.https://doi.org/10.1016/j.geoderma.2015.07.017 (2016).
McBratney, AB, Mendonça Santos, ML & Minasny, B. On digital soil mapping.Geoderma 117(1-2), 3-52.https://doi.org/10.1016/S0016-7061(03)00223-4 (2003).
Deutsch.CV Geostatistical Reservoir Modeling,… – Google Scholar https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=CV+Deutsch%2C+2002%2C+Geostatistical+Reservoir+Modeling%2C +Oxford+University+Press%2C+376+pages.+&btnG= (Accessed 28 April 2021).