The aim of this study was to evaluate the viability of near infrared spectroscopy (NIRS) to detect the anomaly known as yellow stain on cork granulate. Detecting this anomaly is crucial to the cork granulate stopper industry, since it is associated with the presence of 2.4.6Trichloroanisole (TCA), this compound having been identified as the main agent responsible for cork offflavours. Samples for the NIRS spectra were prepared by mixing in different proportions cork granulate with high visual quality and cork granulate with yellow stain, obtaining 120 samples with 8 different percentages of yellow stain (0, 5, 10, 15, 25, 35, 50 and 100%). Two spectra per sample were collected using a Bruker MPA spectrophotometer and the partial least squares (PLS) method was used to obtain numerous equations. The best equation was obtained by utilizing the standard normal variate (SNV) spectral preprocessing, making use of only one specific part of the near infrared spectral range: 94004250 cm^{1}. This equation shows a coefficient of determination (R²) of 99.42%, a root mean square error of cross validation (RMSECV) of 2.34%, and a residual prediction deviation (RPD) of 13.10. The critical level and the limit of detection are 3.8% and 7.6%, respectively. The calculated receiver operating characteristic (ROC) curves show great discrimination capacity and the area under the ROC curve (AUC) is higher than 0.93 in any case. This study demonstrates that NIRS provides a viable technique for detecting yellow stain in cork granulate.
Cork is obtained from the cork oak tree (
One of the main problems for the cork industry is the presence of 2,4,6Trichloroanisole (TCA 
The cork defect known as “yellow stain” was identified as far back as 1900 (
Hence, the industry would greatly benefit from the development of a method to control the presence of yellow stain in cork granulate. If cork granulate affected by yellow stain and therefore by TCA is not removed from the production line, the granulate cork stoppers produced using this defective material will have a mouldlike taste and therefore will not be suitable for wine bottle closure. Near infrared spectroscopy (NIRS) is potentially suitable for detecting yellow stain in cork granulate, since it is widely used in quality control of granulated products in the food and agriculture industry (
The first application of NIRS technology to cork was a viability study to assess its potential for characterizing cork planks according to visual quality, porosity and moisture content, and for predicting the geographical origin of cork planks (
The aim of this study is to develop calibration equations to predict the percentages of yellow stain in samples of cork granulate and thereby evaluate the viability of NIRS as a method to detect cork with yellow stain on the production line. Numerous spectra preprocessing and spectral ranges were studied to determine the most suitable. Lastly, critical level, limit of detection and receiver operating characteristic (ROC) curves of the equations were also studied.
Cork pieces used in this study were collected in a sampling carried out in Catalonia (Spain) in 1991 and form part of the INIACIFOR cork laboratory collection. Two groups were selected: pieces classified as being of the highest visual quality (HQ 
Five strips of 0.5 cm thickness were cut from the cross section on all the pieces from both groups and the corkback (phloemic tissue remaining on the outer side of the cork) was removed. In the case of YS pieces, areas presenting yellow stain were separated, so that areas of “pure cork” and “stained cork” were separately ground and sieved (0.51 mm) in order to obtain cork granulate of two types: one comprising 100% highest visual quality cork with no defects ,and the other, 100% yellow stained cork. Both types of cork granulate were dried at 103 °C to constant weight and later conditioned in a container at constant temperature. When the samples were scanned, average moisture content was 4.5%.
Samples for the NIRS spectra were prepared by mixing both types of cork granulate in different proportions, obtaining samples with different percentages of yellow stain (YSP). These percentages were established such that the range was as large as possible while at the same time having a greater incidence of lower values: 0, 5, 10, 15, 25, 35, 50 and 100%. The amount of granulate per sample was fixed at 2.5 grams and the number of samples per percentage of yellow stain was 15, making a total of 120 samples.
Samples were scanned using a Bruker MPA^{®} spectrophotometer (Bruker Analytical Systems, Billerica, MS, USA) that measures diffuse reflectance. Spectra were collected every 16 cm^{1} from 12,500 cm^{1} to 3,600 cm^{1} using OPUS software.
Each sample was weighed prior to analysis on a precision scale of 0.1 mg. Two spectra per sample were obtained making a total of 240 spectra. These spectra were stored as log (1/R) and were used to determine the percentage of yellow stain. The integrating sphere with a rotating system was used as a measuring channel and the area of spectrum was a circular crown of 35.34 cm^{2}.
Spectra were collected and quantitative analysis was performed using OPUS software ver. 7.5. Prior to the calibration, the two spectra taken for each of the samples were averaged, performing calibration with the 120 average spectra. The partial least squares (PLS) method was used to obtain the equations and the maximum number of PLS vector was set at 10. Numerous equations were obtained using an algorithm from the OPUS software, which allows around 200 combinations of different preprocessings to be performed along with various preset spectral ranges. The distance of each of the spectra from the center of the space defined by the entire population, known as Mahalanobis distance, was calculated. If the Mahalanobis distance is greater than 3 for a given sample, the software classifies it as a spectral outlier. OPUS software also calculates the chemical anomaly, defined as samples with significant differences between their actual and predicted values.
Equations were validated by means of crossvalidation in order to include all of the spectral variability of the data set. The best equations were selected taking into account the lowest value of the root mean standard error of cross validation (RMSECV) and the highest value of the coefficient of determination (R^{2}). Another method of assessment used to evaluate the calibration was residual prediction deviation (RPD). It is calculated as the ratio of two standard deviations; the standard deviations of the reference data as measured using the conventional method and the standard error of crossvalidation (
In order to evaluate the discrimination capacity of the different equations, that is, the ability to differentiate between presence or absence of yellow stain rather than to determine the percentage present, we used detection probability (also termed true positive or sensitivity) and false positive probability (1specificity) by means of the critical level (
In our case, critical level (
where
ROC curves represent sensivity
All spectra show the same absorption bands though the intensity differs for each of them. The spectrum of 100% yellow stain shows the highest absorption, spectra of 0, 5, 10, 15, 25 and 35% show the lowest absorption and the spectrum of 50% shows an intermediate absorption. Therefore, the absorption decreases as the percentage of yellow stain declines.
The number of PLS vectors or rank was 8 for Equations 1 and Equation 3 and was 6 for Equation 2.
The RMSECV obtained in the three equations was similar and ranges from 3.28% (Equation 1) to 2.34% (Equation 3), revealing that the equations developed predict the percentage of yellow stain with a high level of precision. The R^{2} was greater than 98% for Equations 1 and Equation 2, and greater than 99% for Equation 3, so more than 98% and 99% respectively of the data variability is explained by the equations. The RPD values were above 8 (between 9.359.68) for Equations 1 and Equation 2, indicating that the equations are satisfactory for quality assurance (QA). In Equation 3, the RPD value was above 13 (13.1), indicating that the equation is like the reference (
To calculate the critical level and the limit of detection it is necessary to estimate the standard deviation of the predicted values for the 15 samples with 0% yellow stain for each of the equations (
The best result for critical level and limit of detection was also obtained with Equation 3. The critical level was 3.8%. Therefore, if the predicted value of yellow stain is higher than 3.8%, then it will certainly not correspond to a white sample and yellow stain will be present at a 95% confidence level. There is a 5% probability of committing a type I or false positive error (analysis of a white sample giving a positive for yellow stain).
The limit of detection is 7.6%, meaning that this the minimum percentage of yellow stain for which we are able to state at a 95% confidence level that the sample is not white. As described in the previous section, there is a 5% probability of making a mistake, but in this case, it would be a false negative error (analysis of a sample with yellow stain gives a negative yellow stain).
In addition to the graphical interpretation of the ROC curves, the area under the ROC curve (AUC) has also been calculated. The AUC values are between 0.9378 and 0.9911. This statistic confirms that the discrimination capacity of Equation 3 is very high, since values above 0.9 are considered excellent (
The results obtained demonstrate that the NIRS technology is able to detect the presence of yellow stain and consequently an application able to detect continuously the presence of this anomaly in the lines of granulates industries could be developed. However, it must be taken into account that the models were obtained using just one geographical origin (Catalonia, Spain) and one granulometry (0.51 mm). Therefore, in order to generalize the model on a broader scale, different geographical origins and granulometries should be employed.
This study evidences the suitability of near infrared spectroscopy (NIRS) as a viable technique for detecting yellow stain in cork granulate. The NIRS equations obtained have a coefficient of determination (R^{2}) between 98.86% and 99.42%, a root mean standard error of cross validation (RMSECV) ranging from 3.28% to 2.34% and a residual prediction deviation (RPD) above 9. The best result is achieved using standard normal variate (SNV) as preprocessing spectra, entering data exclusively from the near infrared region lying between 9400 cm^{1} and 4250 cm^{1}. This region coincides with the principal absorption peaks of cork granulate. The critical level (
The receiver operating characteristic (ROC) curves show a high discrimination capacity. This capacity is maintained even when samples with a high content of yellow stain are progressively removed. The ROC curve calculated using only 0 and 5% yellow stain samples still shows a high discrimination capacity. Bearing in mind that most of the cork contaminated with yellow stain is removed in the postharvest preprocessing, it is important that the equations maintains its discrimination capacity even at the lower percentages given that the presence of this anomaly is always low in cork used for the production of wine stoppers.
The results suggest that NIRS technology may provide a useful method for detecting low concentrations of yellow stain in cork granulate. Batches in which the anomaly has been detected could then be removed at the start of the production line, thus assuring the production of yellowstainfree cork stoppers. However, further research must be undertaken focusing particularly on the lower percentages of yellow stain in order to improve the accuracy of this technique.
The authors would like to thank the INIACIFOR Cork Laboratory assistants María Luisa Cáceres and Lorenzo Ortiz Buiza for all their work in the laboratory.
(a) Piece of cork classified as highest visual quality (HQ); (b) piece of cork with yellow stain clearly present (YS).
(a) Example plots of the probability distributions of total samples with disease (right) and number of samples without disease (left). (
Mean spectra for each percentage of yellow stain (0, 5, 10, 15, 25, 35, 50 and 100%) in the zone with the absorption peaks.
Root mean standard error of cross validation (RMSECV) versus number of PLS vectors (Rank) for Equation 3.
Predicted values versus actual values for Equation 3. Solid line represents the regression line. Darkshaded region shows the 95 % confidence interval and dashed lines are upper and lower 95 % prediction intervals of the regression.
Actual values
Receiver operating characteristic (ROC) curves for Equation 3. From left to right and from top to bottom: ROC curve with all samples; ROC curve with samples of 050%; ROC curve with samples of 035%; ROC curve with samples of 025%; ROC curve with samples of 015%; ROC curve with samples of 010%; and ROC curve with samples of 05%. The area under the curve (AUC) is showed for each of the calculated ROC curves.
Statistics obtained for the three best NIRS calibration equations. Number of PLS vectors (Rank), root mean square error of cross validation (RMSECV), coefficient of determination (R²), residual prediction deviation (RPD) and systematic error (BIAS).
Equation  Rank  RMSECV (%)  R^{2} (%)  RPD  BIAS 

Equation 1  8  3.28  98.86  9.35  0.0356 
Equation 2  6  3.16  98.93  9.68  0.0377 
Equation 3  8  2.34  99.42  13.10  0.0429 
Critical level (
Equation 



Equation 1  4.2  7.4  14.8 
Equation 2  3.7  6.5  13.0 
Equation 3  2.2  3.8  7.6 