22 jun. 2017

How to check the cooling liquid - NIRS™ DS2500

Filter Replacement - NIRS™ DS2500

Lamp Replacement - NIRS™ DS2500

Instrument Calibration - NIRS™ DS2500

Checking Temperatures in DS2500 (Lamp)

In order that the performance of the instrument DS2500 be optimal, we have to attend the temperature of the lamp when running the diagnostics. I consider it is fine around 35ºC.
Sometimes we find high temperatures like the one in the picture, and even seeing that the report says that is OK, this temperature can affect to the instrument itself and the results.
One of the causes that this temperature increase is that the tank of the pump has lost water, so it is a good idea to check the level, and fill it in if necessary.

Checking pump level video
Check that the pump is pumping. We should see some turbulences in the water and a small noise in the pump.
Check if the water is to dirty, or with algae’s.
Check that the fan is working, its mission is to keep cold the water and see if the filter is clean so the fan performs better its mission.

Changing the filter

It is important also the temperature of the room or laboratory where the instrument is. A higher temperature will increase also the lamp temperature.

After checking all this points, and being sure that the lamp is fine, maybe is the moment to run an instrument calibration:

Instrument Calibration


19 jun. 2017

Comparing Residuals, GH and T when validating

When looking to the validation statistics is important to look at the same time to three values: Residual, GH and T value for every sample. From this data (fiber), we can check if our sample is extrapolating badly, it is not robust or any other issues.

In this case, as we can see there are samples with a very high GH and we can see that those samples have in common that the T statistic is negative (in the left tail of the Gaussian Bell) and the value is quite high also for the T.
These samples have also the highest residiual values.
 Something is telling us that this samples have something special and are not well represented by the equation. PCA is warking fine and is detecting these samples as outliers, but we need to know what makes tese samples special.

These samples are soy meal and have  highest fat value as the ones in the calibration so the Model did not learn enough about the interaction between the fiber bands and fat bands. So this samples are very interested to make the calibtration more robust.

After checking this, we can add these samples to the calibration to improve the results of the next validation.

Graphically in Excel we can se the interaction between the Residuals, GHs and T values:

22 may. 2017

Mosaic 7.12 is now available on our Europe server

Mosaic version 7.12 is now available on our Europe server.
Once you try to connect, you should be asked to automatically download and install the new client.
User accounts, passwords remain the same.

Ports used for NOVA:
Configure correctly the ports with your IT for a successful synchonization.

7 may. 2017

Easy way to check the eigen values with the T (scores) matrix

Other interesting Matrix multiplication is the product of the score matrix T by it´s transpose in this way:


This product give us a square matrix (a.a), being “a” the number of loadings or PCs chosen, and the diagonal has the eigenvalues which are related to the quantity of explained variance for every loading.

If we plot the diagonal we can see how the eigenvalue decrease by every loading. This plot can help us to decide how many loadings or PCs to choose.

Add caption

6 may. 2017

Checking the orthogonality of P (loadings) matrix

One of the values we got in the script of the post:"Tutorials with Resemble (Part 3 - orthoProjection) " was the loadings matrix (X.loadings), or what we called usually in this blog the P matrix.

One of the characteristics of the loadings “P” matrix, when we develop the PCA, is that if we multiply it by its transpose we get the Identity Matrix “I”



P%*%Pt = I

In the “I” matrix, its diagonal is “1”, and “0” values for all the rest cells indicating that all the loadings are orthogonal between them.

  • Check it by yourself and take out the diagonal from the P matrix.
  • Represent in a graphic the first loadings:
    • 1 vs 2      : a plane
    • 1, 2 and 3: a cube

19 abr. 2017

How to load a REP file in a MOSAIC LOCAL Prediction Model

If we use the MONITOR in Win ISI or a LOCAL Prediction Model in ISI Scan, there is a field to load the REP file (is a ".nir" which include the variation we want to minimize in the model, like the temperature, differences between instruments, differences between the pathlengths of the gold reflectors,….). This way the LOCAL uses the REP file when developing the calibration.

In MOSAIC the REP file must be load in a different way.

As usual we load the ".RED" file, reduced with the appropriate math-treatment, we set the maximum and minimum number of factors and samples,...., but where I load the repeatability file (.NIR) .

😏...Easy but tricky.

Rename the extension from the repeatability file from ".NIR" to ".REP", and give to this file the same name than the ".RED" file; put them both in the same folder. Now when you import the ".RED" file to the LOCAL Prediction Model, the ".REP" file will go with it. Just check it on the Links tab of the LOCAL P.M.
As you know something similar happens when whe load a ".EQA" and load also the ".PCA" and ".LIB" files

Thanks to Montse for testing this feature...😉

24 mar. 2017

Tutorials with Resemble (Part 3 - orthoProjection)

Using orthoProjection:
One of the different functions of Resemble is “orthoProjection” and we can use it with different options. Let check in this post the simplest one:
oP<-orthoProjection(Xr=der.Xr, X2 = NULL,
                    Yu = NULL,method = "pca",
                    pcSelection = list("cumvar",0.99),
                    center = TRUE, scaled = FALSE,
                    cores = 1)
 We can use the training data from the previous post, with the SG filter (just for smoothing) and the first derivative: der.Xr
The method we use is “pca”, so we don´t have to use the reference data “Yr”. We don´t use any additional set so X2=NULL
The number of terms will explain a cumulative variance of 99%.
We center the spectra, and we don´t scale it.
Now run this script in R (be sure that the package Resemble is loaded, library(resemble))

Now we can check the values we get:
[1] "scores" "X.loadings" "variance" "sc.sdv" "n.components"
[6] "pcSelection" "center" "scale" "method"

Matrix T of scores
Matrix P of Loadings
We can see the eigenvalue, the cumulative and explained variance
Number of terms chosen to explain 99% of the variance
cumvar  0,99
average spectrum

Check all these values and matrices.
3.1.......Practice plotting the average spectrum. (page Exercises)
3.2.......Play with the accumulative variance.     (page Exercises)
3.3.......Plot the loadings.                                 (page Exercises)
3.4.......Plot combinations of score Maps            (page Exercises)

¡And enjoy Chemometrics with R!

23 mar. 2017

Tutorials with Resemble (part 2)

If you have practise with the post : Tutorials with Resemble (part 1) , you can continue adding more script following the recomendations of the Resemble Package. This time we can add another math treatment to the previous one of the SG filter.
Once applied the "sg" function, we can calculate the first derivative to define better he variance in the spectra. The Resemble Manual show us how to convert the spectra to a first derivative using  differences. We can do it for the calibration and the validation sets:

der.Xr <- t(diff(t(Xr), lag = 1, differences = 1))
der.Xu <- t(diff(t(Xu), lag = 1, differences = 1))

In this case we lose a data point on the left of the spectra so we have to define the wavelengths to see the plot of the first derivative.


and we get this plot:

Practise doing the same for the validation set Xu and overplotting the spectra with the training set Xr.
Do you see significant differences?
Enjoy using Chemometrics with R.

20 mar. 2017

Tutorials with Resemble (part 1)

I see that some of you are interested in the package "Resemble", so I´m going to re-writte some of the post with this package, so we can understand better the LOCAL concept we have been treating with Win ISI.

The examples use the NIRsoil data that we can get from the package "prospectr".
If we can plot the  raw spectra, ..., just writte this script

In the Resemble manual recomends to apply a SG filter without derivatives to smooth the spectra, so in this case we proceed as the manual:
sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)
NIRsoil$spc <- sg
Now the spectra is truncated in both sides, so we have to create:
and we can plot the spectra filtered:
matplot(wavelength_sg,t(NIRsoil$spc ),type="l",col="black",

You won´t see too much difference with the raw spectra.

Now we split the data into a training (Xr , Yr) set and a validation set (Xu, Yu)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]    

Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]   

Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]     
and we take out the data without reference values form both sets:
Xu <- Xu[!is.na(Yu),]    
Xr <- Xr[!is.na(Yr),]    
Yu <- Yu[!is.na(Yu)]     
Yr <- Yr[!is.na(Yr)]

Practise making plots again of the spectra of the diferent sets. Overlap training and validation sets with different colors,....., and enjoy using R for chemometrics.

6 mar. 2017

Neighborhood Mahalanobis distance matrix

Working with the chemometric packages in R help us to understand other chemometric commercial software’s better.

In Resemble we can use the function fDiss to get a matrix of distances between all the samples in a spectra data set, so we get a square and diagonal matrix with zeroes in the diagonal, because the distance between a sample and itself in the PCA space is cero. This way we can see redundant information and remove it from the spectra set. Finally we can get a well distributed cloud of samples and the average spectrum is more representative to all of them.

Here I just trim the matrix in order to see how close the first 10 samples spectra are  between them.
The spectra used was the NIRsoil data from R.

5 mar. 2017

Wheigthed Average (LOCAL)

We have seen in the post  LOCAL optimization  how, when giving a prediction, LOCAL uses all the PLS terms range we have fixed in the options Min to Max number of terms, and the result is a weighted average of all the results predictions of all the models. So to choose the right range is important to get more accurate predictions.
Looking in the Resemble R package documentation you can see some explanations about how the calculations are made:

"Weighted average pls ("wapls1"): It uses multiple models generated by multiple pls components (i.e. between a minimum and a maximum number of pls components). At each local partition the final predicted value is a weighted average of all the predicted values generated by the multiple pls models. The weight for each component is calculated as follows":

"where s1:j  is the root mean square of the spectral residuals of the unknown (or target) sample when a total of j pls components are used and gj is the root mean square of the regression coefficients corresponding to the jth pls component (see Shenk et al., 1997 for more details).
"wapls1" is not compatible with valMethod = "loc_crossval" since the weights are computed based on the sample to be predicted at each local iteration.
by the multiple pls models".