Lipometrix - plots

Index

Abbreviations

SM	Sphingomyelins
CE	Cholesterol esters
CER	Ceramides
DCER	Dihydroceramides
HexCer	Hexosylceramides^*
TG	Triacylglycerides
DG	Diacylglycerides
MG	Monoacylglycerides
PC	Phosphatidylcholine
PC-O	1-alkyl,2-acylphosphatidylcholine
PC-P	1-alkenyl,2-acylphosphatidylcholine
LPC	Lysophosphatidylcholine
PE	Phosphatidylethanolamine
PE-O	1-alkyl,2-acylphosphatidylethanolamines
PE-P	1-alkenyl,2-acylphosphatidylethanolamines
LPE	Lysophosphatidylethanolamine
PG	Phosphatidylglycerol
PI	Phosphatidylinositol
PS	Phosphatidylserine

^*The hexose can be either glucose or galactose.

Lipid species quantification

Lipid species are quantified by one deuterated standard per lipid class and were corrected for the overlap of isotopomers from other species. Factors that determine the analytical response besides the lipid class (number of double bonds, acyl chain length) are not considered. This is the most common method of quantification and is referred to by the Lipidomics-Standards-Initiative (LSI) as “level 2” type of quantification. Relative quantification, where each species is normalised to the total amount of lipid of its class, is also available and can be selected from the dropdown menu at the top right of the page. Unless requested otherwise, liquid samples are normalised to volume, cell culture and tissue samples are normalised to DNA concentration and isolated organelle preparations to protein concentration.

Lipid species plots

These plots show a pairwise comparison of different conditions. For each pairwise comparison there are 3 plots. The first plots shows lipid species denoted by their sum notation. For example PC(36:2) refers to a phosphatidylcholine species with a total of 36 carbon atoms in the 2 acyl chains and a total of 2 double bonds. If you hover with the mouse over this plot, the plot below on the right will update and show the more detailed composition of these species. For example, PC(36:2) could be PC(18:1/18:1), PC(16:0/20:2), etc. The plot on the left below the first plot shows the log2 of the fold change of the 2 conditions. On these plots the lipid species are sorted from left to right by saturation degree and by chain length within groups of identical saturation degree. The lipid class of the displayed data can be changed in the menu at the top of the screen.

p-values were calculated using a one-way anova test, assuming equal variance (homoscedasticity) in the 2 groups. If the groups are of similar size, this is typically a valid assumption, since one-way anova is not very sensitive to heteroscedasticity. FDR adjusted p-values were calculated using the Benjamini–Hochberg procedure.

Feature plots

The first of these plots shows the fatty acid composition of the selected lipid class. In the two plots below, species with the same number of unsaturations (plot on the left) or with the same chain length (plot on the right) are summed together. These plots show how much percent of the acyl chains for a given lipid class contain 0, 1, 2, … unsaturations or how much percent of the acyl chains have a length of 14, 16, 18, etc. carbon atoms. The lipid class of the displayed data can be changed in the menu at the top of the screen. Click on the legend to toggle the display of individual features. Double-click on the legend to isolate individual features.

Dimensionality reduction

These plots show a Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbour Embedding (t-SNE) of the full set of samples, and were calculated based on all the measured lipid species across all lipid classes.

In a (2D) scatterplot, each sample is represented by a dot that is placed on a coordinate system. The placement is determined by the value of 2 variables. For example, each dot could be the representation of a person and the 2 variables could be the BMI and the body fat percentage of these subjects. The BMI would determine the placement of the sample on (for example) the x-axis and the bodyfat percentage would determine the placement on the other axis. It is easy to see that points that are located close to each other represent samples that are similar (in terms of BMI and body fat %). A very useful property of scatterplots is that they can easily reveal trends such as outliers or clusters of similar samples. A typical BMI/body fat % scatterplot for example, reveals 2 clusters that correspond to the male and female population.

Let’s say we want to expand our BMI/body fat % study and also study the relation with age. Now there are 3 variables per sample/subject. One option to visualise this data would be to introduce an 3rd z-axis to our scatterplot for a 3D visualisation. Another option would be to create multiple 2D scatterplots for each possible combination of our variables: BMI vs body fat, BMI vs age and body fat vs age. But what if there are more than 3 variables? The first option is no longer possible and due to the increasing number of possible combinations the second option quickly becomes impractical.

This is a typical problem in omics experiments. In lipidomics, typically hundreds to thousands of variables (lipid species) are measured in each sample. With so many variables, how could you plot the samples in a scatter plot? Dimensionality reduction techniques such as principal component analysis (PCA) or t-SNE offer a solution to this problem (‘dimensions’ in this context refers to the measurement parameters). These techniques reduce the thousands of measurements variables to two new variables that allow each data point to be plotted on a x- and y-axis.

This is sometimes misunderstood as ‘these techniques somehow select two measurement parameters out of the many and put these on a scatter plot’. In reality they construct entirely new parameters that take into account all the measurement parameters. PCA creates these new parameters by generating a linear combination of the measurement parameters and it favours parameters that show high variance. t-SNE on the other hand is a non-linear algorithm that can perform different transformations in different regions of the data. t-SNE tries only to map local relationships accurately while global distances (e.g. distances between different clusters) are not preserved (i.e. it is not suitable for outlier detection).While both techniques aim to get a faithful representation of high dimensional data in a 2D plane, neither technique is perfect.

Index

Lipid species plots

Lipid features plots

Dimensionality reduction

Abbreviations

Lipid species quantification

Lipid species plots

Feature plots

Dimensionality reduction