FCS data format

Flow cytometry data are stored in FCS format files. FCS standard has several versions and each has its specifications (e.g. FCS3.1 specifications)

FCS file contains HEADER, TEXT and DATA part. TEXT part includes description of the data specified by so called keywords. FCS keyword may include:

$CYT Type of flow cytometer.

$DATE Date of data set acquisition.

$EXP Name of investigator initiating the experiment.

$PROJ Name of the experiment project.

$SMNO Specimen (tube or well) label.

$SRC Source of the specimen (patient name, cell types)

$PnF Name of optical filter for parameter n. e.g. $P2F/520LP/

$PnN Short name for parameter n. e.g. $P3N/FL1/

$PnS Name used for parameter n. e.g. $PnS/CD45 FITC Fluorescence/

$SPILLOVER Fluorescence compensation matrix. Perhaps rather spillover table.

A bit of confusion may come when opening the fcs file and realizing, that some keywords start with $ while others do not. The reason is that those starting with $ are FCS standard keywords.

How to get data from fcs to txt

Use FCSExtract

An .exe file from Earl F Glynn, works fine with Wine on Ubuntu. Easiest solution I found so far. If you want to do more with the data try R

Linear, logarithmic, logicle

The best way to display the data depends on how the data were generated. If the signal was linearly amplified (as is usual for forward and side scatters) than linear scale is used.

If the signal was logarithmically amplified than default choice is logarithmic (log) scale. However log scale can't display events around zero. To solve this problem several other scales started to be used in flow cytometry (logicle, hyperlog, biexponential). They allow transition from linear scaling around zero to log scaling at higher values. The big advantage is that the shape of distribution of near-zero events can be visualised.

This approach has also its downside: as it is data driven it differs from dataset to dataset and therefore comparisons can be difficult. If consistency is needed in absolute positioning of gates for negative/low events, then parameters of transforming function must be same across samples or log scale used. However relative gate positioning based on where the true biological population is located might be preferable anyway and that means logicle display again.

Comparison of logarithimc and logicle scaling

Comparison of log scale (left) and biexponential scale (right). On biexponential scale the gaussian-like distribution of negative cells is very clear. Also their median flourescence can be assesed much more accurately (close to zero, compare to "negative" population on log scale).

Logicle-like transformtions:

  • biexponential (I like how DiVa displays data, but that seems to be industrial top secret)
  • logicle?
  • arcsinh (Seems to be main for flowCore & Co., but sometimes a bit wierd.)
  • Hyperlog

It seems, that DiVa is guessing the parameters differently from Parks, et al.??

Note: If the transformation parameters are based on particular dataset, then the transform is best for that data. However this will vary across samples!

A new "Logicle" display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Parks DR, Roederer M, Moore WA. Cytometry A. 2006 Jun;69(6):541-51.

Hyperlog-a flexible log-like transform for negative, zero, and positive valued data. Bagwell CB. Cytometry A. 2005 Mar;64(1):34-42.

Pulse height, width, area

… record them all if you have the chance.

Different shape of pulse from single cell and doublet.

Cells passing through cytometer generate voltage pulses of characteristic shape (gaussian for round shaped cells). The ratio of area under the curve (integral) of the pulse to its width ( height) is different for single cells (upper panel) and doublets (lower panel). It can serve for exclusion of doublets from analysis.


Different instruments

  • FACSCalibur - FCS 2, beads (rainbow?) data from FACS Calibur cytometer (CELLQuest 3.3 software), 27075 events.
  • CyAn - FCS 3.0, rainbow beads data from CyAn cytometer (Summit V4.3.01 software), 10123 events.


Datasets (desription) used in FlowCAP-I 4 challenges (GvHD - part of flowCore).


three_popul - 6-dimensional data of three normaly distributed and overlaping "populations". The "populations" are one after each other in the file, each consisting of 700 datapoints. This dataset was used for algorithm comparison in the paper.

TODO: Need easy multidimensional (normal distributions of e.g. 25 parameters), big (100 000 events) and small (1000 events). To show, that even if big overlap populations can be separated if enough dimensions. So e.g. show how 5, 10, 25 parameters resolve populations.

TODO: Need difficult data, non-normal, tailed, skewed, (+/-bent), outliers, noise. Make it as a mixture of nice and "not-so-nice" populations. Small, but at least 500 (?) events per populations.




Protein arrays


Flow cytometry - proteomics on single cell level.

Flow plots