Coverage-based Datasets

Coverage-based Datasets contains calculated values corresponding to the (possibly normalized) number of reads at the coordinates along the chromosome. These values can be integers or floating point, and in EaSeq they are stored as single precision floating points and can be positive or negative numbers roughly in the range of 10E-38 to 10E38.

They are typically imported from wig or bedgraph (.bg) files. In EaSeq these Datasets takes up more memory, are slower, provides fewer options for normalization, gives values in arbitrary units, and have a reduced precision, so whenever possible we strongly encourage users to import read-based datasets instead.

Furthermore, in e.g. DNA methylation data, the values of the signal only correspond to the 2bp position of the potentially methylated CpG. Heatmaps and tracks will then be filled with large gaps in between the CpGs and the average values will be very low. In such an example, it is not possible to judge if a low value in a track is due to the absence of a CpG or due to the absence of methylation.

EaSeq does therefore offer to extend the values from a coordinate halfway to the next upstream and downstream coordinates to improve visualization, although this clearly gives a numerical and spatial bias. Again, whenever possible, we therefore recommend importing from read-based Datasets (e.g. bed files).

Coverage-based Datasets

Sign up for newsletter

Thank you for signing up!