Where to obtain data?

If you do not plan on analyzing your own data or would like to relate them to existing data, then Datasets from published papers are usually stored in a repository, such as the Gene Expression Omnibus (GEO) or the European Nucleotide Archive (ENA). However, repository data are frequently not mapped/aligned, and needs to be mapped first.

For Genesets, EaSeq has an integrated menu to download the most current Refseq annotation from UCSC. We recommend using this by default to get coordinates for gene features.

A good starting point for finding Regionsets would be to download it from the UCSC table browser or download e.g. a published peak set from GEO. Also Regionsets that correspond to a gene feature such as TSS or gene bodies, can be generated from a Geneset using the ‘Extract’ tool or identified from Datasets using the ‘Peakfind’ tool within EaSeq.

It is important to only relate data that are from the same reference genome, so that the coordinates are comparable. UCSC has an online tool to translate coordinates from one reference genome to another – but not all coordinates are transferrable, so it is best early on to use the same reference genome for all data that should be analyzed together.

Once you have imported the Datasets, Genesets, and/or Regionsets that you would like to work on, we recommend that you save them all as a session file. This will speed up loading later on and save space.

Where to obtain data?

Sign up for newsletter

Thank you for signing up!