Understand output

Overview

Naming output files
  • Each output filename can be split into: outstem, function name, individual file ID, and file extension, which are joined by underscores '_'. The first part outstem is specified via keyword outstem.
  • logfile: outstem*"_"*funcname*".log"
  • genofile: outstem*"_"*funcname*"_geno.vcf.gz"
  • pedfile: outstem*"_"*funcname*"_ped.csv"

The RABBIT output files can be categorized into four groups:

  • logfile. Information include input argument values, output filenames, and computational time.
  • genofile and pedfile. See the above flowchart and the section Prepare Inputs.
  • PNG plot files. See section Pipeline
  • CSV text files (except pedfile).

The main output files of MagicSimulate, MagicFilter, MagicCall, and MagicImpute are genofile and/or pedfile, which have the same formats as the input files. Most CSV output files contain a single table, each column being explained in the beginning comment lines (starting with "##").

In the following, we will mainly describe the three CSV text files that contain multiple tables: ldfile and linkagefile resulting from MagicMap and ancestryfile resulting from MagicReconstruct. In each of these files, each table starts with a row including only two cells: "RABBIT" and table name, followed by a header row and then content rows.

MagicMap

ldfile

MagicMap performs pairwise linkage disequilibrium (LD) analyses via MagicLD.magicld. If marker binning is performed (isbinning = true), LD analyses are only for the representative markers from each bin. The output ldfile (outstem*"_magicmap_magicld.csv.gz) contains three tables:

  • offspringinfo: same as the offspringinfo of pedfile
  • markerinfo: table with each row denoting a (representative) marker. It consists of 5 columns:
    • markerno: row index starting from 1
    • marker_represent: representative marker ID
    • physchrom_represent: physical map chromosome ID for the representative marker
    • physposbp_represent: physical map position (in base pair) for the representative marker
    • ld_bin: list of marker IDs in the bin containing the representative marker. "NA" if isbinning = false
  • pairwiseld: table with each row denoting LD analysis for two representative markers. It consists of 4 columns:
    • marker1: index of 1st representative marker, corresponding to markerno in table markerinfo
    • marker2: index of 2nd representative marker, corresponding to markerno in table markerinfo
    • ld_r2: squared allelic correlation between the two representative markers
    • ld_lod: LOD score of the LD analysis for the two representative markers

The results are saved only if ld_r2 >= minldsave and ld_lod >= minlodsave. By default, the values of keyargs minldsave and minlodsave are nothing and they are reset internally by MagicLD.magicld according to the number of (representative) markers; see the logfile for the reset values.

linkagefile

MagicMap performs pairwise linkage analyses via MagicLinkage.magiclinkage for the pairs of (representative) marker in the ldfile. The output linkagefile (outstem*"_magicmap_magiclinkage.csv.gz) contains three tables:

  • offspringinfo: same as the offspringinfo of pedfile
  • markerinfo: table with each row denoting a (representative) marker. It consists of 5 columns:
    • markerno: row index starting from 1
    • marker: marker ID
    • physchrom: physical map chromosome ID for the marker
    • physposbp: physical map position (in base pair) for the marker
    • nmissing: number of missing genotypes at the marker
  • pairwiselinkage: table with each row denoting linkage analysis for two markers. It consists of 4 columns:
    • marker1: index of 1st marker, corresponding to markerno in table markerinfo
    • marker2: index of 2nd marker, corresponding to markerno in table markerinfo
    • linkage_rf: recombination fraction between the two markers (scaled from 0 to 1)
    • linkage_lod: LOD score of the linkage analysis for the two markers

The results are saved only if linkage_rf >= maxrfsave and linkage_lod >= minlodsave. By default, the value of keyarg maxrfsave and minlodsave are nothing and they are reset internally by MagicLinkage.magiclinkage according to the number of (representative) markers; see the logfile for the reset values.

MagicReconstruct

ancestryfile

The output ancestryfile (outstem*"_magicreconstruct_ancestry.csv.gz") is a CSV file containing 13 tables:

  • designinfo: same as the designinfo of pedfile
  • founderinfo: founder information.
    • individual: founder ID
    • gender: "notapplicable"
  • offspringinfo: same as the offspringinfo of pedfile
  • foundergeno: genotypic data for founders. Columns 1-13 are the same as those in refinedmapfile, and each of the rest columns is genotypic data for each founder.
  • haplotype: ancestral haplotype states
    • haplotypeindex: row index starting from 1
    • haploindex: haplotype index, same as haplotypeindex
    • haplotype: founder genomic labels. If a founder is inbred, its label is the founderID. If a founder is outbred, it has two labels: founderID_1 and founderID_2 for each of the two gametes.
  • genotype: ancestral genotype states
    • genotypeindex: row index starting from 1
    • genoindex: un-ordered combinations of haploindex in table haplotype
    • genotype: un-ordered combinations of haplotype in table haplotype
  • diplotype: ancestral diplotype states
    • diplotypeindex: row index starting from 1
    • diploindex: ordered combinations of haploindex in table haplotype
    • diplotype: ordered combinations of haplotype in table haplotype
  • inbredcoef: inbreeding coefficient
    • linkagegroup: linkage group ID
    • offspring: offspring ID
    • inbredcoef: inbreeding coefficients at all markers in the linkage group for the offspring
  • loglike: log likelihood for each offspring in each chromosome
    • linkagegroup: linkage group ID
    • 1st offpsringID: marginal log likelihood for markers in the chromosome for the offspring
    • ...
    • last offpsringID

There are 4 more tables: haploprob, genoprob, diploprob, and viterbipath, which will be described in the following.

ancestryfile from forward-backward

In the ancestryfile resulting from the forward-backward HMM algorithm (hmmalg="forwardbackward"), the main results are saved in tables haploprob, genoprob, and diploprob, and table viterbipath is nothing.

  • haploprob: haplotype probablity
    • linkagegroup: linkage group ID
    • offspring: offpsring ID
    • nmarker: number of rows (markers)
    • nhaplotype: number of columns (haplotypes)
    • markerindex: row index, index for marker in table foundergeno
    • haplotypeindex: column index, see table haplotype.
    • haploprob: non-zero haplotype probability, rounded to 4 digits after the decimal place.
  • genoprob: genotype probablity. It is nothing if model = "depmodel"
    • linkagegroup: linkage group ID
    • offspring: offpsring ID
    • nmarker: number of rows (markers)
    • ngenotype: number of columns (genotypes)
    • markerindex: row index, index for marker in Table foundergeno
    • genotypeindex: column index, see Table genotype.
    • genoprob: non-zero genotype probability, rounded to 4 digits after the decimal place.
  • diploprob: diplotype probablity. It is nothing if model = "depmodel"
    • linkagegroup: linkage group ID
    • offspring: offpsring ID
    • nmarker: number of rows (markers)
    • ndiplotype: number of columns (diplotypes)
    • markerindex: row index, index for marker in Table foundergeno
    • diplotypeindex: column index, see Table diplotype.
    • diploprob: non-zero diplot probability, rounded to 4 digits after the decimal place.
condprob
  • Each row denotes a sparse matrix of posterior probabilities for the offspring in the linkage group. Note that an offspring from a biparental sub-population has at most two non-zero probabilities, despite of many founders for the whole population.

For example, consider an offspring produced from 4 inbred founders, the haploprob for 8 markers in chr1 looks like

RABBIT, haploprob
linkagegroup, offspring, nmarker, nhaplotype, markerindex, haplotypeindex, haploprob
LG1, offspring1, 8, 4, 3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8, 1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4, 0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09
...
using SparseArrays
A = [0.0 0.85 0.15 0.0
     0.0 0.9 0.0 0.1
     0.01 0.9 0.0 0.09
     0.0  0.82 0.18 0.0
     0.0  0.0  0.88 0.12
     0.08  0.02 0.9 0.0
     0.0   0.04 0.9  0.06
     0.06  0.0 0.85 0.09
     ]
B = sparse(A)
Is,Js, Vs = findnz(B)
m, n = size(B)
println(string("nmarker, nhaplotype = ", (m,n), "\n",
  "markerindex=",join(Is,"|"),"\n",
  "haplotypeindex=",join(Js,"|"), "\n",
  "haploprob=",join(Vs,"|")))
nmarker, nhaplotype = (8, 4)
markerindex=3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8
haplotypeindex=1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4
haploprob=0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09
sparse(Is,Js,Vs,m,n)
8×4 SparseArrays.SparseMatrixCSC{Float64, Int64} with 20 stored entries:
  ⋅    0.85  0.15   ⋅ 
  ⋅    0.9    ⋅    0.1
 0.01  0.9    ⋅    0.09
  ⋅    0.82  0.18   ⋅ 
  ⋅     ⋅    0.88  0.12
 0.08  0.02  0.9    ⋅ 
  ⋅    0.04  0.9   0.06
 0.06   ⋅    0.85  0.09

ancestryfile from Viterbi

In the ancestryfile resulting from the Viterbi HMM algorithm (hmmalg=viterbi), the main results are saved in table viterbipath, and tables haploprob, genoprob, and diploprob are nothing.

Table viterbipath
  • Table viterbipath has two columns: individual and viterbipath.
  • Each row denotes the viterbi paths for all linkage groups in an offspring; paths between linkage groups are joined by "|".
  • The viterbi path for a linkage group in an offspring is in the form of m(1)-s(1)-m(2)-s(2)...m(k)-s(k), where m(1), m(2),..., m(k) denote the marker indices, and s(1), s(2), ..., s(k) denote the indices of HMM hidden states. Note that it always holds that m(1)=1 and m(k) = 1 + number of markers in the linkage group. The i-th segment has state s(i) that starts at m(i) and ends at m(i+1) -1.
  • If model = depmodel, s(i) refers to haplotypeindex, and it refers to diplotypeindex for model = jointmodel or indepmodel.

For example, consider an offspring produced from 4 inbred parents that has two linkage groups.

RABBIT, viterbipath
individual, viterbipath
offspring1, 1-12-96-11-101-16-122|1-16-143
...

The viterbi path for the 1st linkage group is 1-12-96-11-101-16-122, meaning that segment 1 has state 12 from marker indices 1 to 95, segment 2 has state 11 from marker indices 96 to 100, and segment 3 has state 16 from marker indices 101 to 121.

The viterbi path for the 2nd linkage group is 1-16-143, a single segment has state 16 for all the 142 markers.