Understand output

Overview

Naming output files

Each output filename can be split into: outstem, function name, individual file ID, and file extension, which are joined by underscores '_'. The first part outstem is specified via keyword outstem.
logfile: outstem*"_"*funcname*".log"
genofile: outstem*"_"*funcname*"_geno.vcf.gz"
pedfile: outstem*"_"*funcname*"_ped.csv"

The RABBIT output files can be categorized into four groups:

logfile. Information include input argument values, output filenames, and computational time.
genofile and pedfile. See the above flowchart and the section Prepare Inputs.
PNG plot files. See section Pipeline
CSV text files (except pedfile).

The main output files of MagicSimulate, MagicFilter, MagicCall, and MagicImpute are genofile and/or pedfile, which have the same formats as the input files. Most CSV output files contain a single table, each column being explained in the beginning comment lines (starting with "##").

In the following, we will mainly describe the three CSV text files that contain multiple tables: ldfile and linkagefile resulting from MagicMap and ancestryfile resulting from MagicReconstruct. In each of these files, each table starts with a row including only two cells: "RABBIT" and table name, followed by a header row and then content rows.

MagicMap

ldfile

MagicMap performs pairwise linkage disequilibrium (LD) analyses via MagicLD.magicld. If marker binning is performed (isbinning = true), LD analyses are only for the representative markers from each bin. The output ldfile (outstem*"_magicmap_magicld.csv.gz) contains three tables:

offspringinfo: same as the offspringinfo of pedfile
markerinfo: table with each row denoting a (representative) marker. It consists of 5 columns:
- markerno: row index starting from 1
- marker_represent: representative marker ID
- physchrom_represent: physical map chromosome ID for the representative marker
- physposbp_represent: physical map position (in base pair) for the representative marker
- ld_bin: list of marker IDs in the bin containing the representative marker. "NA" if isbinning = false
pairwiseld: table with each row denoting LD analysis for two representative markers. It consists of 4 columns:
- marker1: index of 1st representative marker, corresponding to markerno in table markerinfo
- marker2: index of 2nd representative marker, corresponding to markerno in table markerinfo
- ld_r2: squared allelic correlation between the two representative markers
- ld_lod: LOD score of the LD analysis for the two representative markers

The results are saved only if ld_r2 >= minldsave and ld_lod >= minlodsave. By default, the values of keyargs minldsave and minlodsave are nothing and they are reset internally by MagicLD.magicld according to the number of (representative) markers; see the logfile for the reset values.

linkagefile

MagicMap performs pairwise linkage analyses via MagicLinkage.magiclinkage for the pairs of (representative) marker in the ldfile. The output linkagefile (outstem*"_magicmap_magiclinkage.csv.gz) contains three tables:

offspringinfo: same as the offspringinfo of pedfile
markerinfo: table with each row denoting a (representative) marker. It consists of 5 columns:
- markerno: row index starting from 1
- marker: marker ID
- physchrom: physical map chromosome ID for the marker
- physposbp: physical map position (in base pair) for the marker
- nmissing: number of missing genotypes at the marker
pairwiselinkage: table with each row denoting linkage analysis for two markers. It consists of 4 columns:
- marker1: index of 1st marker, corresponding to markerno in table markerinfo
- marker2: index of 2nd marker, corresponding to markerno in table markerinfo
- linkage_rf: recombination fraction between the two markers (scaled from 0 to 1)
- linkage_lod: LOD score of the linkage analysis for the two markers

The results are saved only if linkage_rf >= maxrfsave and linkage_lod >= minlodsave. By default, the value of keyarg maxrfsave and minlodsave are nothing and they are reset internally by MagicLinkage.magiclinkage according to the number of (representative) markers; see the logfile for the reset values.

MagicReconstruct

ancestryfile

The output ancestryfile (outstem*"_magicreconstruct_ancestry.csv.gz") is a CSV file containing 13 tables:

designinfo: same as the designinfo of pedfile
founderinfo: founder information.
- individual: founder ID
- gender: "notapplicable"
offspringinfo: same as the offspringinfo of pedfile
foundergeno: genotypic data for founders. Columns 1-13 are the same as those in refinedmapfile, and each of the rest columns is genotypic data for each founder.
haplotype: ancestral haplotype states
- haplotypeindex: row index starting from 1
- haploindex: haplotype index, same as haplotypeindex
- haplotype: founder genomic labels. If a founder is inbred, its label is the founderID. If a founder is outbred, it has two labels: founderID_1 and founderID_2 for each of the two gametes.
genotype: ancestral genotype states
- genotypeindex: row index starting from 1
- genoindex: un-ordered combinations of haploindex in table haplotype
- genotype: un-ordered combinations of haplotype in table haplotype
diplotype: ancestral diplotype states
- diplotypeindex: row index starting from 1
- diploindex: ordered combinations of haploindex in table haplotype
- diplotype: ordered combinations of haplotype in table haplotype
inbredcoef: inbreeding coefficient
- linkagegroup: linkage group ID
- offspring: offspring ID
- inbredcoef: inbreeding coefficients at all markers in the linkage group for the offspring
loglike: log likelihood for each offspring in each chromosome
- linkagegroup: linkage group ID
- 1st offpsringID: marginal log likelihood for markers in the chromosome for the offspring
- ...
- last offpsringID

There are 4 more tables: haploprob, genoprob, diploprob, and viterbipath, which will be described in the following.

ancestryfile from forward-backward

In the ancestryfile resulting from the forward-backward HMM algorithm (hmmalg="forwardbackward"), the main results are saved in tables haploprob, genoprob, and diploprob, and table viterbipath is nothing.

haploprob: haplotype probablity
- linkagegroup: linkage group ID
- offspring: offpsring ID
- nmarker: number of rows (markers)
- nhaplotype: number of columns (haplotypes)
- markerindex: row index, index for marker in table foundergeno
- haplotypeindex: column index, see table haplotype.
- haploprob: non-zero haplotype probability, rounded to 4 digits after the decimal place.
genoprob: genotype probablity. It is nothing if model = "depmodel"
- linkagegroup: linkage group ID
- offspring: offpsring ID
- nmarker: number of rows (markers)
- ngenotype: number of columns (genotypes)
- markerindex: row index, index for marker in Table foundergeno
- genotypeindex: column index, see Table genotype.
- genoprob: non-zero genotype probability, rounded to 4 digits after the decimal place.
diploprob: diplotype probablity. It is nothing if model = "depmodel"
- linkagegroup: linkage group ID
- offspring: offpsring ID
- nmarker: number of rows (markers)
- ndiplotype: number of columns (diplotypes)
- markerindex: row index, index for marker in Table foundergeno
- diplotypeindex: column index, see Table diplotype.
- diploprob: non-zero diplot probability, rounded to 4 digits after the decimal place.

condprob

Each row denotes a sparse matrix of posterior probabilities for the offspring in the linkage group. Note that an offspring from a biparental sub-population has at most two non-zero probabilities, despite of many founders for the whole population.

For example, consider an offspring produced from 4 inbred founders, the haploprob for 8 markers in chr1 looks like

RABBIT, haploprob
linkagegroup, offspring, nmarker, nhaplotype, markerindex, haplotypeindex, haploprob
LG1, offspring1, 8, 4, 3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8, 1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4, 0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09
...

using SparseArrays
A = [0.0 0.85 0.15 0.0
     0.0 0.9 0.0 0.1
     0.01 0.9 0.0 0.09
     0.0  0.82 0.18 0.0
     0.0  0.0  0.88 0.12
     0.08  0.02 0.9 0.0
     0.0   0.04 0.9  0.06
     0.06  0.0 0.85 0.09
     ]
B = sparse(A)
Is,Js, Vs = findnz(B)
m, n = size(B)
println(string("nmarker, nhaplotype = ", (m,n), "\n",
  "markerindex=",join(Is,"|"),"\n",
  "haplotypeindex=",join(Js,"|"), "\n",
  "haploprob=",join(Vs,"|")))

nmarker, nhaplotype = (8, 4)
markerindex=3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8
haplotypeindex=1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4
haploprob=0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09

sparse(Is,Js,Vs,m,n)

8×4 SparseArrays.SparseMatrixCSC{Float64, Int64} with 20 stored entries:
  ⋅    0.85  0.15   ⋅ 
  ⋅    0.9    ⋅    0.1
 0.01  0.9    ⋅    0.09
  ⋅    0.82  0.18   ⋅ 
  ⋅     ⋅    0.88  0.12
 0.08  0.02  0.9    ⋅ 
  ⋅    0.04  0.9   0.06
 0.06   ⋅    0.85  0.09

ancestryfile from Viterbi

In the ancestryfile resulting from the Viterbi HMM algorithm (hmmalg=viterbi), the main results are saved in table viterbipath, and tables haploprob, genoprob, and diploprob are nothing.

Table viterbipath

Table viterbipath has two columns: individual and viterbipath.
Each row denotes the viterbi paths for all linkage groups in an offspring; paths between linkage groups are joined by "|".
The viterbi path for a linkage group in an offspring is in the form of m(1)-s(1)-m(2)-s(2)...m(k)-s(k), where m(1), m(2),..., m(k) denote the marker indices, and s(1), s(2), ..., s(k) denote the indices of HMM hidden states. Note that it always holds that m(1)=1 and m(k) = 1 + number of markers in the linkage group. The i-th segment has state s(i) that starts at m(i) and ends at m(i+1) -1.
If model = depmodel, s(i) refers to haplotypeindex, and it refers to diplotypeindex for model = jointmodel or indepmodel.

For example, consider an offspring produced from 4 inbred parents that has two linkage groups.

RABBIT, viterbipath
individual, viterbipath
offspring1, 1-12-96-11-101-16-122|1-16-143
...

The viterbi path for the 1st linkage group is 1-12-96-11-101-16-122, meaning that segment 1 has state 12 from marker indices 1 to 95, segment 2 has state 11 from marker indices 96 to 100, and segment 3 has state 16 from marker indices 101 to 121.

The viterbi path for the 2nd linkage group is 1-16-143, a single segment has state 16 for all the 142 markers.