Understand output
Overview
- Each output filename can be split into: outstem, function name, individual file ID, and file extension, which are joined by underscores '_'. The first part outstem is specified via keyword
outstem
. logfile
:outstem*"_"*funcname*".log"
genofile
:outstem*"_"*funcname*"_geno.vcf.gz"
pedfile
:outstem*"_"*funcname*"_ped.csv"
The RABBIT output files can be categorized into four groups:
logfile
. Information include input argument values, output filenames, and computational time.genofile
andpedfile
. See the above flowchart and the sectionPrepare Inputs
.- PNG plot files. See section
Pipeline
- CSV text files (except
pedfile
).
The main output files of MagicSimulate, MagicFilter, MagicCall, and MagicImpute are genofile
and/or pedfile
, which have the same formats as the input files. Most CSV output files contain a single table, each column being explained in the beginning comment lines (starting with "##").
In the following, we will mainly describe the three CSV text files that contain multiple tables: ldfile
and linkagefile
resulting from MagicMap and ancestryfile
resulting from MagicReconstruct. In each of these files, each table starts with a row including only two cells: "RABBIT" and table name, followed by a header row and then content rows.
MagicMap
ldfile
MagicMap
performs pairwise linkage disequilibrium (LD) analyses via MagicLD.magicld
. If marker binning is performed (isbinning = true
), LD analyses are only for the representative markers from each bin. The output ldfile
(outstem*"_magicmap_magicld.csv.gz
) contains three tables:
offspringinfo
: same as theoffspringinfo
ofpedfile
markerinfo
: table with each row denoting a (representative) marker. It consists of 5 columns:markerno
: row index starting from 1marker_represent
: representative marker IDphyschrom_represent
: physical map chromosome ID for the representative markerphysposbp_represent
: physical map position (in base pair) for the representative markerld_bin
: list of marker IDs in the bin containing the representative marker. "NA" ifisbinning = false
pairwiseld
: table with each row denoting LD analysis for two representative markers. It consists of 4 columns:marker1
: index of 1st representative marker, corresponding tomarkerno
in tablemarkerinfo
marker2
: index of 2nd representative marker, corresponding tomarkerno
in tablemarkerinfo
ld_r2
: squared allelic correlation between the two representative markersld_lod
: LOD score of the LD analysis for the two representative markers
The results are saved only if ld_r2 >= minldsave
and ld_lod >= minlodsave
. By default, the values of keyargs minldsave
and minlodsave
are nothing and they are reset internally by MagicLD.magicld
according to the number of (representative) markers; see the logfile for the reset values.
linkagefile
MagicMap
performs pairwise linkage analyses via MagicLinkage.magiclinkage
for the pairs of (representative) marker in the ldfile
. The output linkagefile
(outstem*"_magicmap_magiclinkage.csv.gz
) contains three tables:
offspringinfo
: same as theoffspringinfo
ofpedfile
markerinfo
: table with each row denoting a (representative) marker. It consists of 5 columns:markerno
: row index starting from 1marker
: marker IDphyschrom
: physical map chromosome ID for the markerphysposbp
: physical map position (in base pair) for the markernmissing
: number of missing genotypes at the marker
pairwiselinkage
: table with each row denoting linkage analysis for two markers. It consists of 4 columns:marker1
: index of 1st marker, corresponding tomarkerno
in tablemarkerinfo
marker2
: index of 2nd marker, corresponding tomarkerno
in tablemarkerinfo
linkage_rf
: recombination fraction between the two markers (scaled from 0 to 1)linkage_lod
: LOD score of the linkage analysis for the two markers
The results are saved only if linkage_rf >= maxrfsave
and linkage_lod >= minlodsave
. By default, the value of keyarg maxrfsave
and minlodsave
are nothing and they are reset internally by MagicLinkage.magiclinkage
according to the number of (representative) markers; see the logfile for the reset values.
MagicReconstruct
ancestryfile
The output ancestryfile
(outstem*"_magicreconstruct_ancestry.csv.gz"
) is a CSV file containing 13 tables:
designinfo
: same as thedesigninfo
ofpedfile
founderinfo
: founder information.individual
: founder IDgender
: "notapplicable"
offspringinfo
: same as theoffspringinfo
ofpedfile
foundergeno
: genotypic data for founders. Columns 1-13 are the same as those inrefinedmapfile
, and each of the rest columns is genotypic data for each founder.haplotype
: ancestral haplotype stateshaplotypeindex
: row index starting from 1haploindex
: haplotype index, same ashaplotypeindex
haplotype
: founder genomic labels. If a founder is inbred, its label is the founderID. If a founder is outbred, it has two labels: founderID_1
and founderID_2
for each of the two gametes.
genotype
: ancestral genotype statesgenotypeindex
: row index starting from 1genoindex
: un-ordered combinations ofhaploindex
in tablehaplotype
genotype
: un-ordered combinations ofhaplotype
in tablehaplotype
diplotype
: ancestral diplotype statesdiplotypeindex
: row index starting from 1diploindex
: ordered combinations ofhaploindex
in tablehaplotype
diplotype
: ordered combinations ofhaplotype
in tablehaplotype
inbredcoef
: inbreeding coefficientlinkagegroup
: linkage group IDoffspring
: offspring IDinbredcoef
: inbreeding coefficients at all markers in the linkage group for the offspring
loglike
: log likelihood for each offspring in each chromosomelinkagegroup
: linkage group ID- 1st offpsringID: marginal log likelihood for markers in the chromosome for the offspring
- ...
- last offpsringID
There are 4 more tables: haploprob
, genoprob
, diploprob
, and viterbipath
, which will be described in the following.
ancestryfile from forward-backward
In the ancestryfile
resulting from the forward-backward HMM algorithm (hmmalg="forwardbackward"
), the main results are saved in tables haploprob
, genoprob
, and diploprob
, and table viterbipath
is nothing.
haploprob
: haplotype probablitylinkagegroup
: linkage group IDoffspring
: offpsring IDnmarker
: number of rows (markers)nhaplotype
: number of columns (haplotypes)markerindex
: row index, index formarker
in tablefoundergeno
haplotypeindex
: column index, see tablehaplotype
.haploprob
: non-zero haplotype probability, rounded to 4 digits after the decimal place.
genoprob
: genotype probablity. It is nothing ifmodel = "depmodel"
linkagegroup
: linkage group IDoffspring
: offpsring IDnmarker
: number of rows (markers)ngenotype
: number of columns (genotypes)markerindex
: row index, index formarker
in Tablefoundergeno
genotypeindex
: column index, see Tablegenotype
.genoprob
: non-zero genotype probability, rounded to 4 digits after the decimal place.
diploprob
: diplotype probablity. It is nothing ifmodel = "depmodel"
linkagegroup
: linkage group IDoffspring
: offpsring IDnmarker
: number of rows (markers)ndiplotype
: number of columns (diplotypes)markerindex
: row index, index formarker
in Tablefoundergeno
diplotypeindex
: column index, see Tablediplotype
.diploprob
: non-zero diplot probability, rounded to 4 digits after the decimal place.
- Each row denotes a sparse matrix of posterior probabilities for the offspring in the linkage group. Note that an offspring from a biparental sub-population has at most two non-zero probabilities, despite of many founders for the whole population.
For example, consider an offspring produced from 4 inbred founders, the haploprob for 8 markers in chr1 looks like
RABBIT, haploprob
linkagegroup, offspring, nmarker, nhaplotype, markerindex, haplotypeindex, haploprob
LG1, offspring1, 8, 4, 3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8, 1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4, 0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09
...
using SparseArrays
A = [0.0 0.85 0.15 0.0
0.0 0.9 0.0 0.1
0.01 0.9 0.0 0.09
0.0 0.82 0.18 0.0
0.0 0.0 0.88 0.12
0.08 0.02 0.9 0.0
0.0 0.04 0.9 0.06
0.06 0.0 0.85 0.09
]
B = sparse(A)
Is,Js, Vs = findnz(B)
m, n = size(B)
println(string("nmarker, nhaplotype = ", (m,n), "\n",
"markerindex=",join(Is,"|"),"\n",
"haplotypeindex=",join(Js,"|"), "\n",
"haploprob=",join(Vs,"|")))
nmarker, nhaplotype = (8, 4)
markerindex=3|6|8|1|2|3|4|6|7|1|4|5|6|7|8|2|3|5|7|8
haplotypeindex=1|1|1|2|2|2|2|2|2|3|3|3|3|3|3|4|4|4|4|4
haploprob=0.01|0.08|0.06|0.85|0.9|0.9|0.82|0.02|0.04|0.15|0.18|0.88|0.9|0.9|0.85|0.1|0.09|0.12|0.06|0.09
sparse(Is,Js,Vs,m,n)
8×4 SparseArrays.SparseMatrixCSC{Float64, Int64} with 20 stored entries:
⋅ 0.85 0.15 ⋅
⋅ 0.9 ⋅ 0.1
0.01 0.9 ⋅ 0.09
⋅ 0.82 0.18 ⋅
⋅ ⋅ 0.88 0.12
0.08 0.02 0.9 ⋅
⋅ 0.04 0.9 0.06
0.06 ⋅ 0.85 0.09
ancestryfile from Viterbi
In the ancestryfile
resulting from the Viterbi HMM algorithm (hmmalg=viterbi), the main results are saved in table viterbipath
, and tables haploprob
, genoprob
, and diploprob
are nothing.
- Table viterbipath has two columns:
individual
andviterbipath
. - Each row denotes the viterbi paths for all linkage groups in an offspring; paths between linkage groups are joined by "|".
- The viterbi path for a linkage group in an offspring is in the form of m(1)-s(1)-m(2)-s(2)...m(k)-s(k), where m(1), m(2),..., m(k) denote the marker indices, and s(1), s(2), ..., s(k) denote the indices of HMM hidden states. Note that it always holds that m(1)=1 and m(k) = 1 + number of markers in the linkage group. The i-th segment has state s(i) that starts at m(i) and ends at m(i+1) -1.
- If model = depmodel, s(i) refers to haplotypeindex, and it refers to diplotypeindex for model = jointmodel or indepmodel.
For example, consider an offspring produced from 4 inbred parents that has two linkage groups.
RABBIT, viterbipath
individual, viterbipath
offspring1, 1-12-96-11-101-16-122|1-16-143
...
The viterbi path for the 1st linkage group is 1-12-96-11-101-16-122, meaning that segment 1 has state 12 from marker indices 1 to 95, segment 2 has state 11 from marker indices 96 to 100, and segment 3 has state 16 from marker indices 101 to 121.
The viterbi path for the 2nd linkage group is 1-16-143, a single segment has state 16 for all the 142 markers.