Statistical framework

See Zheng et al. 2014, Zheng 2015 , Zheng et al. 2015, and Zheng et al. 2018 for the detailed description on the Hidden Markov Model (HMM) framework of RABBIT. See Zheng et al. 2018(2) and Zheng et al. 2024 for the description on the algorithm of genotype imputation. See Zheng et al. 2019 and Zheng et al. 2025(2) for the description on the algorithm of genetic map construction.

The RABBIT HMM framework consists of two basic components: hidden Markov process and genotype data model.

Ancestral origin process

The hidden Markov process refers to the prior ancestral origin process, describing how ancestral origins change along two homologous chromosomes in a diploid offspring.

RABBIT has a keyarg model for specifying the dependence of the prior ancestral origin processes between two homologous chromosomes. It must be "depmodel", "indepmodel", or "jointmodel", denoting complete dependence, complete independence, or intermediate dependence, respectively. RABBIT uses the general "jointmodel" by default. It is recommended for magicimpute to use "depmodel" for almost homozygous populations, which would be much faster than the default "jointmodel".

Genotype data model

The Genotype data model describes the (emission) probability of observed genotypic data given hidden ancestral origin state, and it varies with genotype format. See [Zheng et al. 2025] for the detailed description on the data model.

Discrete genotype (GT)

The data model for "GT" has a parameter describing the allelic error rate, that is, the probability of an error occurring on one allele. If an error occurs on an allele, it will result in the other allele.

RABBIT introduces two likelihood parameters for "GT": foundererror and offspringerror, denoting the allelic error rates for founders and offspring, respectively.

Allelic depeth (AD)

Sequence reads are assumed to be generated by two steps: (1) true genotypes are mis-aligned using the random allelic error model with the allelic error rates foundererror and offspringerror, and (2) conditional on mis-aligned genotypes, sequence reads are sampled with parameters baseerror, allelicbias, allelicoverdispersion, and allelicdropout.

Referneces

Zheng, Chaozhi, Martin P Boer, and Fred A Van Eeuwijk. 2014. “A General Modeling Framework for Genome Ancestral Origins in Multiparental Populations.” Genetics 198 (1): 87–101. https://doi.org/10.1534/genetics.114.163006.

———. 2015. “Reconstruction of Genome Ancestry Blocks in Multiparental Populations.” Genetics 200 (4): 1073–87. https://doi.org/10.1534/genetics.115.177873.

———. 2018. “Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree.” G3 Genes|Genomes|Genetics 8 (10): 3231–45. https://doi.org/10.1534/G3.118.200340.

———. 2018(2). "Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence". Genetics 210 (1): 71-82. https://doi.org/10.1534/genetics.118.300885

———. 2019. "Construction of Genetic Linkage Maps in Multiparental Populations". Genetics 212 (4): 1031-1044. https://doi.org/10.1534/genetics.119.302229

Zheng et al. 2025. “Genotype imputation and error estimation in connected multiparental populations.” In preparation.

Zheng et al. 2025(2). "Efficient consensus map construction in connected multiparental populations.” In preparation.