I will try to explain what I know about heritability by using human height as example phenotype. This will be helpful not only because it is very tangible, but also because it is considered a “model” trait for quantitative genetics because it is easy to measure (Visscher, 2008). Height can be modeled as the sum of the contributions of a unobserved genotype G and an unobserved environment E: P = G + E (Visscher et al., 2008). This definition ignores both the genotype-environment covariation as well as genotype-environment interactions, which usually cannot be estimated. As G and E are different between individuals, height is also variable. Specifically, it has a standard deviation of 7 cm in different human populations. As before, we can decompose the variance in height σ2P as the fraction attributable to genetic factors and the fraction attributable to the environment:

σ2P = σ2G + σ2E

σ2P includes the total observed variation, usually excluding variation due to known fixed factors and covariates (age, sex, cohort…).

Heritability is a term that aims to describe the proportion of the phenotypic variance that is explained by the genotype. The heritability of height was assessed more than a century ago, when Galton and Fisher observed a pattern of family resemblance, consistent with a polygenic additive model of inheritance (Visscher, 2008). There are two definitions of heritability:

Narrow-sense heritability is the estimate that applies in most of situations, and hence, the most widely used. This is because non-additive effects such as dominance do not contribute to genotypic resemblance, as only one copy of each parent is transmitted to their offspring. In other words, narrow-sense heritability defines the proportion of phenotypic variability that can be passed from parents to offspring.

Estimation of heritability

Heritability is calculated from empirical data of related individuals, where we compare the expected and the observed resemblance. To estimate it, we need to calculate both the variance in additive genetic effects, and the phenotypic variance. The latter must be corrected by known fixed effects such as age, sex or cohort. For example, men and women have in average a 15 cm difference in height. If it was not taken into account, the estimated h2 would be 0.6 instead. It also must be taken into account that heritability is population specific, as both the variance in genetic factors and in environment are population-dependent. However, we have found heritabilities in similar traits to be similar across populations and even species.

When it is possible, it is estimated in simple and balanced experiments. For example, by regressing the offspring on parental phenotypes, the correlation of full or half siblings, and the difference in the correlation of monozygotic and dizygotic twin pairs (identical and non-identical, respectively). In cases where this is not possible, and we are stuck with individuals with a mixture of relationships, unbalanced designs, etc. heritability is estimated from linear mixed models.

The missing heritability problem

In the case of height, 80% of the intra-population variability in height is due to genetic factors (h2 = 0.8) (Visscher, 2008). Three studies with a total sample size of 63k individuals (14k + 16k + 34k) identified 54 variants that are reliably associated with height. However, they only get to explain 5% of the phenotypic variance (Manolio et al., 2009). The rest of the estimated genetic component is something alike “dark matter”: we know it is there, as we can see its effects, we just cannot see it nor know what it is exactly. This situation is actually common to many complex phenotypes, and has been called the missing heritability problem. There are several candidates out there to explain this dark matter: a larger than expected number of variants with mild effects (difficult to find in underpowered setting of GWAS); rarer variants with strong effects; but also an overestimation of h2 by unconsidered epigenetics, dominance and epistatic effects; and even an inadequate accounting of shared environment among relatives. Of course, different traits will have different allelic architectures (number of variants, effects, type…). But even then, our knowledge is too limited to really assess it. The enigma of missing heritability is especially relevant when we deal with complex diseases.

Of the main suspects of missing heritability, variants not covered by standard experiment designs are regarded as the most likely cause. Although candidates like structural variants have been proposed, rare SNPs (MAF < 0.05) are the option that gathers more support. They are often not included in the genotyping arrays, while their mild effect sizes (say, odds-ratios of 2-3) make them difficult to find in classical linkage analyses. Increased sample sizes and more exhaustive arrays could potentially help find these variants.