Collagens are members of one of the most important families of structural
proteins in higher organisms. There are 28 types of
collagens encoded by 43 genes in humans that fall into several different functional
protein classes. Mutations in the major
fibrillar collagen genes lead to
osteogenesis imperfecta (COL1A1 and
COL1A2 encoding the chains of
Type I collagen), chondrodysplasias (COL2A1 encoding the chains of
Type II collagen), and
vascular Ehlers-Danlos syndrome (COL3A1 encoding the chains of
Type III collagen). Over the past 2 decades, mutations in these
collagen genes have been catalogued, in hopes of understanding the molecular etiology of diseases caused by these mutations, characterizing the genotype-phenotype relationships, and developing robust models predicting the molecular and clinical outcomes. To achieve these goals better, it is necessary to understand the natural patterns of variation in
collagen genes in human populations. We screened exons, flanking intronic regions, and conserved noncoding regions for variations in COL1A1,
COL1A2, COL2A1, and COL3A1 in 48 individuals from each of four ethnically diverse populations. We identified 459 single-nucleotide polymorphisms (SNPs), more than half of which were novel and not found in public databases. Of the 52 SNPs found in coding regions, 15 caused amino acid substitutions while 37 did not. Although the four
collagens have similar gene and
protein structures, they have different molecular evolutionary characteristics. For example, COL1A1 appears to have been under substantially stronger negative selection than the rest. Phylogenetic analysis also suggests that the four genes have very different evolutionary histories among the different ethnic groups. Our observations suggest that the study of
collagen mutations and their relationships with disease phenotypes should be performed in the context of the genetic background of the subjects.