Hippocamplus My Second Memory

Segmental duplication exploration

Segmental Duplications (SD)

I downloaded the segmental duplication annotation for hg19 from UCSC. There are 51599 annotated SD. They are defined as regions larger than 1 Kbp with at least 90% similarity with another region in the genome.

Segmental duplication regions

Many SD are nested of located next to each other. I merge overlapping SDs (or located at <10 bp) to create SD regions, i.e. longer stretch of the genome overlapping SDs.

There are 7620 SD regions, that account for 166.1 Mbp of the genome.

Size distribution

Similarity

Chromosome distribution

A few chromosomes are more enriched in SD. Some have long stretches of SD, e.g. chr 9 or chr Y. These peaks are mostly created with very recent/similar SDs.

Distance to the other segment

otherSeg segdup segdup.prop
different chr 32740 0.635
same chr 18859 0.365

For the majority of SDs, the similar region is in a different chromosome. For the others the majority are far from each other.

Gene content

I downloaded Gencode v19 at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz.

Around a thousand protein-coding genes are completely within SD regions.

gene_type gene
pseudogene 3968
protein_coding 958
lincRNA 641
miRNA 301
antisense 181
snRNA 179
IG_V_pseudogene 97
misc_RNA 95
IG_V_gene 86
rRNA 62
processed_transcript 60
sense_intronic 29
TR_V_gene 25
snoRNA 23
IG_D_gene 15
IG_C_gene 9
polymorphic_pseudogene 8
TR_V_pseudogene 7
TR_J_gene 5
sense_overlapping 4
IG_C_pseudogene 2
IG_J_gene 2
TR_C_gene 2

Gene families

A naive way of looking for gene families is to cluster the gene names. I also remove any trailing numbers in the gene name.

example n
PRAMEF 21
USP17L 21
ZNF 15
GOLGA8I 13
NBPF 12
CT47B 12
POTEF 11
OR2T 10

Gene Ontology

Description GeneRatio qvalue
GO:0050907 detection of chemical stimulus involved in sensory perception 57/509 0.0000000
GO:0009593 detection of chemical stimulus 58/509 0.0000000
GO:0007606 sensory perception of chemical stimulus 59/509 0.0000000
GO:0007608 sensory perception of smell 53/509 0.0000000
GO:0050911 detection of chemical stimulus involved in sensory perception of smell 51/509 0.0000000
GO:0050906 detection of stimulus involved in sensory perception 57/509 0.0000000
GO:0042742 defense response to bacterium 23/509 0.0002887
GO:0098542 defense response to other organism 35/509 0.0002887
GO:0071346 cellular response to interferon-gamma 17/509 0.0002992
GO:0006805 xenobiotic metabolic process 14/509 0.0002992
GO:0071466 cellular response to xenobiotic stimulus 14/509 0.0004487
GO:0033141 positive regulation of peptidyl-serine phosphorylation of STAT protein 7/509 0.0004487
GO:0002323 natural killer cell activation involved in immune response 8/509 0.0005222
GO:0033139 regulation of peptidyl-serine phosphorylation of STAT protein 7/509 0.0005486
GO:0009410 response to xenobiotic stimulus 14/509 0.0007752
GO:0034341 response to interferon-gamma 17/509 0.0012694
GO:0006749 glutathione metabolic process 10/509 0.0014056
GO:0042501 serine phosphorylation of STAT protein 7/509 0.0019334
GO:0016579 protein deubiquitination 14/509 0.0035460
GO:0060337 type I interferon signaling pathway 11/509 0.0040381
GO:0071357 cellular response to type I interferon 11/509 0.0040381
GO:0006342 chromatin silencing 13/509 0.0046589
GO:0034340 response to type I interferon 11/509 0.0058697
GO:0030101 natural killer cell activation 10/509 0.0090640
GO:0070646 protein modification by small protein removal 14/509 0.0114298
GO:0009812 flavonoid metabolic process 6/509 0.0138310
GO:0033559 unsaturated fatty acid metabolic process 12/509 0.0138310
GO:0000042 protein targeting to Golgi 5/509 0.0138310
GO:0019373 epoxygenase P450 pathway 5/509 0.0138310
GO:0045814 negative regulation of gene expression, epigenetic 13/509 0.0138310
GO:0060338 regulation of type I interferon-mediated signaling pathway 7/509 0.0139433
GO:0006690 icosanoid metabolic process 11/509 0.0144589
GO:0009813 flavonoid biosynthetic process 5/509 0.0165692
GO:0006959 humoral immune response 15/509 0.0202359
GO:0002228 natural killer cell mediated immunity 8/509 0.0226821
GO:0006334 nucleosome assembly 13/509 0.0227629
GO:0006968 cellular defense response 8/509 0.0232698
GO:0052696 flavonoid glucuronidation 5/509 0.0232698
GO:0072600 establishment of protein localization to Golgi 5/509 0.0232698
GO:0043330 response to exogenous dsRNA 7/509 0.0234523
GO:0001906 cell killing 11/509 0.0249099
GO:0001580 detection of chemical stimulus involved in sensory perception of bitter taste 6/509 0.0322850
GO:0052695 cellular glucuronidation 5/509 0.0331023
GO:0042737 drug catabolic process 4/509 0.0378567
GO:0000301 retrograde transport, vesicle recycling within Golgi 5/509 0.0389140
GO:0031497 chromatin assembly 13/509 0.0483266
Description GeneRatio qvalue
GO:0004984 olfactory receptor activity 51/488 0.0000000
GO:0016712 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen 8/488 0.0000499
GO:0005132 type I interferon receptor binding 7/488 0.0000499
GO:0008391 arachidonic acid monooxygenase activity 6/488 0.0002238
GO:0008392 arachidonic acid epoxygenase activity 6/488 0.0002238
GO:0036459 thiol-dependent ubiquitinyl hydrolase activity 14/488 0.0002238
GO:0101005 ubiquitinyl hydrolase activity 14/488 0.0002238
GO:0008395 steroid hydroxylase activity 8/488 0.0002715
GO:0019825 oxygen binding 9/488 0.0004195
GO:0019783 ubiquitin-like protein-specific protease activity 14/488 0.0005119
GO:0003823 antigen binding 11/488 0.0007524
GO:0004497 monooxygenase activity 12/488 0.0008659
GO:0070330 aromatase activity 6/488 0.0009112
GO:0020037 heme binding 13/488 0.0029793
GO:0048020 CCR chemokine receptor binding 7/488 0.0030635
GO:0019864 IgG binding 4/488 0.0038146
GO:0046906 tetrapyrrole binding 13/488 0.0050261
GO:0042605 peptide antigen binding 5/488 0.0078319
GO:0005506 iron ion binding 13/488 0.0176137
GO:0004364 glutathione transferase activity 5/488 0.0236924
GO:0005125 cytokine activity 16/488 0.0236924
GO:0008234 cysteine-type peptidase activity 14/488 0.0236924
GO:0042379 chemokine receptor binding 7/488 0.0364788
GO:0015020 glucuronosyltransferase activity 5/488 0.0364788
GO:0016705 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen 12/488 0.0364788
GO:0019865 immunoglobulin binding 4/488 0.0422110
GO:0008009 chemokine activity 6/488 0.0435900
Description GeneRatio qvalue
GO:0045095 keratin filament 16/589 0.0000242
GO:0005882 intermediate filament 19/589 0.0042923
GO:0042611 MHC protein complex 6/589 0.0057256
GO:0071556 integral component of lumenal side of endoplasmic reticulum membrane 6/589 0.0074647
GO:0098553 lumenal side of endoplasmic reticulum membrane 6/589 0.0074647
GO:0045111 intermediate filament cytoskeleton 19/589 0.0171090
GO:0000786 nucleosome 11/589 0.0178803
GO:0044815 DNA packaging complex 11/589 0.0258667
GO:0012507 ER to Golgi transport vesicle membrane 7/589 0.0434286
GO:0072562 blood microparticle 11/589 0.0438412