not invented here.

Notes on bioinformatics methods, software and discussions.

posts tagged “conference”:

9.29.2012 Exploring the Cancer Methylome

Peter Laird, University of Southern California, USA

70% of CpGs methylated. In a cancer cell a number of changes. Widespread (rather than global) hypomethylation and focal CGI hypermethylation in cancer, frequently in promoter regions. What is the relationship between those two discordant events?

125 colorectal adenocarcinomas genomes profiled for methylation, 29 adjacent normals. Clusters nicely, but are those associated with clinical features? Align with BRAF mutation status, for one (but mutations do not induce the methylation status). Another Distinct CpG-island methylator phenotype, CIMP+, micro-satellite unstable subtypes also overlap (if incompletely). Can be used to generate epigenetic subtypes of colorectal cancer, applied to TCGA with the same outcome.

Synergy between cancer genetics an epigenetics. ES-Cell polycomb repressor complex targets are prone to abnormal DNA methylation in cancer (polycomb keeps master regulators of differentiation in poised state). Large number of promoters acquire methylation in cancers compared to matched normals, enrichment of polycomb targets among those cancer-associated methylated targets. Same genes also acquire increased methylation with age.

Polycomb crosstalk likely leads to cumulative stochastic methylation. Loss of polycomb and replacement with more permanent silencing method — i.e., methylation — means target no longer able to differentiate, gets stuck in stem cell state without full differentiation capability. Great target to accumulate additional mutations over time. Potentially not an active process in cancer, but a hallmark of the event that lead to the cancer development. Not a competitive advantage to develop a cancer, but a passenger event indicating that the progenitor cell could no longer differentiate.

Would explain DN methylation of ~50% of cancer-specific methylated genes, consistent with stem cell-like behaviour of cancer cells, explains observation of epigenetic field effects adjacent to tumors: a differentiation block.

Focal hypermethylation and long-range hypomethylation: WG bisulfite sequencing of primary tumors and normal tissues. Shows sample CpG site in normal/cancer, striking difference of signal. Zooming out to ~20kb windows, nice heatmap/scatterplot of methylation signal show partially methylated domains. Shows erosion of methylation pattern in window from ES cells to differentiated colon to cancer. Hypomethylation not uniform distributed, clear windows. Epigenetic unstable regions close to the nucleous (late replicating regions, lamin attachment regions).

Comparison across cancer types: holds up across multiple cancers studied, with individual differences that should be interesting to tease apart. Comparison of 2200 TCGA cancers 409 normals reveal cancer-specific profiles in unsupervised clustering and pairwise correlation analysis of all cancer types using all sites.

btg2012 ✳ Conference ✳ epigenomics ✳ Cancer 

9.29.2012 Analysis of somatic retrotransposition in human cancers

Peter Park, Harvard Medical School, USA

45% of human genome derived from transposable elements (TEs), able to replicate (copy/paste) across the genome via an RNA intermediate. Previous studies of TEs mostly in germline, >7000 insertions in 185 samples from the 1000G set. Implicated in single gene diseases with ~100 insertions reported so far (L1s, ALUs, others), e.g., neurofibromatosis type 1 contains 18 retrotransposon insertions. Some events found in cancer, for example L1 in MYC — but discovered in low-throughput studies.

Studied 43 cancer/matched normal genomes (150 billion reads) from TCGA (GBM, ovarian, colorectal, prostate, myeloma). Thousands of novel germline insertions, 194 high confidence somatic insertions identified (majority L1s).

Detecting events challenging. Numerous, often identical TE instances. Find cluster of read pairs where one end maps uniquely, other end maps to TE consensus sequence. Use of clipped reads (partially aligned reads) key. Added custom assembly of repeat elements to the genome to be able to find repeat families at once. Tool called Tea (transposable element analyzer); insertions validated by Sanger. All somatic L1/Alu insertions in cancers of epithelial cell origin, none in blood or brain cancers (Ouch: postdoc started with those tissues by chance.).

64 out of 194 insertions located in genes, including tumor suppressors (UTRs and introns). Somatic insertions tend to occur in genes commonly mutated in cancer, disrupt expression levels of target genes, biased towards regions of cancer-specific DNA hypomethylation (all statistically significant).

Unclear where and how often these happen

Quick shout out to Galaxy and Peter’s Refinery system, a data repository connected to the Galaxy backend for data analysis.

btg2012 ✳ Conference ✳ Cancer ✳ sequencing 

9.29.2012 MutaScope: a high sensitivity variant caller for amplicon sequencing

Shawn Yost, University of California San Diego, USA

Targeted therapy in cancer hinges on having more actionable genes: only 47 genes actionable (approved or in clinical trial). Designed ~2000 amplicons (150kb, 1000X coverage) to cover these. PCR amplicon and MiSeq allow for quick turnaround (UDT-Seq, Ultra-Deep Targeted Sequencing). Amplicon has fixed directionality, start/stop position, differs from whole exome sequencing. All mutations will have the same position in a read, making it difficult to identify false positives by usual metrics. High depth of coverage also unusual, but needed due to sample heterogeneity.

Applied to ~50 samples with wide range of percent of invasive tumour cells. MutaScope approach: BWA alignment, refine alignment, calculate experimental error rate using a germline sample, variant detection and classification.

Read refinement: each read assigned to the amplicon from which it was sequenced, allows to use RG information to improve variant calls. Unclip soft-clipped bases to allow variant calls at the end of reads.

Error rates: based on germ line sample to correct for error rates dependent on position of read, CG content of reference. Use two models of tumor heterogeneity (spiked in tumor samples at 1/5/20% as a control or two mixed germ line samples). Tested against Samtools, Varscan, GATK. MutaScope works better for this kind of sequencing.

Applied successfully to clinical samples, studying prevalence of somatic and germline variants. Designed specifically for PCR amplicon, modular workflow from FASTQ to PCR. Additional information in VCF such as detection p-value, classification p-value and mutant allele read group bias.

software ✳ sequencing ✳ btg2012 ✳ Conference 

9.29.2012 Translating cancer genomes

Lynda Chin, MD Anderson Cancer Center, USA

Cancer as a disease of the genome. Discovery of the BRAF mutation by Wellcome Trust as the poster child of what genome analysis can do (proof of concept). Know the target, what it does, have the right target, and identify the right patient population subset should lead to therapeutic success — in theory. A large scale catalogue of mutations insufficient. Overview of RAC1 mutations, biological evidence of activating function of the mutation, still need better understanding to translate this into aims for a drug design.

Another example, Prex2 in melanoma, mutated and highly re-arranged in different patients. Unclear whether this is a driver mutation or noise — large number of mutations, but scattered all over the gene, no hotspots. Need to engineer mutations and test in vivo model system. That still does not define what it does or how, and more importantly is it rate limiting for the tumor?

All model systems with strengths and weaknesses; need to run several independent tests that should converge on the same result before trusting the results to not be an artifact.

Case study: landscape of somatic mutations in melanoma (Hodis, Watson; Cell 2012). Number of patients without BRAF or NRAS mutations, no treatment for this group. Start with genetic model, NRAS mouse promoter can be turned on and off in a tumor. NRAS initiates tumor, mutations in TRRAP, GRM3, SETD2 present. Switch NRAS off, tumor shrinks — NRAS is required for maintenance and a valid therapeutic target. Genetic ablation of NRAS induces tumor regression, not achieved by inhibition of MEK. Identified genes significantly altered by MEK inhibitor and the effect of NRAS (plenty of overlap, MEKi represents partial inhibition of NRAS activity). Majority of NRAS-associated genes not affected by MEK inhibitors though.

Tested for pathway enriched by RAS-specific genes (cell cycle proliferation, opposite of expectations). P53 decreased upon NRAS extinction, not after MEK inhibition. Network modeling (TRAP) to analyze difference what key regulators are responsible for the pathway differences, identified CDK4 as a putative key driver (proliferation checkpoint).

Missing step: pharmacological validation with a CDK4 inhibitor (commercially available). Only works in combination with MEK inhibitor (synergistic effect); confirmed in ex vivo test. Partial inhibition uncouples apoptosis and cell cycle arrest.

Model system vital to understand how the signaling of the pathway works, bypass redundant / complex feedback systems. Enables a wider therapeutic window. Systems approach to collect data at will crucial to develop combination therapies.

btg2012 ✳ clinic ✳ sequencing ✳ drugs ✳ Conference 

9.28.2012 Genomics – Catching up to Human Genetics

Richard Gibbs, Baylor College of Medicine, USA

aka Genomic Medicine 20??. From individual variation (Watson) to population variation (Desmond Tutu project), identification of actionable variants (Jim Lupski’s project, NEJM 2010), medical management and intervention (2012), everyone (20??). What is the main utility, and should we all be sequenced?

Technical development:

Still long way to go, not a perfect genome yet despite rapid developments. Capture technologies to stick around a bit longer as higher coverage in important regions helps

Healthy adults:

Who wants a test without medical indication? Mike Snyder, for one. Still useful information such as site frequency spectrum (1000G useful even without detailed phenotypes)

Complex disease:

GWAS vs Mendel. Few actionable alleles vs low frequency/high impact, can we have an integrated model? Can we construct models of complex diseases (‘oligogenics’). ARIC and CHARGE consortium with 30,000 well phenotyped individuals, 4,500 exome-seq’d sees Mendelian alleles in the ‘normals’

Mendelian disease:

Severe diseases, often children, collectively frequent, cites actionable case studies (40 mendelian diseases studied right now at HGSC alone, ‘industrialized’ pipeline). Studies take away endless exhausting lists of diagnosis, treatments. Huge value in molecular diagnosis even without treatment. Problem of too many small pedigrees, need a surrogate for what a variant does as part of a functional assessment. Whole Genome Lab launched 2011, steady increase of samples (by end of 2013 10k samples/month, no way to do manual curation).

How much work will be in research vs clinical settings? Lots of additional success stories that are just impossible to cover without the pedigree pictures, gene names and impact.

Clan Genomics paper: “recent mutation may have a greater influence on disease susceptibility or protection than is conferred by variations that arose in distant ancestors”. Should be more worried about immediate family, variants not present in the broader population.


Role of inherited, acquired mutations, environmental mutations. TCGA et al ‘damn successful’ in uncovering new mutations and functions. New technologies allow analysis of low frequency cancers. Different trios: normal, primary, recurrence allow clonal analysis, trace evolution, study time dependency between mutational events.


Need to find the family or cohort first. Data ends up in medical records which can be mined for resources; likely growth in social networks / 23andMe’s to create cohorts, whereas designed population studies will decrease.

Future prediction: all the excitement stale in a few years. Passion move to other fields as this becomes complete routine — it’s water coming out of the faucet.

sequencing ✳ conference ✳ btg2012 

9.28.2012 Surname leakage from personal genomes

Yaniv Erlich, Whitehead Institute for Biomedical Research, USA

Co-segregation between Y-Chr and surnames used by public services, allow you to connect to relatives; second database of interest SMGF used to extract a total of 140k surname/Y-chr-test data points.

What is the probability to recover a surname using genomic data? Success rate of ~12%. What when adding age, state as additional metadata (often included in public records, publication also allowed). Age 40, State Colorado, surname Smith. Median size of 12 people identified by this combination.

The Venter case, putting it all together. Profile STR markers with lobSTR. Now sufficient to get Venter surname from (Try yourself!). The same person does NOT need to be in the database, sufficient for relatives to be included.

Remaining talk not tweet-able. Shame, _really fun and intriguing analysis_.

privacy ✳ sequencing ✳ Conference ✳ btg2012 

9.28.2012 The implications of clonal genome evolution for cancer medicine

Samuel Aparicio, BC Cancer Research Centre, Canada

Cancers act as evolutionary systems. Provides a brief history of tumour evolution, Hauscka in the 50’s, Richard Doll’s mathematical models, Peter Nowell in Science 1976 on the clonal evolution. Ecosystems of malignant cells, with selection operating on phenotypes resulting in growth advantages / disadvantages. Conceptualized and modeled in a large number of recent papers.

In practice:

  1. Tumours will vary in size and composition
  2. Mutations detected in bulk will not all co-occur in the same cells (clonal genotype)
  3. Powerful tool to analyze function in human tumours
  4. Most tests completely ignore this clonality

Concepts: clonal prevalence (prevalence of a given mutation across all clones), genotype (unique constellation of mutations defining a clone), lineage (related by hierarchical descent). Difficult to track as mutations occur at every scale. First demonstration of clonal evolution over nine years (primary tumour, recurrence and metastasis). Three patterns: present in primary and malignant cells, present in subset of either population, undetectable in primary. Again during AML relapse tracing clonal abundance using allelic frequencies. Patient relapse was caused by mutation already present prior to therapy, but at very low abundance. Observing this kind of evolution provides hints of the relevance (significance) of the genotype.

Triple negative breast cancer has additional subtypes. Analyzed 104 TNBC (early stages), 2164 validated SNVs (single base, small indels). Wide variation in mutation abundance by case, but would be treated all the same way, even after adjusting for differences in CNV. (Aside: unusual pattern of somatic mutations in genes downstream of ITG, cell shape, cytoskeleton processes).

Mixture model to account for copy variation, dirichlet process mixture model to cluster mutations to predict discrete clonal frequencies. Some cancers with only 2-3 clonal groups, others with six groups that are distinct from each other. Huge variation between patients depending on how much the cancers evolved. In some cases p53 is not the initial (earliest) clone, so other events must be driving the cancer initiation.

(Shah et al, Nature 2012) showing clonal frequencies of TNBC organized by pathways rather by genes, resulting in an interesting crosstalk network. (Naxin et al, Nature 2011) another paper showing lots of regional variation (clonal differences) across a tumour. Single cell sequencing will help resolve the clonal genotypes. Preliminary data from seven nuclei, able to infer 4 clonal genotypes based on informative sites. Detect mutations in blood plasma before radiological progression.

Clonal evolution of cancers at the root of treatment failure. Can be tackled with modern genomics.

sequencing ✳ btg2012 ✳ conference ✳ evolution 

9.28.2012 Analyzing Genomes: is there a duty to disclose?

Amy McGuire, Baylor College of Medicine, USA

Requirement for an investigator to contact patient when incidental findings are discovered? Aka, how to return individual level research results. Guidelines include wording like ‘proven clinical value’ that are difficult to define. Some contracts extend this to secondary analysts who are expected to contact Biobanks who follow up with sample provider. There is no consensus.

Duty to reciprocity, and a right to receive (not well recognized legal principles in clinical practice). Patients when asked would prefer to receive results about themselves.

Arguments against disclosure: obligations are role-specific, no duty to rescue for researchers. Other ways (aggregate information) for participants to receive information; preliminary research results often not replicated and with unclear significance. Routine return of results impose burden on research enterprise.

GWAS Investigators survey, list of published study corresponding authors; 200 completed surveys, 35 interviewed. Only 7 primary users returned results, none of the secondary researchers did. 68% felt results should be returned in some circumstances: type of study matters. Results not significant enough in large-scale GWAS, but frequently so in small family linkage studies.

ICD matter to what is being returned to participants. ~20% of documents silent on return of results, none stated they would be returned under all circumstances, 10% when significant findings were discovered. (As @larsgt puts it: The EULA of research.) Legal obligations are being created (negligence on behalf of the investigator to return results as outlined), in line with standards of care. Moral obligation to return data can quickly turn into a legal one.

Back to Zack’s talk: incidental findings in clinical care. Fiduciary obligation to disclose, but what about variants of unknown significance (First time I have seen the VUS acronym). Is there a duty to hunt for more information? How does this mesh with the incidentalome problem discussed earlier?

Working group on Secondary Findings in Whole Exome/Genome Sequencing (article in Genetics in Medicine, Robert C Green). Bound to get more difficult as lines between research and clinical care continue to blur. Clinical Sequencing Exploratory Research (CSER) projects to assess the integration of clinical information, sequencing, utilization outcomes, clinical care.

Problem of Iatrogenic harm — over-utilization of healthcare resources. No good data available.

Other ethical issues, e.g., identification of incestuous parental relationships using SNP arrays. Seven cases at Baylor (first degree, in some cases with the mother being a minor) in the last year; genera consent to treat includes this kind of testing. Disclose information to authorities, child protective services?

btg2012 ✳ conference ✳ ethics ✳ sequencing 

9.28.2012 How to Avoid One Thousand Opportunities to Do Harm In Genomic Medicine

Isaac Kohane, Harvard Medical School and Children’s Hospital Boston, USA

Analogy of Google Maps GIS layer system to create useful maps, and how this will apply to the combination of ‘omes (3 billion based to one report). Includes mandatory jab at Apple Maps.

Threat of the Incidentalome: danger of large N and small p(Disease), false (dangerous) diagnosis will become prevalent. With only 10k tests / per person 60% of the population will be diagnosed falsely. Non-genomic follow-up tests will be cost prohibitive. Shout-out to Dan MacArthur’s LoF variant paper in science — high confidence of 100 LoF alleles with 20 complete LoF events per genome without noticeable phenotype.

Four components of the Incidentalome:

  1. Wrong annotations
  2. Measurement error

Findings per genome: > 50 variants at highly conserved, disease causing sites.. in healthy individuals. Easily 20% annotation errors in OMIM, HGMD, dbSNP (e.g., due to bad genome assembly updates). Systematical errors when sequencing the same genome multiple times, let alone between different technology platforms (5-20% disagreement). For indels concordance only 20%. 10”^3 to 10^5 errors per genome. Still, this problem can be fixed over time.

  1. Wrong priors (comparison group)

41,000 patients tested, 5 in 1000 homozygous for highly penetrant mutation for haemochromatosis. Only one with actual history — from 80% penetrance to less than 1%. Initially measured in families with (obviously) shared genetic background, but also shared environmental exposure. Compare to mouse strains: knockout only disease-causing in one out of three mice strains. Clinical cohorts already heavily biased.

I2B2 toolkit, viral dissemination of a population study system, now at 84 centers, comes with VMware image.

  1. Multiple comparisons

With one variant, picking the right comparison group is feasible. Assume only 10^4 important variants, not large enough groups around to compare to. iSnyder study as an alternative with a mechanistic evaluation of data.

All that said, what’s a genome worth? Predicted value to healthy individuals is very limited. Extremely valuable to sick patients, risk relatively slight.

ethics ✳ clinic ✳ sequencing ✳ Conference ✳ btg2012 

9.28.2012 Clinical diagnostic whole genome sequencing in a paediatric population

Elizabeth Worthey, The Medical College of Wisconsin, USA

Pediatric WGS based MDx pilot at MCW, 18 months, 3-6 cases reviewed / month, 24 approved by committee. 8-10 hours of counseling required for the consent process. A need to define the medically actionable term: treatment of manifestations, prevention of primary, secondary symptoms, etc. Where possible patients decide on what information is to be returned (within State laws). Broad range of diseases, MDx rate is ~40%, based on the pilot agreement to move forward.


  • how to ensure systematic, validated analysis? Complex data processing pipeline with controlled data updates, curated annotations, controlled development schedule, validation of each change (six month development schedule).
  • how to support clinical interpretation? Like other speakers different categories of findings. Prioritize on likely error, likely pathogenicity
  • handle regions not sequenced (false negatives); cross-sample analysis (GapMine scripts). Found ~60 genes with clinical utility always poorly covered. Most NEMO disease associated variants fall into such a coverage gap. Can use other technologies (very low coverage PacBio) to close such gaps.
  • how to handle bad reference data? WGS found candidate mutation in patient that looked promising, but turned out to be polymorphic when studying mutation databases (Goes back to previous talks on needing lots and lots of references). Tracing the annotation evidence paper trail indicate, though, that the mutation isn’t polymorphic. Data needs to be reviewed carefully (given the current state of public data)
  • when do you declare failure? Sanger confirmed variant with insignificant clinical findings. Six month later diagnosis due to new publications of patients with same mutation and similar phenotype

NGS is already providing great insight into human health. Need to be able to query each others variants data, ideally with associated phenotype information. Need to fix wrong or misleading information in databases.

Conference ✳ wgs ✳ btg2012 ✳ clinic ✳ sequencing