Primers and PrimeViews

Filter By:

  • Protein language models learn from diverse sequences spanning the evolutionary tree and have proven to be powerful tools for sequence design, variant effect prediction and structure prediction. What are the foundations of protein language models, and how are they applied in protein engineering?

    • Jeffrey A. Ruffolo
    • Ali Madani
    Primer
  • Models like ChatGPT and DALL-E2 generate text and images in response to a text prompt. Despite different data and goals, how can generative models be useful for protein engineering?

    • Chloe Hsu
    • Clara Fannjiang
    • Jennifer Listgarten
    Primer
  • A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem.

    • Phillip E C Compeau
    • Pavel A Pevzner
    • Glenn Tesler
    Primer
  • Hierarchical models provide reliable statistical estimates for data sets from high-throughput experiments where measurements vastly outnumber experimental samples.

    • Hongkai Ji
    • X Shirley Liu
    Primer
  • Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. This primer covers the theoretical basis of the approach, several practical examples and a software toolbox for performing the calculations.

    • Jeffrey D Orth
    • Ines Thiele
    • Bernhard Ø Palsson
    Primer
  • When prioritizing hits from a high-throughput experiment, it is important to correct for random events that falsely appear significant. How is this done and what methods should be used?

    • William S Noble
    Primer
  • Networks in biology can appear complex and difficult to decipher. Merico et al. illustrate how to interpret biological networks with the help of frequently used visualization and analysis patterns.

    • Daniele Merico
    • David Gfeller
    • Gary D Bader
    Primer
  • Mapping the vast quantities of short sequence fragments produced by next-generation sequencing platforms is a challenge. What programs are available and how do they work?

    • Cole Trapnell
    • Steven L Salzberg
    Primer
  • Only a subset of single-nucleotide polymorphisms (SNPs) can be genotyped in genome-wide association studies. Imputation methods can infer the alleles of 'hidden' variants and use those inferences to test the hidden variants for association.

    • Eran Halperin
    • Dietrich A Stephan
    Primer
  • Only a subset of genetic variants can be examined in genome-wide surveys for genetic risk factors. How can a fixed set of markers account for the entire genome by acting as proxies for neighboring associations?

    • Eran Halperin
    • Dietrich A Stephan
    Primer
  • How can genome browsers help researchers to infer biological knowledge from data that might be misleading?

    • Melissa S Cline
    • W James Kent
    Primer
  • Decision trees have been applied to problems such as assigning protein function and predicting splice sites. How do these classifiers work, what types of problems can they solve and what are their advantages over alternatives?

    • Carl Kingsford
    • Steven L Salzberg
    Primer
  • The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?

    • Chuong B Do
    • Serafim Batzoglou
    Primer
  • Principal component analysis is often incorporated into genome-wide expression studies, but what is it and how can it be used to explore high-dimensional data?

    • Markus Ringnér
    Primer
  • Artificial neural networks have been applied to problems ranging from speech recognition to prediction of protein secondary structure, classification of cancers and gene prediction. How do they work and what might they be good for?

    • Anders Krogh
    Primer
  • Computational prediction of gene structure is crucial for interpreting genomic sequences. But how do the algorithms involved work and how accurate are they?

    • Michael R Brent
    Primer
  • Instrumentation aside, algorithms for matching mass spectra to proteins are at the heart of shotgun proteomics. How do these algorithms work, what can we expect of them and why is it so difficult to find protein modifications?

    • Edward M Marcotte
    Primer
  • Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?

    • William S Noble
    Primer
  • How can we computationally extract an unknown motif from a set of target sequences? What are the principles behind the major motif discovery algorithms? Which of these should we use, and how do we know we've found a 'real' motif?

    • Patrik D'haeseleer
    Primer
  • Sequence motifs are becoming increasingly important in the analysis of gene regulation. How do we define sequence motifs, and why should we use sequence logos instead of consensus sequences to represent them? Do they have any relation with binding affinity? How do we search for new instances of a motif in this sea of DNA?

    • Patrik D'haeseleer
    Primer