Miscelanious Flashcards

(151 cards)

1
Q

ipsae

A

ipSAE (interaction prediction Score from Aligned Errors): an interface-focused confidence score for complexes computed from predicted aligned errors (PAE); higher is better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

pLDDT

A

pLDDT (predicted Local Distance Difference Test): per-residue confidence score (0–100); higher means the local geometry is more reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which metrics can be used to assess predicted protein structures?

A

Per-residue/local: pLDDT, lDDT.
Global/fold: pTM, TM-score, GDT_TS, RMSD.
Complex/interface: ipTM, DockQ, (ipSAE), interface RMSD.
Model quality/physics: clashes, Ramachandran/MolProbity scores.
Relative placement: PAE heatmap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Alphafold model protein complexes?

A

AlphaFold-Multimer predicts all chains jointly: it concatenates chain sequences with chain breaks, builds/uses paired MSAs for interacting partners, lets attention/triangle updates operate across chains, and ranks models with multimer confidence (e.g., ipTM + pTM).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Rosettafold differ from Alphafold?

A

RoseTTAFold uses a 3-track network (1D sequence, 2D pair/distance map, and 3D coordinates) with information exchange between tracks (incl. SE(3)-equivariant 3D updates). AlphaFold2’s trunk is the Evoformer (MSA+pair) followed by a separate structure module; AF historically achieved higher CASP14 accuracy, while RosettaFold is a different, more explicit 3D-track design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What modules does Alphafold have?

A

pairformer,evoformer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the pairformer do in Alphafold?

A

In AlphaFold3, the Pairformer is the main trunk that updates sequence/pair representations (and conditioning features) with attention + triangle-style operations, producing interaction-aware features that guide the downstream diffusion/structure generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the evoformer do in Alphafold?

A

In AlphaFold2, the Evoformer iteratively updates two representations—MSA (evolutionary info) and pair (residue–residue relations)—using attention and triangle updates, producing features used by the Structure Module to place atoms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does openfold differ from Alphafold?

A

OpenFold is a trainable, fully open-source PyTorch reimplementation/retraining of AlphaFold2. It reproduces the AF2 architecture closely but provides training code, configurable pipelines, and engineering changes/optimizations for research and different hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Alphafold3 differ from Alphafold2?

A

AlphaFold3 uses a diffusion-based all-atom generative approach and can jointly model complexes beyond proteins (DNA/RNA, small molecules/ligands, ions, modified residues). AF2 mainly targets proteins (and protein–protein complexes via Multimer) with an Evoformer + Structure Module pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does Alphafold2 differ from Alphafold1?

A

AlphaFold2 is end-to-end: it uses attention-based Evoformer + a learned structure module (with recycling) to directly output 3D coordinates and confidence. AlphaFold1 relied more on predicting distance/angle distributions and using separate structure-building/optimization steps with stronger template/fragment-style components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does Boltz differ from Alphafold?

A

Boltz (Boltz-1/2) is an open-source family of diffusion-based biomolecular interaction models aimed at AlphaFold3-like complex prediction (proteins with ligands/nucleic acids, etc.). AlphaFold3 is the DeepMind/Isomorphic model; Boltz emphasizes openness (weights/code) and (in Boltz-2) affinity prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Boltz2 differ from Boltz1?

A

Boltz-2 extends Boltz-1 with explicit binding-affinity prediction (e.g., protein–ligand), broader multimodal performance improvements, and updated training/engineering; Boltz-1 focused primarily on high-accuracy complex structure prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How much did it cost to train Alphafold, Boltz and Openfold?

A

Exact $ costs are generally not disclosed; common reported compute: AlphaFold2 used ~128 TPUv3 cores and took ~11 days to converge; OpenFold reported training on 128 A100 GPUs in ~8+ days. Boltz-1/2 training hardware/time is less consistently public; Boltz-2 training is reported as enabled by Recursion’s BioHive-2 supercomputer (exact cost depends on pricing/ownership).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What kinds of library construction methods are there?

A

Common NGS library types: (1) shotgun/fragmentation + adapter ligation, (2) amplicon-PCR libraries, (3) tagmentation-based (e.g., Nextera), (4) hybrid-capture/enrichment libraries, (5) long-read ligation/rapid kits. For synthetic DNA/protein libraries: Gibson/Golden Gate/restriction-ligation assembly, and display libraries (phage/yeast/mRNA) for variants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What loss function was used in training Alphafold?

A

Primary structural loss: FAPE (Frame Aligned Point Error). Plus auxiliary losses such as distogram cross-entropy, masked-MSA prediction, torsion/angle losses, structural violation/clash losses, and confidence-head losses (pLDDT/PAE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some key Pymol commands?

A

load/fetch, remove, select, show/hide (cartoon, sticks, spheres, surface), color, spectrum, zoom/orient, center, align/super, rms/rms_cur, distance (dist), label, save, png.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a SELECT statement in SQL?

A

A query that retrieves data from one or more tables/views: SELECT <columns/expressions> FROM <table> with optional WHERE, GROUP BY, HAVING, ORDER BY, LIMIT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which joins are there in SQL?

A

INNER JOIN, LEFT (OUTER) JOIN, RIGHT (OUTER) JOIN, FULL (OUTER) JOIN, CROSS JOIN; plus SELF JOIN (joining a table to itself).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How are the attentions calculated?

A

Scaled dot-product attention: scores = QKᵀ/√dₖ (+ mask), weights = softmax(scores), output = weights·V. Multi-head attention repeats this in parallel heads then concatenates and projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the dimensions of matrix multiplication?

A

If A is (m×n) and B is (n×p), then C = A·B is (m×p). The inner dimensions (n) must match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the complexity of matrix multiplication?

A

Naïve multiplication is O(m·n·p). For square n×n it’s O(n³); faster algorithms exist (e.g., Strassen ~O(n^{2.81})) but are less common in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is small o() notation?

A

f(n) = o(g(n)) means f grows strictly slower than g: limₙ→∞ f(n)/g(n) = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is large O() notation?

A

f(n) = O(g(n)) means f is asymptotically bounded above by g up to a constant: ∃c,n₀ s.t. f(n) ≤ c·g(n) for n≥n₀.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How do you calculate an MSA?
1) Find homologs (BLAST/HHblits/JackHMMER against sequence DBs). 2) Build an initial alignment (progressive/HMM-based). 3) Iteratively refine/realign; optionally weight sequences and trim/filter. Output is an aligned matrix of sequences with gaps.
26
What alignment algorithms are there?
Pairwise: Needleman–Wunsch (global), Smith–Waterman (local), Gotoh (affine gaps). Heuristics: BLAST. Profile/HMM: HMMER, HHsearch/HHalign. Multiple alignment: Clustal Omega, MAFFT, MUSCLE, T-Coffee; plus iterative refinement and consistency methods.
27
Why can we not calculate MSAs for antibodies?
Antibodies are generated by V(D)J recombination and somatic hypermutation, so each sequence (especially CDR3) is often unique with few true homologs; the resulting “MSA” is shallow/poorly defined and doesn’t provide the evolutionary constraints typical protein MSAs do.
28
What experimental methods can be used to determine antibody specificity?
ELISA; Western blot; flow cytometry; immunofluorescence/IHC; immunoprecipitation; protein/peptide microarrays; competition/epitope binning; knockout/knockdown controls; cross-reactivity panels; peptide scanning/alanine scanning for epitope mapping.
29
What experimental methods can be used to determine antibody affinity?
SPR (Biacore), BLI (Octet), ITC, microscale thermophoresis (MST), KinExA, equilibrium binding titrations (e.g., flow/ELISA-based) to estimate Kd and kinetics.
30
How does flow cytometry work?
Cells/particles are labeled (often with fluorescent antibodies) and pass single-file through a laser; detectors measure forward scatter (size), side scatter (granularity), and fluorescence channels; gating/compensation yields populations and marker expression.
31
How does MSA transformer work?
It takes an MSA as a 2D input (sequences × positions) and applies axial attention: row attention within sequences and column attention across sequences at each site. Trained with masked-language modeling on MSAs, it learns embeddings and attention patterns that correlate with contacts/structure.
32
What is special about Ablang?
AbLang is an antibody-specific language model trained on large antibody repertoires (heavy or light chains). It captures antibody-specific patterns (framework/CDRs) and can impute missing residues and produce useful embeddings better than general protein LMs for antibody-centric tasks.
33
Which tokenizer algorithms are there?
Character/byte tokenization; WordPiece; BPE (incl. byte-level BPE); Unigram language-model tokenizer (SentencePiece); whitespace/regex tokenizers; hybrid approaches (e.g., sentencepiece with byte fallback).
34
How can positional embeddings be calculated?
Absolute: learned position embeddings or fixed sinusoidal. Relative: learned relative bias/embeddings (e.g., Shaw), ALiBi. Rotary/phase: RoPE (rotary positional embeddings) and RoPE variants/scaling.
35
What are methods to extend the context length in transformer models?
Efficient attention (FlashAttention, sparse/windowed, block-sparse), linear/approx attention, recurrence/memory (Transformer-XL), retrieval augmentation (RAG), RoPE scaling/YaRN/NTK scaling, ALiBi, chunking + sliding window, KV-cache optimizations, and sequence compression/summarization.
36
How does a random forest model work?
An ensemble of decision trees trained on bootstrap-resampled data with random feature subsampling at each split; predictions are averaged (regression) or majority-voted (classification) to reduce variance.
37
What is the decision criterion in random forests?
For classification: maximize information gain / reduce impurity (Gini or entropy). For regression: minimize MSE/variance of targets in child nodes.
38
What is gradient boosting?
An additive ensemble method that builds weak learners (often trees) sequentially, each new learner fitting the residuals/negative gradient of the current model (e.g., XGBoost/LightGBM/CatBoost).
39
What are some clustering methods?
k-means, hierarchical (agglomerative), DBSCAN/HDBSCAN, Gaussian mixture models (EM), spectral clustering, mean-shift, affinity propagation.
40
What is performance@K?
A top‑K evaluation: measure how good the first K ranked predictions are (e.g., precision@K, recall@K, hit-rate@K, NDCG@K).
41
How can you create a model with PyTorch?
Define an nn.Module (layers + forward), choose a loss, optimizer, and DataLoader; run a training loop: forward → loss → backward → optimizer.step() with eval on a validation set.
42
How does JAX differ from PyTorch?
JAX is more functional and emphasizes composable transforms (grad, jit, vmap, pmap) compiled with XLA; arrays are typically immutable and you often write pure functions. PyTorch is imperative/eager by default (with optional torch.compile) and uses nn.Module-centric APIs.
43
What is data parallel?
Replicate the full model on each device, split the batch across devices, compute gradients locally, then all-reduce gradients to keep parameters in sync.
44
What is pipeline parallel?
Split the model layers into stages across devices and run microbatches through the stages in a pipeline (often with 1F1B scheduling) to increase throughput and fit larger models.
45
What is model parallel?
Partition the model’s parameters across devices (e.g., tensor/“sharded” weights) so a single layer’s computation is split across GPUs.
46
What is 3D parallel?
Combining data parallel + tensor/model parallel + pipeline parallel (often called 3D parallelism) to scale to very large models efficiently.
47
How would you parallelize a large model across GPUs?
Typical recipe: tensor parallel within a node (shard big matmuls), pipeline parallel across nodes (split layers), and data parallel across replicas; add optimizer/gradient sharding (ZeRO/FSDP) and activation checkpointing to reduce memory.
48
Which operations are needed to implement pipeline parallel from scratch?
Partition layers into stages; microbatch the input; send/recv activations between stages; run forward passes; run backward passes with gradient send/recv; accumulate gradients over microbatches; apply optimizer step; optionally checkpoint/recompute activations and manage pipeline scheduling (e.g., 1F1B).
49
What are the dimensions of keys, queries and values?
Common layout: Q,K,V are (B, H, L, d) where B=batch, L=sequence length, H=#heads, d=head_dim (so d_model = H·d). Attention weights are (B, H, L, L).
50
What are model embeddings?
Learned dense vector representations of discrete inputs (tokens, residues, etc.) produced by an embedding matrix; can also refer to intermediate hidden states used as representations for downstream tasks.
51
How can we get embeddings from a diffusion model?
Take intermediate activations from the denoiser/UNet (or diffusion transformer) as features; or use the latent representation (e.g., VAE latent in latent diffusion) as an embedding; optionally average‑pool across timesteps or use the final denoised latent.
52
How does LORA finetuning work?
Freeze the base weights W and learn a low-rank update ΔW = (α/r)·B·A (rank r). During training only A and B are updated; at inference ΔW is added (or merged) into W.
53
What parameter efficient methods to finetune models are there?
Adapters, LoRA/QLoRA, prefix tuning, prompt tuning, P-tuning, IA³, BitFit (bias-only), partial layer fine-tuning/linear probing, and low-rank/soft prompts.
54
What is the difference between SFT and RL?
SFT (supervised fine-tuning) learns from labeled targets (e.g., next-token or instruction-response pairs). RL optimizes a policy to maximize a reward signal (often from human/AI preferences) using RL algorithms (policy gradients, PPO, etc.).
55
What is Q-learning?
An off-policy RL method that learns action-value Q(s,a): Q ← (1−α)Q + α[r + γ·max_a' Q(s',a')]. The greedy policy picks argmax_a Q(s,a).
56
How can you prevent overfitting in general?
More data;
57
FT (supervised fine-tuning)
learns from labeled targets (e.g., next-token or instruction-response pairs). RL optimizes a policy to maximize a reward signal (often from human/AI preferences) using RL algorithms (policy gradients, PPO, etc.).
58
How can you prevent overfitting in general?
More data; proper train/val split; regularization (L2/weight decay), dropout; early stopping; data augmentation; simpler model; ensembling; cross-validation; noise injection.
59
How can you prevent overfitting in language models?
Deduplicate/clean data; dropout + weight decay; early stopping on validation perplexity; smaller model or fewer steps; label smoothing; augmentation/noising; mixout; curriculum; regularize with constraints (e.g., KL to base) during tuning; evaluate on held-out domains.
60
What is dropout?
A regularization technique that randomly zeroes a fraction of activations (or weights) during training so the network can’t rely on any single pathway; scaled appropriately at inference.
61
What are skip connections?
Residual connections that add a block’s input to its output (x + f(x)), improving gradient flow and enabling deeper networks.
62
What are the most important hyperparameters in language models?
Learning rate (and schedule/warmup), batch size, sequence length/context, model size (layers/width/heads), optimizer settings (βs, ε), weight decay, dropout, gradient clipping, training steps, and tokenization/vocab.
63
How do diffusion models work?
They define a forward process that gradually adds noise to data, then learn a reverse denoising model that removes noise step-by-step to generate samples from noise (often by predicting noise/score).
64
Why do diffusion models use a stepwise process?
The reverse distribution is hard to model in one jump; a Markov chain (or discretized SDE/ODE) of small denoising steps makes training stable and generation controllable, gradually refining coarse structure into detail.
65
What are the most important hyperparameters in diffusion models?
#steps/timesteps, noise schedule (β/α), model architecture/capacity, learning rate, guidance scale (if used), sampler (DDPM/DDIM/ODE), batch size, latent size/resolution, and conditioning dropout.
66
What diffusion language models exist?
Discrete/continuous text diffusion examples include D3PM (discrete diffusion), Diffusion-LM, SeqDiffuSeq (diffusion for seq2seq), and masked/iterative denoisers inspired by diffusion (e.g., MaskGIT-style).
67
What are alpha and R in LORA finetuning?
r (rank) is the low-rank dimension of the update matrices; α (alpha) is a scaling hyperparameter. The effective update is typically scaled by α/r.
68
How would you host a webapp on AWS?
Static sites: S3 + CloudFront (+ Route53 + ACM). Dynamic apps: ECS/Fargate or Elastic Beanstalk or EC2 behind an ALB; serverless: Lambda + API Gateway. Add CI/CD (CodePipeline/GitHub Actions), logging (CloudWatch), secrets (SSM/Secrets Manager), and a database (RDS/DynamoDB) if needed.
69
What are Autoencoders?
Neural networks trained to reconstruct inputs via an encoder → latent code → decoder; used for dimensionality reduction, denoising, and representation learning.
70
What are VAEs?
Variational Autoencoders: probabilistic autoencoders where the encoder outputs a distribution (μ,σ) over latents; trained with reconstruction loss + KL divergence to a prior, enabling sampling via the reparameterization trick.
71
How do you calculate the standard deviation?
Population: σ = sqrt( (1/n)·Σ(xᵢ−μ)² ). Sample: s = sqrt( (1/(n−1))·Σ(xᵢ−x̄)² ).
72
What is accuracy?
(TP + TN) / (TP + TN + FP + FN).
73
What is precision?
TP / (TP + FP).
74
What is recall?
TP / (TP + FN) (also called sensitivity/TPR).
75
What is the precision-accuracy curve?
Usually called the precision–recall (PR) curve: precision vs recall as you vary the classification threshold; area under it is Average Precision (AP).
76
What is the ROC curve?
Plot of TPR (recall) vs FPR (FP/(FP+TN)) as the threshold varies; AUC summarizes ranking quality.
77
What is overfitting?
When a model fits training data (including noise) too closely and fails to generalize, leading to a large train–test performance gap.
78
What is the bitter lesson?
Rich Sutton’s observation: methods that scale with compute and data (general learning + search) tend to outperform clever human-engineered, domain-specific tricks in the long run.
79
What is the curse of dimensionality?
As dimensions grow, space becomes sparse: distances concentrate, nearest neighbors get far away, and the amount of data needed to cover the space grows exponentially.
80
What is the variance ... tradeoff?
Bias–variance tradeoff: increasing model complexity often lowers bias but raises variance; the goal is to minimize total generalization error.
81
How do you calculate the covariance?
Sample covariance: cov(X,Y) = (1/(n−1))·Σ(xᵢ−x̄)(yᵢ−ȳ). Population uses 1/n.
82
What is n-fold cross-validation?
Split data into n folds; train on n−1 folds and validate on the held-out fold; repeat n times and average metrics.
83
What are benefits and drawbacks of n-fold or leave-one-out cross-validation?
Benefits: uses data efficiently; more stable estimate than a single split. Drawbacks: n trainings (expensive); folds may still be correlated. LOOCV (n = dataset size) has very low bias but high variance and is especially computationally expensive.
84
What’s the difference between PAE, pLDDT, pTM, and ipTM?
pLDDT: per‑residue local confidence (0–100). PAE: predicted aligned error between residue pairs (useful for domain/interface placement). pTM: predicted TM-score for overall fold (single-chain/global). ipTM: predicted TM-score focused on inter-chain interface quality (complex ranking).
85
How do you interpret a PAE heatmap for domain movement vs interface confidence?
Low PAE blocks along the diagonal indicate confident local/domain structure; low PAE between two domains/chains indicates confident relative placement. High off-diagonal PAE suggests uncertain domain orientation or flexible/alternative arrangements; interfaces with low PAE at contacting regions are more trustworthy.
86
What’s the difference between RMSD, TM-score, and lDDT (when is each misleading)?
RMSD is distance-based and sensitive to outliers/size and domain motions; TM-score is length-normalized and less sensitive to local errors, better for overall topology; lDDT is local distance agreement, robust to rigid-body domain shifts. RMSD can look bad for correct multi-domain models; TM-score can hide local errors; lDDT can be high even if domain orientation is wrong.
87
What’s the difference between model ranking vs model confidence in AF(-Multimer)?
Ranking is which model AF thinks is best among its outputs (e.g., by ipTM+pTM). Confidence is the predicted correctness level (pLDDT/PAE/ipTM). A top-ranked model can still be low-confidence if all candidates are uncertain.
88
What are common failure modes of AF/complex prediction?
Disordered regions or alternative conformations; incorrect domain orientations (hinges); wrong stoichiometry/assembly; weak/transient interfaces; missing ligands/cofactors/PTMs; induced-fit changes; membrane/low-homology targets; antibody CDR flexibility and antigen-dependent rearrangements.
89
What is recycling in AlphaFold and why does it help?
Recycling feeds the model’s own predicted representations/coordinates back into the trunk for additional refinement iterations. It helps correct errors iteratively, improving long-range consistency and interfaces.
90
What is template usage and when do templates help/hurt?
Templates provide structural priors from known related structures. They help when homologous structures exist (especially for low-MSA targets) and can guide domain arrangement; they can hurt if the template is too distant, wrong conformation, or biases the model away from the true structure.
91
What is MSA depth/effective sequences (Neff) and how does it affect accuracy?
MSA depth is the number of aligned homologs; Neff is the diversity‑weighted effective count. Higher Neff usually provides stronger coevolution signals and improves accuracy; shallow/low‑Neff MSAs often yield lower confidence and more errors.
92
What’s the difference between global docking and template-based docking?
Global docking searches relative orientations/positions of partners without assuming a known complex; template-based docking uses a known similar complex/interface as a starting point/constraint.
93
What is conformational selection vs induced fit (and why it matters for docking)?
Conformational selection: partners already sample binding-competent conformations and binding selects them. Induced fit: binding triggers conformational change. Docking is harder with large induced-fit changes because a single rigid structure may not represent the bound state.
94
What is DockQ and what does it measure?
DockQ is a composite score for protein–protein docking quality combining interface RMSD, ligand RMSD, and fraction of native contacts; it correlates with CAPRI quality categories.
95
What are common interface features and how to compute them?
Buried surface area (BSA), hydrogen bonds, salt bridges, hydrophobic contacts, shape complementarity. Compute via tools like PDBePISA/FreeSASA for BSA, and contact/H-bond analysis via PyMOL, Biopython, or dedicated interface analyzers.
96
What experimental data can be used to restrain modeling?
Cryo‑EM density maps, SAXS profiles, crosslinking mass spec (XL‑MS), NMR restraints (NOEs/RDCs), mutagenesis/epitope mapping, FRET distances, hydrogen–deuterium exchange (HDX‑MS), and co-evolution/paired MSAs.
97
What’s the difference between homologs vs orthologs vs paralogs?
Homologs share common ancestry (umbrella term). Orthologs diverged via speciation (often similar function). Paralogs diverged via gene duplication (may change function).
98
What is a profile HMM and why is it good for remote homology?
A profile Hidden Markov Model represents position-specific residue and gap probabilities from an MSA. It captures conservation patterns and indels better than pairwise alignment, improving detection of distant homologs.
99
What are gap penalties (linear vs affine) and why do they matter?
Gap penalties discourage insertions/deletions. Linear charges per gap character; affine uses gap_open + gap_extend, reflecting biology (few long gaps preferred over many short ones). They affect alignment sensitivity/specificity.
100
What is sequence weighting in MSAs and why do we do it?
Weighting down-weights redundant/closely related sequences so diverse sequences contribute more. It reduces phylogenetic bias and improves downstream statistics like coevolution and profiles.
101
What are paired MSAs and when are they useful (vs dangerous)?
Paired MSAs match sequences of two interacting proteins from the same organism to capture inter-protein coevolution. Useful for stable conserved interactions; dangerous if pairing is wrong (paralogs, promiscuous interactions), which can mislead models.
102
What are CDRs and framework regions, and common numbering schemes (IMGT/Kabat/Chothia)?
Framework regions (FR1–FR4) form the antibody scaffold; CDR1–CDR3 are hypervariable loops that often dominate binding. IMGT/Kabat/Chothia are conventions for assigning residue numbers/loop boundaries; they differ slightly in definitions, especially around CDRs.
103
What is V(D)J recombination and somatic hypermutation?
V(D)J recombination assembles variable regions from V, D (heavy only), and J gene segments to create initial diversity. Somatic hypermutation introduces point mutations in activated B cells, followed by selection to increase affinity.
104
What is clonal expansion and how do repertoires form?
After antigen activation, B cells with productive receptors proliferate (clonal expansion) and diversify via mutation; the repertoire is the population of B-cell receptor/antibody sequences shaped by recombination, selection, and exposure history.
105
What’s the difference between affinity and avidity?
Affinity is the strength of a single binding site interaction (often Kd). Avidity is the overall functional binding strength from multivalent interactions (e.g., IgG bivalent binding), which can be much stronger than affinity alone.
106
What is epitope binning and how is it measured (SPR/BLI competition)?
Epitope binning groups antibodies by whether they compete for the same/overlapping epitope. In SPR/BLI, one antibody captures antigen, then a second antibody is flowed; reduced binding indicates competition (same bin).
107
What’s the difference between neutralization and binding?
Binding means the antibody recognizes the antigen; neutralization means it blocks biological function (e.g., viral entry), often requiring binding to specific functional epitopes and sufficient potency.
108
What assays detect polyreactivity/off-target binding?
HEp‑2 cell staining, polyspecificity reagent (PSR) assays, binding to dsDNA/LPS/insulin panels, protein microarrays, tissue cross-reactivity, and off-target panels via ELISA/SPR/BLI/flow.
109
What is the difference between self-attention and cross-attention?
Self-attention uses Q,K,V from the same sequence (intra-sequence context). Cross-attention uses queries from one sequence (e.g., decoder) and keys/values from another (e.g., encoder or conditioning input).
110
What does the softmax temperature do?
Temperature scales logits before softmax. Higher temperature (T>1) makes distributions flatter (more uncertainty); lower (T<1) makes them sharper (more confident).
111
Why does attention use √d scaling?
Dot products grow in magnitude with dimension d, which can push softmax into saturation and harm gradients. Dividing by √d keeps scores in a reasonable range for stable training.
112
What are KV caches and how do they speed up decoding?
In autoregressive generation, keys/values from previous tokens are stored (cached) so each new token only computes attention against cached K,V instead of recomputing for the whole prefix, reducing per-step cost.
113
What are pre-norm vs post-norm transformers?
Pre-norm applies LayerNorm before the sublayer (x + f(LN(x))); post-norm applies LayerNorm after the residual (LN(x + f(x))). Pre-norm generally stabilizes deep training.
114
What is masked language modeling vs causal language modeling?
MLM predicts masked tokens using bidirectional context (e.g., BERT). Causal LM predicts next tokens left-to-right with a causal mask (e.g., GPT).
115
What are common LR schedules (cosine, linear warmup, one-cycle) and why?
Warmup prevents early instability; cosine decay gradually lowers LR for convergence; one-cycle increases then decreases LR to speed training and improve generalization. Schedules balance fast learning early with fine-tuning late.
116
What is gradient clipping and when do you need it?
Clipping caps gradient norm/value to prevent exploding gradients, especially in RNNs, very deep nets, or large-batch/unstable training regimes.
117
What is label smoothing and when is it helpful/harmful?
It replaces hard one-hot targets with a slightly smoothed distribution to reduce overconfidence and improve calibration. It can harm tasks needing exact probabilities or when data is already noisy/low-signal.
118
What’s the difference between calibration and accuracy (ECE, reliability diagrams)?
Accuracy measures correctness; calibration measures whether predicted probabilities match true frequencies. ECE summarizes miscalibration; reliability diagrams plot predicted confidence vs observed accuracy.
119
What is confusion matrix and derived metrics (F1, MCC, balanced accuracy)?
Confusion matrix counts TP/FP/TN/FN. F1 = harmonic mean of precision and recall; MCC measures correlation between predictions and labels (robust to imbalance); balanced accuracy averages recall across classes.
120
What is class imbalance and how do you handle it?
When classes have very different frequencies. Handle via reweighting, resampling (over/under/SMOTE), appropriate metrics (PR-AUC), threshold tuning, focal loss, and collecting more minority data.
121
What is ZeRO / FSDP and what does it shard?
ZeRO (and PyTorch FSDP) shard training state across devices. Depending on stage/config they shard optimizer states, gradients, and parameters, reducing per-GPU memory while using collectives (all-gather/reduce-scatter).
122
What is activation checkpointing (trade compute for memory)?
Instead of storing all activations for backprop, you save a subset and recompute others during backward. It reduces memory at the cost of extra forward compute.
123
What communication ops dominate training (all-reduce, all-gather, reduce-scatter)?
All-reduce aggregates gradients across replicas (data parallel). All-gather collects sharded tensors (FSDP/tensor parallel). Reduce-scatter sums and distributes shards (often paired with all-gather for efficient sharded training).
124
What’s the difference between throughput and latency?
Throughput is samples/tokens processed per second (rate). Latency is time to produce one response/output (delay). Optimizations often trade one for the other.
125
What is mixed precision (FP16/BF16) and what can go wrong?
Using lower-precision dtypes speeds training and reduces memory. Issues include overflow/underflow and instability; mitigations include loss scaling, BF16 use, careful normalization, and keeping some ops in FP32.
126
What’s the difference between DDPM, DDIM, and ODE samplers?
DDPM uses stochastic reverse diffusion (many steps). DDIM is a deterministic (or less stochastic) variant enabling fewer steps with similar quality. ODE samplers integrate a probability-flow ODE (deterministic) and can use adaptive step solvers.
127
What is classifier-free guidance?
A conditioning technique that trains with random condition dropout; at sampling, combine conditional and unconditional predictions and scale their difference to steer samples toward the condition without an external classifier.
128
What is a noise schedule and why does it matter?
It defines how much noise is added per timestep (β/α schedule). It affects training signal distribution and sampling quality/speed; good schedules improve stability and reduce required steps.
129
What does it mean to predict ε (noise) vs x0 vs v?
Different parameterizations of the denoiser target: predict added noise ε, the clean data x0, or a v-parameterization combining both. They change loss scaling and can improve stability/quality.
130
How do you compute RMSD properly (alignment first)?
Select comparable atoms (e.g., Cα), superpose structures (least-squares alignment) to remove rigid-body differences, then compute RMSD on the aligned coordinates; for multi-domain proteins consider domain-wise RMSD.
131
How do you color by B-factor / confidence (pLDDT) in PyMOL?
Store pLDDT in the B-factor field (common in AF PDBs) and use spectrum b, then set appropriate range; e.g., 'spectrum b, blue_white_red, minimum=50, maximum=100' (colors optional) or use 'spectrum b' with defaults.
132
How do you identify interface residues and measure distances/contacts?
Define selections for each chain, find residues within a cutoff (e.g., within 4–5 Å) of the other chain using 'byres' and 'within', then use 'distance' for specific pairs and count contacts via selection sizes or scripts.
133
What’s the difference between cartoon, surface, and sticks views (when to use each)?
Cartoon shows secondary structure/backbone topology; surface shows solvent-accessible envelope and binding pockets/interfaces; sticks shows detailed side-chain/ligand interactions (often combined with cartoon).
134
What are GROUP BY and HAVING (difference from WHERE)?
WHERE filters rows before aggregation; GROUP BY forms groups for aggregates; HAVING filters groups after aggregation (e.g., HAVING COUNT(*) > 10).
135
What are aggregations and window functions (OVER/PARTITION BY)?
Aggregations (COUNT/SUM/AVG/MIN/MAX) summarize groups. Window functions compute per-row values over a window defined by OVER (PARTITION BY/ORDER BY), e.g., running totals or ranks, without collapsing rows.
136
What are indexes and when do they help/hurt?
Indexes speed lookups/joins/orderings by maintaining auxiliary data structures (e.g., B-trees). They can hurt by slowing writes/inserts/updates and consuming storage; poor indexes can increase query planner overhead.
137
When did the CASP competition start and when did AlphaFold participate?
Critical Assessment of Structure Prediction (CASP) Start = 1994 [CASP-01] AF1 = 2018 [CASP-13] AF2 = 2020 [CASP-14]
138
What is RMSD?
Mean distance between corresponding atoms (all or only C-alpha), range [0,infinity), requires a global superposition, lower is better
139
What is the TM-score?
Mean distance between corresponding C-alpha atoms scaled by lenght-dependent distance parameter, range [0,1], higher is better, requires global superposition
140
What is GDT-TS/HA?
Mean percentage of C-alpha atoms that fit under 4 distance thresholds: GDT-TS: 1,2,4,8 angstrøm, GDT-HA: 0.5,1,2,4 angstrøm, requires global superposition (but for each threshold the optimal superposition is chosen separately)
141
What is one angstrom equivalent to?
The angstrom is a unit of length equal to 10⁻¹⁰ m; that is, one ten-billionth of a metre, a hundred-millionth of a centimetre, 0.1 nanometre, or 100 picometres.
142
What is LDDT?
Mean fraction of preserved all-atom distances using 4 tolerance thresholds (0.5,1,2,4) within the 15 anstrøm inclusion radius (only for atoms within this radius the distances are included in the calculation)
143
What is Gibbs free energy?
delta G = delta(H) - T * delta(S) - delta(H): enthalpy - T: temperature - ΔS: entropy Binding/folding is favorable if delta G < 0.
144
What is Enthalpy?
Enthalpy is the sum of a thermodynamic system's internal energy and the product of its pressure and volume: H(S,p)=U+pV
145
What contributes to enthalpy in proteins?
ΔH (enthalpy): “bonding/interactions energy” hydrogen bonds electrostatics (salt bridges) dispersion/van der Waals These generally favor more ordered, well-packed structures and make ΔH more negative (stabilizing).
146
What contributes to entropy in proteins?
ΔS (entropy): “number of accessible microstates” - protein conformational entropy (flexibility) - solvent entropy (especially water around hydrophobics) Entropy can favor disorder for the protein itself (more conformations), but favor folding via the hydrophobic effect because burying hydrophobics can increase solvent entropy (water becomes less ordered).
147
Which components are considered in molecular forcefields of proteins?
- all bonds - all angles - all torsion angles - all non bonded pairs - all partial charges
148
What is the Rosetta energy?
Rosetta scores a structure as a weighted sum of individual energy terms. Some terms are physics inspired (e.g. electrostatics) and others are knowledge based (e.g. torsion preferences)
149
What are rotamers?
Side chains can rotate around single bonds (the χ / chi dihedral angles: χ1, χ2, …), but often adopt the same discrete angles: ~60°, ~180°, or ~300° (g+,t, g-)
150
Which angles describe the side chain rotation?
χ, chi1, chi2, chi3, chi4
151