Common sequence issues to fix early

  • Unpaired cysteines can create unwanted disulfides. Remove or pair them unless they are functionally required.
  • High hydrophobic content and long hydrophobic patches often drive aggregation. Prefer balanced hydropathy and short, flexible linkers between domains.
  • NetSolP: fast sequence-based solubility screen for triage.
  • SoluProt: predicts soluble expression in E. coli from sequence features.
  • SolubleMPNN: solubility‑oriented variants of ProteinMPNN for redesigning constructs toward higher solubility (commonly used as a default starting point in redesign). See also ProteinMPNN.
  • ESM: use sequence log‑likelihoods to deprioritize hard‑to‑express variants.
  • ipTM (AlphaFold‑Multimer): interface quality metric that can correlate with complex stability in binder design workflows.
  • pSAE: exposure‑based measures (e.g., percent solvent‑accessible exposed hydrophobicity) are useful for ranking aggregation risk; implementations vary across toolkits.
Definitions and references: SolubleMPNN, ProteinMPNN, ESM, ipTM, AlphaFold‑Multimer, pSAE.
Workflow: Filter with NetSolP/SoluProt → rank by ESM log‑likelihoods → redesign with SolubleMPNN‑style constraints → re‑screen.

See also