Common sequence issues to fix early
- Unpaired cysteines can create unwanted disulfides. Remove or pair them unless they are functionally required.
- High hydrophobic content and long hydrophobic patches often drive aggregation. Prefer balanced hydropathy and short, flexible linkers between domains.
Recommended solubility and stability predictors
- NetSolP: fast sequence-based solubility screen for triage.
- SoluProt: predicts soluble expression in E. coli from sequence features.
- SolubleMPNN: solubility‑oriented variants of ProteinMPNN for redesigning constructs toward higher solubility (commonly used as a default starting point in redesign). See also ProteinMPNN.
- ESM: use sequence log‑likelihoods to deprioritize hard‑to‑express variants.
- ipTM (AlphaFold‑Multimer): interface quality metric that can correlate with complex stability in binder design workflows.
- pSAE: exposure‑based measures (e.g., percent solvent‑accessible exposed hydrophobicity) are useful for ranking aggregation risk; implementations vary across toolkits.
Definitions and references: SolubleMPNN, ProteinMPNN, ESM, ipTM, AlphaFold‑Multimer, pSAE.
Workflow: Filter with NetSolP/SoluProt → rank by ESM log‑likelihoods → redesign with SolubleMPNN‑style constraints → re‑screen.