Part A

  1. Sign up for HuggingFace

    1. https://huggingface.co/ChatterjeeLab/PepMLM-650M
    2. Colab Notebook (link)
  2. Find the amino acid sequence for SOD1 in UniProt (ID: P00441), a protein when mutated, can cause Amyotrophic lateral sclerosis (ALS). In fact, the A4V (when you change position 4 from Alanine to Valine) causes the most aggressive form of ALS, so make that change in the sequence

    1. Original Sequence: MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
    2. Modified Sequence: MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
  3. Enter your mutated SOD1 sequence into the PepMLM inference API and generate 4 peptides of length 12 amino acids (Step 5 takes a while so you can also just pick 1 or 2 peptides)

    image.png

  4. To your list, add this known SOD1-binding peptide to your list: FLYRWLPSRRGG [from -https://genesdev.cshlp.org/content/22/11/1451]

    1. List of Peptides:
    Binder Pseudo Perplexity
    0 WRSPAAAAELGX
    1 WLSPVAGAAHKE
    2 FLYRWLPSRRGG
  5. Go to AlphaFold-Multimer (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). This is similar to what you did for homework last week but instead for a protein-peptide complex

    1. Set model_type: alphafold2_multimer_v3 (this model has been shown to recapitulate peptide-protein binding accurately: https://www.frontiersin.org/articles/10.3389/fbinf.2022.959160/full). * Add your query sequence - Its the SOD1Sequence:PeptideSequence.
  6. After running AlphaFold-Multimer with your 5 peptides alongside your mutated SOD1 sequence, plot the ipTM scores, which measures the relative confidence of the binding region.

Binder Pseudo Perplexity ipTM Scores
0 WRSPAAAAELGX 2025-03-29 00:48:56,439 rank_001_alphafold2_multimer_v3_model_4_seed_000 pLDDT=90.5 pTM=0.888 ipTM=0.293
2025-03-29 00:48:56,440 rank_002_alphafold2_multimer_v3_model_1_seed_000 pLDDT=90.9 pTM=0.887 ipTM=0.28
2025-03-29 00:48:56,440 rank_003_alphafold2_multimer_v3_model_3_seed_000 pLDDT=90.6 pTM=0.886 ipTM=0.277
2025-03-29 00:48:56,440 rank_004_alphafold2_multimer_v3_model_2_seed_000 pLDDT=90.4 pTM=0.883 ipTM=0.24
2025-03-29 00:48:56,441 rank_005_alphafold2_multimer_v3_model_5_seed_000 pLDDT=90.3 pTM=0.881 ipTM=0.21
1 WLSPVAGAAHKE 2025-03-29 00:57:41,137 rank_001_alphafold2_multimer_v3_model_3_seed_000 pLDDT=91.2 pTM=0.889 ipTM=0.296
2025-03-29 00:57:41,138 rank_002_alphafold2_multimer_v3_model_1_seed_000 pLDDT=90.9 pTM=0.887 ipTM=0.289
2025-03-29 00:57:41,138 rank_003_alphafold2_multimer_v3_model_4_seed_000 pLDDT=90.5 pTM=0.886 ipTM=0.278
2025-03-29 00:57:41,138 rank_004_alphafold2_multimer_v3_model_2_seed_000 pLDDT=90.8 pTM=0.885 ipTM=0.263
2025-03-29 00:57:41,138 rank_005_alphafold2_multimer_v3_model_5_seed_000 pLDDT=90.8 pTM=0.885 ipTM=0.252
2 FLYRWLPSRRGG 2025-03-29 01:09:04,769 rank_001_alphafold2_multimer_v3_model_2_seed_000 pLDDT=91.6 pTM=0.883 ipTM=0.215
2025-03-29 01:09:04,770 rank_002_alphafold2_multimer_v3_model_1_seed_000 pLDDT=91.4 pTM=0.882 ipTM=0.207
2025-03-29 01:09:04,770 rank_003_alphafold2_multimer_v3_model_5_seed_000 pLDDT=91.1 pTM=0.881 ipTM=0.203
2025-03-29 01:09:04,770 rank_004_alphafold2_multimer_v3_model_4_seed_000 pLDDT=90.9 pTM=0.878 ipTM=0.171
2025-03-29 01:09:04,770 rank_005_alphafold2_multimer_v3_model_3_seed_000 pLDDT=90.9 pTM=0.874 ipTM=0.13

Binder 0 structure:

image.png

Binder 1 structure:

image.png

Binder 2 (known) structure:

image.png

  1. Provide a 1 paragraph write-up of your results

image.png

Overall, the known peptide has a much lower confidence value than the generated peptides. This could be due to the mutation, which affected the protein folding, or disrupted the chemical interactions which usually allows binding.

Part B

Bacteriophage MS2 is a single stranded RNA virus whose genome only encodes 4 proteins -the maturation protein (A-protein), the lysis (L-Protein) protein, the coat protein (cp), and the replicase (rep) protein. Bacteriophages infect E-coli. Upon infection, the L-Protein forms pores in the E-coli cell membrane which eventually leads to breakdown of the membrane (Lysis). DnaJ is a chaperone protein in E-coli (chaperone proteins are proteins that assist during protein folding). It is thought to be involved in the lysis mechanism. In this homework, we will explore if computational models we learnt about in the last class are useful for designing variants/mutants of the lysis protein sequence. We will study the effects of L-protein mutants on the bacteriophage infectivity.