FindMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM

Grzegorz Chojnowski, Adam J. Simpkin, Diego A. Leonardo, Wolfram Seifert-Davila, Dan E. Vivas-Ruiz, Ronan M. Keegan, Daniel J. Rigden

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.

Original languageEnglish
Pages (from-to)86-97
Number of pages12
JournalIUCrJ
Volume9
DOIs
StatePublished - 1 Jan 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022.

Keywords

  • SIMBAD
  • bioinformatics
  • cryo-EM
  • findMySequence
  • neural networks
  • protein sequences
  • protein structures
  • structure determination

Fingerprint

Dive into the research topics of 'FindMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM'. Together they form a unique fingerprint.

Cite this