Coronavirus disease (COVID-19), caused by the virus SARS-CoV-2, is already responsible for more than 4.3 million confirmed cases and 295,000 deaths worldwide as of May 15, 2020. Ongoing efforts to control the pandemic include the development of peptide-based vaccines and diagnostic tests. In these approaches, HLA allelic diversity plays a crucial role. Despite its importance, current knowledge of HLA allele frequencies in South America is very limited. In this study, we have performed a literature review of datasets reporting HLA frequencies of South American populations, available in scientific literature and/or in the Allele Frequency Net Database. This allowed us to enrich the current scenario with more than 12.8 million data points. As a result, we are presenting updated HLA allelic frequencies based on country, including 91 alleles that were previously thought to have frequencies either under 5% or of an unknown value. Using alleles with an updated frequency of at least ≥5% in any South American country, we predicted epitopes in SARS-CoV-2 proteins using NetMHCpan (I and II) and MHC flurry. Then, the best predicted epitopes (class-I and -II) were selected based on their binding to South American alleles (Coverage Score). Class II predicted epitopes were also filtered based on their three-dimensional exposure. We obtained 14 class-I and four class-II candidate epitopes with experimental evidence (reported in the Immune Epitope Database and Analysis Resource), having good coverage scores for South America. Additionally, we are presenting 13 HLA-I and 30 HLA-II novel candidate epitopes without experimental evidence, including 16 class-II candidates in highly exposed conserved areas of the NTD and RBD regions of the Spike protein. These novel candidates have even better coverage scores for South America than those with experimental evidence. Finally, we show that recent similar studies presenting candidate epitopes also predicted some of our candidates but discarded them in the selection process, resulting in candidates with suboptimal coverage for South America. In conclusion, the candidate epitopes presented provide valuable information for the development of epitope-based strategies against SARS-CoV-2, such as peptide vaccines and diagnostic tests. Additionally, the updated HLA allelic frequencies provide a better representation of South America and may impact different immunogenetic studies.
Bibliographical noteFunding Information:
We gratefully acknowledge the authors and laboratories that generated and submitted genome sequences to the GISAID's EpiCov™ Database, on which this research is based. The detailed list of genomes sequenced used, authors, and submitting laboratories is provided in Table S1. All submitters of data may be contacted directly via www.gisaid.org. We also thank Rydberg Supo-Escalante and Eduardo Gushiken for their critical observations and recommendations during the course of this research. Finally, we warmly thank the Professional School of Genetics and Biotechnology at the National University of San Marcos (Escuela Profesional de Genética y Biotecnología de la Universidad Nacional Mayor de San Marcos) for the invaluable scientific and humanistic formation provided to all authors. Funding. The publication fee of this study was sponsored by the Vice-rectorate for Research and Postgraduate Studies of the National University of San Marcos (Vicerrectorado de Investigación y Posgrado de la Universidad Nacional Mayor de San Marcos).
© Copyright © 2020 Requena, Médico, Chacón, Ramírez and Marín-Sánchez.
- South America
- allele frequency
- literature review