SHREC 2019: Special track on protein shape retrieval


Envisioned task

The aim of this track is to assess the performance of shape retrieval algorithms on a small dataset of protein surfaces.

Proteins are complex macro-molecular molecules constituted of hundreds to millions of atoms, and are usually classified according to their function in the cellular environment. They display various motions reflecting (1) the relative motion of their atoms and (2) their ability to undergo small to large conformational changes in order to perform their cellular activities through surficial binding notably with other proteins (Protein-Protein Interaction, PPIs). Proteins can be described as non-rigid surfaces representing their solvent-excluded surface as defined by Connoly (Connoly et al, J Appl Cryst. 1983). Detecting similarities and/or dissimilarities between protein surfaces (all surfaces from all proteins of a cell, for instance) is of main importance in drug discovery pipelines, adverse drug event prediction and in the characterization of molecular processes and diseases.

This track proposes a set of 5298 surfaces (provided as off files) representing the conformational space of 211 individual proteins. Compared to the previous SHREC17’ and SHREC18’ Protein Shape Retrieval contests, we focus on the evaluation of the performance in retrieval of ortholog protein surfaces (proteins having the same activity in different organisms, i.e. the human and murine haemoglobin proteins) in addition to the usual evaluation of the performance in retrieval of the different conformers of a given protein.

Dataset and Ground Truth

The dataset is built using the SCOPe v2.07 database (Fox et al, Nucleic Acids Research, 2014; Chandonia et al, Nucleic Acids Research, 2019) entries. Only entries corresponding to Nuclear Magnetic Resonance (NMR) structures from the Protein Data Bank (Berman et al, Nucleic Acids Research, 2000) were considered. NMR structures containing only one conformation or NMR structures whose conformers display various number of atoms were discarded for consistency. Finally, we kept entries corresponding to ortholog proteins when at least 4 ortholog proteins were implemented in the SCOPe database. The peptides and small proteins were discarded. The structures were retrieved, and each individual conformation (corresponding to the positions of the protein atoms expressed as x, y and z coordinates) was separated into an individual model in the dataset. All outer solvent-excluded surfaces (SES) of all models were calculated using EDTSurf (Xu et al, Plos One 2009). The models in the dataset are randomly shuffled.

All models will be used as a query against the whole dataset. The participants are asked to produce a distance-to-the-query dissimilarity matrix. The ground truth is extracted from the SCOPe v2.07 database hierarchical classification; only the two lowest levels of the database (Species and Proteins, respectively) are used to generate the ground truth.

SHREC2019_proteins.cla

SHREC2019_species.cla

OFF files can be downloaded here:

protein_shape_retrieval_contest.tar.gz

Evaluation

Standard metrics of previous shape retrieval experiments will be used: precision - recall evaluation, Nearest Neighbor, first-tier and second-tier and Discounted Cumulative Gain. The participants are expected to return their results as distance matrix file in binary format.

It is important for the participants to provide runtimes of their calculations since it is a critical information for processing large datasets notably in this particular context of molecular shapes.

 Schedule timeline

 

Feb 4, 2019 - The dataset is made available on shrec2019.drugdesign.fr/ The participants are free to run their calculations.

Feb 15, 2019 - Registration deadline. Registration must be sent to Matthieu Montès and Florent Langenfeld .

Mar 3, 2019 - Submission deadline of the results. Each participant is allowed a maximum of up to 3 runs. Results are submitted along with a summary of the method(s) used to generate the results. 

March 7, 2019 - The organizers circulate the evaluation of all participants of the tracks, and release the ground truth.

March 11, 2019 - The participants send their comments on the results.

March 13, 2019 - The organizers send a draft of the track report to the participants for reviews, comments and feedback.

March 15, 2019 - The track review is submitted for review.

March 25, 2019 - Reviews are done, notification to the participants.

April 5, 2019 - Camera-ready track paper is submitted for inclusion into the proceedings.

May 5-6, 2019 - Eurographics Workshop on 3D Object Retrieval 2019, featuring SHREC 2019.


Organizers 

Matthieu Montès - Conservatoire National des Arts-et-Métiers 
Florent Langenfeld - Conservatoire National des Arts-et-Métiers