Contact

ModelArchive is developed by the Computational Structural Biology group at the SIB - Swiss Institute of Bioinformatics and the Biozentrum University of Basel. Please let us know if you encounter any problems using this service, or if you have suggestions how we could improve this site by emailing us at help-modelarchive@unibas.ch

Scope of ModelArchive

ModelArchive is the archive for structural models which are not based on experimental data and complements the PDB archive for experimental structures and PDB-Dev for integrative structures. ModelArchive is being developed following a community recommendation during a workshop on applications of protein models in biomedical research (Schwede et al, 2009).

Any type of macromolecular structure which would otherwise be suitable for the PDB but whose coordinates are not based on experimental data can be deposited in ModelArchive. This includes single chains or complexes consisting of proteins, RNA, DNA, or carbohydrates including small molecules bound to them. The modelling methods can be pure in silico predictions as found in de novo models or based on experimental structures such as homology models or modified structures including docked ligands, modelled variants, post-translational modifications (e.g. glycosylated structures), etc.

The main purpose of a deposited model is to supplement a manuscript for which the model was generated and to make the model accessible for the interested reader. The secondary purpose is to make all deposited models findable, accessible, interoperable and reusable (FAIR) for interested researchers. As a result, ModelArchive provides a unique stable accession code (DOI) for each deposited model, which can be directly referenced in the corresponding manuscripts. Besides the actual model coordinates, archiving of models should include sufficient details about the purpose of the modelling, the source of sequences for the macromolecules, the modelling steps performed, and estimates of model quality to assess the applicability of the model for specific applications.

This document provides guidelines on the minimal and desired information to be provided with each model.

File format for models

Model coordinates are preferably stored in the standard PDB archive format PDBx/mmCIF. While, for many purposes, the legacy PDB format may suffice to store model coordinates and is still widely used, the format is no longer being modified or extended. ModelArchive depositions in the legacy PDB format are internally converted to PDBx/mmCIF. The wwPDB’s page on PDBx/mmCIF includes extensive documentation on the file format, the available dictionaries and software resources to assist users in generating PDBx/mmCIF files.

Metadata for theoretical models of macromolecular structures can be provided during the deposition process in ModelArchive but should preferably be stored using the PDBx/mmCIF ModelCIF Extension Dictionary prior to deposition. The extension enables programmatic access to the metadata and its use is required to deposit large sets of models (please contact us to coordinate such depositions in ModelCIF format). The extension is being developed by the ModelCIF working group with input from the community. Please contact us if you have questions, feedback or change requests for the extension.

Data to provide with depositions

DescriptionIn mmCIF format *In ModelArchive
Coordinates of the model follow standard formats of the PDB and commonly include the main per residue model quality estimation in place of B-factors.Following standard PDBx/mmCIF dictionaryTo be uploaded when starting a deposition
The name of the deposition appears in search results and should be concise and representative for the model._struct.titleProject Name
The type of model to distinguish e.g. homology and de novo models._ma_model_list.model_typeModel Type
The model description contains the purpose of the study for which the model was generated and relates to the manuscript which cites the model._struct.pdbx_model_detailsProject Overview / Abstract
An image to represent the depositionAutomatically generated from coordinatesTo be uploaded in Project Overview
The list of authors for the deposition including email addresses for the corresponding author and the principal investigator._audit_author (email addresses, corresponding author and principal investigator in ModelArchive deposition process)Authors
Funding information to acknowledge grants (optional)._pdbx_audit_supportFunding Information
The release policy defines how the model is made available after deposition.Only in ModelArchive deposition processRelease Policy (see deposition workflow)
Citation to the manuscript for which the model was generated and which refers to the model (to be added once manuscript published)_citation and _citation_authorCitations (contact us by email to provide it for an existing, accepted entry)
Source for the sequence of the modelled macromolecules (e.g. link to UniProtKB incl. sequence version and checksum)_ma_target_ref_db_detailsPart of Material
Input data used from other sources and of relevance for modelling steps (e.g. PDB ID of template used for homology modelling).See detailed info in best practises described belowPart of Material
Software used in the modelling steps_software and _ma_software_groupPart of Material
The modelling steps describe how the model was generated and (if applicable) manual interventions_ma_protocol_stepPart of Modelling Protocol
Estimates of model quality should be (if possible) provided at least on a global and local (per residue) level. Multiple estimates can be provided as well as estimates per residue pair. See detailed info below for how to obtain those estimates._ma_qa_metric_global, _ma_qa_metric_local, and _ma_qa_metric_local_pairwiseDescription and global scores as part of Modelling Protocol; local scores can be added in place of B-factors or as part of the accompanying data
Accompanying data can be provided as separate files if necessary (optional)._ma_entry_associated_filesCan be uploaded in Materials & Modelling Protocol

* the data item or category listed and linked here is where the main data is stored, but please check what additional items must be provided for the PDBx/mmCIF file to be valid

Best practises for protein structure predictions

Modelling steps and input data

The modelling steps and provided input data should enable an informed reader to reproduce a qualitatively similar model. In the minimal case a single modelling step is provided with a reference to the used software, the provided input and a description of manual interventions compared to the default modelling process of the used software. In mmCIF this is stored using the ma_protocol_step category which contains a details item for free text descriptions and which can be connected to ma_data_group for description of input data and to ma_software_group for the used software.

The preferred level of detail varies significantly with the type of modelling method used. In all cases, a free text description can be sufficient, but more fine grained descriptions are possible to enable programmatic access to certain details such as the template selected for homology modelling stored in ma_template_details and its alignment to the target stored in ma_alignment. Recent deep learning based de novo methods may also make use of templates but it may be more critical to have a reasonably large multiple sequence alignment (MSA) as an input. Such large input files are preferably stored as part of the accompanying data and referred to in ma_entry_associated_files.

Model quality estimates

Models from protein structure prediction methods must contain estimates of the expected accuracy of the structure prediction. This is commonly referred to as “model quality” or “model confidence” and is of major relevance to determine whether a given model can be used for downstream analysis. Quality estimates should enable users to judge the expected accuracy of the prediction both globally and locally. The provided values are meant to predict the values of a similarity metric such as the lDDT (Mariani et al, 2013) or TM score (Zhang et al, 2004) comparing the model with the coordinates of the correct protein structure.

The accuracy of quality estimates has increased significantly over the years (see Fig. 4 in Haas et al, 2019) and estimation methods develop in parallel to improved structure prediction methods. It is hence critical to ues a relatively recent and well benchmarked standalone tool/service (e.g. using ProQ3, ModFOLD8, QMEANDisCo or their latest variants) or quality estimates provided by the structure prediction method itself (e.g. QMEAN/QMEANDisCo scores from SWISS-MODEL, predicted lDDT from RoseTTAFold or pLDDT from AlphaFold). Note that by convention, the main per-residue quality estimates are stored in place of B-factors in model coordinate files. In mmCIF files any number of quality estimates can be properly described and stored in ma_qa_metric.

Please check CAMEO, CASP, and CAPRI to find suitable quality estimators. In CAMEO for instance, estimates of local quality are assessed on a weekly basis and the results can be found in the QE category for standalone tools and in the “Model Confidence” metric of the 3D category for quality estimates provided by the structure prediction method itself.

Deposition Workflow

Step 1: Login, upload structure, complete all data fields, submit deposition
Improvements to the deposition may be suggested. You can then make edits before resubmitting
Before acceptance, we will assess whether details such as an adequate project description, listing of database identifiers where appropriate and model quality method/values are sufficient.
Accepted
Depending on Release Policy
a) Wait for publication
Deposition is password protected
b) Immediate public release.
Deposition is visible immediately to all
Step 2: You must inform ModelArchive when a paper citing the model is published.*
The citation will be added followed by activation of the Digital Object Identifier (DOI) which completes the deposition process.
*Please refer to the model in your manuscript with a sentence such as the following: "The model is available in ModelArchive at https://www.modelarchive.org/doi/10.5452/ma-xx". For lists of models you can also use a sentence such as: "The models are available in ModelArchive (www.modelarchive.org) with the accession codes ma-xx, ...".