Help | ModelArchive

Contact

ModelArchive is developed by the Computational Structural Biology group at the SIB - Swiss Institute of Bioinformatics and the Biozentrum University of Basel. Please let us know if you encounter any problems using this service, or if you have suggestions how we could improve this site by emailing us at help-modelarchive@unibas.ch

Reference

If you use this resource, please cite the following paper:

ModelArchive: a deposition database for computational macromolecular structural models. JMB (2025)

Gerardo Tauriello, Andrew M. Waterhouse, Juergen Haas, Dario Behringer, Stefan Bienert, Thomas Garello, Torsten Schwede

Scope of ModelArchive

ModelArchive serves as a repository and archive for computational structure models that are not based on experimental data. It complements the PDB archive for experimental structures and PDB-IHM for integrative structures. It is being developed following a community recommendation during a workshop on applications of protein models in biomedical research (Schwede et al, 2009).

Any type of macromolecular structure which would otherwise be suitable for the PDB but whose coordinates are not based on experimental data can be deposited in ModelArchive. This includes single chains or complexes consisting of proteins, RNA, DNA, or carbohydrates including small molecules bound to them. The modelling methods can be pure in silico predictions as found in de novo models or based on experimental structures such as homology models or modified structures including docked ligands, modelled variants, post-translational modifications (e.g. glycosylated structures), etc.

The main purpose of ModelArchive is to enable scientific reproducibility and maximise the potential reuse of models. Depositors are required to provide a minimum set of information about the modelling procedure and the expected accuracy of the results. The completeness and correctness of this information is checked by curators in the ModelArchive team. A unique, stable accession code (DOI) is provided for each deposited model, which can be directly referenced in the corresponding manuscripts.

The deposited models are findable, accessible, interoperable and reusable (FAIR) for interested researchers. Findability and accessibility are provided through queries on this website and external services (e.g. 3D-Beacons, RCSB PDB, ViralZone, Foldseek (BFMD)). Accessibility is subject to our terms of use and may be restricted for individual entries. Interoperability and reusability are enabled by the use of standard wwPDB file formats, accompanied by appropriate metadata.

File format for models

Model coordinates are preferably stored in the standard PDB archive format PDBx/mmCIF. While, for many purposes, the legacy PDB format may suffice to store model coordinates and is still widely used, the format is no longer being modified or extended. ModelArchive depositions in the legacy PDB format are internally converted to PDBx/mmCIF. The wwPDB’s page on PDBx/mmCIF includes extensive documentation on the file format, the available dictionaries and software resources to assist users in generating PDBx/mmCIF files.

Metadata for theoretical models of macromolecular structures can be provided during the deposition process in ModelArchive but should preferably be stored using the PDBx/mmCIF ModelCIF Extension Dictionary (Vallat et al, 2023) prior to deposition. The extension is being developed by the ModelCIF working group with input from the community. Please contact us if you have questions, feedback or change requests for the extension.

ModelCIF provides programmatic access to the metadata and its use is required for the deposition of large model sets. Model sets consist of an overview entry with a single accession code that is used to represent the entire set (e.g. ma-dm-prc) and acts as a prefix to the accession codes of individual models (e.g. ma-dm-prc-462). Work is ongoing to improve support for ModelCIF by modelling tools and services. In the meantime, we provide guidelines for data harvesting and encourage depositors to contact us to coordinate the deposition of model sets in ModelCIF format. The process of converting data to ModelCIF can take from a few weeks to several months, depending on the complexity and novelty of the modelling protocol and the size of the model set. Please ensure that you plan such a deposition well in advance of your publication so that it is ready in time.

Most of the model sets include downloadable compressed ZIP archives that contain all of the coordinate files in the data set. The associated data for the entries is not included in these archives to keep the size manageable. However, the location of the associated data can be accessed programmatically in the ModelCIF formatted coordinate files. The ZIP archive is only created on user request for very large model sets. Please contact us if you require other data exports and we will try to find a solution.

Since PDBx/mmCIF is the default format of the PDB and ModelCIF is an extension of it, the vast majority of software tools can read ModelCIF files and at least extract the information (including model coordinates and molecular entities). Tools such as mol* and ChimeraX, which are used to visualize and analyse molecular data, can also read and display additional ModelCIF information such as model quality estimates.

Data to provide with depositions

Description	In ModelCIF format *	In ModelArchive
Coordinates of the model follow standard formats of the PDB and commonly include the main per residue model quality estimation in place of B-factors.	Following standard PDBx/mmCIF dictionary	To be uploaded when starting a deposition
The name of the deposition appears in search results and should be concise and representative for the model.	_struct.title	Project Name
The type of model to distinguish e.g. homology and de novo models.	_ma_model_list.model_type	Model Type
The model description contains the purpose of the study for which the model was generated and relates to the manuscript which cites the model.	_struct.pdbx_model_details	Project Overview / Abstract
An image to represent the deposition	Automatically generated from coordinates	To be uploaded in Project Overview
The list of authors for the deposition including email addresses for the corresponding author and the principal investigator.	_audit_author (email addresses, corresponding author and principal investigator in ModelArchive deposition process)	Authors
Funding information to acknowledge grants (optional).	_pdbx_audit_support	Funding Information
The release policy defines how the model is made available after deposition.	Only in ModelArchive deposition process	Release Policy (see deposition workflow)
Citation to the manuscript for which the model was generated and which refers to the model (to be added once manuscript published)	_citation (with id == “primary”) _citation_author	Citations (contact us by email to provide it for an existing, accepted entry)
List of molecular entities with short descriptive texts.	_entity	Part of Material
Source for the sequence of the modelled macromolecules (e.g. link to UniProtKB incl. sequence version and checksum)	_ma_target_ref_db_details	Part of Material
Input data used from other sources and of relevance for modelling steps (e.g. PDB ID of template used for homology modelling).	See detailed info in best practises described below	Part of Material
Software used in the modelling steps, including citations for the software and non-default parameters.	_software _citation _citation_author _ma_software_group _ma_software_parameter	Part of Material
The modelling steps describe how the model was generated and (if applicable) manual interventions	_ma_protocol_step	Part of Modelling Protocol
Estimates of model quality should be (if possible) provided at least on a global and local (per residue) level. Multiple estimates can be provided as well as estimates per residue pair. See detailed info below for how to obtain those estimates.	_ma_qa_metric_global, _ma_qa_metric_local, and _ma_qa_metric_local_pairwise	Description and global scores as part of Modelling Protocol; local scores can be added in place of B-factors or as part of the accompanying data
Accompanying data can be provided as separate files if necessary (optional).	_ma_entry_associated_files	Can be uploaded in Materials & Modelling Protocol

* the data item or category listed and linked here is where the main data is stored, but please check what additional items must be provided for the ModelCIF file to be valid

The best starting point for a deposition is to use modelling software or services that produce valid ModelCIF files in the first place. If done well, these would contain the necessary information about the software used, modelling steps, input data, and estimates of model quality. One would then only need to add the name and description for the deposition, sources and descriptions for the molecular entities, and the list of authors.

Best practises for protein structure predictions

Modelling steps and input data

The modelling steps and provided input data should enable an informed reader to reproduce a qualitatively similar model. In the minimal case a single modelling step is provided with a reference to the used software, the provided input and a description of manual interventions compared to the default modelling process of the used software. In ModelCIF this is stored using the ma_protocol_step category which contains a details item for free text descriptions and which can be connected to ma_data_group for description of input data and to ma_software_group for the used software.

The preferred level of detail varies significantly with the type of modelling method used. In all cases, a free text description can be sufficient, but more fine grained descriptions are possible to enable programmatic access to certain details such as the template selected for homology modelling stored in ma_template_details and its alignment to the target stored in ma_alignment. Recent deep learning based de novo methods may also make use of templates but it may be more critical to have a reasonably large multiple sequence alignment (MSA) as an input. If the MSA has been generated by an automated procedure, it is sufficient to list the reference databases queried to generate the MSA. If it is necessary to store the complete MSA as an intermediate result, this can be done. Such large input files are then preferably stored as part of the associated data and referenced in ma_entry_associated_files.

Model quality estimates

Models from protein structure prediction methods must contain estimates of the expected accuracy of the structure prediction. This is commonly referred to as “model quality” or “model confidence” and is of major relevance to determine whether a given model can be used for downstream analysis. Quality estimates should enable users to judge the expected accuracy of the prediction both globally and locally. The provided values are meant to predict the values of a similarity metric such as the lDDT (Mariani et al, 2013) or TM score (Zhang et al, 2004) comparing the model with the coordinates of the correct protein structure.

The accuracy of quality estimates has increased significantly over the years (see Fig. 4 in Haas et al, 2019) and estimation methods develop in parallel to improved structure prediction methods. It is hence critical to ues a relatively recent and well benchmarked standalone tool/service (e.g. using ProQ3, ModFOLD9, QMEANDisCo or their latest variants) or quality estimates provided by the structure prediction method itself (e.g. QMEAN/QMEANDisCo scores from SWISS-MODEL, predicted lDDT from RoseTTAFold or pLDDT from AlphaFold). Note that by convention, the main per-residue quality estimates are stored in place of B-factors in model coordinate files. In ModelCIF files any number of quality estimates can be properly described and stored in ma_qa_metric.

Please check CAMEO, CASP, and CAPRI to find suitable quality estimators. In CAMEO for instance, estimates of local quality are assessed on a weekly basis and the results can be found in the QE category for standalone tools and in the “Model Confidence” metric of the 3D category for quality estimates provided by the structure prediction method itself.

Deposition Workflow

Step 1: Login, upload structure, complete all data fields, submit deposition

Improvements to the deposition may be suggested. You can then make edits before resubmitting

Before acceptance, we will assess whether details such as an adequate project description, listing of database identifiers where appropriate and model quality method/values are sufficient.

Accepted

Depending on Release Policy
a) Wait for publication Deposition is password protected	b) Immediate public release. Deposition is visible immediately to all

Step 2: You must inform ModelArchive when a paper citing the model is published.*

The citation will be added followed by activation of the Digital Object Identifier (DOI) which completes the deposition process.

*Please refer to the model in your manuscript with a sentence such as the following: "The model is available in ModelArchive at https://www.modelarchive.org/doi/10.5452/ma-xx". For lists of models you can also use a sentence such as: "The models are available in ModelArchive (www.modelarchive.org) with the accession codes ma-xx, ...".

Validation process and criteria

The validation process in ModelArchive aims to ensure scientific reproducibility and maximise the potential for reuse. The ModelArchive team validates each submission for data completeness and compliance with syntactic and semantic criteria. Syntactic validation confirms that the included data is correctly formatted, while semantic validation ensures consistency between the metadata and the model itself, e.g. checking that the modelled sequence matches the stated provenance.

The sections on "Data to provide with depositions" and "Best practises for protein structure predictions" above serve as guidelines for the data typically required for a submission to be considered complete. In general, we would expect a user familiar with the methodology to be able to reproduce the results, and a user familiar with structure prediction to be able to judge whether the model can be used for a desired downstream application. The availability of estimates of model quality is critical to this, and we therefore require them to be visible as part of the metadata wherever possible. However, model quality will not be used as a rejection criterion. In this way, the repository maintains a high standard of quality without excluding potentially useful models.

The ModelArchive team aims to review submissions and respond to requests made by contacting us by email within two working days. If we have not responded within a week, please contact us to ensure there are no technical problems. Submissions will be returned to the submitter with feedback for correction if discrepancies or missing information are identified.

Our deposition web service has default file size limits that should accommodate all models (as of 2025, these are 20MB for the gzippable model file and 100MB for the compressed accompanying data). If a desired submission exceeds these limits, please contact us and we will try to find a solution.

Depositors should contact us as soon as the article for which the model was generated is published (preferably with PubMed ID) and we will add the citation. Approximately once a year, the ModelArchive team sends out reminders to contributors about unpublished submissions so that we can add missing citations and prevent submissions from remaining password protected indefinitely.

Example ModelCIF entries

ma-nmpfamsdb-f000001 is an AlphaFold monomer model with custom MSA input and storage of non-top-ranked models.
ma-kul-lams-02 is a ColabFold model which also describes a mutation with the molecular entities and has a custom extra modelling step described as free text.
ma-rap-bacsu-014 is an AlphaFold-Multimer model
ma-denv-03 is a homology model including info on templates used.
ma-jd-viral-00007 has multiple cross-links to NCBI and UniProt for the target protein.

Labelled illustration

Auto-generated entry for ModelCIF file in a model set (Source: ma-rap-bacsu-014)