Structures Without Legacy PDB Format Files
PDB File Formats
The primary data format for PDB data is the PDBx/mmCIF format. See a guide to the format and a complete reference of the mmCIF dictionary for more information. The data is also offered in XML format (PDBML) following the same mmCIF dictionary. Additionally, in some cases the data is offered in the legacy PDB format.
PDB entries without PDB-format files
Several type of PDB entries are not offered in the legacy PDB format anymore:
- Entries containing multiple character chain ids
- Entries containing > 62 chains
- Entries containing > 99999 ATOM coordinates
- Entries that have complex beta sheet topology, see more details
- Entries containing B-factors > 999.99
Note that the entries listed above can be found using Advanced Search (under Deposition > Compatible with PDB Format > equals > N)
Also, PDB entries with extended CCD or PDB IDs will be distributed in PDBx/mmCIF format only. Learn more about the extended CCD and PDB IDs.
TAR files containing a collection of best effort/minimal files in the PDB format are available for some of the entries that do not have legacy PDB-format files (exceptions are: entries containing > 99999 ATOM coordinates in a single chain and entries that have complex beta sheet topology). The entire structure is divided into several minimal files. The TAR files are available for download from each Structure Summary page and in a separate directory in the PDB FTP archive (https://files.wwpdb.org/pub/pdb/compatible/pdb_bundle/) grouped by the middle two characters of the 4-character PDB ID.
Best effort/minimal PDB format files contain only authorship, citation details and coordinate data under HEADER, AUTHOR, JRNL, CRYST1, SCALEn, ATOM, HETATM records.
The following PDB records are not included in the best effort/minimal files: OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, REVDAT, SPRSDE, REMARKS, DBREF, SEQADV, SEQRES, MODRES, HET, HETNAM, HETSYN, FORMUL, HELIX, SHEET, SSBOND, LINK, CISPEP, SITE, ORIGXn, MTRIXn, CONECT.
Example: Truncated best effort/minimal file for the entry 4u50
HEADER RIBOSOME 2014-07-24 XXXX
AUTHOR N.GARREAU DE LOUBRESSE, I.PROKHOROVA, G.YUSUPOVA, M.YUSUPOV
JRNL AUTH N.Garreau de Loubresse, I.Prokhorova, W.Holtkamp,
JRNL AUTH 2 M.V.Rodnina, G.Yusupova, M.Yusupov
JRNL TITL Structural basis for the inhibition of the eukaryotic
JRNL TITL 2 ribosome.
JRNL REF Nature V. 513 517 2014
JRNL REFN NATUAS UK 1476-4687
JRNL PMID 25209664
JRNL DOI 10.1038/nature13737
CRYST1 434.390 285.580 303.060 90.00 98.99 90.00 P 1 21 1 4
SCALE1 0.002302 0.000000 0.000364 0.00000
SCALE2 0.000000 0.003502 0.000000 0.00000
SCALE3 0.000000 0.000000 0.003341 0.00000
ATOM 1 P U A 1 -88.608 31.952 64.746 1.00 81.64 P
ANISOU 1 P U A 1 5964 13260 11795 1046 -383 -1793 P
ATOM 2 OP1 U A 1 -88.681 32.341 63.314 1.00 82.66 O
ANISOU 2 OP1 U A 1 6111 13412 11886 1075 -507 -1778 O
The TAR file also contains an index file with the mapping between the chains present in the large entry and the chains present in the minimal PDB format files.
Example: Truncated index file for the entry 4u50
New chain ID Original chain ID
An index of all structure entries that do not have a standard PDB format file is updated regularly on the PDB FTP site at https://files.wwpdb.org/pub/pdb/compatible/pdb_bundle/pdb_bundle_index.txt. This list will continue to grow as new large structures are deposited and released.
Historically, large files containing >62 chains and/or 99999 ATOM lines were "split" across multiple PDB format files. These files were combined into single entries at the end of 2014 (wwPDB Announcement). An index file mapping large structures to corresponding obsoleted split entries is available at https://files.wwpdb.org/pub/pdb/compatible/pdb_bundle/large_split_mapping.tsv. This file does not list large structures that were never deposited as split entries.