The following examples show the ATOM records from the current PDB format and an example from the proposed stylized PDBx/mmCIF format. In the PDBx/mmCIF example the order of columns places the chain, residue and atom nomencature items in the left-most columns. Data items that depend on the experimental method (e.g. occupancy, B-value ) are placed in columns to the left. All of the items of the atom record in the PDBx/mmCIF format example are placed on a single text line and are white-space delimited.
PDB entries in PDBx/mmCIF format are stored on the ftp sites of the wwPDB partners at one of the locations:
Entries containing very large structures stored PDBx/mmCIF format are currently stored separately one of the locations:
The PDBx/mmCIF format files are named following the convention <PDB_4-LETTER-ID_CODE>.cif.gz
(e.g. 1abc.cif.gz).
Experimental data files containing X-ray structure factors are only distributed in PDBx/mmCIF format and are named following an
older PDB naming convention r<PDB_ID_CODE>sf.ent.gz
(e.g. r1abcsf.ent.gz).
A complete description of the download options for PDB data files is maintained at here by the wwPDB. The special handling of PDB entries containing very large structures is available here.
The PDBx/mmCIF format has a simple appearance with only a few syntax elements. All of syntax elements used in PDBx data files are shown in the following snippet describing polymer sequence.
The essential syntax features include:
_entity_poly.entity_id
.
_category.attribute
which
are separated by a period.
entity_name_com
and
entity_poly
both use the key-value style and the entity_poly_seq
category uses the tabular style. In the tabular sytle, the data
item names correpsonding to the table columns follow a reserved loop_
token which are followed by the rows
of data rows of white-space delimited data values.
_entity_name_com.name
) must be quoted. Character values that extend over
multiple lines are quoted using leading and trailing semi-colons positioned at the first character position of the
records surronding the multi-line character value (e.g._entity_poly.pdbx_seq_one_letter_code
).
#
are comments.
Look here for a more complete description of PDBx/mmCIF data file and dictionary syntax.
# <-- a comment line _entity_name_com.entity_id 1 _entity_name_com.name "Pantoate--beta-alanine ligase, Pantoate-activating enzyme" _entity_poly.entity_id 1 _entity_poly.type 'polypeptide(L)' _entity_poly.nstd_linkage no _entity_poly.nstd_monomer no _entity_poly.pdbx_seq_one_letter_code ;AMAIPAFHPGELNVYSAPGDVADVSRALRLTGRRVMLVPTMGALHEGHLALVRAAKRVPGSVVVVSIFVNPMQFGAGGDL DAYPRTPDDDLAQLRAEGVEIAFTPTTAAMYPDGLRTTVQPGPLAAELEGGPRPTHFAGVLTVVLKLLQIVRPDRVFFGE KDYQQLVLIRQLVADFNLDVAVVGVPTVREADGLAMSSRNRYLDPAQRAAAVALSAALTAAAHAATAGAQAALDAARAVL DAAPGVAVDYLELRDIGLGPMPLNGSGRLLVAARLGTTRLLDNIAIEIGTFAGTDRPDGYR ; # loop_ _entity_poly_seq.entity_id _entity_poly_seq.num _entity_poly_seq.mon_id _entity_poly_seq.hetero 1 1 ALA n 1 2 MET n 1 3 ALA n 1 4 ILE n 1 5 PRO n 1 6 ALA n 1 7 PHE n # .... abbreviated ....
Yes, the atom coordindate records in the PDBx/mmCIF data distributed by the wwPDB are stored on individual lines each beginning with either 'ATOM' or 'HETATM'. The elements of each coordinate record are white-space delimited. For example, PDBx/mmCIF coordinate records in PDB entries all have the following regular layout.
loop_ _atom_site.group_PDB _atom_site.id _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_alt_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_entity_id _atom_site.label_seq_id _atom_site.pdbx_PDB_ins_code _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.Cartn_x_esd _atom_site.Cartn_y_esd _atom_site.Cartn_z_esd _atom_site.occupancy_esd _atom_site.B_iso_or_equiv_esd _atom_site.pdbx_formal_charge _atom_site.auth_seq_id _atom_site.auth_comp_id _atom_site.auth_asym_id _atom_site.auth_atom_id _atom_site.pdbx_PDB_model_num ATOM 1 N N . VAL A 1 1 ? 6.204 16.869 4.854 1.00 49.05 ? ? ? ? ? ? 1 VAL A N 1 ATOM 2 C CA . VAL A 1 1 ? 6.913 17.759 4.607 1.00 43.14 ? ? ? ? ? ? 1 VAL A CA 1 ATOM 3 C C . VAL A 1 1 ? 8.504 17.378 4.797 1.00 24.80 ? ? ? ? ? ? 1 VAL A C 1 ATOM 4 O O . VAL A 1 1 ? 8.805 17.011 5.943 1.00 37.68 ? ? ? ? ? ? 1 VAL A O 1 ATOM 5 C CB . VAL A 1 1 ? 6.369 19.044 5.810 1.00 72.12 ? ? ? ? ? ? 1 VAL A CB 1 ATOM 6 C CG1 . VAL A 1 1 ? 7.009 20.127 5.418 1.00 61.79 ? ? ? ? ? ? 1 VAL A CG1 1 ATOM 7 C CG2 . VAL A 1 1 ? 5.246 18.533 5.681 1.00 80.12 ? ? ? ? ? ? 1 VAL A CG2 1 ATOM 8 N N . LEU A 1 2 ? 9.096 18.040 3.857 1.00 26.44 ? ? ? ? ? ? 2 LEU A N 1 ATOM 9 C CA . LEU A 1 2 ? 10.600 17.889 4.283 1.00 26.32 ? ? ? ? ? ? 2 LEU A CA 1 ATOM 10 C C . LEU A 1 2 ? 11.265 19.184 5.297 1.00 32.96 ? ? ? ? ? ? 2 LEU A C 1 ATOM 11 O O . LEU A 1 2 ? 10.813 20.177 4.647 1.00 31.90 ? ? ? ? ? ? 2 LEU A O 1 ATOM 12 C CB . LEU A 1 2 ? 11.099 18.007 2.815 1.00 29.23 ? ? ? ? ? ? 2 LEU A CB 1 ATOM 13 C CG . LEU A 1 2 ? 11.322 16.956 1.934 1.00 37.71 ? ? ? ? ? ? 2 LEU A CG 1 ATOM 14 C CD1 . LEU A 1 2 ? 11.468 15.596 2.337 1.00 39.10 ? ? ? ? ? ? 2 LEU A CD1 1 ATOM 15 C CD2 . LEU A 1 2 ? 11.423 17.268 0.300 1.00 37.47 ? ? ? ? ? ? 2 LEU A CD2 1
The following command will extract the PDB atom record name, atom name, residue name, chain identifier, residue number, Cartesian X, Y, and Z coordinates from the above snippet of PDBx/mmCIF coordinate data for PDB entry 4HHB.
grep '^ATOM' 4HHB.cif | awk '{print $1, $25, $23, $24, $22, $11, $12, $13}'