PDBx/mmCIF Dictionary Resources

INSTALL AND BUILD INSTRUCTIONS

Download the library.
Install the pdbx module, either by adding the directory containing the pdbx module to your PYTHONPATH or by moving the pdbx directory (and subdirectories) to a location already in your PYTHONPATH (in Python IDLE, import sys and check the contents of sys.path). See here for more about adding a module to the Python search path (e.g., in BASH:).
```
		  mkdir -p source/python/modules
		  mv pdbx source/python/modules
		  PYTHONPATH=$PYTHONPATH:source/python/modules
		  export PYTHONPATH
		
```
Test the installation by executing python PdbxReaderTests.py and python PdbxReadWriteTests.py in /path/to/pdbx/reader, or python PdbxWriterTests.py in /path/to/pdbx/writer.
If you do not receive any 'module not found' errors and the tests run, you should be able to import from the pdbx module anywhere.

PYTHON EXAMPLES

Connections.py: Uses the PDBX library to interface with Chimera. Shows how to retrieve and iterate over the struct_conn category, which delineates connections in a molecule, and locate connections of interest (in this case, covalent bonds) for Chimera to emphasize and animate.
Structures.py: Uses the PDBX library to interface with Chimera. Shows how to retrieve and iterate over the struct_site_gen category, which delineates members of structurally relevant sites in a molecule, and locate all structurally relevant sites for Chimera to emphasize and animate.
Connections3.py: This example shows one way of using the information about a partner atom in a connection, detailed in the the struct_conn category, to identify the atom in the atom_site category, and, in this case, to determine the (x,y,z) Cartesian coordinates of said atom. In this case, we look for partner atoms involved in covalent bonds and report their (x,y,z) coordinates.
Connections2.py: Uses the PDBX library to interface with Chimera. Shows how to find connections of certain types that involve certain entities by retrieving and iterating over the struct_conn category, which delineates connections in a molecule, and using the struct_asym and entity categories to determine the entity types involved in each connection. In this case, polymer-polymer covalent bonds are sought for Chimera to display and animate.
FASTA.py: This example shows how the (sequence) information contained in a CIF file can be readily accessed and transformed into another format. This particular example implements a FASTA converter, which reads the monomer sequences in the entity_poly_seq category and translates them into the single-letter FASTA format.
Assemblies.py: A more involved and extensive example that uses the PDBX library to generate a CIF file for each biological assembly listed in the pdbx_struct_assembly category of a CIF file. This example synthesizes information located in the pdbx_struct_assembly_gen, pdbx_struct_oper_list, and atom_site categories to accomplish this task.

Basic I/O Operations

Reading and writing are handled by the PdbxReader (in pdbx.reader.PdbxReader) and PdbxWriter (in pdbx.writer.PdbxWriter) classes, respectively.

Using PdbxReader

imports: PdbxReader from PdbxReader, * from PdbxContainers

Open() a CIF file and store the file handle
```
ifh = open("/path/to/file.cif")
```
Initialize a PdbxReader object with the input file handle
```
pRd = PdbxReader(ifh)
```
Initialize a list to be propagated with DataContainer (and/or DefinitionContainer) objects (of the DataContainer class, which inherits from ContainerBase) parsed from the CIF file, where data blocks map to DataContainer objects
```
data = []
```
Call the read(self, containerList) method with your list
```
pRd.read(data)
```
Your list is now propagated with one or more DataContainer objects, which represent data blocks. To get the first data block, just use list notation: block = data[0]
To retrieve a category object, use the getObj(self, name) method
```
struct_conn = block.getObj("struct_conn")
```
To retrieve a value stored in a category table, e.g., the connection type of the first linkage described in the struct_conn category table, use the getValue(self, attributeName=None, rowIndex=None) method
```
connType = struct_conn.getValue("conn_type_id", 0)
```
See below for other methods to handle blocks, and, subsequently, the contents of the category objects they contain.

Using PdbxWriter

imports: PdbxWriter from PdbxWriter, * from PdbxContainers

Open() a file for writing and store the file handle
```
ofh = open("path/to/out.cif", "w")
```
Initialize a PdbxWriter object with the output file handle
```
pWt = PdbxWriter(ofh)
```
The two major PdbxWriter write methods are write(self, containerList), which takes a list of containers, data and/or definition, and writeContainer(self, container), which takes a single data or definition container.
Now you can declare one or more DataContainer/DefintionContainer objects and write them.

Containers and Methods

All of the containers are accessible through pdbx.reader.PdbxContainers. The DataContainer, to which data blocks map, and DefinitionContainer classes derive from ContainerBase, which maintains an internal dictionary of DataCategory (derived from DataCategoryBase) objects, to which categories map. The following are some methods of interest for these three major container objects, viz., DefinitionContainer and DataContainer, derived from ContainerBase, and DataCategory, derived from DataCategoryBase.

DefinitionContainer/DataContainer

exists(self, name) - returns a bool indicating whether or not the DataCategory object named name exists in this container
getObj(self, name) - returns the DataCategory object named name, or None if it doesn't exist
getObjNameList(self) - returns the list of category names within this container
printIt(self, fh=sys.stdout, type="brief") - prints out the contents of the container

DataCategory

__getitem__(self, x) - special method, category[x] returns the row specified by the integer x in category
get(self) - returns 3-tuple consisting of (categoryName, attributeNameList, rowList)
getRowList(self) - returns a list of all the rows in the category table
getRowCount(self) - returns the number of rows in the category table
getRow(self, index) - attempts to fetch the row at index index and returns an empty list if it fails
getAttributeList(self) - returns a list of attribute/data item names
getAttributeCount(self) - returns the number of attributes/columns in the category table
getAttributeIndex(self, attributeName) - returns the index of the attribute specified by attributeName or -1 if not found
hasAttribute(self, attributeName) - returns a bool indicating whether or not the category has the attribute attributeName
getIndex(self, attributeName) - same as getAttributeIndex(self, attributeName)
getValue(self, attributeName=None, rowIndex=None) - returns the value of the attribute attributeName at row index rowIndex

PDBx Python Parser Examples and Tutorial