ABOUT THIS EXAMPLE:

This example shows how to use the Search function to fulfill a query of arbitrary complexity. In this case, we find the (x, y, z) coordinates of every alpha carbon atom in chain A of the 5HVP molecule, using the atom_site category. Note that the Search function takes three optional arguments of note, none of which are used here, viz., an unsigned int fromRowIndex, which specifies a row from which to begin the search, a const eSearchDir searchDir, which specifies the direction of the search, and a const eSearchType searchType, which specifies the type of the search (e.g., find rows with values equal to target values, greater than target values, etc.).

BUILD INSTRUCTIONS:

Files: Search.C, 5HVP.cif

	    Save Search.C to /path/to/cifparse-obj-vX.X-prod-src/parser-test-app-vX.X/src/
	    Save the CIF file anywhere, e.g., /path/to/cifparse-obj-vX.X-prod-src/parser-test-app-vX.X/bin/
	    Add Search.ext to the BASE_MAIN_FILES list in the Makefile in /path/to/cifparse-obj.vX.X-prod-src/parser-test-app-vX.X
	    Execute make in the same directory as the Makefile
	    cd to bin, where the executable has been made, and run ./Search /path/to/5HVP.cif
	  

Functions of Note

#include "CifFile.h"
string CifFile::GetFirstBlockName()
Returns the first data block name. CifFile inherits this method from TableView. Related: CifFile::GetBlockNames(vector<string>& blockNames).
Block& CifFile::GetBlock(const string& blockName)
Retrieves a data block specified by some blockName. CifFile inherits this method from TableView.
ISTable& Block::GetTable(const string& name)
Retrieves a table (i.e., category) within the block, specified by some name.
#include "ISTable.h"
void Search(vector<unsigned int>& res, const vector<string>& targets, const vector<string>& colNames, const unsigned int fromRowIndex = 0, const eSearchDir searchDir = eFORWARD, const eSearchType searchType = eEQUAL, const string& indexName = string())
Propagates res with the indices of every row whose attributes colNames have the values targets.
void ISTable::GetRow(vector<string>& row, const unsigned int rowIndex, const string& fromColName = string(), const string& toColName = string())
Returns the row in the zero-indexed category table specified by some rowIndex. Related: vector<string>& ISTable::GetRow(const unsigned int rowIndex).

Basic Sample Output

./Search 5HVP.cif
99 atoms found:
(29.970, 38.922, 4.561)
(29.636, 36.572, 1.593)
(28.879, 33.060, 2.829)
(29.513, 30.113, 0.468)
(27.327, 27.003, 0.728)
(29.974, 24.273, 0.856)
(28.869, 23.997, 4.484)
(25.689, 24.716, 6.375)
(25.415, 28.480, 6.613)
(25.452, 28.653, 10.471)
(25.970, 31.965, 12.332)
-------truncated-------
(13.908, 32.611, 7.947)
(13.090, 35.152, 5.233)
(11.249, 37.435, 7.675)
(14.550, 38.372, 9.439)
(16.951, 37.736, 6.539)
(15.383, 40.593, 4.616)
(15.952, 42.771, 7.692)
(19.718, 42.404, 7.594)
(19.906, 42.920, 3.781)
(20.463, 39.289, 2.946)
(20.506, 38.362, -0.735)
(21.255, 35.231, -2.813)
(23.596, 35.908, -5.720)
(24.853, 33.899, -8.704)
/*************************
* Search.C
*
* For some CIF file, determine the (x, y, z) Cartesian coordinates
* of every alpha carbon atom in the A chain.
*
* Method: Perform a Search query on the atom_site category table.
*
* Highlighted lines contain footnoted references or explanations.
*************************/

#include <iostream>
#include <string>
#include <vector>

#include "CifFile.h"
#include "CifParserBase.h"
#include "ISTable.h"

int main(int argc, char **argv)
{
    // The name of the CIF file
    string cifFileName = argv[1];
    
    // A string to hold any parsing diagnostics
    string diagnostics;

    // Create CIF file and parser objects
    CifFile *cifFileP = new CifFile;
    CifParser *cifParserP = new CifParser(cifFileP);

    // Parse the CIF file
    cifParserP->Parse(cifFileName, diagnostics);

    // Delete the CIF parser, as it is no longer needed
    delete cifParserP;

    // Display any diagnostics
    if (!diagnostics.empty())
    {
        std::cout << "Diagnostics: " << std::endl << diagnostics << std::endl;
    }

    // Get the first data block name in the CIF file
    string firstBlockName = cifFileP->GetFirstBlockName();

    // Retrieve the first data block
    Block &block = cifFileP->GetBlock(firstBlockName);

    // Retrieve the table corresponding to the atom_site category, which delineates atomic constituents1
    ISTable& atom_site = block.GetTable("atom_site");

    // Will hold the atom_site row indices of any atoms fulfilling our search query
    vector<unsigned int> results;

    // Holds attribute names and their target values
    vector<string> colNames, targets;

    // We want alpha carbons in chain A2
    colNames.push_back("label_atom_id"); 
    targets.push_back("CA");
    colNames.push_back("auth_asym_id");
    targets.push_back("A");

    // Perform the search, propagating the results vector with atom indices
    atom_site.Search(results, targets, colNames);

    // Retrieve and display the coordinates of every atom satisfying our query
    std::cout << results.size() << " atoms found: \n";
    vector<string> coords;
    for (unsigned int i = 0; i < results.size(); ++i)
    {
        atom_site.GetRow(coords, results[i], "Cartn_x", "Cartn_z");3
        std::cout << "(" << coords[0] << ", " << coords[1] << ", " << coords[2] << ")" << std::endl;
    }
    return 0;
}

NOTES AND REFERENCES

  1. http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html
  2. Note that for brevity we are assuming that an author-provided value, which is non-mandatory but commonly present, exists for the asym_id attribute. In a more extensive program, this is easily accounted for with ISTable::IsColumnPresent(const string& columnName), which returns a bool indicating the presence or absence of some column specified by columnName. Note also that while some columns may be present, their values may be "?", which indicates a missing data item value, or ".", which indicates that there is no appropriate value for that data item or that it has been intentionally omitted.
  3. Note that in many CIF files the x, y, and z in Cartn_x, Cartn_y, Cartn_z are capitalized.