Do NOT split on whitespace
The other answers given here make a flawed assumption - that coordinates will be space-delimited. Per the PDB specification of ATOM
, this is not necessarilly the case: PDB record values are specified by column indices, and may flow into one another. For instance, your first ATOM
record reads:
ATOM 920 CA GLN A 203 39.292 -13.354 17.416 1.00 55.76 C
But this is perfectly valid as well:
ATOM 920 CA GLN A 203 39.292-13.3540 17.416 1.00 55.76 C
The better approach
Because of the column-specified indices, and the number of other problems that can occur in a PDB file, you should not write your own parser. The PDB format is messy, and there's a lot of special cases and badly formatted files to handle. Instead, use a parser that's already written for you.
I like Biopython's PDB.PDBParser
. It will parse the structure for you into Python objects, complete with handy features. If you prefer Perl, check out BioPerl.
PDB.Residue
objects allow keyed access to Atoms by name, and PDB.Atom
objects overload the -
operator to return distance between two Atoms. We can use this to write clean, concise code:
Code
from Bio import PDB
parser = PDB.PDBParser()
# Parse the structure into a PDB.Structure object
pdb_code = "1exm"
pdb_path = "pdb1exm.ent"
struct = parser.get_structure(pdb_code, pdb_path)
# Grab the first two residues from the structure
residues = struct.get_residues()
res_one = residues.next()
res_two = residues.next()
try:
alpha_dist = res_one['CA'] - res_two['CA']
except KeyError:
print "Alpha carbon missing, computing distance impossible!"
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…