Reading files with Python

For Python users involved in molecular modeling, reading files is likely a major part of analyzing results. Here I show a few potentially useful ways to approach file handling with Python. This list is in no way exhaustive. It is intended to serve as a brief overview for beginners.

1. Reading in coordinates from an XYZ file

In an earlier post, I discussed the XYZ file format and its use in molecular modeling. There are many reasons why one might need to read in the nuclear coordinates of a molecule from an XYZ file. Suppose you're trying to map a potential energy curve as a function of some internuclear distance, bond angle, or dihderal torsion. You'll want a way to calculate the distance in each structure used to map your curve, that doesn't involve opening a GUI software to measure the distance "by hand." Here is an example of how to open an XYZ file and read in its contents.

----------------------------------------------------
    f = input('Enter name of XYZ file:  ') 
    xyz = open(f)
    N = int(xyz.readline())
    header = xyz.readline()
    atom_symbol, coords = ([] for i in range (2))
    for line in xyz:
        atom,x,y,z = line.split()
        atom_symbol.append(atom)
        coords.append([float(x),float(y),float(z)])
     xyz.close()
----------------------------------------------------

2. Extracting a specific portion of an output file

Now suppose there is are specific quantities you want to extract from the results of an electronic structure calculation. These quantities could be: molecular orbital energies, nuclear coordinates of the optimized structure, atomic partial charges from population analysis, and many others. Sometimes this can be challenging given that output files contain a lot of information and are made up of: text, numbers, and symbols.

Here is an example showing one way to find the optimized cartesian coordinates in a MOPAC output (.out) file. The coordinates always appear after the second instance of the string, 'CARTESIAN COORDINATES'.

----------------------------------------------------
    import numpy as np
    with open("ouput.out", "r") as f:
        content = []
        cartesian = 0
        for line in f: 
            if cartesian >= 2:
                content.append(line)
            if 'CARTESIAN COORDINATES' in line:
                cartesian += 1
    coords = np.array(content)
    f.close()
----------------------------------------------------

In the above example, the array 'coords' can be further manipulated depending on the goal of the analysis.

3. Reading in data from an Excel sheet

Excel is a powerful tool when used correctly. Sometimes it may be useful to use both Python and Excel for analysis. Here is a simple example of how to read in values from specific columns in a chosen sheet of an Excel workbook. For further information, I point the reader to the xlrd documentation.

----------------------------------------------------
    import xlrd
    book = xlrd.open_workbook('mybook.xlsx')
    sheet = book.sheet_by_name('SHEET1')
    G = 29
    atoms = [[sheet.cell_value(r, c) for c in range(1)] for r in range(2,G+2)]
    x_vals =[[sheet.cell_value(r, c) for c in range(1,2)] for r in range(2,G+2)]
----------------------------------------------------

4. Reading in a large number of files

In some instances, one may need to read in a large number of files. It would then be impractical to specify each file name indvidually. Here is an example showing one way to approach reading in a set of .dat files, each containing one vector. For further information, I point the reader to the glob documentation.

----------------------------------------------------
  import glob
  import os
  vec  = []
  files = sorted(glob.glob('*.dat'))
  for file in files:
      f = open(file,'r')
      for line in f:
          vec.extend(line.split())
      f.close()
----------------------------------------------------
Previous
Previous

Plotting with Matplotlib

Next
Next

A slowly growing repository of code snippets