Monday, February 27, 2012

Read an ascii file (cont')


In this message we'll see other ways to do the same job than in the previous post: read an ascii file.

In the previous post we read a file of the form:

0.000000 0.000000
0.095200 0.095056
0.190400 0.189251
0.285599 0.281733
0.380799 0.371662
0.475999 0.458227
0.571199 0.540641
0.666398 0.618159
0.761598 0.690079
0.856798 0.755750
...
using 2 lists, one for each column. It can be easier to manage only one 2D variable containing the whole table. Easier to apply filter on it, for example.
We first read the whole file into a single variable:
f2 = open('datas.dat', 'r')
lines = f2.readlines()
f2.close()

One can alternatively use the with formulation, which close the file automatically once used:
with open('datas.dat', 'r') as f2:
    lines = f2.readlines()

lines is not usable as it is, as it contains the lines of the file in string form, without separating the 2 values...

We now define data as a list, which will contain the lines of the file in the right format:
data = []

Now, we loop on the lines to copy the values, but after splitting the line:
for line in lines:
    p = line.split()
    data.append(p)

One can do the 4 previous commands in a single one, using list comprehension:
data = [line.split() for line in lines]

Let's have a look at the first element of the list data:
print(data[0])
['0.000000', '0.000000']

Ok, the value of x and y are not anymore in a single variable, but to really use them, we have to:
  1. transform the list into a numpy array, to allow operations on the values.
  2. transform this into floating point.
The 2 operations can be performed one after the other:

data2 = np.array(data)
data2 = data2.astype(float)

or in a single command (notice the f in asfarray, to transform into float):
data2 = np.asfarray(data)

The numpy arrays have a lot of properties and method. One tells us the shape of the array:
print(data2.shape)
(100, 2)

To access the distinct columns, use:
x = data2[:,0]
y = data2[:,1]

As the array is a numpy object, one can perform operations on it:
z = (data2[:,0]**2 + data2[:,1]**2)**0.5

Numpy includes functions to read ascii files and return float arrays directly. The following is then very compact and efficient:
data3 = np.loadtxt('datas.dat')


An even more complete tool to read ascii file is the following:
data4 = np.genfromtxt('datas.dat')
It allows to read formated files, deal with undefined values, etc...
One can even ask for this function to automatically put the result in 2 different variables:
x, y = np.loadtxt('datas.dat', unpack = True)
x, y = np.genfromtxt('datas.dat', unpack = True)

Once you have the data2, data3, or data4, you can plot it (we'll see more on plots latter):
plt.plot(data2[:,0], data2[:,1])

A small comment on the order of the elements in arrays in Python:
There is two ways arrays can be stored: row- or column major. It has a direct impact on the way one has to loop on the arrays. IDL is like Fortran (column major) and Python is like C (row major). It means that in Python, as you move linearly through the memory of an array, the second dimension (rightmost) changes the fastest, while in IDL the first (leftmost) dimension changes the fastest.
Consequence on the loop order in Python:

for i in range(100):
   for j in range(2):
         ... data2[i,j] ...

Another comment on boundaries in arrays:
print(data2[0:4, 0])
[ 0.        0.0952    0.1904    0.285599]
Some would expect the answer to be a 5 elements sub-array. The last index of 0:4 is not included. The rule is that a[n:m] return the elements between n and m-1.

Something quite powerful: accessing the latest elements of an array:
print(data2[-1, :])
[ 9.424778  0.      ]
print(data2[-5:-1, :])
[[ 9.043979  0.371662]
 [ 9.139179  0.281733]
 [ 9.234378  0.189251]
 [ 9.329578  0.095056]]

The np.genfromtxt method allows to read formatted column tables, as for example the following file:
#  Line      Iobs    lambda  relat_error Obs_code
H  1  4861A 1.00000    4861. 0.08000  Anabel                               
H  1  6563A 2.8667     6563. 0.19467  Anabel                               
H  1  4340A 0.4933     4340. 0.03307  Anabel                               
H  1  4102A 0.2907     4102. 0.02229  Anabel                               
H  1  3970A 0.1800     3970. 0.01253  Anabel                               
N  2  6584A 2.1681     6584. 0.08686  Anabel                               
N  2 121.7m 0.00446 1217000. 0.20000  Liu                                  
O  1  6300A 0.0147     6300. 0.00325  Anabel                               
TOTL  2326A 0.07900    2326. 0.20000  Adams                                
C  2 157.6m 0.00856 1576000. 0.20000  Liu                                  
O  1 63.17m 0.13647  631700. 0.10000  Liu                                  
O  1 145.5m 0.00446 1455000. 0.200    Liu                                  
TOTL  3727A 0.77609    3727. 0.200    Torres-Peimbert                      
S II  4070A 0.06174    4070. 0.200    Torres-Peimbert                      
S II  4078A 0.06174    4078. 0.200    Torres-Peimbert                      

read using this command:
obs  = np.genfromtxt('observations.dat', skip_header=1,
                     dtype=["a11","float","float","float","a2","a2"],
                     delimiter=[11,8,9,8,2,2],
                     names = ['label', 'i_obs', 'lambda', 'e_obs', 'na', 'observer'],
                     usecols = (0,1,2,3,5)
                     )
The skip_header tells not to read the first line. The dtype describes the type of the 6 columns. delimiter is the list of the sizes (in characters) of each column. The names are used to identify the columns. The usecols is used to specify which columns to output (here I don't need the 5th column, made of 2 spaces before the observer name)
The data are accessible using for example:
obs['i_obs']


To learn more in arrays indexing:
To learn more about list comprehension:
More about np.loadtxt, np.genfromtxt:

Saturday, February 25, 2012

Read a 2 columns file and plot the result

"Reading and writing files" is available in the 2014 version of the lecture here: http://python-astro.blogspot.mx/2014/09/interacting-with-files-reading-writing.html


In this second post, we will read the file we wrote in the previous one (datas.dat), and use the data to draw a plot.

We will now have to import the plotting package, which is part of matplotlib (we will also need numpy):

import matplotlib.pyplot as plt
import numpy as np
There is a lot of different ways to read a file, depending if it is ASCII or fits or Binary, if we want to extract only some columns, if we know the format of the data, etc... Even for a given data file, for example the one we did last post, I'll present various ways, to illustrate some functions of python.

We now open the file with the 'r' parameter, to specify that we will read it:
f2 = open('datas.dat', 'r')
'r' is optional.

We read the whole file in a single variable, we will deal with it latter. If the file is really big, and/or we only want a small part of it (we are looking for some line, or only one column is of interest) we may have to proceed differently:
lines = f2.readlines()

Now that all the datas are in the lines variable, we can close the file and forget it:
f2.close()

lines  contains the 200 values from the 2 vectors x and y, but we will have to extract this information, as it is now in an unusable form.
First we can ask for the type of the variable:
type(lines)
list
A list is... a list of objects. It can handle objects of different types, even other lists.

We can print the length of the list, it is the number of elements it contains:
len(lines)
100

We can print the first element:
print(lines[0])
0.000000 0.000000 
It seems it is the first line of our file, nice!

But it is a single string...:
print(type(lines[0]))
<type 'str'>

We will have to use a method of the object string to split the line into the 2 values:
print(lines[0].split())
['0.000000', '0.000000']
The result is another list, with 2 elements, each one being a string with the value of x and y corresponding to the line.

We can now access to the value of x, and transform it into a floating point value:
print(float(lines[0].split()[0]))
0.0

We will now loop on all the lines of the file (they are stored in the lines variable) and do the same.
We want to store the result in 2 lists. We first then need to define these lists, that will later be filled by the values of x and y.
x1 = []
y1 = []

The two empty lists are now ready. We can loop on files without to know the number of elements, python take care of this:
for line in lines:
    p = line.split()
    x1.append(float(p[0]))
    y1.append(float(p[1]))

We use p as an intermediate variable, to store the list of 2 strings (like ['0.000000', '0.000000'] for the first line).
append is a method applied to the lists x1 and y1, to insert a new element to the end of the list.

We now have the x and y vectors in the 2 variables x1 and y1.
We can transform the two lists into numpy vectors, to have more flexibility on them:
xv = np.array(x1)
yv = np.array(y1)
We can plot them:
plt.plot(xv, yv)

To see the plot, one last command must be sent:
plt.show()
A window should pop-up with a nice sin(x) plot!

All the previous commands can be stored in a file, called for example Ex2.py and execute by python.
From the shell:
LINUX> python Ex2.py
or from with python
import Ex2
or from ipython
%run Ex2

Here is the script of Ex2.py:

# Need to import the plotting package:
import matplotlib.pyplot as plt

# Read the file.
f2 = open('datas.dat', 'r')
# read the whole file into a single variable, which is a list of every row of the file.
lines = f2.readlines()
f2.close()

# initialize some variable to be lists:
x1 = []
y1 = []

# scan the rows of the file stored in lines, and put the values into some variables:
for line in lines:
    p = line.split()
    x1.append(float(p[0]))
    y1.append(float(p[1]))
   
xv = np.array(x1)
yv = np.array(y1)


# now, plot the data:
plt.plot(xv, yv)

plt.show()



To learn more on lists:
To learn more on string manipulations:
To learn more on plot (but we'll coma back to this latter):

Write vectors as an ASCII file

In this first post we will create 2 vectors and write them into a file.

Python can be used interactively, with classical call to python from the command line or by running iPython. In the following, I will mainly use iPython, which provides a lot of tools to help the user interacting with its variables and programs.

Use copy-paste to execute the commands found in this color in the following, in an interactive python or ipython shell. The outputs from python are in this color.

We can also put all the commands in a file and execute it from python (we'll come back to this latter).

As we want to use vectors and perform operations on them, we need an additional library called numpy.
To import this library, we will write at the very beginning of the session (or the file):
import numpy

Everything depending on numpy will be called using numpy. before its name (Notice the point after numpy). For example, if we need to create an array with 10 elements named a, we do:

a = numpy.arange(10)

Given that it's a little heavy to have to write numpy. a lot of times, we can alias this when importing the package, so that we only need to use np.:

import numpy as np
a = np.arange(10)

You can print out the value of a by just typing "a" and ENTER:
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 

The output can differ if you use iPython:
In [2]: a
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

One can also print a:
print a
[0 1 2 3 4 5 6 7 8 9]

We want to create 2 vectors: x and y = sin(x). We will first define some variables holding the parameters of the vectors:
x_min = 0.
x_max = 3 * np.pi
n_steps = 100

To generate an array of n_steps values ranging from x_min to x_max, use linspace:
x = np.linspace(x_min, x_max, n_steps)

x is a numpy array, which support arithmetic operations like :
y = x**2
It can be used as argument for functions who will apply on each element of the array and return an array of the result. The trigonometrical functions are also available from numpy, so we use np. to access it:
y = np.sin(x)

We now want to store the 2 vectors as 2 columns of an ASCII file.
We first open the file, named 'datas'. We tell python that we will write to it, using 'w':
f = open('datas.dat', 'w')
f is an object (everything in Python is an object ;-) ), and one can play with some function that apply to f or some variable related to f directly by calling them using f. (notice the point after f):
print f.name, f.mode
datas w
We have to loop on the different elements of x and y to write them. The indexes in Python start at 0. The function np.arange(n) returns a list of integer from 0 to n-1, exactly what we need to scan the values of the two arrays x and y.
The loop is classically defined by a for instruction. In Python, the start of a block (for, while, if, etc.. blobk) is delimited by :, and the block is defined by indentation:
for i in np.arange(n_steps): 
    f.write('{0:f} {1:f} \n'.format(x[i], y[i]))
f.close()

We see that the f.write is indented: this is the block (here a single line, but can be very bigger) on which the loop is run. The instruction to close the file f.close() is written at the beginning of the line, no more indentation, so the loop is terminated.
The format to write x[i] and y[i] is '{0:f} {1:f} \n'.
{0:f} is to tell python to write the first variable in format(x[i], y[i]), namely x[i], as floating points.
\n is to go to the next line. We will have a post dedicated to formats latter, you can already have a look at some links below.

One can copy-paste all the previous commands in a python session, or write all in a file, to be run directly.

The python program is there:

#! /usr/bin/env python
#Need to import numpy:

import numpy as np

# define some variables:
x_min = 0.
x_max = 3 * np.pi
n_steps = 100
# create a vector from x_min to x_max, with n_steps elements
x = np.linspace(x_min, x_max, n_steps)
# create a 2nd vector, as a function of x
y = np.sin(x)

# Write x and y into a file
f = open('datas.dat', 'w')
for i in np.arange(n_steps):
# could have been: for xx, yy in zip(x,y):
    f.write('{0:f} {1:f} \n'.format(x[i], y[i]))
f.close()

Save it somewhere with the name Ex1.py.
You can run it from the shell by:
LINUX> python Ex1.py

With the help of the first line #! /usr/bin/env python, once the Ex1.py file is set to executable (chmod a+x Ex1.py), it's possible to run it directly:
LINUX> ./Ex1.py

or from python:
import Ex1
Notice that import actually also executes any instruction in the imported file.

or from ipython
In [1]: %run Ex1

The file "datas.dat" should contains the x and y values, in 2 columns, like this:
0.000000 0.000000
0.095200 0.095056
0.190400 0.189251
0.285599 0.281733
0.380799 0.371662
0.475999 0.458227
0.571199 0.540641
0.666398 0.618159
0.761598 0.690079
0.856798 0.755750
...

To lear more on files:
To learn more on string format:

Bienvenidos

Hola!

There is so many people who wants to follow the Python course at IA-UNAM that I decided to create a blog to share stuff. This way those who cannot assist the lecture can at least follow the advances and download the scripts and programs. I can also use this blog in real time while in the lecture.

The useful packages that are recommended to install and some links are listed THERE. I will try to keep this page uptodate, it is also accessible directly from the right column of the blog.

The messages are updated as we need it, so don't trust the publication date...

So, bienvenidos, comments welcome ;-)
Christophe