Sunday, March 25, 2012

Play with FITS files

The same lecture is now in the Notebook format, see there: http://python-astro.blogspot.mx/2014/09/interacting-with-files-reading-writing.html


What is the FITS format?

The FITS format is the most popular way to save and interchange astronomical data. The files are organized in units each of which contains a human readable header and a data. This structure is refereed as HDUs (Header/DATA Unit).

A FITS file can contain one or more HDUs, the first of which is called "primary" and the rest are called "extensions". The primary HDU usually contains 1D spectrum, 2D image or 3D data cube, although any dimension from 0 to 999 are possible. The data are 1, 2 or 4 bytes integers or 4 or 8 bytes real numbers.

The extensions can contain or arrays as in the primary HDU or ascii tables or binary tables.
If a FITS file contains only tables, it primary HDU does not contain data, but only header.

Both headers and data in a FITS file are organized in blocs of 2880 bytes. The header contain 80 bytes lines each of which consists of a keyword of 8 bytes followed in most of the cases by '= ' in the position 9 and 10 and then the value of the keyword. The rest of the line is a comment string beginning with '/'. Each header begins with the following lines

SIMPLE  =                    T / file conforms to FITS standard
BITPIX  =                   16 / number of bits per data pixel
NAXIS   =                    2 / number of data axes
NAXIS1  =                  440 / length of data axis 1
NAXIS2  =                  300 / length of data axis 2

which defines the format of the file as standard FITS, the data format and the dimensions of the stored data.

One block of 2880 bytes contains 36 lines of 80 characters per line. The header can have several blocks of 36 lines. The last block is identified by the presence of the keyword 'END' The next 2880 bytes block contains the first part of the data. The empty lines after 'END' keyword are filled with blanks and the unused bytes from the end of the data to the end of the 2880 bytes block are filled with NULLs.

Full description of the FITS format can be found at http://fits.gsfc.nasa.gov/fits_primer.html

PyFITS

As one can see, reading FITS files is not as simple as reading columns of data. Fortunately, there is a python package called PyFITS which makes the task much easier.

The package can be downloaded from

http://www.stsci.edu/institute/software_hardware/pyfits

The documentation can be found on the same web page.

Although it is possible to download and install the package it is much easier to install it with

easy_install pyfits
or
pip pyfits

Information on easy_install: HERE and on pip: HERE.
I did that on both Mac and Linux machines and it working nicely.

How to read FITS file

First, as usual, one has to import the pyfits package

import pyfits

then the file is read with

hdulist = pyfits.open('n10017o.fits')

where I used one of my FITS files from San Pedro Martir echelle spectrograph. The file can be downloaded from HERE.

The result hdulist is a list of HDU objects. In the case of a simple file, there is only one primary HDU so the list contains only one element

len(hdulist)
1

The information on what the file contains can be obtained by calling the info() method:

hdulist.info()
Filename: n10017o.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      62   (2154, 2048)   int16


The table said that there is only a primary HDU which contains 2154 X 2048 image with data stored in 2 bytes (16 bits) integers.

As described above, the HDU (header/data unit) contains header and data. The header is a dictionary. To see what keywords were used in the header one can do:

hdulist[0].header.keys()
['SIMPLE',
 'BITPIX',
 'NAXIS',
 'NAXIS1',
 'NAXIS2',
 'EXTEND',
 'COMMENT',
 'BZERO',
 'BSCALE',
 'EXPTIME',
 'DETECTOR',
 'ORIGIN',
 'OBSERVAT',
 'TELESCOP',
 'LATITUDE',
 'LONGITUD',
 'ALTITUD',
 'SECONDAR',
 'TIMEZONE',
 'OBSERVER',
 'OBJECT',
 'INSTRUME',
 'GAINMODE',
 'FILTER',
 'IMGTYPE',
 'EQUINOX',
 'ST',
 'UT',
 'JD',
 'DATE-OBS',
 'CCDSUM',
 'RA',
 'DEC',
 'AH',
 'AIRMASS',
 'TMMIRROR',
 'TSMIRROR',
 'TAIR',
 'XTEMP',
 'HUMIDITY',
 'ATMOSBAR',
 'WIND',
 'WDATE',
 'DATE',
 'NAMPS',
 'CCDNAMPS',
 'AMPNAME',
 'CREATOR',
 'VERSION',
 'HISTORY']

and to get the value of a given keyword

hdulist[0].header['OBJECT'] 
'BD +59 363'

The header can be printed as it apears in the file by

print hdulist[0].header.ascardlist()
SIMPLE  =                    T / conforms to FITS standard                     
BITPIX  =                   16 / array data type                               
NAXIS   =                    2 / number of array dimensions                    
NAXIS1  =                 2154 / length of data axis 1                         
NAXIS2  =                 2048 / length of data axis 2                         
EXTEND  =                    T                                                 
COMMENT FITS (Flexible Image Transport System) format is defined in 'Astronomy 
COMMENT and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H  
BZERO   =                32768 / BZERO                                         
BSCALE  =                    1 / BSCALE                                        
EXPTIME =                  0.0 / Integration Time, sec.                        
DETECTOR= 'e2vm2 E2V-4240'     / CCD Type                                      
ORIGIN  = 'UNAM    '           / OAN SPM, IA-UNAM                              
OBSERVAT= 'SPM     '           / Observatory                                   
TELESCOP= '2.12m   '           / Telescope                                     
LATITUDE= '+31:02:39'          / Latitude                                      
LONGITUD= '115:27:49'          / Longitud                                      
ALTITUD =                 2800 / altitud                                       
SECONDAR=                   -1 / F/ Secondary type                             
TIMEZONE=                    8 / Time Zone                                     
OBSERVER= 'Leonid  '           / Observer's Name                               
OBJECT  = 'BD +59 363'         / Object                                        
INSTRUME= 'Echelle '           / Instrument                                    
GAINMODE=                    1 / Gain factor in the CCD                        
FILTER  = 'None    '           / Filter                                        
IMGTYPE = 'zero    '           / Image Type                                    
EQUINOX =               2011.7 / Equinox                                       
ST      = '02:19:51.2'         / Sideral Time                                  
UT      = '11:28:28'           / Universal Time                                
JD      =            2455803.5 / Julian Date                                   
DATE-OBS= '2011-08-30'         / Observation Date UTM                          
CCDSUM  = '1 1     '           / Binning [ Cols:Rows ]                         
RA      = ' 02:13:22.2'        / Right Ascension                               
DEC     = ' 52''14''44.0'      / Declination                                   
AH      = '  00:06:28.0'       / Hour Angle                                    
AIRMASS =                1.073 / Airmass                                       
TMMIRROR=                    0 / Primary Mirror Temperature (celsius degree)   
TSMIRROR=                    0 / Secundary Mirror Temperature (celsius degree) 
TAIR    =                    0 / Internal Telescope Air Temperature (celsius deg
XTEMP   =                 14.6 / Exterior Temperature (celsius degree)         
HUMIDITY=                 44.0 / % external Humidity                           
ATMOSBAR=                731.7 / Atmosferic Presure in mb                      
WIND    = 'S at 24.1 km/h'     / Wind Direction                                
WDATE   = '11:28:10, 08/30/11' / Weather Acquisition Date (Local time)         
DATE    = '2011-08-30T11:28:29' / file creation date (YYYY-MM-DDThh:mm:ss UT)  
NAMPS   =                    1 / Number of Amplifiers                          
CCDNAMPS=                    1 / Number of amplifiers used                     
AMPNAME = '1 Channel'          / Amplifier name                                
CREATOR = 'Python Oan ccds'    / Name of the software task that created the file
VERSION = '4.12D   '           / Application Software Version                  
COMMENT Visit our weather site http://www.astrossp.unam.mx/weather15           
COMMENT for complete meteorological data of your observation night             
HISTORY bin2fits V1.0                                                          
HISTORY Programmer: Enrique Colorado [ colorado@astrosen.unam.mx ]             
HISTORY Observatorio Astronomico Nacional -UNAM                                
HISTORY V1.00 By Arturo Nunez and Colorado >Ported to Python using pyfits      
HISTORY V0.50 By E. Colorado >Added interior mirrors temperatures              
HISTORY V0.49 By E. Colorado >Added BIASSEC parameter                          
HISTORY V0.48 By E. Colorado >Aditional info for autofocus calculations        
HISTORY V0.4 By E. Colorado >Now we include timezone, and remove lat. sign     
HISTORY V0.3 By E. Colorado >Now we include weather data                       
HISTORY V0.2 By E. Colorado >General OAN Working Release                       

The data in the file are accessible with

data = hdulist[0].data

and can be seen with [don't forget to import matplotlib.pyplot as plt before running this]:

plt.imshow(data)

A column from the data can be plotted with

plt.plot(data[:,1000])

where I am plotting the column number 1000. In the same way a line from the data is plotted with:

plt.plot(data[1000,:])

The data are numpy object so all manipulations are available.

Some more on displaying images

The main matplotlib function to display images is imshow. It has several parameters of which te most important are the one which controls the color scheme and the ones which control the dynamic range.

plt.imshow(data, cmap=cm.gray, vmin=1000, vmax=10000)

cmap controls the pseudocolor map (same as color= in IDL). The predefined color maps can be seen on http://matplotlib.sourceforge.net/examples/pylab_examples/show_colormaps.html

vmin and vmax controls the range of values which are mapped in [0,1] range where 0 is black (the darkest color) and 1 is white (the lightest color).

origin = 'upper' | 'lower', place the pixel with coordinates 0,0 at the upper or lower corner of the plot

extent = (xmin,xmax,ymin,ymax) - This parameter defines the numbers written on the axes. No changes on the image.

Using FITS tables

For this example I'll use a spectrum obtain with the high dispersion camera on board of IUE.
The file is opened as usual:
hdulist = pyfits.open('swp04345.mxhi')

Download the file from THERE.

but now hdulist has 2 elements (2 header/data units):

len(hdulist)
2

We can see that the primary header has dimension (), son does not contain any data. The data are in the extension.

hdulist.info()
Filename: swp04345.mxhi
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU     421   ()           uint8
1    MEHI        BinTableHDU     61   60R x 17C    [1B, 1I, 1D, 1I, 1D, 1E, 1E, 768E, 768E, 768E, 768I, 768E, 768E, 1I, 1I, 1E, 7E]

The first header contains the minimal infirmation:

print hdulist[0].header.ascardlist()[:5]
SIMPLE  =                    T / Standard FITS Format                          
BITPIX  =                    8 / Binary data                                   
NAXIS   =                    0 / Two-dimensional image                         
EXTEND  =                    T / Extensions are present                        
TELESCOP= 'IUE     '           / International Ultraviolet Explorer            

The number of axis is 0 which means there is no data block in the primary HDU. The header of the second HDU begins with the keyword XTENSION and with the specification of the data

print hdulist[1].header.ascardlist()[:5]
XTENSION= 'BINTABLE'           / Binary table extension                        
BITPIX  =                    8 / Binary data                                   
NAXIS   =                    2 / Two-dimensional table array                   
NAXIS1  =                16961 / Width of row in bytes                         
NAXIS2  =                   60 / Number of orders                              

To progress further we need to know what is in the table. As usual, the columns have names and type of the stored data. These information can be obtained using the column attribute of hdulist:

cols = hdulist[1].columns

cols.info
<bound method ColDefs.info of ColDefs(
    name = 'ORDER'; format = '1B'; unit = ' '
    name = 'NPOINTS'; format = '1I'; unit = ' '
    name = 'WAVELENGTH'; format = '1D'; unit = 'ANGSTROM'
    name = 'STARTPIX'; format = '1I'; unit = 'PIXEL'
    name = 'DELTAW'; format = '1D'; unit = 'ANGSTROM'
    name = 'SLIT HEIGHT'; format = '1E'; unit = 'PIXEL'
    name = 'LINE_FOUND'; format = '1E'; unit = 'PIXEL'
    name = 'NET'; format = '768E'; unit = 'FN'
    name = 'BACKGROUND'; format = '768E'; unit = 'FN'
    name = 'NOISE'; format = '768E'; unit = 'FN'
    name = 'QUALITY'; format = '768I'; unit = ' '
    name = 'RIPPLE'; format = '768E'; unit = 'FN'
    name = 'ABS_CAL'; format = '768E'; unit = 'ERG/CM2/S/A'
    name = 'START-BKG'; format = '1I'; unit = 'PIXEL'
    name = 'END-BKG'; format = '1I'; unit = 'PIXEL'
    name = 'SCALE_BKG'; format = '1E'; unit = ' '
    name = 'COEFF'; format = '7E'; unit = ' '
)>


the cols.info returns the names of the columns and the information of their format and units.

The data are available using (this example is NOT the right way of plotting the data, it's just an example) [and don't forget to import numpy as np to have np.arange working]:

data1 = hdulist[1].data
DTs =  data1.ABS_CAL
WLs = data1.WAVELENGTH
DWs = data1.DELTAW
for WL, DW, DT in zip(WLs, DWs, DTs):
    plot(WL + np.arange(len(DT)) * DW, DT)

Writing FITS files.

The creation of a FITS file pass through 4 steps.

1) Creation of numpy array with the data.

x = np.arange(100)
  
2) Creation of the HDU from the data.

hdu = pyfits.PrimaryHDU(x)
   
thus created, the hdu has its basic header and the data.

print hdu.header.ascardlist()
SIMPLE  =                    T / conforms to FITS standard                     
BITPIX  =                   64 / array data type                               
NAXIS   =                    1 / number of array dimensions                    
NAXIS1  =                  100                                                 
EXTEND  =                    T                                                 

3) Adding additional keywords to the header. The automatically created header contains only the required minimum of keywords. If additional keywords are needed they are added with

hdu.header.update('testkey',0.001,'some test value')

print hdu.header.ascardlist()
SIMPLE  =                    T / conforms to FITS standard                     
BITPIX  =                   64 / array data type                               
NAXIS   =                    1 / number of array dimensions                    
NAXIS1  =                  100                                                 
EXTEND  =                    T                                                 
TESTKEY =                0.001 / some test value                               

4) Once all the keywords are ready, the final HDU list have to be created and written to the file:

hdulist = pyfits.HDUList([hdu])
hdulist.writeto('new.fits')
hdulist.close()

Alternative way

Another way to deal with FITS tables is to use the ATpy library, look there:
http://atpy.github.com/index.html

References


To learn more on PyFITS: http://packages.python.org/pyfits/