You can also read the more recent post on Numpy here: http://python-astro.blogspot.mx/2014/08/introduction-to-numpy.html
Most of the data we have to manage are in form of arrays. That's why it's quite always necessary to import the numpy library to easily deal with tables, arrays, images, cubes, etc...
In this post I will discuss some of the methods and functions one may have to use in this context.
In Python, one can use lists, tuples and dictionaries to put different elements together. They even can contain elements of different types. But nothing better than numpy arrays to really "play" with the data, extract subsets, combine them using arithmetic operations.
We already discuss a little some array commands in a previous post: read-ascii-file-cont. Here we are more complete, but not exhaustive, as numpy is a whole world... Have a look at the reference guide: http://docs.scipy.org/doc/numpy/reference/ and other references at the end of this post.
Create arrays
a = np.array([1, 2, 3, 7, 5])
a
array([1, 2, 3, 7, 5])
numpy's arrays cannot contain values of different types, the values are transformed if necessary, from integer into real, and from real into string:
b = np.array([1, 2, 3, 4.])
b
array([ 1., 2., 3., 4.])
c = np.array([1, 2, 3., '4'])
c
array(['1', '2', '3', '4'],
dtype='|S1')
dtype='|S1')
shapes and sizes of the arrays are obtained using method and functions:
print c.shape
(4,)
print len(c)
4Different ways to create arrays:
a = np.arange(1, 10, 0.5)
print a
[ 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. 6.5 7. 7.5 8.
8.5 9. 9.5]
8.5 9. 9.5]
TAKE CARE, THE LATEST ELEMENT IS NOT WHAT YOU MAY EXPECT...
It's because the first element in Python is indexed by 0. The simple use of np.arange is :
a = np.arange(10)
print a
[0 1 2 3 4 5 6 7 8 9]
a starts at 0, and has 10 elements, that's ok.Easy to create linearly spaced arrays:
c = np.linspace(0, 1, 4)
print c
[ 0. 0.33333333 0.66666667 1. ]
or log spaced:
c = np.logspace(0, 1, 4)
c
c
array([ 1. , 2.15443469, 4.64158883, 10. ])
which is actually the same as :
c = 10**np.logspace(0, 1, 4)
To create arrays of 0. or 1.:
a = np.zeros(10)
b = np.ones((5, 4))
Here b is a 2D array.
b.shape
(5, 4)
You can use a list (or a tuple, or an array) and replicate it:
a = np.array([1, 2, 3])
b = np.tile(a, 5)
print b
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
c = np.tile(a, (5, 1))
print c
[[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
You can also create 2D arrays from 1D vectors:
x = np.linspace(-1, 1, 100)
y = x
X, Y = np.meshgrid(x, y) #this create 2D arrays containing x and y for each pixel
print X.shape
(100, 100)
Outer products are obtain using... np.outer:
a = np.array([1, 2, 3])
b = np.array([1, 10, 100])
b = np.array([1, 10, 100])
np.outer(a, b)
array([[ 1, 10, 100],[ 2, 20, 200],
[ 3, 30, 300]])
np.outer(b, a)
array([[ 1, 2, 3],
[ 10, 20, 30],
[100, 200, 300]])
[ 10, 20, 30],
[100, 200, 300]])
Broadcasting
Python numpy is able to add dimensions to arrays to perform operations:
a = np.array([1, 2, 3, 4])
b = np.ones((6, 4))
a * b
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
Nice page on this:[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
http://www.scipy.org/EricsBroadcastingDoc
Indexing and Slicing
The access to elements of arrays is done using []:
a = np.arange(10)
a[5]
0
a[-1] # last elements
9In case of N-dims arrays, one can extract slides using :
a = np.array([1, 2, 3, 4, 5])
b = np.array([1, 10, 100, 1000])
c = np.outer(a, b)
c.shape
(5, 4)
print c
[[ 1 10 100 1000]
[ 2 20 200 2000]
[ 3 30 300 3000]
[ 4 40 400 4000]
[ 5 50 500 5000]]
[ 2 20 200 2000]
[ 3 30 300 3000]
[ 4 40 400 4000]
[ 5 50 500 5000]]
print c[:,2]
[100 200 300 400 500]
print c[-1,:]
[ 5 50 500 5000]
print c[-1, -1]
5000
There is a lot of methods in the array object, the best page to learn more is there: http://scipy-lectures.github.com/intro/numpy/numpy.html#the-numpy-array-object
Filtering
Sometimes we need to extract elements from arrays following some criteria. There is basically two ways to do this: defining a set of indices where the condition is completed (a la WHERE in IDL), or defining a boolean mask.Let's first define a 2D array made of 10 times 1000 random values:
a = np.random.random((10, 1000))
We want to extract the values where the 2nd and the 4th 1000-elements vectors are greater than 0.5.
Using the numpy.where function:
w1 = np.where((a[1,:] > 0.5) & (a[3,:] > 0.5))
as the result is a tuple of indices, and in this case the dimension of the result is 1, better extract on the fly the array of indices from the tuple:
w2 = np.where((a[1,:] > 0.5) & (a[3,:] > 0.5))[0]
len(w2)
248 # your result may differ form this value, but must be close to 250.We can now reduce the initial array to the desired values:
b = a[:, w2]
b.shape
(10, 248)
One can also build an array of boolean:
mask = (a[1,:] > 0.5) & (a[3,:] > 0.5)
which is actually similar to
mask2 = np.where((a[1,:] > 0.5) & (a[3,:] > 0.5), True, False)
It can be used the same way as before:
b = a[:, mask]
b.shape
(10, 248)
These latter cases result in a 1000-element array, filled with True and False. To know the number of True values, just sum the array:
mask.sum()
248 # your result may differ form this value, but must be close to 250. One big advantage of the mask technique is that you can combine different masks:
mask1 = a[1,:] > 0.5
mask2 = a[3,:] > 0.5
mask_total = mask1 & mask2
b = a[:, mask_total]
Sorting
a = np.array([1,2,45,2,1,46,7,-1])
b1 = np.sort(a)
ib = np.argsort(a)
b2 = a[ib]
print b1
[-1 1 1 2 2 7 45 46]
print b2
[-1 1 1 2 2 7 45 46]More on numpy arrays
http://scipy.org/Numpy_Example_List_With_Dochttp://scipy-lectures.github.com/intro/numpy/numpy.html#the-numpy-array-object
Little mistake, you wanted to say lin instead of log :
ReplyDeletewhich is actually the same as :
c = 10**np.linspace(0, 1, 4)
Why a mistake? logspace is a valide numpy command.
ReplyDeleteBecause np.logspace(0, 1, 4) is not "actually the same as" 10**np.logspace(0, 1, 4).
ReplyDeleteBut it is the same as 10**np.linspace(0, 1, 4).
I'd spent so long trying to figure out how to apply 1d masks onto 2d numpy arrays, b = a[:, mask]! Thanks for a really helpful guide!
ReplyDeleteThe access to elements of arrays is done using []:
ReplyDeletea = np.arange(10)
a[5]
0 <-- it should be 5
Hey! I have to convert some lists into arrays in order to use function np.where
ReplyDeleteI have to plot indicators of different sources from a huge data file and I make the operations between those indicators using lists not arrays. So lists are already populated with data but the np.where function would be useful when I have to plot them.
Can you help me?
Thanks a lot
This comment has been removed by the author.
ReplyDelete