NumPy

This notebook was written by Kayla Leonard (June 2019) to supplement the currently existing IceCube Bootcamp Tutorials.

NumPy is an external python package that has very useful mathematical tools. It's documentation can be found at https://docs.scipy.org/doc/numpy/reference/.

Realistically the best way to find a new function is just to Google "numpy" with what ever you are trying to do (for example "numpy convert radians to degrees").

It is standard to name the package "np" when you import it.

Feel free to add additional cells and change the numbers to try your own

Topics include:

  • Special numbers (pi, infinity, etc.)
  • Trigonometry (sin, cos, etc.)
  • Random Numbers
  • Arrays (NumPy's version of lists)
  • Dataset calculations (mean, median, etc.)
  • Masks
In [1]:
import numpy as np

Special Numbers

In [2]:
np.pi
Out[2]:
3.141592653589793
In [3]:
np.e
Out[3]:
2.718281828459045
In [4]:
np.inf # infinity
Out[4]:
inf

Trigonometry

If you are looking for standard trigometric functions in python, this package is probably what you want to use.

In [5]:
np.cos(np.pi)
Out[5]:
-1.0
In [6]:
np.sin(np.pi/2)
Out[6]:
1.0

The trig functions assume the input is in radians, so if you'd like to use degrees, there are useful functions to convert.

In [7]:
# These are equivalent functions
print(np.rad2deg(np.pi/2))
print(np.degrees(np.pi/2))
90.0
90.0
In [8]:
# These are equivalent functions
print(np.deg2rad(45))
print(np.radians(45))
0.7853981633974483
0.7853981633974483
In [9]:
np.sin(np.deg2rad(45)) # convets 45 degrees to radians, then takes the sin of that
Out[9]:
0.7071067811865475

Random Numbers

NumPy has an exentsive list of functions helpful random nubmers, probability, and statistics.

In [10]:
np.random.random() # picks a random number between 0 and 1
Out[10]:
0.3883925064755209
In [11]:
# rerun this cell several times and you will see the value change each time
np.random.random()
Out[11]:
0.9854419945815169

If you want your results to be re-producible, you can run np.random.seed(13) or with your favorite number, and then every time you re-run the cell you will get the same number.

In [12]:
np.random.normal(0,10) # draws a random number from a gaussian distribution centered at 0 with standard deviation 10
Out[12]:
-6.290842493187123
In [13]:
np.random.normal(0,10,10) # we can give it an additonal argument that is the length of the array we want
Out[13]:
array([  3.18359367,  10.65710673,  15.70076931,   6.11426628,
        -5.56016407,   2.38773831,  -9.34114013, -12.87750992,
         3.15110509,  -1.97483647])

There are many other distributions available like binomial, chi-squared, poisson, gamma, etc. https://docs.scipy.org/doc/numpy/reference/routines.random.html

Arrays

There are several quick ways to initialize numpy arrays.

In [14]:
np.linspace(0,20,num=21) # returns an array of evenly spaced values from 0 to 20.
# Note: This will include both the start and end points so you must make the number n+1 
Out[14]:
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20.])
In [15]:
np.linspace(0,100,num=26) # it will calculate the step size, it doesn't need to be 1
Out[15]:
array([  0.,   4.,   8.,  12.,  16.,  20.,  24.,  28.,  32.,  36.,  40.,
        44.,  48.,  52.,  56.,  60.,  64.,  68.,  72.,  76.,  80.,  84.,
        88.,  92.,  96., 100.])
In [16]:
np.linspace(0,1,num=17) # it can also handle decimal numbers
Out[16]:
array([0.    , 0.0625, 0.125 , 0.1875, 0.25  , 0.3125, 0.375 , 0.4375,
       0.5   , 0.5625, 0.625 , 0.6875, 0.75  , 0.8125, 0.875 , 0.9375,
       1.    ])
In [17]:
np.logspace(2,4,21) # similar to linspace but returns values that are evenly spaced in log space from 10**2 t o 10**4
Out[17]:
array([  100.        ,   125.89254118,   158.48931925,   199.5262315 ,
         251.18864315,   316.22776602,   398.10717055,   501.18723363,
         630.95734448,   794.32823472,  1000.        ,  1258.92541179,
        1584.89319246,  1995.26231497,  2511.88643151,  3162.27766017,
        3981.07170553,  5011.87233627,  6309.5734448 ,  7943.28234724,
       10000.        ])
In [18]:
np.arange(0,100,4) # If you know the stepsize you want but not the number of items, you can use arange
# Note: This one does not include the end point in the list
Out[18]:
array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64,
       68, 72, 76, 80, 84, 88, 92, 96])
In [19]:
np.zeros(10) # creates an array of 10 items, where all entries are zero
Out[19]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [20]:
np.ones(10) # creates an array of 10 items, where all entries are one
Out[20]:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Arrays are useful because they allow for elementwise manipulation of numbers that are prohibited with python lists.

In [21]:
[1,2,3,4]**2 # This will give us an error because we can't square a list
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-e570a8d4207d> in <module>
----> 1 [1,2,3,4]**2 # This will give us an error because we can't square a list

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
In [22]:
np.array([1,2,3,4])**2 # this will square each element in the array
Out[22]:
array([ 1,  4,  9, 16])

You can also use elementwise calculations of two arrays of the same length:

In [23]:
np.array([1,2,3,4])+np.array([5,6,7,8])
Out[23]:
array([ 6,  8, 10, 12])
In [24]:
np.array([1,2,3,4])*np.array([5,6,7,8])
Out[24]:
array([ 5, 12, 21, 32])

You can also run numpy functions and it will apply it to each element.

In [25]:
np.sqrt([1,4,9,10])
Out[25]:
array([1.        , 2.        , 3.        , 3.16227766])
In [26]:
np.sin([0,np.pi/4,np.pi/2])
Out[26]:
array([0.        , 0.70710678, 1.        ])

Basic data set calculations

Given an array of values, we can calculate all the standard things like mean, median, etc.

In [27]:
scores = np.array([95,41,72,100,80,97,95])
print(scores)
print(type(scores))
[ 95  41  72 100  80  97  95]
<class 'numpy.ndarray'>
In [28]:
# Mean
print(np.mean(scores))
print(np.average(scores))
82.85714285714286
82.85714285714286
In [29]:
# Median
print(np.median(scores))
95.0
In [30]:
# calculate the value of the 90th percentile
print(np.percentile(scores,90))
98.2
In [31]:
# Standard Deviation
print(np.std(scores))
19.51869851800408

Masks

Masks are arrays of True or False that can be used to identify certain elements in an array.

In [32]:
# Let's figure out which elements in this array are not zero:
my_array = np.array([1,2,0,3,0,4,0])
my_mask = my_array!=0
print(my_mask)
[ True  True False  True False  True False]
In [33]:
# We can also see which elements are greater than 3
my_mask2 = my_array>3
print(my_mask2)
[False False False False False  True False]

Masks allow you to extract only certain elements using the following notation:

In [34]:
my_array[my_mask]
Out[34]:
array([1, 2, 3, 4])