NumPy
- Implemented in C, stores data in contiguous memory blocks
- internal storage includes
- pointer to data
- data type (dtype), describes fixed-size value cells
- numpy dtype hierarchy
- np.issubtype
- tuple for array shape
- tuple for strides: number of bytes to step
- internal storage includes
ndarray(n-dimension array)
- Data Types and Type Code
- int8(unit) to int62(i1, u1 to i8, u8), float 16 to float128(f2 to f16 (f4=f, f8=d, f16=g)
- complex64(c8 or c16) to complex256
- bool (?)
- object(0) - Python object type
- string_ (S) fixed length ASCII string type (eg. use length 10 "S10")
- unicode_ U ("U10")
- Attributes
- .ndim(rank), shape(dimension),size, dtype, itemsize(size of item(in byte)
- Methods
- [] - indexing with m:n or logical indexing
data[name == 'Bob', 2:] #Bod could be another list with length n rows
- fancy indexing
- take(<ind>, axis=) and put(<ind>)
- put uses C order
- take(<ind>, axis=) and put(<ind>)
- shape, ndim, dtype, size
- dimension (5,) is different from (5,1), the former is a rank-1 array
- reshape([order='C' or 'F'] )
- reshape( n, -1) : flatten all other dimensions (to (n, (n_2*n_3..*n_n))
- C order (row major) vs Fortran order (column major)
- .ravel(['C' or 'F']) and flatten()
- flatten return a copy
- .T - transpose
- copy() - deep copy
- Aggregation
- sum(),mean()
- var(),max(),min()
- argmin(),argmax()-return index
- cumsum(),cumprod()
- argument: keepdims = True : to avoid rank-0 array
- any(), all() - works on logical nd array
- sort([axis])
- argsort([kind=]) and lexsort()
- return sort indexers (sort and lexical sort)
- kind = 'quiksort' (default), 'mergesort', 'heapsort'
- np.partition(arr, pos), np.argpartition()
- searchsorted(val)
- returns index by binary search
- argsort([kind=]) and lexsort()
- unique()
- [] - indexing with m:n or logical indexing
Numpy (np.)
Creation
- array(<sequence>0
- asarray() - covert input to array, not copy if already an np array
- arrange([start,] stop, [step,], dtype = None) - like range
- linespace(start, stop, num, endpoint...)
- logspace()
- ones(<shape>), ones_like(<another array of same shape>), zeros, zeros_like
- empty, empty_like, full, full_like
- eye(), identity()
- fromfile(), fromfunction()
Operations
- shape
- a[a,b] - same usage for : (to or all) as list
- *+-/ - Element wise operation
- Numpy will do "broadcasting" by copying elements (eg. (5,3) array + (5,1) array is (5,3))
- np.squeeze - select a subset
- concatenate and split
- dstack() - stack in 'depth' wise
- vstack() and row_stack
- htack() and column_stack
- column_sstack converts one d array to 2d columns first
- r_()
- row_stack()
- c_()
- split(arr, [...]), vsplit
- isnan()
- np.where() -selection
Functions (np.) - (ufunc() - implemented in C usually)
Looping
- wrap function using np.vectorize
- apply_along_axis(<lambda>)
- all, exp, floor, ceil, clip, conj, corrcoef, cov, bincoun)
Unary
- abs, fabs
- sqrt, square
- exp, log, log10, log2, log1p
- sign, ceil, floor
- rint
- modf - fractional and integral part
- isnan
- isfinite, isinf
- cos, sin, cosh, sinh, tan, tanh, arcsin, arccost, ...
- logical_not
Binary
- add, substrct, multiply, divide, floor_divide, dot, power
- @ is matrix multiply (dot)
- maximum, fmax(ignores NaN), minimum, fmin
- copysign
- greater, greater_equal, less_euqal
- logical_and, logical_or
- outer() - Cartesian product
Multiple(aggregators)
reduce
operations can be chained
arr = np.arange(10) np.add.reduce(arr) # 45
accumulate - (cumulative reduce)
- reduceat(arr, [reduce cuts])
Logical Indexing
np.
- unique(x)
- intersect1d(x,y)
- union1d(x,y)
- setdiff1d(x,y)
- setxor1d(x,y)
Others
- np.meshgrid() -1D to 2D
- np.where( <cond>, <if array>, <else array>)
- <if array>.where(<cond>)
Linear Algebra
- dot - matirx mutliply
- .outer() - outer product
- .linalg
- diag,
- trace
- .det, inv, solve, eig,
- pinv - Moore-Penrose pseudo-inverse of a matrix
- norm( , keepdims = True)
- np.asmatrix (Matrix)
Random
np. random
- seed - set seed
- RandomState() - create one generator independent of others
- permutation
- shuffle
- rand, randint
- randn (std normal), binomial, beta, normal, chisquare, gamma, uniform
- choice(<list>, size, [prob]) - choose among
Advanced
Broadcasting
- broadcasting
- automatic casting of smaller one to meet the shape of bigger one in linear algebra and general calculations
Numpy File I/O
deal with either text of binary format. Arrays are saved by default in umcompressed raw binary format with file extension .npy. Compressed form is .npz
np.
- load(<npy>/<npz>)
- npz returns dict-like struct (lazily)
- save()
- savez(<file>, <var dict>)
- a=..., b=///
- savez_compressed(<file>, a = arr, b = arr)
User-defined, C-like ufuncs, dtypes and Numba
np.
- frompyfunc(func, nin, nout)
- returns python objects
- vectorize
- can specify output type
- structured array can compress complex nested objects to single block of memory
Numba
- njit([nopython])
- indicates any python API calls
- float64
Uses LLVM project to translate python code to compiled machine code
Advanced Array Input and Output
- memmap object
- nd-array like object, enable large file to be read and written without loading to memory
- np.memap
memmap
- flush()
Numpy Functions for Deep Learning
np.pad()
Numpy Performance Tips
- vectorize
- loops and condition logic to array operations and boolean operations
- use contiguous memory (C-contiguous)
- arr.flags
- broadcasting if possible
- use views to avoid copy
- Utilize ufunc