Nothing Special   »   [go: up one dir, main page]

NumPy
The fundamental package for scientific computing with Python
NumPy 2.1 released!
2024-08-18
Powerful N-dimensional arrays
Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today.
Numerical computing tools
NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.
Open source
Distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub by a vibrant, responsive, and diverse community.
Interoperable
NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries.
Performant
The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code.
Easy to use
NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level.
Try NumPy

Use the interactive shell to try NumPy in the browser

"""
To try the examples in the browser:
1. Type code in the input cell and press
   Shift + Enter to execute
2. Or copy paste the code, and click on
   the "Run" button in the toolbar
"""

# The standard way to import NumPy:
import numpy as np

# Create a 2-D array, set every second element in
# some rows and find max per row:

x = np.arange(15, dtype=np.int64).reshape(3, 5)
x[1:, ::2] = -99
x
# array([[  0,   1,   2,   3,   4],
#        [-99,   6, -99,   8, -99],
#        [-99,  11, -99,  13, -99]])

x.max(axis=1)
# array([ 4,  8, 13])

# Generate normally distributed random numbers:
rng = np.random.default_rng()
samples = rng.normal(size=2500)
samples

ECOSYSTEM

Nearly every scientist working in Python draws on the power of NumPy.

NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes simplicity: a solution in NumPy is often clear and elegant.

NumPy's API is the starting point when libraries are written to exploit innovative hardware, create specialized array types, or add capabilities beyond what NumPy provides.

Array LibraryCapabilities & Application areas
DaskDaskDistributed arrays and advanced parallelism for analytics, enabling performance at scale.
CuPyCuPyNumPy-compatible array library for GPU-accelerated computing with Python.
JAXJAXComposable transformations of NumPy programs: differentiate, vectorize, just-in-time compilation to GPU/TPU.
xarrayXarrayLabeled, indexed multi-dimensional arrays for advanced analytics and visualization.
sparseSparseNumPy-compatible sparse array library that integrates with Dask and SciPy's sparse linear algebra.
PyTorchPyTorchDeep learning framework that accelerates the path from research prototyping to production deployment.
TensorFlowTensorFlowAn end-to-end platform for machine learning to easily build and deploy ML powered applications.
arrowArrowA cross-language development platform for columnar in-memory data and analytics.
xtensorxtensorMulti-dimensional arrays with broadcasting and lazy computing for numerical analysis.
awkwardAwkward ArrayManipulate JSON-like data with NumPy-like idioms.
uarrayuarrayPython backend system that decouples API from implementation; unumpy provides a NumPy API.
tensorlytensorlyTensor learning, algebra and backends to seamlessly use NumPy, PyTorch, TensorFlow or CuPy.
Diagram of Python Libraries. The five catagories are 'Extract, Transform, Load', 'Data Exploration', 'Data Modeling', 'Data Evaluation' and 'Data Presentation'.

NumPy lies at the core of a rich ecosystem of data science libraries. A typical exploratory data science workflow might look like:

For high data volumes, Dask and Ray are designed to scale. Stable deployments rely on data versioning (DVC), experiment tracking (MLFlow), and workflow automation (Airflow, Dagster and Prefect).

Diagram of three overlapping circles. The circles are labeled 'Mathematics', 'Computer Science' and 'Domain Expertise'. In the middle of the diagram, which has the three circles overlapping it, is an area labeled 'Data Science'.

NumPy forms the basis of powerful machine learning libraries like scikit-learn and SciPy. As machine learning grows, so does the list of libraries built on NumPy. TensorFlow’s deep learning capabilities have broad applications — among them speech and image recognition, text-based applications, time-series analysis, and video detection. PyTorch, another deep learning library, is popular among researchers in computer vision and natural language processing.

Statistical techniques called ensemble methods such as binning, bagging, stacking, and boosting are among the ML algorithms implemented by tools such as XGBoost, LightGBM, and CatBoost — one of the fastest inference engines. Yellowbrick and Eli5 offer machine learning visualizations.

A streamplot made in matplotlib
A scatter-plot graph made in ggpy
A box-plot made in plotly
A streamgraph made in altair
A pairplot of two types of graph, a plot-graph and a frequency graph made in seaborn"
A 3D volume rendering made in PyVista.
A multi-dimensionan image made in napari.
A Voronoi diagram made in vispy.

NumPy is an essential component in the burgeoning Python visualization landscape, which includes Matplotlib, Seaborn, Plotly, Altair, Bokeh, Holoviz, Vispy, Napari, and PyVista, to name a few.

NumPy’s accelerated processing of large arrays allows researchers to visualize datasets far larger than native Python could handle.

CASE STUDIES