Learning NumPy

A look into NumPy for data science purposes

Data Science and Python

A field of programming that is in high demand in recent years is data science. It's used in many areas, including machine learning. Python is one of the most popular languages for data science, and one of the most useful libraries is called NumPy.

What is NumPy?

In short, NumPy is a library that is used for working with numbers, which is crucial to data science. That is why I decided to learn the basics of the library, to further my understanding of Python and data science.

Installing NumPy

Before we can use NumPy, we have to install it, we can do this by running the following command in the terminal.

pip install numpy

Pretty self-explanatory.

Actually using it

Now to use it, we've got to import it to our file, so let's make a new Python file. Then, at the top of the file, we're going to import it, like so.

import numpy as np

It's standard to import NumPy as np, so we can reference it using np in the code, rather than typing out NumPy each time.

While Python comes built-in with a data structure known as Lists, which are similar to arrays or linked lists, NumPy implements a similar yet different data structure, just called arrays.

A NumPy array can be created the following way:

arr = np.array([69, 420, 69420])

So, if you're familiar with programming at all, you probably already know what's going on here. We are initializing a variable, called arr, with the data type of a NumPy array, and assigning it the values which we provided (69, 420, 69420).

We can access the data how you may expect, just like with lists in Python, you simply give the index you wish to access, starting from 0 as the first index.

print(arr[0])
# Output: 69

NumPy arrays can become multi-dimensional if you simply create an array of many nested lists, instead of a single list, like so.

dimensional_arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

I'm not going to go into greater detail on multi-dimensional arrays yet, however, if you wish to learn more, there is great documentation on it at: https://numpy.org/doc/stable/

NumPy arrays can be manipulated and accessed in many ways similar to regular Python lists. A cool feature of NumPy is being able to copy arrays to a new variable or create a link between two different variables connected to the same array.

arr_copy = arr.copy()
print(arr, arr_copy)
# Output: [69, 420, 69420], [69, 420, 69420]

The copy of the array is exactly the same as the original array, so they output the same values when printed. However, the values of the copy can be changed and are not connected to the original array.

arr_copy[0] = 32
print(arr, arr_copy)
# Output: [69, 420, 69420], [32, 420, 69420]

After changing the 0th index of the copy, the two arrays are no longer the exact same values. However, if you wanted a new variable that remains connected to the original array, look no further than views.

arr_view = arr.view()
print(arr, arr_view)
# Output: [69, 420, 69420], [69, 420, 69420]

Now they print the same values, just like before. However, if we change the values of either the original array or the view, both variables will change.

arr_view[0] = 120
print(arr, arr_view)
# Output: [120, 420, 69420], [120, 420, 69420]

Now, as you can see, the values of both changed, that is because the view is not a new variable distinct from the original, but instead just a view of what the original looks like, in a sense.

Still a lot to learn

That's all I'm going to look at for NumPy today, however, there is still a LOT more to it, which I cannot wait to explore and become familiar with.

If you wish to see all the code used in this article, and any future revisions of it, it is available on my GitHub: https://github.com/CobbCoding1/learning-numpy

Did you find this article valuable?

Support CobbCoding's Tech Adventures by becoming a sponsor. Any amount is appreciated!