In AI applications, usually the data is stored in the form of arrays or dataframes where each row indicates the number of samples and columns represent the features of these samples. Before performing any further processing and feeding this data to ML models, we need to ensure that the data is in the correct format. For this, we can use the shape function of NumPy library.
Numpy is one of the most commonly used python libraries that helps in working with N-dimensional data and numerical computations. In this article, we will learn what is Python numpy shape module, its applications and how can we use it in machine learning applications.
Numpy shape attribute return the size of N-dimensional array in the form of tuple of integers representing the size of array in each dimension. In this topic, we will cover:
- Shape of Arrays
- Shape of 2D Pandas Dataframe
- Difference between shape(), size() and len() in Numpy
- Difference between shape() and reshape() in Numpy
Lets dive into the numpy shape!
Shape of ARrays
In arrays, shape is basically the number of elements in each dimension. For instance,  is a one-dimensional array. Similarly, [[1,2,3,5], [2,9,7], [5,1,7,8]] and [[[1,5,9], [2,6,5]], [[2,4,9],[1,3,7]]] are two-dimensional and three-dimensional arrays respectively.
We can create arrays using numpy library as shown in the code below
import numpy as np # Create a 1D NumPy array array1D = np.array() # Create a 2D NumPy array array2D = np.array([[1,2,3,5], [2,9,7,4], [5,1,7,8]], dtype='object') # Create a 3D NumPy array array3D = np.array([[[1,5,9],[2,6,5]], [[2,4,9],[1,3,7]], [[5,1,7],[5,1,8]]], dtype='object')
As discussed above, shape() function returns the shape of an array in the form of tuples whose elements represent the array dimensions within that shape. In the above example, you can print the shape of arrays using numpy.array() as shown in the code below.
print("Shape of Array 1: ", np.shape(array1D)) print("Shape of Array 2: ", np.shape(array2D)) print("Shape of Array 3: ", np.shape(array3D))
Shape of Array 1: (1,) Shape of Array 2: (3, 4) Shape of Array 3: (3, 2, 3)
Here, 1-D refers to the one-dimensional array which means it is a 1D array consisting of only one element. Therefore, the np.shape() function returns only one value i.e., (1,) which indicates that the array is one dimensional. In case of 2D arrays, the output (3, 4) represents that this array has 3 rows and 4 columns. Similarly, for 3D arrays you have 3 values i.e., row, column and dimensions. You can also extract the values separately.
# Create a 3D NumPy array array3D = np.array([[[1,5,9],[2,6,5]], [[2,4,9],[1,3,7]], [[5,1,7],[5,1,8]]],dtype='object') array3D_shape = np.shape(array3D) print("Rows: ", array3D_shape) print("Columns: ", array3D_shape) print("Dimensions: ", array3D_shape)
Rows: 3 Columns: 2 Dimensions: 3
Similarly, you can create arrays of different dimensions with different values and check their dimensions. But keep in mind all the elements should have same datatypes as the numpy arrays are homogenous.
shape of two Dimensional Pandas Dataframes
Pandas dataframe is another commonly used for data manipulation and analysis of tabular data. Data from different sources such as SQL databases, Excel spreadsheets, and CSV etc. can be merged into a single dataframe which can then be used for further analysis. A lot of machine learning algorithms need the input data in the form of a 2D array with a certain number of rows and columns. Knowing the structure of the DataFrame allows you to preprocess and reshape the data according to the correct format required for the algorithm.
You can use the np.shape() function to find the shape of the dataframes also. This is how you can find the shape of the dataframe.
- Import pandas module as its shorthand pd.
- Read the csv file from your directory. We have downloaded a publicly available dataset of flowers. You can also use any other dataset.
- Use np.shape() to get the shape of the pandas dataframes.
- To figure out how many dimensions the csv file have, use the np.ndim() function. It returns the dimensional information of the Dataframe.
import pandas as pd import numpy as np #load dataset data = pd.read_csv("/iris_csv.csv") #print first five rows of dataset print(data.head())
sepallength sepalwidth petallength petalwidth class 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa
#print the shape of dataframe print ('Shape:', np.shape(data)) #returns number of dimensions print("Dimension of csv data: ", np.ndim(data))
Shape: (150, 5) Dimension of csv data: 2
Similarly, you can also check the dimensions of N-dimensional data.
Difference between shape() vs size() vs len() in numpy Python.
One of the most important question that comes into mind is: what is the difference between shape, size and len functions. Size() function returns the information about total number of elements containing in an array whereas the shape () function returns the size of each dimension of array. However, len() provides the length of the first axis i.e., total number of rows.
import numpy as np # Create a 4D array with random values arr = np.random.rand(3, 2, 3, 1) # Print array print(arr) # Print the shape of the array print("Shape: ", np.shape(arr)) # Print the size of the array print("Size: ", np.size(arr)) # Print the len of the array print("Length: ", len(arr))
[[[[0.73479733] [0.85829823] [0.29924836]] [[0.43194162] [0.07568045] [0.03651994]]] [[[0.34761957] [0.36325159] [0.58400447]] [[0.80442234] [0.3962501 ] [0.35959527]]] [[[0.54602137] [0.6560841 ] [0.76847947]] [[0.16126047] [0.50990035] [0.37992998]]]] Shape: (3, 2, 3, 1) Size: 18 Length: 3
You can see from the output that shape, size and length are different in terms of arrays.
Difference between shape() vs reshape() in numpy Python.
Just like shape() function, numpy provides another builtin function known as reshape(). The reshape() function gives us the flexibility to change the dimension of array without altering its element values. You can transform from 1D array to 2D array by setting values in reshape().
import numpy as np # Create a 2D array arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Reshape the array to a 3D array arr3d = arr2d.reshape((3, 1, 3)) # Print the shapes of the arrays print("Shape of 2D array:", arr2d.shape) print(arr2d) print("Shape of 3D array:", arr3d.shape) print(arr3d)
Shape of 2D array: (3, 3) [[1 2 3] [4 5 6] [7 8 9]] Shape of 3D array: (3, 1, 3) [[[1 2 3]] [[4 5 6]] [[7 8 9]]]
reshape method returns a new array with the same data as the original array, but it has a different shape as shown in the example above. The shape of the reshaped array is (3, 1, 3) where the first element represent the number of layers and the last two elements represent the number of rows and columns in each layer.
In this tutorial, we have covered how 1D, 2D and N-dimensional arrays are created and how can you find the shape of these arrays. We have also seen that how we can use the numpy shape function to find the shape of the Pandas dataframes. The shape attribute can be used for multiple purposes such as you can verify the arrays dimensions before performing and mathematical operation on them or feeding them into any machine learning model. Effective training and prediction depend on the dimensions of the input data being compatible with the model architecture. Not only this, you can also change the dimensions using reshape function. Lastly, we have covered the differences between shape, size and reshape functions in Numpy library. Overall, the shape attribute is very important tool for data manipulation in AI applications.