This tutorial is about How to display a Scatterplot using Seaborn in Python. Scatterplots are an effective tool for visualizing data as they reveal the relationship between two variables, whether they exhibit correlation or not, and how this relationship changes over time. In this tutorial, we will explore the process of constructing scatterplots in Python and delve into when it is appropriate to leverage Seaborn for this purpose.
OVERVIEW OF seaborn library
The Seaborn Library makes it easy to produce statistical graphics for interpreting insightful data. It provides a set of tools for visualizing and analyzing statistical data. Additionally, by visualizing the behavior and correlation among various features of the dataset, you can identify and drop the redundant or irrelevant features from the dataset which may be the reason behind the overfitting of the learning model. You can gain valuable insights into your data, make informed decisions, and optimize your model’s performance by leveraging Seaborn’s visualization capabilities.
Setting Up the Environment
Before proceeding with the code, it is essential to import the necessary libraries into your Python script.
import seaborn
or
import seaborn as sns
Basic Scatterplot Creation with Seaborn
Seaborn, known for its flexibility and user-friendly nature, is an ideal choice for data visualization. To display a scatterplot using Seaborn in Python, first of all, you need to import the necessary libraries. A scatterplot serves as a visual representation of the relationship between two numerical variables. Users can plot data points on a Cartesian plane using the scatterplot function in Seaborn. Each point within the plot corresponds to the values of the two variables being compared, offering a clear depiction of their association and patterns. The scatterplot tool in seaborn provides several customization options allowing you to tailor the visualization to your specific needs. These are briefly described below:
- x – Specifies the feature values to be plotted on x-axis
- y – Specifies the feature values to be plotted on y-axis
- hue – sets different colors for different categories
- style – assigns different markers to different groups or categories.
- palette – sets the colour palette for the markers.
- alpha – sets the transparency level of the markers.
- size – sets the size of the markers based on a specific column
- sizes – sets the range of sizes of the markers.
- legend= ‘brief’ – parameter displays a legend with a brief explanation of the markers.
When working with Seaborn, we have the option of using one of the built-in datasets or loading a Pandas DataFrame. Let us begin by analyzing a simple built-in “iris” dataset and uncover valuable insights through the power of scatterplots. Visualizing these insights will help us gain a deeper understanding of the data.
# import Seaborn library in Python script
import seaborn as sns
#using seaborn with set attribute using dot(.) operator
sns.set(style='whitegrid')
#load data set
data_set = sns.load_dataset("iris")
print(data_set.head())
Output:
dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
The Iris dataset contains flowers of three species namely Setosa, Virginica and Versicolor. The total dataset comprises of 50 samples from each specie and each sample consists of four features i.e. sepals length, petals length, sepals width and petals width. Let’s see the relationship between “sepal_length” and “petal_length“.
# display scatterplot
sns.scatterplot(x="sepal_length", y="petal_length", data=data_set)
Output:
<Axes: xlabel='sepal_length', ylabel='petal_length'>

Customizing Scatterplots
This plot can be further improved by using hue parameter. If we set hue equal to “species”, then the scatter plot specify the relationship between x and y feature of different species.
import matplotlib.pyplot as plt
import seaborn as sns
# Create a figure with a 2x3 subplot grid
fig, axes = plt.subplots(2, 3, figsize=(11, 8))
# Plot scatterplot 1
sns.scatterplot(x="sepal_length", y="sepal_width", hue="species", data=data_set, ax=axes[0, 0])
# Plot scatterplot 2
sns.scatterplot(x="sepal_length", y="petal_length", hue="species", data=data_set, ax=axes[0, 1])
# Plot scatterplot 3
sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=data_set, ax=axes[0, 2])
# Plot scatterplot 4
sns.scatterplot(x="petal_length", y="petal_width", hue="species", data=data_set, ax=axes[1, 0])
# Plot scatterplot 5
sns.scatterplot(x="petal_length", y="sepal_width", hue="species", data=data_set, ax=axes[1, 1])
# Plot scatterplot 6
sns.scatterplot(x="sepal_width", y="petal_width", hue="species", data=data_set, ax=axes[1, 2])
axes[0, 0].set_title("sepal_length vs sepal_width")
axes[0, 1].set_title("sepal_length vs petal_length")
axes[0, 2].set_title("sepal_length vs petal_width")
axes[1, 0].set_title("petal_length vs petal_width")
axes[1, 1].set_title("petal_length vs sepal_width")
axes[1, 2].set_title("sepal_width vs petal_width")
# Adjust spacing between subplots
plt.tight_layout()
# Show the plot
plt.show()
Output:

Exploring different PARAMETERS OF scatterplot() function
Let’s dive into the scatterplot function of seaborn library and its parameters. Lets consider another dataset “titanic.csv“.
- Import the Seaborn library and set the style attribute to whitegrid.
- Then, load the desired ‘titanic’ dataset using the load_dataset method and display its first five rows using the head() method.
- Lastly, display a scatterplot using thescatterplot() function, where the “survived” column is plotted on the x-axis, and the “age” column is plotted on the y-axis, and the data parameter is set equal to the ‘titanic’ dataset. However, you can add other data parameters discussed in the above section for better visualization.
Visualize The Scatterplot By Adding Style Parameter
- Exploring different scatterplot() functions parameter
- visualize the scatterplot by adding style parameter
- visualize the scatterplot by adding style and hue parameter
- visualize the scatterplot by adding style, hue, and palette parameter
- visualize the scatterplot by adding style, hue, palette and size parameter
- Using lmplot() and setting the scatter parameter to True
- Display a scatter plot without using the scatterplot() function
- Calculate the relation between the dataset variables.
- Negative correlation
- Moderate correlation
- Positive correlation
The below syntax produces a scatterplot with different markers for “alone
” values, and the x-axis set to “survived
” and the y-axis set to “age
“.
#display scatterplot
sn.scatterplot(x="survived", y="age", style="alone", data=data_set)
To create a scatterplot using Seaborn’s scatterplot()
function follows a straightforward and forthright.
Here’s how the scatterplot using Seaborn is executing:
x="survived"
– parameter sets the x-axis to the “survived” column.y="age"
– parameter sets the y-axis to the “age” column.style="alone"
– parameter creates separate markers for each value of the “alone” column.
The output of scatterplot() function using seaborn in Python
<AxesSubplot:xlabel='survived', ylabel='age'>
visualize the scatterplot by adding style and hue parameter
A scatterplot with different markers for “survived
” and “age
” values, and the x-axis set to “survived
” and the y-axis set to “age
“. The markers are distinguished by colour, with different colors for “age
“.
- x
="survived"
– parameter sets the x-axis to the “survived
” column. - y=”age” – parameter sets the y-axis to the “
age
” column. hue="age"
– parameter creates separate data points for each “age
” column value and distinguishes them by color.style="survived"
– parameter creates separate markers for each value of the “survived” column. Dot (.) marker for “not survived
” and cross marker for ”survived
“.
#display scatterplot
sn.scatterplot(x="survived", y="age", hue="age", style="survived", data=data_set)
The output of scatterplot() function using seaborn in Python
<AxesSubplot:xlabel='survived', ylabel='age'>
visualize the scatterplot by adding style, hue, and palette parameter
The scatterplot() function from Seaborn is being used to create the plot with the following arguments:
x="survived" -
It sets the x-axis of the plot to the “survived” column from the dataset.- y=”age” – It sets the y-axis of the plot to the “age” column from the dataset.
hue="who"
– It uses the “who” column from the dataset to group the data points by colour for each unique value in the “who” column.style="survived"
– It uses the “survived
” column from the dataset to determine the shape of each data point for each unique value in the “survived
” column.palette='Set1'
– It sets the colour palette to be used for the plot. In the following code, the “Set1
” palette will be used.data=data_set
– It specifies the dataset that will be used to create the plot.
sn.scatterplot(x="survived", y="age", hue="who", style="survived", palette='Set1', data=data_set)
Overall the above syntax depicts a scatterplot that shows the relationship between survival
, age
, and who
in the dataset. The plot will have data points coloured based on the “who
” column and separate marker based on the “survived
” column, one for ‘survived
’ and another for ‘not survived
’.
<AxesSubplot:xlabel='survived', ylabel='age'>
visualize the scatterplot by adding style, hue, palette and size parameter
The scatterplot() function from Seaborn is being used to create the plot, with the following arguments:
size="class"
– The “class
” column from the dataset determines each data point’s size for each unique value in the “class
” column.
#display scatterplot
sn.scatterplot(x="survived", y="age", hue="who", style="survived", palette='Set1', data=data_set, size = "class")
A scatterplot shows the relationship between survival
, age
, who
, and class
in the dataset data_set
. In which data points coloured represents the “who
“ column, the separate markers depict the “survived
“ column and are set-sized to represent the “class
” column.
<AxesSubplot:xlabel='survived', ylabel='age'>
Using lmplot() and setting the scatter parameter to True
The following code uses the Seaborn library to create a scatter plot with a linear regression line from a dataset without using the scatterplot()
function.
Import seaborn as sns
– First, import theSeaborn
library by creating a shorthandsns
.sns.lmplot()
– Thelmplot
() function from Seaborn to create a scatter plot with a linear regression line.line_kws={'color': 'r'}
– It specifies that the color of the regression line will be red.markers=["*"]
– It specifies the marker for data points.scatter=True
– It specifies that the scatter plot should include points in addition to the regression line.
g=sns.lmplot(x='survived',y='age',data=data_set ,line_kws={'color': 'r'}, markers=["*"], scatter=True)
Display a scatter plot without using the scatterplot() function
The following code uses the Matplotlib library to create a scatter plot from a dataset called data_set.
- I
mport matplotlib.pyplot as plt
– It imports thePyplot
module from thematplotlib
library and creates shorthand, termed named it asplt
, which allows the user to visualize data using visuals. import numpy as np
– It imports theNumPy
library and uses its shorthand asnp
, to deal with mathematical operations.- x
='survived'
andy='age'
–x
andy
are variables, and assign them the values “survived
” and “age
“, respectively. plt.scatter()
– It uses thescatter
() function from matplotlib to create a scatter plot.plt.xlabel("Survived")
andplt.ylabel("Age")
– It adds labels to thex-axis
andy-axis
of the plot for better visualization and understanding.plt.show()
– It displays the plot.
import matplotlib.pyplot as plt
import numpy as np
x='survived'
y='age'
plt.scatter(x, y,data=data_set, marker=('*'))
plt.xlabel("Survived")
plt.ylabel("Age")
plt.show()
Calculate the relation between the dataset variables.
The corr() method is used to calculate the correlation coefficient between the different variables of the dataset. The coefficient of correlation is equal to 1; it indicates the strength of the correlation, whereas values closer to 0 depict a weaker correlation.
negative correlation
- data_set[‘who_l
en'] =data_set['who'].apply(len)
– It command line creates a new column in the data_set dataset called “who_len
“, which contains the length of each row’s “who” column. - The
apply
() method is used with the len function as an argument to apply the len function to each item of the “who” column and return the length of each string. data_set['who_len'].corr(data_set['age'])
– It calculates the correlation coefficient between the “who_len
” column and the “age
” column in the data_set dataset.- The
corr
() method calculates the correlation coefficient between two columns.
#First, generate a new column who_len for length of the who.
#Then, find out the correlation between who_len and age.
data_set['who_len'] =data_set['who'].apply(len)
data_set['who_len'].corr(data_set['age'])
-0.2803275326031991
The output –0.2803275326031991
is the correlation coefficient between the “who_len
” column and the “age
” column in the data_set dataset. The negative value indicates a negative correlation between the two columns. Hence, the Coefficient of correlation is relatively small, depicting a weak correlation between the two columns.
moderate correlation
#First, generate a new column who_len for length of the who.
#Then, find out the correlation between who_len and survived.
data_set['who_len'] =data_set['who'].apply(len)
data_set['who_len'].corr(data_set['survived'])
0.5570800422053258
The output 0.5570800422053258
is the correlation coefficient between the “who_len
” column and the ‘survived
‘ column in the data_set dataset. The positive value indicates a strong correlation between the two columns. The coefficient of correlation is moderate, depicting a moderate correlation between the two columns.
Positive correlation
#First, generate a new column who_len for length of the who.
#Then, find out the correlation between who_len and age.
data_set['class_len'] =data_set['class'].apply(len)
data_set['class_len'].corr(data_set['age'])
0.006954025298685509
The output 0.006954025298685509
is the correlation coefficient between the “who_len
” column and the “age
” column in the data_set dataset, calculated using the corr() method. The positive value indicates a positive correlation between the two columns. The coefficient of correlation is very small, close to 0, depicting a weaker relationship between the two columns.
Conclusion
Concluding the entire topic, the first and most straightforward way to display scatterplot using seaborn in Python is to use the scatterplot() function, lmplot(scatter=True) and scatter() function. Moreover, pass the independent and dependent variables within its scope to visualize the behavior of dependent and independent variables within a dataset. Use the method that best fits your problem. However, hoping that this finds you helpful in determining your insightful data.
Learn more about Python Programming at Entechin.com.