Thursday, October 3, 2024

How to create bar plot?

In matplotlib the bar plots (bar charts) are used to display the distribution of categorical data. The data is shown in form of rectangular bars, where the length/height of each bar corresponds to the value of the data it represents. They are very useful for comparing different categories or showing changes in a variable over time.
The key features of bar plots are:
  • Bar representation - each bar represents a category or a group, the length (for vertical bars) or height (for horizontal bars) is proportional to the value it represents.
  • Orientation - the bars can be plotted vertically or horizontally. When you want to plot it vertically you use the plt.bar() function. In case you want to create the horizontal bar plot use the plt.barh()
  • Categorical data - as stated previously the bar plots are useful for visualizing categorical data for example type of fruits, cities, etc.
  • Grouped and stacked bars - the beauty of using bar plots is that large number of data can be shown. The data can be shown in form of the grouped bar plot and the stacked barplot. The grouped bar plot is useful for comparing different groups. The grouped bar plot shows multiple bars for each category. Stacked barplot stacks multiple data series on top of each other for each category.
  • Customization - as in other matploltib plots the bar plots can be fully customized. In a bar plot the bar color, width, edge color, and transparency can be customized. In some cases the variability is needed so the error bars can be added to represent it. Besides the error bars the advanced features can include the annotations and subpolots. By annotations we mean labels, text annotations, or arrows to ephasize parts of the plot. Subplots enable creation of multiple barplots in a single figure using the plt.subplot() function.

Vertical and horizontal bar plots

The matplotlib library enables the creation of vertical and horizontal barplots. The vertical barplots are created using \text{bar() function of pyplot module. The full form of the bar() function is given below.
matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
The matplotlib.pyplot.bar() function consist of the following parameters:
  • x (required) - is the x-coordinates of the bars which can be list, tuple, NumPy array. The array elements represent position along the x-axis where the bars should be placed. For example if x = [1,2,3,4,5] the bars will be centered at the x-coordinates 1,2,3,4, and 5. If for example x = f[2,4,6,8,10] in this case the bars will be centered at the x-coordinates 2,4,6,8, and 10.
  • height (required) - the heights of the bars. The data can be in various formats like the x i.e. list, tuple, NumPy array that represents the values (y-values) for each bar. For example height = [10,20,30] means that the heights of the bars will be 10, 20, and 30 units, respectively.
  • width (optional) - is the width of the bars. The default value is 0.8. A single float value will set the same width for all bars, while an array-like input allows specifying different width for different bars. IF width is equal to 0.5, all bars will have width of 0.5 units.
  • bottom (optional) - the y-coordinate of the bottom edges of the bars. This parameter is useful if you need to stack the bars on top of each other.
  • For example if the bottom = 5, all bars will start at y = 5.
  • align (optional) - this option determines how the bars will be aligned relative to the x-coordinates. There are two options available:
    • center (default value) - bars are centered on the x-coordinates.
    • edge - bars can be aligned to the left (if width > 0) or right (if width < 0) of the x-coordinates.
  • data (optional) - An optional parameter to pass a dictionary or a pandas DataFrame as data, which can be referenced by variable names in the x, height, width, and bottom parameters. If data = {'x': [1, 2, 3], 'height': [10, 15, 7]}, you can refer to x and height directly.
  • **kwargs (optional) - Additional keyword arguments that can be used to customize the appearance of the bars. Common options include:
    • color - the colors of the bars
    • edgecolor - the color of the edges of the bars.
    • linewidth - The width of the edges of the bars
    • alpha - transparency level of the bars (0 to 1 where 0 is fully transparent and 1 is fully opaque)
    • label - the label of the bars, useful when creating legends,
    • hatch - patterns inside bars for example -, +, *, o

Example 1 - Vertical bar plot

In this example we will create the bar plot using the matplotlib.pyplot.bar() function. The dataset consist of categories (A,B,C,D) with values 5,7,3, and 9. The example consist of the following steps.
  • import libraries,
  • defining categories and values,
  • defining the bar plot

Import libraries

To create the bar plot we have to import required libraries. In this case we will have to import matplotlib.pyplot module.
import matplotlib.pyplot as plt 
The matplotlib.pyplot module contains the bar() function which is required to create the bar plot.

Defining the categories and values

After the libraries are imported the dataset must be created and these are categories and values.
categories = ['A', 'B', 'C', 'D']
values = [5,7,3,9]
The categories variable contains the categories "A", "B", "C", and "D". These categories will be on x-axis. The values variable contains values for each category. So 'A' has a value of 5, "B" has a value of 7, C has value of 3, and "D" contains value of 9.

Defining and showing the bar plot

After the dataset is defined it is time to define and show the bar plot. The first step is to define the pyplot figure and define the figure size. The pyplot figure is defined using the plt.figure() function and the figure size is defined as an argumente of the aforementioned function. The figure size of figsize is equal to tuple (12,8) which represent the 12 by 8 inches.
plt.figure(figsize=(12,8))
The second step is to define the bar plot using the plt.bar function. The arguments of the bar function are the x and y values i.e. the categories and vlaues variables.
plt.bar(categories, values)
The third step is to define the title, the x-axis, and y-axis names. The title is "Vertical Bar Plot Example" the x-axis name is "Category", and y-axis name is "Value". To show the plot we will use the plt.show() function.
plt.title("Vertical Bar Plot Example") 
plt.xlabel("Category")
plt.ylabel("Value")
plt.show()
The entire code used in this example is shown below.
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D']
values = [5, 7, 3, 9]
	
plt.bar(categories, values)
plt.title("Vertical Bar Plot Example")
plt.xlabel("Category")
plt.ylabel("Value")
plt.show()
After executing the previous code the result is shown in Figure 1.
2024-10-03T22:53:30.292552 image/svg+xml Matplotlib v3.8.0, https://matplotlib.org/
Figure 1 - Vertical Bar plot Example

Horizontal bar plots

To plot the horizontal bar plot you have to use pyplot barh() function. The function in default format is shown below.
matplotlib.pyplot.barh(y, width, height=0.8, left=None, *, align='center', data=None, **kwargs)
The parameters of the barh function are described below:
  • y (required) - can be a scalar or array-like. The y represents the positions of the bars along the y-axis. If you pass an array, it should specify the positions of each bar. For example [0,1,2,3] would place bars at positions 0, 1, and 2 on the y-axis.
  • width (Required)- can be scalar or array-like. Specifies the lengths (or widths) of the bars. It determines how far the bars extend along the x-axis. This can be a scalar (same width for all bars) or an array to assign different widths to each bar.
  • height (optional) - can be scalar, or array-like and the default value is 0.8. This parameter controls the height (or thickness) of the bars along the y-axis. The default value is 0.8 which means that the bars will take up 80\% of the vertical space between positions. You can make the bars taller or thinner by changing this value.
  • left (optional) - can be scalar, or array like. The default value of this parameter is None. The starting position of the bars on the x-axis. By default, bars start from 0 (on the left edge of the plot). If the array is passed, it will shift the starting positions of the bars along the x-axis.
  • align (optional) - can be 'center' or 'edge'. The default value of this parameter is 'center'. This parameter determines how the bars are aligned relative to the positions specified by y: 'center' (bars are centered on the y positions), 'edge' (Bars are aligned by hteir bottom edge with the y postions).
  • data (optional) - indexable object. This parameter allows for plotting directly from data structures like DataFrames or dictionaries. You can refer to column names or dictionary keys directly when using data.
  • **kwargs - Any additional arguments passed here will be forwarded to matplotlib.patches.Rectangle, allowing for further customization. Common parameters include: color, edgecolor, linewidth, label, and alpha.

Example 2 - Creating horizontal bar plot

In this example we will create the horizontal bar plot with following data: \begin{eqnarray} categories &=& ['A', 'B', 'C', 'D'] \\ \nonumber values &=& [3,7,5,9] \end{eqnarray} The example consist of the following steps:
  • Import required libraries,
  • Defining data,
  • Defining and showing plot

Importing libraries

For this example we will require matplotlib.pyplot module.
import matplotlib.pyplot as plt 

Defining data

For horizontal bar plot two variables are required categories and values. The categories are A, B, C, and D and each category has its values A 3, B 7, C 5, and D 9.
categories = ['A', 'B', 'C', 'D']
values = [3,7,5,9]

Defining and showing the horizontal bar plot

To define the plot we will define the figure using plt.figure(), define the figure size of 12 by 8 inches using figure() parameter figsize=(12,8). Then we will define the horizontal bar plot using plt.barh() function. The arguments of the function are categories and vlaues. Them we will define the the title "Horizontal Bar Plot Example" using the plt.title() function, define the x-axis name "Value" and y-axis name "Category". Both of x-axis and y-axis names will be defined using plt.xlabel() and plt.ylabel() function. The horizontal bar plot will be shown using plt.show() function.
plt.barh(categories, values)
plt.title("Horizontal Bar Plot Example")
plt.xlabel("Value")
plt.ylabel("Category")
plt.show()
The entire code created in this example is shown below.
import matplotlib.pyplot as plt 
categories = ['A', 'B', 'C', 'D']
values = [3,7,5,9]
plt.figure(figsize = (12,8))
plt.barh(categories, values)
plt.title("Horizontal Bar Plot Example")
plt.xlabel("Value")
plt.ylabel("Category")
plt.show()
The plot generated using this code is shown below.
2024-10-03T22:57:20.416244 image/svg+xml Matplotlib v3.8.0, https://matplotlib.org/
Figure 2 - Horizontal Bar Plot

Grouped and stacked bar plots

A grouped bar plot is used to visually compare multiple datasets across various categories by placing bars for each dataset side by side within each category. This allows for clear comparison of values between datasets for each category, making it particularly useful when analyzing differences or trends across different groups. For instance, in a business context, you might use a grouped bar plot to compare sales figures for different years across various product categories. Each category (such as a product or a region) would have multiple bars, each representing a different year’s sales. The bars are displayed next to each other for easy comparison within each category, allowing you to observe how the values differ or align between datasets.
On the other hand, a stacked bar plot is designed to display the composition of data within each category by stacking the values for multiple datasets on top of each other. Instead of placing the bars side by side, the values for each dataset are stacked vertically (or horizontally in the case of horizontal bar plots), allowing the total height (or length) of the stack to represent the cumulative value for each category. This type of plot is useful when you want to understand the total combined value of multiple datasets for each category, while still being able to see the contribution of each dataset. Stacked bar plots are often used when you want to emphasize the overall totals while still showing how individual parts contribute, such as analyzing the composition of sales from different regions or products within the overall company revenue for each year.

Example 3 - Grouped bar plot

This example demonstrates how to create a grouped bar plot using Matplotlib in Python, which is useful for comparing multiple dataset side by side across different categories. This example consist of the following steps:
  • Importing libraries
  • Defining values for each dataset
  • Setting Bar width and defining index position
  • Creating the Grouped Bar plot
  • Adding Labels and Title
  • Customizing X-axis Tick Labels
  • Adding a Legend
  • Displaying the Plot

Importing libraries

In this example we will need the numpy library for generating arrays and the matplotlib.pyplot module for creating grouped bar plot.
import numpy as np 
import matplotlib.pyplot as plt 
The import numpy as np imports the NumPy library, which is used for numerical operations in Python. In this example it used to create an array of evenly spaced values (indices) for the categories. The matplotlib library pyplot module is imported to use function for creating figure (plt.figure()), bar plot (plt.bar()), and other functions.

Defining values for each dataset

In this example there are two datasets and both datasets have the same categories. The categories and values for both datasets are shown in the following code block.
categories = ['A','B','C','D']
values1 = [5,7,3,9]
values2 = [2,8,5,7]
As stated, both datasets have four categories labeled 'A', 'B', 'C', and 'D'. The values1 and values2 are lists containing the values to be plotted for two different datasets. These values will be represented as bars in the plot.

Setting Bar Width and Defining Index Positions

Each bar in the bar plot will have the width of 0.35. We also have to create an array that will contain numbers corresponding to the position of the bars ( one for each category). The definition of the bar width and the array of numbers corresponding to the position of the bars is shown in the following code block.
bar_wdith = 0.35 
index = np.arange(len(categories))
The bar\_width defines the width of each bar. Here the bars will be 0.35 units wide. The np.arange(len(categories)) generates an array of numbers corresponding to the positions of the bars( one for each category). The categories variable contains 4 elements (A,B,C, and D), the index variable will be [0,1,2,3]. \subsubsection{Creating the Grouped Bar Plots} Before creating the bar plots we will just define the figure and the figure size using the plt.figure().
plt.figure(figsize=(12,8))
As you probably know with the plt.figure() the empty figure is created and with the parameter figsize=(12,8) the size of the figure is set to 12 by 8 inches. After the empty figure is created we can define the grouped bar plots using the plt.bar() function. Since we want to create the grouped bar plot due two datasets we will have to use plt.bar() function twice i.e. the first time for the first dataset (categories and values1) and the second time for the second dataset (categories and values2). The following code shows the definition of the grouped bar plot.
plt.bar(index, values1, bar_width, label='Dataset 1')
plt.bar(index + bar_width, values2, bar_width, label = 'Dataset 2')
In the previous code block the plt.bar() is used to create the bar plot. The first call plots the bars for the values 1 at the positions specified by index ([0,1,2,3]). The second call shifts the bars for values2 by bar\_width (to the right) so they don't overlap and appear side by side. The index original array used in first plt.bar() function is 0,1,2,3. The index+bar\_wdith array used in the second plt.bar() function is equal to [0.35, 1.35, 2.35, 3.35]. Since we are showing two datasets in the grouped bar plot it is mandatory to create a label in each of the plt.bar() functions. So in first plt.bar() function we have created label="Dataset 1" and in the second plt.bar() function the label = "Dataset 2". The labels are generally necessary to distinguish between two or more datasets.

Adding Labels and Title

The x-axis name will be Category, the y-axis name will be "Values", and the title name will be "Grouped Bar Plot Example". The x-axis label will be created using the plt.xlabel(), the y-axis label will be created using the plt.ylabel(), and the title will be created using the plt.title() function.
plt.xlabel("Category")
plt.yalbel("Values")
pt.title("Grouped Bar Plot Example")
To summarize, in the previous code block, the plt.xlabel() and the plt.ylabel() are used to set the labels for the x-axis and y-axis, respectively. In this case the x-label and the y-label are the Category and the Values, respectively. The plt.title() is a function used to create the "Grouped Bar Plot Example" on the grouped bar plot.

Customizing X-axis Tick Labels

In grouped bar plot sometimes it is required to adjust the position of the tick labels (category names).
plt.xticks(index + bar_width / 2, categories)
The plt.xiticks() adjust the positions of the tick labels (category names). The tick labels are placed in the center of each group of bars (between index and index + bar\_width). The index values are 0, 1, 2, and 3, and the bar\_width/2 is 0.175 (0.35/2). So the new xitcks will be placed on 0.175, 1.175, 2.175, and 3.175, respectively.

Adding a legend

We need to find a way of displaying labels ("Dataset 1" and "Dataset 2") defined in the plt.bar() functions. This will be done using the plt.legend() function.
plt.legend()
The plt.legend() adds a legend to the plot, using the label parameters specified earlier i.e 'Dataset 1' and 'Dataset 2' to differentiate between the two datasets. \subsubsection{Displaying the plot} After libraries are imported, the dataset is created and the grouped bar plot is defined the plot can finally be shown using the plt.show() function.
plt.show()
As previously stated the plot will be shown using the plt.show() function. The entire code created in this example is shown below.
import numpy as np
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D'] 	
values1 = [5, 7, 3, 9]
values2 = [2, 8, 5, 7]
bar_width = 0.35
index = np.arange(len(categories))
	
plt.bar(index, values1, bar_width, label='Dataset 1')
plt.bar(index + bar_width, values2, bar_width, label='Dataset 2')
plt.xlabel('Category')
plt.ylabel('Values')
plt.title('Grouped Bar Plot Example')
plt.xticks(index + bar_width / 2, categories)
plt.legend()
plt.show()
When the previous code is executed the obtained plot is shown in Figure 3.
2024-10-03T23:01:15.456687 image/svg+xml Matplotlib v3.8.0, https://matplotlib.org/
Figure 3 - Grouped Bar Plot

Example 4 - Stacked Bar Plot

This code demonstrates how to create a stacked bar plot using Matplotlib in Python. The plot will show how two different datasets (represented as bars) stack on top of each other across multiple categories, allowing you to visualize both individual contributions and the cumulative totals for each category. The entire example consist of the following steps:
  • Importing necessary libraries,
  • Defining the categories,
  • Defining the values for eac dataset
  • Creating the stacked bar plot
  • Adding the labels and title
  • Adding a legend
  • Displaying the plot.

Importing necessary libraries

In this example two libraries are required i.e. numpy and the matplotlib.pyplot.
 
import matplotlib.pyplot as plt 
The matplotlib's Pyplot (plt) is imported to generate the bar plot and customize it with labels, titles, and a legend.

Defining the Categories

The first step after importing required libraries is to define the x axis value. In this case this will be categories 'A','B', 'C', and 'D'.
categories = ['A','B','D','E']
The categories is a list of four labels ('A', 'B', 'C', 'D') that represent the categories along the x-axis. These could represent different groups, products, or any other categorical data.

Defining the Values of Each Dataset

In this example we have two datasets and the y-values for each dataset will be defined here in form of lists
values1 = [5, 7, 3, 9]
values2 = [2, 8, 5, 7]
The values1 and values2 are lists containing the numerical data for two different datasets. Each list has four values, corresponding to the four categories ('A', 'B', 'C', 'D'). The values1 will be plotted first, representing the base layer (bottom part) of each stacked bar. The values2 will be plotted on top of values1, completing the stacked bar for each category.

Creating the Stacked Bar Plot

After the dataset is defined the next step is to define the empty figure and define the bar plots for both dataset. The empty figure in matplotlib is defined through the plt.figure() function.
plt.figure(figsize=(12,8))
The size of the figure is defined through the figsize parameter which is equal to (12,8) tuple. This tuple represents the 12 by 8 inches figure size.\newline Plotting the first dataset (values 1):
plt.bar(categories, values1, label='Datset 1')
This creates the first set of bars using the plt.bar() function. It uses the categories ('A', 'B', 'C', 'D') for the x-axis, and the values from values1 for the height of each bar. The label='Dataset 1' parameter assigns a label to these bars for use in the legend later. The code for plotting the second dataset (values2) is shown in following code block.
plt.bar(categories, values2, bottom=values1, label='Dataset 2')
This creates the second set of bars using plt.bar() again. However, this time the bottom=values1 parameter is used, which means that the values from values2 will be stacked on top of the corresponding values from values1. The bars for Dataset 2 will start at the top of Dataset 1's bars, effectively stacking them. The label='Dataset 2' assigns a label to these stacked bars for the legend.

Adding Labels and Time

The x-axis label is "Category", the y-axis label is "Values", and the title is "Stacked Bar Plot Example". The x-axis and y-axis labels will be created using the plt.xlabel() and plt.ylabel() functions while the bar plot title will be created using the plt.title() function.
plt.xlabel('Category')
plt.ylabel('Values')
plt.title('Stacked Bar Plot Example')
To summarize the plt.xlabel('Category') will add the label to the x-axis, indicating that the horizontal axis represents different categories. The plt.ylabel('Values') will add a label to the y-axis, showing that the vertical axis represents the numerical values of the datasets. The plt.title("Stacked Bar Plot Example") adds a title to the plot.

Adding a Legend

The purpose of adding the legend is to show the labels defined in the plt.bar() functions and to distinguish between plotted datasets.
plt.legend()
This command adds the legend to the plot. The labels specified earlier ('Dataset 1' and 'Dataset 2') will appear in the legend, making it clear which bars represent which dataset.

Displaying the Plot

> After required libraires are imported, the dataset is defined, and the stacked bar plot is defined the final step is to show the plot.
plt.show()
This final line of code will display the plot.
import numpy as np 
import matplotlib.pyplot as plt 
categories = ['A','B','C','D']
values1 = [5, 7, 3, 9]
values2 = [2, 8, 5, 7]

plt.bar(categories, values1, label='Dataset 1')
plt.bar(categories, values2, bottom=values1, label='Dataset 2')
plt.xlabel('Category')
plt.ylabel('Values')
plt.title('Stacked Bar Plot Example')
plt.legend()
plt.show()
The result is shwon in The following Figure.
2024-10-03T23:04:54.444876 image/svg+xml Matplotlib v3.8.0, https://matplotlib.org/
Figure 4 - Stacked Bar Plot

No comments:

Post a Comment