Convert CSV Into Dictionary in Python

The CSV (Comma Separated Values) is a plain text file that stores tabular data in a simple and structured format. Each line in a CSV file represents a row, and the values within each row are separated by commas or other delimiters, such as semicolons or tabs. These CSV files are commonly used to store datasets. Converting CSV files into a usable format can be challenging task. However, python provides libraries, such as pandas and csv, to easily read and parse CSV files for data analysis and manipulation. This tutorial will guide you through the process of converting a CSV file into a dictionary, enabling you to efficiently manipulate and analyze your data for various data processing tasks.

To convert a CSV file into a dictionary in Python, you have multiple options available: you can use the csv.DictReader() method or the to_dict() method from the pandas library. Both methods create dictionaries from the CSV data, with column headers as keys. They provide convenient ways to manipulate and analyze CSV data in Python. You can also use dictionary comprehension to manually iterate through a csv file and convert it into a dictionary. 

If you want to learn more about Python Programming, visit Python Programming Tutorials.

For analysis of csv files, we have utilized the iris dataset, which serves as a benchmark for evaluating machine learning algorithms. The dataset consists of measurements of four features (sepal length, sepal width, petal length, and petal width) from three different species of Iris flowers (Setosa, Versicolor, and Virginica). It contains 150 samples, with 50 samples for each species.

There are different approaches by which we can convert a data of csv file into a list of dictionaries. In this article, we will demonstrate the following methods to convert CSV to a dictionary in Python:

  1. Using the to_dict() approach
  2. Using the DictReader() approach
  3. Using dictionary comprehension approach

Convert CSV Into a Dictionary in Python using the to_dict() approach

The to_dict() method in Python is a function provided by the pandas library. It allows you to convert a DataFrame object into a dictionary, offering flexibility in representation. Additionally, the to_dict() method introduces the orient parameter, which determines the format of the resulting dictionary. The orient parameter can take different values, each producing a different structure for the resulting dictionary. Here is a brief description of the values and their corresponding dictionary formats.

Parameter ValueDescription
dict (default)Returns a dictionary where the keys are column names and the values are dictionaries containing the row values.
listReturns a list of dictionaries, where each dictionary represents a row, and the keys are column names.
seriesReturns a dictionary where the keys are column names, and the values are pandas Series objects containing the row values.
splitReturns a dictionary containing separate lists for column names and row values.
recordsReturns a list of dictionaries, where each dictionary represents a row, and the keys are column names.
indexReturns a dictionary where the keys are row indices, and the values are dictionaries containing the column values.

The choice of the orient parameter depends on how you want the resulting dictionary to be structured and how you plan to work with the data. Reading a CSV file in Python is simplified with the pandas library. Use the read_csv() function of pandas library to read the contents of csv file. This function takes the path to the CSV file as an argument and returns a DataFrame object.

import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv("/content/drive/MyDrive/iris_csv.csv")

# Select two samples from each class
data_subset = df.groupby('class').head(2)

# Convert the subset DataFrame to a dictionary
result = data_subset.to_dict(orient='records')

# Print the dictionary
print(result)
[{'sepallength': 5.1, 'sepalwidth': 3.5, 'petallength': 1.4, 'petalwidth': 0.2, 'class': 'Iris-setosa'}, {'sepallength': 4.9, 'sepalwidth': 3.0, 'petallength': 1.4, 'petalwidth': 0.2, 'class': 'Iris-setosa'}, {'sepallength': 7.0, 'sepalwidth': 3.2, 'petallength': 4.7, 'petalwidth': 1.4, 'class': 'Iris-versicolor'}, {'sepallength': 6.4, 'sepalwidth': 3.2, 'petallength': 4.5, 'petalwidth': 1.5, 'class': 'Iris-versicolor'}, {'sepallength': 6.3, 'sepalwidth': 3.3, 'petallength': 6.0, 'petalwidth': 2.5, 'class': 'Iris-virginica'}, {'sepallength': 5.8, 'sepalwidth': 2.7, 'petallength': 5.1, 'petalwidth': 1.9, 'class': 'Iris-virginica'}]

In the above example, we have used the groupby() function to select two samples from every class and the to_dict() method with the orient='records' parameter to convert the subset DataFrame to a list of dictionaries. The to_dict() method is very useful when you are working with tabular data stored in a DataFrame.

DictReader() function to convert CSV files into DICT format

The DictReader() method is another approach for converting CSV files into dictionaries in Python. With this method, each row in the CSV file is transformed into a dictionary, with the column headers as keys and the row values as values.

Here’s an example of how to use the DictReader() method:

import csv
with open("/content/drive/MyDrive/iris_csv.csv", 'r') as file:
   # Create a DictReader object
    csv2dict = csv.DictReader(file)

    # Convert the CSV file into a dictionary
    dictionary = list(csv2dict)

# Print the first five rows from the dictionary
print("Output: ",dictionary[:5])
Output:  [{'sepallength': '5.1', 'sepalwidth': '3.5', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.9', 'sepalwidth': '3.0', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.7', 'sepalwidth': '3.2', 'petallength': '1.3', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.6', 'sepalwidth': '3.1', 'petallength': '1.5', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '5.0', 'sepalwidth': '3.6', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}]

In this example, we first open the CSV file using the open() function and store it in the file variable. Then, we create a DictReader object csv2dict using the csv.DictReader() method, passing the file as the parameter. Next, we convert the CSV file into a list of dictionaries by calling the list() function on the csv2dict object.

Note that the DictReader() method assumes that the first row of the CSV file contains the column headers. If your CSV file doesn’t have a header row, you can pass the fieldnames parameter to the csv.DictReader() method to specify the column headers manually.

Dictionary comprehension approach to convert CSV into the dictionary in Python. 

Using dictionary comprehension in combination with the reader() function, it is possible to convert a CSV file into a dictionary. The reader() function is part of the csv module and is used to read the CSV file. By using dictionary comprehension, we can effortlessly transform each row of the CSV file into a dictionary, with the header values serving as the keys.

To illustrate the process, consider the following example. We initiate the conversion by opening the CSV file using the open() function, followed by creating a reader object using csv.reader(), with the file as the parameter. Next, we extract the values from the header row and store them in the header variable, which will serve as the keys for the resulting dictionary.

import csv
dict_from_csv = {}
with open("/content/drive/MyDrive/iris_csv.csv",'r') as file:
    # Create a reader object
    reader = csv.reader(file)

    # Extract the header row
    header = next(reader)

    # Initialize an empty list to store the dictionaries
    dictionary_list = []

    # Convert each row into a dictionary and append to the list
    for row in reader:
        dictionary = {header[i]: value for i, value in enumerate(row)}
        dictionary_list.append(dictionary)

# Print the list of dictionaries
print(dictionary_list[:5])
[{'sepallength': '5.1', 'sepalwidth': '3.5', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.9', 'sepalwidth': '3.0', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.7', 'sepalwidth': '3.2', 'petallength': '1.3', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '4.6', 'sepalwidth': '3.1', 'petallength': '1.5', 'petalwidth': '0.2', 'class': 'Iris-setosa'}, {'sepallength': '5.0', 'sepalwidth': '3.6', 'petallength': '1.4', 'petalwidth': '0.2', 'class': 'Iris-setosa'}]

The above code outputs dictionaries with keys as column or feature names and the values are the corresponding values from that row. These dictionaries are then appended to a list.

The output will be a list of dictionaries where each dictionary represents a row from the CSV file.

Conclusion

In conclusion, converting a CSV file to a Python dictionary is a common task in data analysis and manipulation. This article discusses different methods, such as using the pandas library, the csv module’s DictReader(), or implementing dictionary comprehension, to efficiently convert CSV data into dictionaries. These approaches allow for easy access and manipulation of the data, thus enabling efficient data analysis and processing. Depending on the size and complexity of your dataset, you can choose the most suitable method to convert your CSV file into a dictionary. If you have any queries, let us know in the comments.

Leave a Comment

Your email address will not be published. Required fields are marked *