Chat With CSV Data Using Langchain and OpenAI

Michael Jacobsen
2 min readSep 12, 2023

--

I regularly work with clients who have years of data stored in their systems. Typically, the tools used to extract and view this data include CSV exports or custom reports, with Excel often being the final stop for further analysis. But what if we could simplify this? What if we could just ask the data what we wanted to know? This article delves into using LangChain and OpenAI to transform traditional data interaction, making it more like a casual chat. Let’s see how we can make this shift and streamline the way we understand our data.

Prerequisites

  1. Python 3.10+
  2. OpenAI API Key
  3. Sample CSV Data — I used an export from https://www.mockaroo.com/
  4. Install libraries:
pip install langchain
pip install openai

Script

In the below script, update your OpenAI API Key and Sample CSV data.

We’re using the create_csv_agent from Langchain to add the CSV to our context.

I’ve tested this with a local LLM, such as a LLama2 variant (https://huggingface.co/TheBloke/Llama-2-13B-GGML), with success, however, the performance suffers.

#!/usr/bin/env python3
import os
os.environ['OPENAI_API_KEY'] = "OPEN_AI_API_KEY"

from langchain.llms.openai import OpenAI
from langchain.agents.agent_types import AgentType
from langchain.agents import create_csv_agent
import time

csv_file_path = "SAMPLE_CSV"

def main():

llm=OpenAI(temperature=0)

agent = create_csv_agent(
llm,
csv_file_path,
verbose=False,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

while True:
query = input("\nWhat would you like to know?: ")
if query == "exit":
break
if query.strip() == "":
continue

# Get the answer
start = time.time()
answer = agent.run(query)
end = time.time()
print(answer)
print(f"\n> Answer (took {round(end - start, 2)} s.)")

if __name__ == "__main__":
main()

Output

Conclusion

In running locally, metadata-related questions were answered quickly whereas computation-based questions took somewhat longer, so in this form, not exactly a replacement for Excel. However, I think it opens the door to possibility as we look for solutions to gain insight into our data.

While this is a simple attempt to explore chatting with your CSV data, Langchain offers a variety of agents and toolkits to explore: https://python.langchain.com/docs/integrations/toolkits/

--

--