HomeBlogTutorials
📖 16 min read
A sleek dark-mode IDE glowing with clean Python code making an API request, hovering in an abstract neon digital space
Advertisement

Every senior AI Engineer started in the exact same place: staring at an empty code file, trying to figure out how to make a Python script talk to an OpenAI server. The OpenAI documentation is robust, but it often assumes you already know how to manage API keys securely and handle conversation state.

This is the unskippable tutorial. In the next 30 minutes, you are going to write a terminal-based Chatbot. It won't have a web UI. It won't have a database. It will be pure, unadulterated interaction logic. Once you understand how this underlying script works, plugging it into a Flask backend or a React frontend becomes trivial.

Prerequisites

Before writing code, ensure you have three things:

  1. Python 3.10+ installed on your machine.
  2. An OpenAI Developer Account with billing enabled. You cannot use the API without attaching a credit card. Don't worry, this tutorial will cost you less than $0.05.
  3. An API Key. Go to platform.openai.com, navigate to API Keys, generate a new one, and copy it immediately. You will not be able to see it again.

Step 1: The Environment Variables (Don't leak your key)

The fastest way to ruin your day is hardcoding your API key into your Python script and accidentally pushing it to GitHub. OpenAI's automated bots will revoke the key immediately, but not before you panic.

We use the python-dotenv library to manage secrets.

Open your terminal, create a new directory, and execute these commands:

mkdir my_ai_assistant
cd my_ai_assistant
python -m venv venv

# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

pip install openai python-dotenv

Inside that folder, create a file named exactly .env (no filename, just the extension). Paste your key inside:

OPENAI_API_KEY=sk-proj-YoUrACTuALapIkEyGoeSHere
Advertisement

Step 2: The Bare Minimum Connection

Create a file named app.py. We are going to start by making a single, one-off call to the API just to prove our connection works.

import os
from dotenv import load_dotenv
from openai import OpenAI

# Load the API key from the .env file
load_dotenv()

# Initialize the client. It automatically looks for OPENAI_API_KEY in the environment.
client = OpenAI()

def make_simple_call():
    print("Calling OpenAI...")
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": "Write a haiku about Python programming."}
        ]
    )
    
    # The response object is deeply nested. We have to dig into choices[0]
    print(response.choices[0].message.content)

if __name__ == "__main__":
    make_simple_call()

Run python app.py. If you get a haiku about indentation, congratulations. You are officially interacting with an LLM programmatically.

Advertisement

Step 3: Understanding the Messages Array

Look closely at the messages parameter in the code above. It takes an array of dictionaries. This is the entire secret to how LLMs manage conversations: they don't. The API is entirely stateless.

Every time you send a request, OpenAI forgets everything you ever told it. To create a "conversation," you have to send the entire conversation history back to the API every single time you make a new request.

There are three specific roles you must use in this array:

Step 4: Building the Continuous Chat Loop

Now we will delete make_simple_call() and build a loop that maintains state. We initialize a list called conversation_history with a powerful system prompt, and then continuously append the user's input and the model's responses to it.

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI()

def start_chat():
    print("\nTerminal Assistant Booting Up. Type 'exit' to quit.\n")
    print("-" * 50)
    
    # Initialize the memory with our system prompt
    conversation_history = [
        {"role": "system", "content": "You are a concise, highly technical AI assistant. Never output pleasantries. Answer directly in markdown code blocks when possible."}
    ]
    
    while True:
        # 1. Get user input
        user_input = input("\nYou: ")
        
        if user_input.lower() in ['exit', 'quit']:
            print("Shutting down...")
            break
            
        # 2. Append user input to history
        conversation_history.append({"role": "user", "content": user_input})
        
        try:
            # 3. Send entire history to the API
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=conversation_history,
                temperature=0.7 # Read our guide on Top-P & Temperature to understand this
            )
            
            # 4. Extract the reply
            ai_reply = response.choices[0].message.content
            print(f"\nAI: {ai_reply}")
            
            # 5. Append AI reply to history so the model "remembers" it next time
            conversation_history.append({"role": "assistant", "content": ai_reply})
            
        except Exception as e:
            print(f"\n[ERROR] The API call failed: {e}")

if __name__ == "__main__":
    start_chat()
Advertisement

Step 5: Handling the Context Window Crash

If you run the script above and chat for three hours, it will eventually crash. Why?

Because the conversation_history list is growing indefinitely. Eventually, the list of messages will exceed GPT-4o's context window (128,000 tokens), and the API will throw an error. Even before it crashes, sending 100,000 tokens back and forth will cost you ridiculous amounts of money.

In a production application, you manage this by implementing a sliding window. You only keep the system prompt and the last ~10 messages. Here is the logic you would add to the end of the while loop to protect your context window:

# Keep the System Prompt (index 0), but slice the rest to keep only the last 10 interactions
# An interaction is 2 items (User + Assistant), so keep 20 total.
if len(conversation_history) > 21:
    # Keep index 0, plus the last 20 elements
    conversation_history = [conversation_history[0]] + conversation_history[-20:]

Next Steps: From Terminal to Production

You now have a functioning, stateful conversational agent. The barrier between what you just built in the terminal and a massive B2B SaaS application is entirely visual.

In a real application, you replace the input() function with an HTTP POST endpoint in Flask or FastAPI. You replace the Python print() statement with a JSON return object sent to a React frontend. You replace the python list with a PostgreSQL database storing the message arrays linked to User IDs.

If you want to take this to the next level, start experimenting with Prompt Chaining to give your chatbot reasoning capabilities before it replies.

Advertisement