Thinking in Loops: Mastering ReAct Prompting

ReAct (Reasoning and Acting) is a prompting technique for large language models (LLMs) that interleaves reasoning steps with tool use. This approach enables LLMs to solve complex tasks by thinking step-by-step, taking actions (like calling functions or APIs), and integrating the results of those actions into further reasoning.

The ReAct Loop: Core Components

Thought: The model explains its current reasoning or what it needs to do next.
Action: The model specifies an action to take, such as calling a function or querying a tool. In code, this is often formatted as a JSON object for easy parsing.
Observation: The model receives and processes the result of the action, which informs its next step.
Repeat: The loop continues until the model can provide a final answer.

The ReAct Structure: Who Does What?

When using the ReAct (Reasoning and Acting) approach with AI and Python, each step in the loop is handled by a different part of the system. Here’s a clear breakdown of which steps are performed by the user, the AI, and the Python code.

Step-by-Step Breakdown

Step	Who Performs This Step?	What Happens
1. User Input	User	The user asks a question or gives a task.
2. Thought	AI	The AI explains its reasoning about what to do next.
3. Action	AI	The AI decides which tool or function to use and formats the action (often as JSON).
4. Observation	Python Code	The Python program runs the function/tool, gets the result, and provides it as an observation.
5. Loop	AI & Python Code	The AI uses the new observation to decide if more steps are needed, repeating as necessary.
6. Final Answer	AI	The AI gives a clear, final answer to the user.

Example Walkthrough

Let’s see how this works in practice with a simple weather query.

1. User Input

User: What’s the weather in Tokyo?

2. Thought (AI)

AI: Thought: I need to find out the current weather in Tokyo.

3. Action (AI)

AI: Action: {“tool”: “get_weather”, “args”: {“location”: “Tokyo”}}

4. Observation (Python Code)

Python Code: Runs the get_weather function with the argument "Tokyo" and gets the result.
Python Code: Observation: The weather in Tokyo is sunny and 75°F.

5. Final Answer (AI)

AI: Final Answer: The weather in Tokyo is sunny and 75°F.

Step	Example Output	Who Does It?
User Input	What’s the weather in Tokyo?	User
Thought	Thought: I need to find out the current weather in Tokyo.	AI
Action	Action: {“tool”: “get_weather”, “args”: {“location”: “Tokyo”}}	AI
Observation	Observation: The weather in Tokyo is sunny and 75°F.	Python Code
Final Answer	Final Answer: The weather in Tokyo is sunny and 75°F.	AI

Key Points

User: Asks the question or gives the task.
AI: Handles the reasoning (Thought), decides on actions (Action), and provides the final answer.
Python Code: Executes the actual function/tool and returns the result (Observation) to the AI.

Does the AI See the Full ReAct History Each Turn?

When an AI agent uses the ReAct (Reasoning and Acting) framework and iterates through multiple cycles of Thought → Action → Observation, the way previous steps are included in the prompt is crucial for maintaining context and producing accurate answers.

How History Is Handled in ReAct

Full History by Default: In most ReAct implementations, the entire sequence of previous Thought/Action/Observation blocks is included in the prompt to the AI at each step. This means the model always has access to the full reasoning and tool-use history up to that point, allowing it to build on prior steps and avoid repeating actions unnecessarily.
Why Include Full History?
- Contextual Reasoning: The AI can reference earlier thoughts, actions, and observations to inform its next move.
- Avoiding Redundancy: By seeing what has already been tried, the AI is less likely to repeat the same action or make the same mistake.
- Coherent Final Answers: The model can synthesize all previous information for a well-supported final answer.

Practical Considerations

Prompt Length Limits: As the number of steps grows, the prompt can become very long. If the conversation or task is especially lengthy, some systems may:
- Trim Older Steps: Remove the oldest blocks to stay within the model’s context window.
- Summarize History: Replace earlier steps with a summary to save space while preserving key information.
Agent Frameworks: Popular frameworks (like LangChain, Agno, and others) typically manage this history automatically, either by including the full message history or by applying strategies to keep the prompt within token limits.

Example: What the AI Sees

At each iteration, the AI’s prompt might look like this:

User: What's the weather in Tokyo?
Thought: I need to find out the current weather in Tokyo.
Action: {"tool": "get_weather", "args": {"location": "Tokyo"}}
Observation: The weather in Tokyo is sunny and 75°F.
Thought: Now I can provide the answer.
Action: {"tool": "finish", "args": {"answer": "The weather in Tokyo is sunny and 75°F."}}

Each new cycle appends to this history, and the full block is sent to the AI until a final answer is produced or a maximum number of iterations is reached.

Step in Cycle	Included in Prompt Each Turn?	Notes
User Input	Yes	Always included
Thought	Yes	Each new thought is appended
Action	Yes	Each action is appended
Observation	Yes	Each observation is appended
Final Answer	Yes (when present)	Marks the end of the cycle

Key Takeaways

The full previous Thought/Action/Observation blocks are included in the prompt to the AI at each step, unless the system trims or summarizes history to fit within context limits.
This approach ensures the AI has all necessary context to reason effectively and produce high-quality answers.

If you’re building or using a ReAct agent, you can generally expect the model to “see” the entire step-by-step history during its reasoning process, up to the point where technical limits require trimming or summarization.

Ollama ReAct Agent in Python

Let’s create a reasoning agent using Python 3 and any language model from Ollama—like phi4-mini, qwen, or any other LLM. This agent follows the ReAct paradigm, combining thought and action steps to solve problems interactively. It doesn’t require special tool-calling abilities from the LLM; instead, it orchestrates tool usage externally, allowing the model to tap into real-time system information and date/time utilities.

The agent works by:

Accepting user questions or tasks
Reasoning through them step-by-step
Selecting and invoking tools (like get_system_info or get_datetime)
Updating its reasoning based on live observations
Producing final answers grounded in current, factual data

Before you begin, make sure you have Ollama installed and the desired LLM downloaded (e.g., phi4-mini). This setup enables a flexible and expandable framework for interacting with the real world through language and logic.

Full Code

import json
import ollama
import re
import platform
import psutil
from datetime import datetime
import asyncio

# Define your tools
async def get_system_info():
    """Retrieve basic system information.

    Returns:
        dict: A dictionary containing the platform name, OS release version,
              number of CPU cores, and total memory in bytes.
    """

    return {
        "platform": platform.system(),
        "release": platform.release(),
        "cpu_count": psutil.cpu_count(),
        "memory": psutil.virtual_memory().total
    }



async def get_datetime():
    """Get the current date and time.

    Returns:
        str: The current date and time in ISO format.
    """

    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


# Tool registry
TOOLS = {
    "get_system_info": get_system_info,
    "get_datetime": get_datetime
}


tool_descriptions = []
for i, (tool_name, tool_func) in enumerate(TOOLS.items(), 1):
    # Extract function signature and docstring for comprehensive documentation
    import inspect
    sig = inspect.signature(tool_func)
    docstring = inspect.getdoc(tool_func) or "No description available."
    
    # Format the tool description with signature and full docstring
    tool_desc = f"{i}. {tool_name}{sig}\n   {docstring}"
    tool_descriptions.append(tool_desc)

tools_text = "\n\n".join(tool_descriptions)  # Use double newlines for better separation


# System prompt to guide ReAct behavior
SYSTEM_PROMPT = """
    You are a reasoning agent. You can use tools to solve problems step by step.
    Available tools:
    {tools_text}

    Use this format:
    Thought: [your reasoning]
    Action: {{"tool": "tool_name", "args": {{arg1: val1, ...}}}}

    IMPORTANT: After outputting an Action, STOP and wait for the Observation. Do NOT generate the Observation yourself.
    The Observation will be provided to you automatically after the tool executes.

    Only after receiving real Observations can you output a Final Answer.

    When you have enough information, output:
    Final Answer: [your final answer to the user]

    Never generate fake Observations. Only output Thought and Action, then wait.
"""


def is_final_answer(content):
    """Detect if the agent has produced a final answer."""
    return bool(re.search(r"Final Answer\s*:", content, re.IGNORECASE))

def parse_action(content):
    """Extract the Action JSON from the agent's output."""
    for line in content.splitlines():
        if line.strip().startswith("Action:"):
            action_json = line.replace("Action:", "").strip()
            return json.loads(action_json)
    return None


async def run_agent(llm='qwen3:1.7b'):
    user_input = input("User: ")

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT.format(tools_text=tools_text)},
        {"role": "user", "content": user_input}
    ]

    MAX_ITER = 15
    iter_count = 0
    recent_actions = []

    while iter_count < MAX_ITER:
        response = ollama.chat(model=llm, messages=messages)
        content = response['message']['content']
        print(f"\nAgent:\n{content}")

        # Check for final answer
        if is_final_answer(content):
            break

        # Parse Action JSON from the response
        try:
            tool_call = parse_action(content)
            if not tool_call:
                observation = "Observation: No valid Action found."
            else:
                tool_name = tool_call["tool"]
                args = tool_call["args"]

                # Loop detection: stop if repeating the same action 3 times in a row
                recent_actions.append(tool_name)
                if recent_actions[-3:] == [tool_name]*3:
                    observation = f"Observation: Repeated action '{tool_name}' detected. Stopping."
                    print(f"\n{observation}")
                    break

                if tool_name in TOOLS:
                    result = await TOOLS[tool_name](**args)
                    observation = f"Observation: {result}"
                else:
                    observation = f"Observation: Unknown tool '{tool_name}'"
        except Exception as e:
            observation = f"Observation: Error parsing tool call - {e}"

        # Display the observation to follow ReAct format
        print(f"\n{observation}")

        # Append LLM output and observation to the conversation
        messages.append({"role": "assistant", "content": content})
        messages.append({"role": "user", "content": observation})

        iter_count += 1

    if iter_count >= MAX_ITER:
        print("\nAgent stopped: iteration limit reached.")


if __name__ == "__main__":
    asyncio.run(run_agent(llm='phi4-mini'))

Installing the necessary libraries

%pip install ollama psutil

Importing the Libraries

import json
import ollama
import re
import platform
import psutil
from datetime import datetime
import asyncio

Here’s a quick breakdown of what each import does:

json: Handles JSON data—parsing and formatting for config, messaging, or API responses.
ollama: Interacts with local language models; used for chatting with models like phi, qwen, etc.
re: Provides regex (regular expression) operations—great for pattern matching in strings.
platform: Gives system info like OS name and version.
psutil: Lets you access hardware details—CPU count, memory, system performance.
datetime: Manages and formats date/time values.
asyncio: Enables asynchronous programming—useful for efficient I/O operations and concurrent tasks.

Tool Definition Section

The next part of the code sets up two asynchronous tools that your reasoning agent can use to retrieve system information or get the current date and time. Let’s break down what each function does and how the tool registry works.

get_system_info

async def get_system_info():
    """Retrieve basic system information.

    Returns:
        dict: A dictionary containing the platform name, OS release version,
              number of CPU cores, and total memory in bytes.
    """
    return {
        "platform": platform.system(),
        "release": platform.release(),
        "cpu_count": psutil.cpu_count(),
        "memory": psutil.virtual_memory().total
    }

Purpose: Gathers essential information about the computer where the code is running.
What it returns: A dictionary containing:
- platform: The OS name (e.g., ‘Windows’, ‘Linux’, ‘Darwin’ for macOS).
- release: The OS release version.
- cpu_count: Number of CPU cores detected.
- memory: Total physical memory (RAM) in bytes.
How it works: Uses Python’s platform module for OS details and psutil for hardware information.
Async: Declared as async def so it can be used asynchronously in an event loop, improving responsiveness in concurrent scenarios.

get_datetime

async def get_datetime():
    """Get the current date and time.

    Returns:
        str: The current date and time in ISO format.
    """
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

Purpose: Returns a nicely formatted string with the current local date and time.
Format: Standard ISO-like format (e.g., “2025-07-26 09:00:00”).
Async: Even though getting the system time is a quick operation, marking this as async makes tool usage uniform.

Tool Registry

TOOLS = {
    "get_system_info": get_system_info,
    "get_datetime": get_datetime
}

What this does: Creates a dictionary that maps tool names (as string keys) to the actual function objects.
Why it’s useful: Allows your agent to look up and call a tool dynamically by name, supporting scalable expansion if you add more tools in the future.
How it’s used: The agent can select and invoke tools by referencing this registry, enabling flexible reasoning and acting behavior.

Auto-Generating Tool Documentation

tool_descriptions = []
for i, (tool_name, tool_func) in enumerate(TOOLS.items(), 1):
    # Extract function signature and docstring for comprehensive documentation
    import inspect
    sig = inspect.signature(tool_func)
    docstring = inspect.getdoc(tool_func) or "No description available."
    
    # Format the tool description with signature and full docstring
    tool_desc = f"{i}. {tool_name}{sig}\n   {docstring}"
    tool_descriptions.append(tool_desc)

tools_text = "\n\n".join(tool_descriptions)  # Use double newlines for better separation

This block of code creates human-readable documentation for each of your agent’s tools by programmatically extracting information such as function names, inputs, and documentation strings.

1. Prepare the List

tool_descriptions = []

2. Iterate Over Each Tool

for i, (tool_name, tool_func) in enumerate(TOOLS.items(), 1):

Loops through all tools defined in the TOOLS dictionary.
enumerate(..., 1) gives you both a counter (i, starting from 1), and each key-value pair where tool_name is the name (like 'get_system_info') and tool_func is the actual function.

3. Extract Function Details

import inspect
sig = inspect.signature(tool_func)
docstring = inspect.getdoc(tool_func) or "No description available."

Uses Python’s built-in inspect module:
- signature: Get the function’s argument list.
- getdoc: Pull the function’s docstring. If the function doesn’t have one, it uses “No description available.”

4. Format Each Tool’s Description

tool_desc = f"{i}. {tool_name}{sig}\n   {docstring}"
tool_descriptions.append(tool_desc)

Combines the tool’s number, name, its signature (what parameters it accepts), and the docstring (the function’s own description and return value info).
Adds this formatted string to the list.

5. Combine Into a Single Text Block

tools_text = "\n\n".join(tool_descriptions)

Joins all the individual tool descriptions into a single string, inserting double newlines for readability.

This section generates a neat, numbered reference guide of all available tools and their documentation. By doing this automatically, any new tool you add to the TOOLS dictionary is instantly documented for the agent or any human reader—no manual updates required. This greatly improves transparency, usability for users, and maintainability for developers.

System Prompt

# System prompt to guide ReAct behavior
SYSTEM_PROMPT = """
    You are a reasoning agent. You can use tools to solve problems step by step.
    Available tools:
    {tools_text}

    Use this format:
    Thought: [your reasoning]
    Action: {{"tool": "tool_name", "args": {{arg1: val1, ...}}}}

    IMPORTANT: After outputting an Action, STOP and wait for the Observation. Do NOT generate the Observation yourself.
    The Observation will be provided to you automatically after the tool executes.

    Only after receiving real Observations can you output a Final Answer.

    When you have enough information, output:
    Final Answer: [your final answer to the user]

    Never generate fake Observations. Only output Thought and Action, then wait.
"""

This prompt instructs the agent to reason through problems by thinking step-by-step and using external tools when needed. It follows the ReAct format:

Thought: The agent explains its reasoning.
Action: It selects a tool and provides input arguments.
Observation: Waits for real output from the tool before continuing.
Final Answer: Once enough info is gathered, the agent responds with a complete answer.

The agent must not make up observations or jump ahead. It only proceeds once actual data from the tool is received.

Final Answer Checking and Action Parsing

def is_final_answer(content):
    """Detect if the agent has produced a final answer."""
    return bool(re.search(r"Final Answer\s*:", content, re.IGNORECASE))

def parse_action(content):
    """Extract the Action JSON from the agent's output."""
    for line in content.splitlines():
        if line.strip().startswith("Action:"):
            action_json = line.replace("Action:", "").strip()
            return json.loads(action_json)
    return None

is_final_answer(content):
Checks if the string content contains “Final Answer:” (case-insensitive), indicating the agent is ready to output its final response.
parse_action(content):
Goes through each line of content to find one that starts with “Action:”. If found, it parses the rest of that line as JSON and returns it (representing a tool call). If not found, returns None.

Main Agent Loop

This function runs the core reasoning loop, where the agent interacts with the user and uses tools to solve the problem step by step.

async def run_agent(llm='qwen3:1.7b'):
    user_input = input("User: ")

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT.format(tools_text=tools_text)},
        {"role": "user", "content": user_input}
    ]

    MAX_ITER = 15
    iter_count = 0
    recent_actions = []

    while iter_count < MAX_ITER:
        response = ollama.chat(model=llm, messages=messages)
        content = response['message']['content']
        print(f"\nAgent:\n{content}")

        # Check for final answer
        if is_final_answer(content):
            break

        # Parse Action JSON from the response
        try:
            tool_call = parse_action(content)
            if not tool_call:
                observation = "Observation: No valid Action found."
            else:
                tool_name = tool_call["tool"]
                args = tool_call["args"]

                # Loop detection: stop if repeating the same action 3 times in a row
                recent_actions.append(tool_name)
                if recent_actions[-3:] == [tool_name]*3:
                    observation = f"Observation: Repeated action '{tool_name}' detected. Stopping."
                    print(f"\n{observation}")
                    break

                if tool_name in TOOLS:
                    result = await TOOLS[tool_name](**args)
                    observation = f"Observation: {result}"
                else:
                    observation = f"Observation: Unknown tool '{tool_name}'"
        except Exception as e:
            observation = f"Observation: Error parsing tool call - {e}"

        # Display the observation to follow ReAct format
        print(f"\n{observation}")

        # Append LLM output and observation to the conversation
        messages.append({"role": "assistant", "content": content})
        messages.append({"role": "user", "content": observation})

        iter_count += 1

    if iter_count >= MAX_ITER:
        print("\nAgent stopped: iteration limit reached.")

User Input & Setup
- Prompts the user for input.
- Sets up the initial chat context with a system prompt (which includes tool documentation and usage instructions) and the user’s question.
Loop Control
- Uses a maximum iteration counter (MAX_ITER) to prevent infinite loops, and tracks how many iterations have occurred.
Agent Interaction Loop
- For up to MAX_ITER times:
  - Agent Response: The agent processes the conversation using ollama.chat and generates a response (usually a Thought or an Action).
  - Final Answer Check: If the response contains a “Final Answer:”, the loop exits—this means the agent believes it’s solved the user’s question.
  - Tool Call Handling:
    - Attempts to parse an Action (a tool call with arguments) from the agent’s message.
    - If no valid Action is found, returns an observation noting the issue.
    - If an Action is found:
      - Loop Detection: Checks if the same tool is called three times in a row—if so, it stops, considering the agent is stuck.
      - Tool Execution: If the tool exists, it is called asynchronously with any provided arguments, and the result is formatted as an Observation. If the tool does not exist, a relevant error is noted.
    - Handles any errors in parsing or execution by returning an appropriate error observation.
  - Observation & Memory: Shows the Observation, and appends both the agent’s output and the observation to the messages list (maintains conversation memory).
  - Repeat: Increments iteration count and loops unless finished.
Termination
- If the agent hits the iteration limit without reaching a final answer, it notifies that it stopped due to too many steps.

Summary:
This loop carefully orchestrates the agent’s step-by-step reasoning:

Alternates between agent thinking and tool use,
Checks for completion or stuck situations,
Records the full conversation history,
And ensures the agent cannot run indefinitely or get stuck in a repetitive cycle.

This design is robust and suitable for complex, multi-step problem-solving agents.Certainly! Here’s a clear, concise explanation suitable for your blog:

Calling the Main Loop

This last section ensures the agent only runs when the script is executed directly (not imported as a module). It starts the asynchronous main loop by calling run_agent() with the ‘phi4-mini’ language model. This line effectively launches the interactive reasoning agent for user input and conversation.

if __name__ == "__main__":
    asyncio.run(run_agent(llm='phi4-mini'))

Running the Agent

python3 agent.py

User: What is the current date and time?

Agent:
Thought: To answer the user's question, I need to find out the current date and time as reported by the system. I have a tool specifically for this purpose.

Action: {"tool": "get_datetime", "args": {}}

Observation: 2025-07-26 09:00:45

Agent:
Thought: The observation gives me the current date and time as returned by the system tool.

Final Answer: The current date and time is 2025-07-26 09:00:45.

Conclusion

This article walks through building a ReAct-style agent using Python 3 and any language model from Ollama. What sets this approach apart is its incredible flexibility: the agent can use any Python function as a tool and any LLM to power the agent, which means you can create agents for virtually any domain or task.

Because the reasoning and tool-calling logic is orchestrated outside the language model, you’re not limited to models that natively support tool usage. Instead, you define how the agent thinks (using the ReAct format) and how it interacts with tools—whether that’s fetching the weather, querying a database, analyzing files, or controlling a robot arm. If you can write a function for it in Python, the agent can use it.

Why it’s versatile:

Works with any LLM you load into Ollama (e.g. phi4-mini, deepseek-r1, etc.)
Plug-and-play tools: Just write Python functions and register them
Clear reasoning flow (Thought → Action → Observation → Final Answer)
No need for special tool-calling capabilities in the model itself

This makes it ideal for developers who want to build agents tailored to specific real-world tasks—whether you’re automating workflows, handling customer service queries, or building a personal assistant.