Tools, Actions, and Function Calling

Overview

Tools are the primary mechanism through which AI agents interact with external systems and perform actions beyond text generation. Function calling enables LLMs to generate structured outputs that can be used to invoke external functions [13].

Function Calling

Function calling allows models to generate structured JSON outputs that match predefined schemas, enabling reliable integration with external systems.

How It Works

  1. Define function schemas with parameters and descriptions
  2. Send schemas to the model along with user input
  3. Model generates structured JSON matching the schema
  4. Application parses JSON and executes the function
  5. Results are sent back to the model for further processing

File System Tools

File system tools enable agents to read, write, and manipulate files, providing persistent storage and the ability to work with documents and data.

OperationDescriptionUse Case
ReadRead file contentsAnalyzing documents, loading data
WriteCreate or overwrite filesSaving results, generating reports
AppendAdd content to existing filesLogging, incremental updates
ListList directory contentsExploring file structures
SearchFind files matching patternsLocating relevant files

Web Search / Deep Research

Web search tools enable agents to access real-time information from the internet [14].

Capabilities

  • General Search: Query search engines for information
  • News Search: Access recent news articles
  • Academic Search: Find research papers and publications
  • Site-Specific Search: Search within specific domains

Deep Research Pattern

Deep research involves iterative search and synthesis:

  1. Initial broad search to understand the topic
  2. Identify key subtopics and questions
  3. Targeted searches for each subtopic
  4. Cross-reference and validate information
  5. Synthesize findings into coherent output

Code Interpreter

Code interpreter tools allow agents to write and execute code in a sandboxed environment [15].

Capabilities

  • Data Analysis: Process and analyze datasets
  • Visualization: Generate charts and graphs
  • File Processing: Convert, transform, and manipulate files
  • Mathematical Computation: Perform complex calculations
  • Code Generation: Write and test code snippets

Supported Languages

LanguagePrimary Use Cases
PythonData analysis, ML, general scripting
JavaScriptWeb development, JSON processing
BashSystem operations, file manipulation

Computer Use

Computer use tools enable agents to interact with graphical user interfaces [16].

Capabilities

  • Screen Capture: View current screen state
  • Mouse Control: Click, drag, scroll
  • Keyboard Input: Type text, use shortcuts
  • Window Management: Open, close, resize windows

Use Cases

  • Automating repetitive GUI tasks
  • Testing web applications
  • Interacting with legacy systems
  • Data entry and form filling

Shell

Shell tools provide command-line access for system operations [17].

Capabilities

CategoryOperations
File Operationscp, mv, rm, mkdir, chmod
Text Processinggrep, sed, awk, sort, uniq
Networkcurl, wget, ping, ssh
Process Managementps, kill, top, nohup
Package Managementapt, pip, npm, cargo

Security Considerations

  • Run in sandboxed environments
  • Limit available commands
  • Validate all inputs
  • Monitor and log all executions
  • Set resource limits (CPU, memory, time)

Tool Integration Patterns

Single Tool Invocation

# Simple tool call
result = agent.call_tool("search", {"query": "AI agents"})
print(result)

Tool Chaining

# Chain multiple tools
search_results = agent.call_tool("search", {"query": "Python best practices"})
analysis = agent.call_tool("code_interpreter", {
    "code": f"analyze_text('{search_results}')"
})
agent.call_tool("file_write", {
    "path": "analysis.md",
    "content": analysis
})

References

  1. OpenAI Function Calling
  2. OpenAI Deep Research
  3. OpenAI Code Interpreter
  4. OpenAI Computer-Using Agent
  5. OpenAI Shell Tool