Tools, Actions, and Function Calling
Overview
Tools are the primary mechanism through which AI agents interact with external systems and perform actions beyond text generation. Function calling enables LLMs to generate structured outputs that can be used to invoke external functions [13].
Function Calling
Function calling allows models to generate structured JSON outputs that match predefined schemas, enabling reliable integration with external systems.
How It Works
- Define function schemas with parameters and descriptions
- Send schemas to the model along with user input
- Model generates structured JSON matching the schema
- Application parses JSON and executes the function
- Results are sent back to the model for further processing
File System Tools
File system tools enable agents to read, write, and manipulate files, providing persistent storage and the ability to work with documents and data.
| Operation | Description | Use Case |
|---|---|---|
| Read | Read file contents | Analyzing documents, loading data |
| Write | Create or overwrite files | Saving results, generating reports |
| Append | Add content to existing files | Logging, incremental updates |
| List | List directory contents | Exploring file structures |
| Search | Find files matching patterns | Locating relevant files |
Web Search / Deep Research
Web search tools enable agents to access real-time information from the internet [14].
Capabilities
- General Search: Query search engines for information
- News Search: Access recent news articles
- Academic Search: Find research papers and publications
- Site-Specific Search: Search within specific domains
Deep Research Pattern
Deep research involves iterative search and synthesis:
- Initial broad search to understand the topic
- Identify key subtopics and questions
- Targeted searches for each subtopic
- Cross-reference and validate information
- Synthesize findings into coherent output
Code Interpreter
Code interpreter tools allow agents to write and execute code in a sandboxed environment [15].
Capabilities
- Data Analysis: Process and analyze datasets
- Visualization: Generate charts and graphs
- File Processing: Convert, transform, and manipulate files
- Mathematical Computation: Perform complex calculations
- Code Generation: Write and test code snippets
Supported Languages
| Language | Primary Use Cases |
|---|---|
| Python | Data analysis, ML, general scripting |
| JavaScript | Web development, JSON processing |
| Bash | System operations, file manipulation |
Computer Use
Computer use tools enable agents to interact with graphical user interfaces [16].
Capabilities
- Screen Capture: View current screen state
- Mouse Control: Click, drag, scroll
- Keyboard Input: Type text, use shortcuts
- Window Management: Open, close, resize windows
Use Cases
- Automating repetitive GUI tasks
- Testing web applications
- Interacting with legacy systems
- Data entry and form filling
Shell
Shell tools provide command-line access for system operations [17].
Capabilities
| Category | Operations |
|---|---|
| File Operations | cp, mv, rm, mkdir, chmod |
| Text Processing | grep, sed, awk, sort, uniq |
| Network | curl, wget, ping, ssh |
| Process Management | ps, kill, top, nohup |
| Package Management | apt, pip, npm, cargo |
Security Considerations
- Run in sandboxed environments
- Limit available commands
- Validate all inputs
- Monitor and log all executions
- Set resource limits (CPU, memory, time)
Tool Integration Patterns
Single Tool Invocation
# Simple tool call
result = agent.call_tool("search", {"query": "AI agents"})
print(result)Tool Chaining
# Chain multiple tools
search_results = agent.call_tool("search", {"query": "Python best practices"})
analysis = agent.call_tool("code_interpreter", {
"code": f"analyze_text('{search_results}')"
})
agent.call_tool("file_write", {
"path": "analysis.md",
"content": analysis
})