Phantom is a powerful, multi-modal AI assistant framework that combines multiple Large Language Models (LLMs) with advanced automation tools for code generation, web browsing, GUI automation, and computer control.
- Multi-LLM Support: OpenAI, Gemini, Anthropic Claude, Groq
- Code Generation: Automatic Python code generation and execution with error recovery
- Cross-Platform Computer Control: Powered by Orgo - works on Windows, macOS, and Linux
- GUI Automation: Advanced computer control using AgentS2 and OSWorldACI
- Web Search: Intelligent web search and information gathering
- File Operations: Read, write, and manipulate files
- Threaded Execution: Fast, non-blocking operations with progress indicators
- Error Recovery: Automatic code debugging and fixing by AI
- Modular Architecture: Easily customizable tools and LLM configurations
- Python 3.8 or higher
- Cross-platform support: Windows, macOS, Linux (thanks to Orgo)
-
Clone the repository:
git clone https://github.com/your-username/phantom.git cd phantom -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env
Edit the
.envfile with your API keys (see API Keys section below).
Copy .env.example to .env and configure the following:
# MANDATORY for cross-platform computer control via Orgo
ORGO_API_KEY=your_orgo_api_key_here
# MANDATORY for advanced GUI automation
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# CHOOSE based on your preferred LLM
GEMINI_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
GROQ_API_KEY=your_groq_api_key_here- Anthropic Claude: https://console.anthropic.com/
- Google Gemini: https://aistudio.google.com/app/apikey
- OpenAI: https://platform.openai.com/api-keys
- Groq: https://console.groq.com/keys
- Orgo: https://orgo.ai/ β Cross-platform computer control (Windows/macOS/Linux)
For basic functionality, you need:
- At least one LLM API key (Gemini, OpenAI, Groq, or Anthropic)
- ORGO_API_KEY β Essential for cross-platform computer control
- ANTHROPIC_API_KEY (for advanced GUI automation)
Simply run the main script:
python main.pyThis will start Phantom with the default configuration using Gemini as the main LLM.
Once running, you can ask Phantom to:
# Code generation and automation
>>> Create a Python script to organize my desktop files by extension
# Web search and research
>>> Search for the latest stock price of tesla
# GUI automation
>>> Open Notepad and write "Hello World"
# File operations
>>> Read the contents of config.txt and explain what it does
# Complex tasks
>>> Create a web scraper for news headlines and save to CSV
Edit main.py to use a different LLM:
# Using OpenAI instead of Gemini
llm = Openai('gpt-4')
# Using Anthropic Claude
llm = Anthropic('claude-3-sonnet')
# Using Groq
llm = GroqLLM('llama-3.1-70b-versatile')Add or remove tools in main.py:
# Basic setup
tools = [
CodesmithTool(llm_instance=Gemini()),
WebSearchTool(),
ReadFileTool()
]
# Advanced setup with GUI automation
tools = [
CodesmithTool(llm_instance=Gemini()),
WebSearchTool(),
Orgotool(Computer=Computer(api_key=environ.get('ORGO_API_KEY')), agent=agent_s2),
ReadFileTool()
]For advanced GUI automation, configure AgentS2:
# Configure different LLMs for different components
engine_params = {"engine_type": "openai", "model": "gpt-4"}
grounding_params = {"engine_type": "anthropic", "model": "claude-3-sonnet"}
grounding_agent = OSWorldACI(
platform='windows',
engine_params_for_generation=engine_params,
engine_params_for_grounding=grounding_params
)
agent_s2 = AgentS2(
engine_params=engine_params,
grounding_agent=grounding_agent,
platform="windows",
action_space="pyautogui",
observation_type="screenshot"
)- Purpose: Python code generation and execution
- Features: Automatic error recovery, module installation, threaded execution
- Usage: "Create a script to...", "Write code that..."
- Purpose: Web search and information gathering
- Features: Real-time web search, content extraction
- Usage: "Search for...", "Find information about..."
- Purpose: Cross-platform computer and GUI automation
- Features: Click, type, screenshot, app control on Windows/macOS/Linux
- Orgo Integration: Universal computer control API that works seamlessly across all operating systems
- Usage: "Open Chrome", "Click the submit button", "Take a screenshot", "Control my desktop"
- Purpose: File system operations
- Features: Read files, directory listing
- Usage: "Read the file...", "Show me the contents of..."
>>> Create a simple website about Artificial Intelligence
>>> make a calculator in calculator.py file
>>> Research the latest AI developments and create a summary
>>> Find Python libraries for data visualization
>>> Compare different cloud hosting providers
phantom/
βββ app.py # Main PhantomAssistant class
βββ codesmith.py # Code generation and execution
βββ tool.py # Base tool interface
βββ func/ # Function tools
β βββ automation.py # Basic automation functions
β βββ orgotool.py # Advanced GUI automation
β βββ websearch.py # Web search capabilities
βββ llms/ # LLM interfaces
β βββ openaillm.py # OpenAI integration
β βββ genai.py # Gemini integration
β βββ anthropicllm.py # Claude integration
β βββ groqllm.py # Groq integration
βββ prompts/ # System prompts
βββ base.md # Main orchestrator prompt
βββ codesmith.md # Code generation prompt
| Variable | Required | Purpose |
|---|---|---|
ORGO_API_KEY |
YES | β Cross-platform computer control (Windows/macOS/Linux) |
ANTHROPIC_API_KEY |
YES | Claude LLM + Advanced GUI automation |
GEMINI_API_KEY |
Optional* | Google Gemini LLM |
OPENAI_API_KEY |
Optional* | OpenAI GPT models |
GROQ_API_KEY |
Optional* | Groq LLM inference |
*At least one LLM API key is required
Configure specific models in your setup:
# High-performance setup
llm = Openai('gpt-4-turbo')
codesmith_llm = Gemini('gemini-2.0-flash-lite-001')
# Cost-effective setup
llm = GroqLLM('llama-3.1-70b-versatile')
codesmith_llm = Gemini('gemini-2.5-flash-lite')
# Balanced setup
llm = Anthropic('claude-3-sonnet')
codesmith_llm = Gemini('gemini-2.0-flash-lite-001')-
"No API key found"
- Ensure your
.envfile is in the root directory - Check that API keys are correctly formatted
- Verify the API key is valid and has credits
- Ensure your
-
"Module not found"
- Phantom will automatically prompt to install missing modules
- Ensure you have pip permissions for installations
-
GUI automation not working
- Verify ORGO_API_KEY is set (required for all computer control)
- Verify ANTHROPIC_API_KEY is set (for advanced GUI features)
- Orgo works on Windows, macOS, and Linux
- Ensure screen resolution is supported
- Check Orgo service status at orgo.ai
-
Slow execution
- Threading is enabled by default for better performance
- Consider using faster models like Groq for code generation
- Check your internet connection for API calls
Enable verbose logging by modifying the console output in the code or using a more detailed LLM model.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- β Powered by Orgo for universal cross-platform computer control
- Built with Rich for beautiful terminal UI
- Uses AgentS2 for advanced GUI automation
- Integrates multiple LLM providers for maximum flexibility
Happy Automating! π€β¨
