This guide shows you how to run the Data Preparation Agent using Docker.
- Go to https://makersuite.google.com/app/apikey
- Sign in with your Google account
- Click "Create API Key"
- Copy your key (looks like:
AIzaSyC_xxxxxxxxxxxxxxxxx)
docker pull ghcr.io/emergenceai/em-data-preparation-agent:latestOption A: Enter API key later via UI (Easiest)
docker run -d -p 8000:8000 --name data-prep-agent ghcr.io/emergenceai/em-data-preparation-agent:latestOption B: Provide API key at startup
docker run -d \
--name data-prep-agent \
-p 8000:8000 \
-e GEMINI_API_KEY="your-gemini-api-key-here" \
-v $(pwd)/data:/app/data \
ghcr.io/emergenceai/em-data-preparation-agent:latestThat's it! Open your browser to http://localhost:8000
π‘ If you didn't provide the API key, you'll be prompted to enter it in the UI when you first use the app.
IMPORTANT - Please read before using:
- Your uploaded Excel files are sent to Google Gemini API for AI-powered analysis
- File contents and your transformation requests are processed on Google's servers
- No data is stored by us - everything stays local except API calls to Google
- Review Google Gemini API Privacy Policy
- AI Code Execution: This tool generates and executes Python code. Only process trusted files.
- Code Obfuscation: The application code is obfuscated and cannot be audited
- API Key Security: Your Gemini API key stays in your local environment
- Internet Required: Active internet connection needed for AI processing
π Read the full Terms of Use for complete legal details, especially regarding third-party LLM provider data handling.
Before using this software, please review:
- Terms of Use β Legal agreement governing your use of this software
- Security Policy β How to report vulnerabilities and security best practices
By downloading or using the Data Preparation Agent, you agree to the Terms of Use.
β
Docker - Download here
β
Gemini API Key
β
4 GB RAM minimum (8 GB recommended)
β
10GB disk space
Once the container starts:
- Open your browser to http://localhost:8000
- Upload your Excel file (drag & drop)
- System will auto detect all the tables and show
- You can clean or optimize the data using the transformation options
- Describe what you want - e.g., "Get the top 10 customers by revenue"
- Review and confirm the transformation plan
- View clean CSV files ready for analysis!
Features:
- π Automatically detects tables in messy Excel sheets
- π€ Emergence GEN-AI-powered task transformation.
- β Preview and approve before transforming
- πΎ Export to clean CSV files
docker psdocker logs data-prep-agent -fdocker stop data-prep-agentdocker start data-prep-agentdocker rm -f data-prep-agentAll your files are saved in the ./data folder on your computer:
./data/
βββ uploads/ # Your uploaded Excel files
βββ temp_processing/ # Temporary working files
βββ outputs/ # Transformed CSV outputs
If you prefer Docker Compose for easier management:
1. Create .env file:
echo 'GEMINI_API_KEY="your-gemini-api-key-here"' > .env2. Create docker-compose.yml:
version: '3.8'
services:
data-prep-agent:
image: ghcr.io/emergenceai/em-data-preparation-agent:latest
container_name: data-prep-agent
ports:
- "8000:8000"
environment:
- GEMINI_API_KEY=${GEMINI_API_KEY} # Optional - can also be entered in UI
volumes:
- ./data:/app/data
restart: unless-stoppedπ‘ Note: The GEMINI_API_KEY environment variable is optional. If not provided, you'll be prompted to enter it in the UI.
3. Run:
docker-compose up -d # Start
docker-compose logs -f # View logs
docker-compose down # StopError: Container exits immediately after starting
Solution:
# Check logs for error message
docker logs data-prep-agent
# Note: API key is optional - you can enter it in the UI
# Only set GEMINI_API_KEY if you want to pre-configure itError: Large error message about missing API key
Solution:
- Verify your .env file exists and contains the correct key
- Check that you're passing the environment variable correctly:
docker run -e GEMINI_API_KEY="your-actual-key" ...
Error: Authentication failures when making API calls
Solution:
- Verify your Gemini API key is valid at https://makersuite.google.com/app/apikey
- Check if you've exceeded API quota (unlikely with free tier)
- Ensure no extra spaces or quotes in the API key
Error: Bind for 0.0.0.0:8000 failed: port is already allocated
Solution:
# Use different ports
docker run -d \
-p 8001:8000 \ # Changed from 8000
--name data-prep-agent \
ghcr.io/emergenceai/em-data-preparation-agent:latestDiagnosis:
# Check if container is running
docker ps
# Check logs for errors
docker logs data-prep-agent
# Test backend API directly
curl http://localhost:8000/healthSolution:
- If backend responds but UI doesn't load, wait 30-60 seconds for startup
- Check if port 8000 is accessible from your browser
- Try accessing http://127.0.0.1:8000 instead of localhost
-
Never commit .env files to git
echo ".env" >> .gitignore
-
Use environment variable
# Set in your shell profile or systemd service export GEMINI_API_KEY="your-key"
-
Restrict file permissions
chmod 600 .env # Only owner can read/write
For production deployments:
# Run on private network
docker network create em-private
docker run --network em-private ...
# Use reverse proxy (nginx, traefik) for SSL
# Expose only ports 443 and 80 to internetDon't:
- β Share your API key publicly
- β Commit it to GitHub
- β Post it in screenshots
Do:
- β
Use
.envfiles (more secure than command line) - β
Keep
.envfiles out of version control - β Regenerate your key if accidentally exposed
- First time? Start with a small Excel file to test
- Sample files? Check the
data/sample_files/folder for example Excel files to try - Big files? The app can handle up to 50MB Excel files.
- Multiple tables? The AI detects and processes them separately
- Need help? Check the logs with
docker logs data-prep-agent -f
Thank you for your interest in the Data Preparation Agent. This repository provides container access and documentation for evaluation and integration purposes. Support is provided on a best-effort basis.
If you believe you have found a reproducible issue, please open a GitHub Issue using the Bug Report template and include:
- Container version
- Deployment environment (OS, cloud, runtime)
- Steps to reproduce
- Relevant logs or error messages (sanitized)
Issues that cannot be reproduced or lack sufficient information may be closed.
Feature requests and enhancement suggestions are welcome. Please open a GitHub Issue using the Feature Request template and describe:
- The problem you are trying to solve
- The expected behavior
- Any relevant context or constraints
Feature requests are reviewed periodically. Implementation is not guaranteed.
Please review the documentation in this repository before opening an issue. If your question relates to architecture, enterprise integration, or production deployment, please contact us directly.
Please do not report security vulnerabilities through public GitHub issues.
Instead, email us at: security@emergence.ai
See our Security Policy for full details on reporting vulnerabilities.
- Issues are reviewed periodically
- Response times are not guaranteed
- Support via GitHub is limited to reproducible defects and documented behavior
For enterprise deployments, production use cases, or integration discussions, please contact: support@emergence.ai
Last Updated: February 2026 Version: Latest Legal: Usage subject to Terms of Use