this post was submitted on 20 Feb 2025
1 points (100.0% liked)

Self-Hosted Alternatives to Popular Services

224 readers
2 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/Electrical-Two9833 on 2025-02-19 22:46:18+00:00.


If you deal with documents and images and want to save time on parsing, analyzing, or describing them, PyVisionAI is for you. It unifies multiple Vision LLMs (GPT-4 Vision, Claude Vision, or local Llama2-based models) under one workflow, so you can extract text and images from PDF, DOCX, PPTX, and HTML—even capturing fully rendered web pages—and generate human-like explanations for images or diagrams.

Why It’s Useful

  • All-in-One: Handle text extraction and image description across various file types—no juggling separate scripts or libraries.
  • Flexible: Go with cloud-based GPT-4/Claude for speed, or local Llama models for privacy.
  • CLI & Python Library: Use simple terminal commands or integrate PyVisionAI right into your Python projects.
  • Multiple OS Support: Works on macOS (via Homebrew), Windows, and Linux (via pip).
  • No More Dependency Hassles: On macOS, just run one Homebrew command (plus a couple optional installs if you need advanced features).

Quick macOS Setup (Homebrew)

brew tap mdgrey33/pyvisionai
brew install pyvisionai

# Optional: Needed for dynamic HTML extraction
playwright install chromium

# Optional: For Office documents (DOCX, PPTX)
brew install --cask libreoffice

This leverages Python 3.11+ automatically (as required by the Homebrew formula). If you’re on Windows or Linux, you can install via pip install pyvisionai (Python 3.8+).

Core Features (Confirmed by the READMEs)

  1. Document Extraction
    • PDFs, DOCXs, PPTXs, HTML (with JS), and images are all fair game.
    • Extract text, tables, and even generate screenshots of HTML.
  2. Image Description
    • Analyze diagrams, charts, photos, or scanned pages using GPT-4, Claude, or a local Llama model via Ollama.
    • Customize your prompts to control the level of detail.
  3. CLI & Python API
    • CLI: file-extract for documents, describe-image for images.
    • Python: create_extractor(...) to handle large sets of files; describe_image_* functions for quick references in code.
  4. Performance & Reliability
    • Parallel processing, thorough logging, and automatic retries for rate-limited APIs.
    • Test coverage sits above 80%, so it’s stable enough for production scenarios.

Sample Code

from pyvisionai import create_extractor, describe_image_claude

# 1. Extract content from PDFs
extractor = create_extractor("pdf", model="gpt4")  # or "claude", "llama"
extractor.extract("quarterly_reports/", "analysis_out/")

# 2. Describe an image or diagram
desc = describe_image_claude(
    "circuit.jpg",
    prompt="Explain what this circuit does, focusing on the components"
)
print(desc)

Choose Your Model

  • Cloud:export OPENAI_API_KEY="your-openai-key" # GPT-4 Vision export ANTHROPIC_API_KEY="your-anthropic-key" # Claude Vision
  • Local:brew install ollama ollama pull llama2-vision # Then run: describe-image -i diagram.jpg -u llama

System Requirements

  • macOS (Homebrew install): Python 3.11+
  • Windows/Linux: Python 3.8+ via pip install pyvisionai
  • 1GB+ Free Disk Space (local models may require more)

Want More?

Help Shape the Future of PyVisionAI

If there’s a feature you need—maybe specialized document parsing, new prompt templates, or deeper local model integration—please ask or open a feature request on GitHub. I want PyVisionAI to fit right into your workflow, whether you’re doing academic research, business analysis, or general-purpose data wrangling.

Give it a try and share your ideas! I’d love to know how PyVisionAI can make your work easier.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here