Self-Hosted Alternatives to Popular Services

224 readers

2 users here now

A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web...

founded 2 years ago

MODERATORS

bot@lemmit.online

PyVisionAI: Instantly Extract & Describe Content from Documents with Vision LLMs(Now with Claude and homebrew) (old.reddit.com)

submitted 6 months ago by bot@lemmit.online to c/selfhosted@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/Electrical-Two9833 on 2025-02-19 22:46:18+00:00.

If you deal with documents and images and want to save time on parsing, analyzing, or describing them, PyVisionAI is for you. It unifies multiple Vision LLMs (GPT-4 Vision, Claude Vision, or local Llama2-based models) under one workflow, so you can extract text and images from PDF, DOCX, PPTX, and HTML—even capturing fully rendered web pages—and generate human-like explanations for images or diagrams.

Why It’s Useful

All-in-One: Handle text extraction and image description across various file types—no juggling separate scripts or libraries.
Flexible: Go with cloud-based GPT-4/Claude for speed, or local Llama models for privacy.
CLI & Python Library: Use simple terminal commands or integrate PyVisionAI right into your Python projects.
Multiple OS Support: Works on macOS (via Homebrew), Windows, and Linux (via pip).
No More Dependency Hassles: On macOS, just run one Homebrew command (plus a couple optional installs if you need advanced features).

Quick macOS Setup (Homebrew)

brew tap mdgrey33/pyvisionai
brew install pyvisionai

# Optional: Needed for dynamic HTML extraction
playwright install chromium

# Optional: For Office documents (DOCX, PPTX)
brew install --cask libreoffice

This leverages Python 3.11+ automatically (as required by the Homebrew formula). If you’re on Windows or Linux, you can install via pip install pyvisionai (Python 3.8+).

Core Features (Confirmed by the READMEs)

Document Extraction
- PDFs, DOCXs, PPTXs, HTML (with JS), and images are all fair game.
- Extract text, tables, and even generate screenshots of HTML.
Image Description
- Analyze diagrams, charts, photos, or scanned pages using GPT-4, Claude, or a local Llama model via Ollama.
- Customize your prompts to control the level of detail.
CLI & Python API
- CLI: file-extract for documents, describe-image for images.
- Python: create_extractor(...) to handle large sets of files; describe_image_* functions for quick references in code.
Performance & Reliability
- Parallel processing, thorough logging, and automatic retries for rate-limited APIs.
- Test coverage sits above 80%, so it’s stable enough for production scenarios.

Sample Code

from pyvisionai import create_extractor, describe_image_claude

# 1. Extract content from PDFs
extractor = create_extractor("pdf", model="gpt4")  # or "claude", "llama"
extractor.extract("quarterly_reports/", "analysis_out/")

# 2. Describe an image or diagram
desc = describe_image_claude(
    "circuit.jpg",
    prompt="Explain what this circuit does, focusing on the components"
)
print(desc)

Choose Your Model

Cloud:export OPENAI_API_KEY="your-openai-key" # GPT-4 Vision export ANTHROPIC_API_KEY="your-anthropic-key" # Claude Vision
Local:brew install ollama ollama pull llama2-vision # Then run: describe-image -i diagram.jpg -u llama

System Requirements

macOS (Homebrew install): Python 3.11+
Windows/Linux: Python 3.8+ via pip install pyvisionai
1GB+ Free Disk Space (local models may require more)

Want More?

Official Site: pyvisionai.com
GitHub: MDGrey33/pyvisionai – open issues or PRs if you spot bugs!
Docs: Full README & Usage
Homebrew Formula: mdgrey33/homebrew-pyvisionai

Help Shape the Future of PyVisionAI

If there’s a feature you need—maybe specialized document parsing, new prompt templates, or deeper local model integration—please ask or open a feature request on GitHub. I want PyVisionAI to fit right into your workflow, whether you’re doing academic research, business analysis, or general-purpose data wrangling.

Give it a try and share your ideas! I’d love to know how PyVisionAI can make your work easier.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here