Enable image and visual search with CLIP embeddings in Memvid
Memvid supports CLIP (Contrastive Language-Image Pre-training) embeddings for visual search. This enables searching documents, PDFs, and images by visual content, including charts, diagrams, photos, and visual elements using natural language queries.
The default provider uses MobileCLIP-S2, a lightweight CLIP model optimized for mobile and edge devices.Characteristics:
Dimensions: 512
Size: ~200 MB (downloaded on first use)
Inference: CPU-based, no GPU required
Privacy: All processing happens locally
Offline: Works without internet after initial download
from memvid_sdk.clip import get_clip_provider, LocalClip# Using factoryclip = get_clip_provider('local')# Or direct instantiationclip = LocalClip(model='mobileclip-s2')# Embed an imageimage_embedding = clip.embed_image('photo.jpg')# Embed text for searchtext_embedding = clip.embed_text('sunset over ocean')# Batch embed multiple imagesembeddings = clip.embed_images(['img1.jpg', 'img2.jpg', 'img3.jpg'])
import { getClipProvider, LocalClip } from '@memvid/sdk';// Using factoryconst clip = getClipProvider('local');// Or direct instantiationconst clip = new LocalClip({ model: 'mobileclip-s2' });// Embed an imageconst imageEmbedding = await clip.embedImage('photo.jpg');// Embed text for searchconst textEmbedding = await clip.embedText('sunset over ocean');// Batch embed multiple imagesconst embeddings = await clip.embedImages(['img1.jpg', 'img2.jpg', 'img3.jpg']);
Local CLIP is supported in memvid-core and the Python SDK. In Node.js, LocalClip requires a native build that exports ClipModel (the prebuilt npm binaries may not include it). Cloud providers work out of the box.
OpenAI’s embedding models provide excellent quality for visual search queries.Setup:
export OPENAI_API_KEY=sk-your-key-here
Usage:
from memvid_sdk.clip import get_clip_provider, OpenAIClip# Using factoryclip = get_clip_provider('openai')# Or with specific modelclip = get_clip_provider('openai:text-embedding-3-large')# Direct instantiationclip = OpenAIClip(model='text-embedding-3-small')# Embed text for visual searchembedding = clip.embed_text('executive team photo')print(f"Dimensions: {len(embedding)}")
import { getClipProvider, OpenAIClip } from '@memvid/sdk';// Using factoryconst clip = getClipProvider('openai');// Override embedding/vision models via configconst clip2 = getClipProvider('openai', { embeddingModel: 'text-embedding-3-large', visionModel: 'gpt-4o-mini' });// Direct instantiationconst clip3 = new OpenAIClip({ embeddingModel: 'text-embedding-3-small', visionModel: 'gpt-4o-mini' });// Embed text for visual searchconst embedding = await clip.embedText('executive team photo');console.log(`Dimensions: ${embedding.length}`);
Google’s Gemini provides multimodal embeddings for visual search.Setup:
export GEMINI_API_KEY=your-key-here
Usage:
from memvid_sdk.clip import get_clip_provider, GeminiClip# Using factoryclip = get_clip_provider('gemini')# Or with specific modelclip = get_clip_provider('gemini:embedding-001')# Direct instantiationclip = GeminiClip(model='embedding-001')# Embed textembedding = clip.embed_text('data visualization dashboard')
Combine visual and text search for richer retrieval:
# Store documents with visual embeddingsfor pdf in pdfs: # Text for lexical search frame_id = mem.put(title=pdf.name, file=str(pdf)) # Visual embedding for image search visual_embedding = clip.embed_image(pdf.thumbnail_path)# Hybrid search combines both modalitiesresults = mem.find(query, mode='auto')