Extract entities and relationships for structured knowledge management
Memvid supports Named Entity Recognition (NER) for extracting structured entities from documents. This enables building knowledge graphs, entity-based search, and relationship mapping, turning unstructured text into structured knowledge.
import { create, getEntityExtractor } from '@memvid/sdk';// Initialize entity extractorconst ner = getEntityExtractor('openai', { entityTypes: ['COMPANY', 'PERSON', 'MONEY', 'DATE'],});console.log(`Provider: ${ner.name}`);console.log(`Entity types: ${ner.entityTypes}`);// Extract entities from textconst text = `Microsoft CEO Satya Nadella announced a $50 million investment in Seattle.The deal closes December 2024 with Pinnacle Financial as lead investor.`;const entities = await ner.extract(text, 0.5);for (const entity of entities) { console.log(` ${entity.name} (${entity.type}, ${entity.confidence.toFixed(2)})`);}
The default provider uses DistilBERT-NER, a lightweight model for offline entity extraction.Characteristics:
Model: DistilBERT fine-tuned on CoNLL-03
Size: ~261 MB (downloaded on first use)
Entity types: PERSON, ORG, LOCATION, MISC (fixed)
Inference: CPU-based, no GPU required
Privacy: All processing happens locally
from memvid_sdk.entities import get_entity_extractor, LocalNER# Using factoryner = get_entity_extractor('local')# Or direct instantiationner = LocalNER(model='distilbert-ner')# Extract entitiesentities = ner.extract("Apple CEO Tim Cook visited Paris headquarters.")# [# {'name': 'Apple', 'type': 'ORG', 'confidence': 0.98},# {'name': 'Tim Cook', 'type': 'PERSON', 'confidence': 0.97},# {'name': 'Paris', 'type': 'LOCATION', 'confidence': 0.95},# ]
import { getEntityExtractor, LocalNER } from '@memvid/sdk';const ner = getEntityExtractor('local');const entities = await ner.extract('Apple CEO Tim Cook visited Paris headquarters.');
Local NER uses fixed entity types (PERSON, ORG, LOCATION, MISC). For custom entity types, use cloud providers. In Node.js, LocalNER requires a native build that exports NerModel (the prebuilt npm binaries may not include it).
from memvid_sdk.entities import get_entity_extractor, GeminiEntities# Using factoryner = get_entity_extractor('gemini', entity_types=['COMPANY', 'PERSON'])# With specific modelner = get_entity_extractor('gemini:gemini-2.0-flash', entity_types=['COMPANY'])entities = ner.extract(text, min_confidence=0.5)
import { getEntityExtractor, GeminiEntities } from '@memvid/sdk';const ner = getEntityExtractor('gemini', { entityTypes: ['COMPANY', 'PERSON'],});const entities = await ner.extract(text, 0.5);
Extract structured data from unstructured documents:
# Process legal contractsner = get_entity_extractor('claude', entity_types=[ 'PARTY', 'DATE', 'MONEY', 'TERM', 'JURISDICTION'])contract_text = "Agreement between Acme Corp and Beta Inc dated January 15, 2024..."entities = ner.extract(contract_text)# Build structured contract summaryparties = [e['name'] for e in entities if e['type'] == 'PARTY']dates = [e['name'] for e in entities if e['type'] == 'DATE']
# Extract entities from multiple documentsall_entities = []for doc in documents: entities = ner.extract(doc.text) for e in entities: e['source_doc'] = doc.id all_entities.extend(entities)# Build co-occurrence graphfrom collections import defaultdictco_occurrences = defaultdict(int)for doc_id in set(e['source_doc'] for e in all_entities): doc_entities = [e for e in all_entities if e['source_doc'] == doc_id] for i, e1 in enumerate(doc_entities): for e2 in doc_entities[i+1:]: pair = tuple(sorted([e1['name'], e2['name']])) co_occurrences[pair] += 1
# Store entities with documentsfor doc in documents: entities = ner.extract(doc.text) frame_id = mem.put( title=doc.title, label='document', metadata={ 'entities': entities, 'companies': [e['name'] for e in entities if e['type'] == 'COMPANY'], 'people': [e['name'] for e in entities if e['type'] == 'PERSON'], }, text=doc.text, )# Search by entityresults = mem.find('Microsoft', k=10)