A comprehensive Python framework inspired by Mayo Clinic's MedTagger, implementing dictionary matching, pattern extraction, ML-based NER, section detection, and concept normalization.
High-performance flashtext matching against CSV, JSON, MedLex, UMLS, and OMOP dictionaries
Regex and spaCy patterns for clinical entities like vital signs, medications, and demographics
spaCy and Transformers integration with BioBERT/ClinicalBERT support
Automatic detection of clinical document sections (Chief Complaint, HPI, Assessment, etc.)
pyConText-style analysis for clinical context understanding
Concept mapping to standard medical vocabularies
Processing time per document
Comprehensive tests
Core NLP components
Test pass rate
# Clone the repository
git clone https://github.com/sonishsivarajkumar/Agentic-MedTagger.git
cd Agentic-MedTagger
# Install dependencies
pip install -r requirements.txt
# Download spaCy models
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_trf
from agentic_medtagger.core.pipeline import create_medtagger_pipeline
from agentic_medtagger.core.document import Document
# Create pipeline
pipeline = create_medtagger_pipeline()
# Process clinical text
text = """
CHIEF COMPLAINT: Chest pain
Blood pressure is 140/90 mmHg.
"""
document = Document(text=text, document_id="note_001")
processed_doc = await pipeline.process_document(document)
# Access annotations
for annotation in processed_doc.annotations:
print(f"{annotation.text} -> {annotation.label}")
# Run comprehensive demo
python demo_medtagger.py
# Run tests
pytest tests/ -v