Production-Ready Clinical NLP

A comprehensive Python framework inspired by Mayo Clinic's MedTagger, implementing dictionary matching, pattern extraction, ML-based NER, section detection, and concept normalization.

Key Features

📚

Dictionary Matching

High-performance flashtext matching against CSV, JSON, MedLex, UMLS, and OMOP dictionaries

🔍

Pattern Extraction

Regex and spaCy patterns for clinical entities like vital signs, medications, and demographics

🤖

ML-Based NER

spaCy and Transformers integration with BioBERT/ClinicalBERT support

📄

Section Detection

Automatic detection of clinical document sections (Chief Complaint, HPI, Assessment, etc.)

💭

Assertion & Negation

pyConText-style analysis for clinical context understanding

🏷️

OMOP/UMLS Normalization

Concept mapping to standard medical vocabularies

Interactive Demo

Clinical Text Processing Example

Performance Metrics

~0.1s

Processing time per document

27

Comprehensive tests

6

Core NLP components

100%

Test pass rate

Getting Started

Quick Installation

# Clone the repository
git clone https://github.com/sonishsivarajkumar/Agentic-MedTagger.git
cd Agentic-MedTagger

# Install dependencies
pip install -r requirements.txt

# Download spaCy models
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_trf

Basic Usage

from agentic_medtagger.core.pipeline import create_medtagger_pipeline
from agentic_medtagger.core.document import Document

# Create pipeline
pipeline = create_medtagger_pipeline()

# Process clinical text
text = """
CHIEF COMPLAINT: Chest pain
Blood pressure is 140/90 mmHg.
"""

document = Document(text=text, document_id="note_001")
processed_doc = await pipeline.process_document(document)

# Access annotations
for annotation in processed_doc.annotations:
    print(f"{annotation.text} -> {annotation.label}")

Run the Demo

# Run comprehensive demo
python demo_medtagger.py

# Run tests
pytest tests/ -v