Installation Guide
Requirements
pyCTAKES requires Python 3.8 or higher and is tested on:
- Python 3.8, 3.9, 3.10, 3.11, 3.12
- Linux, macOS, and Windows
Basic Installation
Install from PyPI (Recommended)
This installs pyCTAKES with basic dependencies for rule-based processing.
Development Installation
For the latest development version:
Optional Dependencies
pyCTAKES supports multiple NLP backends. Install additional packages for enhanced functionality:
spaCy (Recommended)
For advanced tokenization, POS tagging, and model-based NER:
# Install spaCy
pip install spacy
# Download English model
python -m spacy download en_core_web_sm
# For clinical models (optional)
pip install scispacy
python -m spacy download en_core_sci_sm
Stanza
Alternative NLP backend:
UMLS Integration
For comprehensive concept mapping:
Complete Installation
For all features:
# Install with all optional dependencies
pip install pyctakes[all]
# Or install components separately
pip install pyctakes spacy scispacy stanza
python -m spacy download en_core_web_sm
python -m spacy download en_core_sci_sm
Verification
Verify your installation:
import pyctakes
# Test basic functionality
pipeline = pyctakes.create_basic_pipeline()
result = pipeline.process_text("Patient has diabetes.")
print(f"Found {len(result.document.annotations)} annotations")
Docker Installation
Run pyCTAKES in Docker:
# Pull the image
docker pull sonish777/pyctakes:latest
# Run interactively
docker run -it sonish777/pyctakes:latest python
# Process a file
docker run -v $(pwd):/data sonish777/pyctakes:latest \
pyctakes annotate /data/clinical_note.txt
Troubleshooting
Common Issues
1. spaCy model not found
2. Permission errors
3. Environment conflicts
# Use virtual environment
python -m venv pyctakes_env
source pyctakes_env/bin/activate # Linux/Mac
# or pyctakes_env\Scripts\activate # Windows
pip install pyctakes
Platform-Specific Notes
macOS Apple Silicon (M1/M2)
Windows
Linux
# May need additional system dependencies
sudo apt-get install python3-dev build-essential
pip install pyctakes
Performance Optimization
For optimal performance:
- Install spaCy models: Significantly improves NER accuracy
- Use SSD storage: Faster model loading
- Allocate sufficient RAM: 4GB+ recommended for large models
- GPU support: Install CUDA-compatible packages for transformer models