Command Line Interface
pyCTAKES provides a comprehensive command-line interface for processing clinical text without writing code.
Basic Usage
# Process a single file
pyctakes process input.txt
# Process with output file
pyctakes process input.txt --output output.json
# Process multiple files
pyctakes process *.txt --output-dir results/
# Use custom configuration
pyctakes process input.txt --config my_config.json
Commands
process
Process clinical text files through the pyCTAKES pipeline.
Arguments:
- INPUT_FILES
: One or more input text files to process
Options:
- --output, -o TEXT
: Output file path (default: stdout)
- --output-dir, -d TEXT
: Output directory for multiple files
- --config, -c TEXT
: Configuration file path
- --pipeline, -p TEXT
: Pipeline name (default, fast, basic, custom)
- --format, -f TEXT
: Output format (json, xml, text)
- --verbose, -v
: Enable verbose logging
- --quiet, -q
: Suppress all output except results
Examples:
# Basic processing
pyctakes process note.txt
# Save to file
pyctakes process note.txt --output results.json
# Process multiple files
pyctakes process notes/*.txt --output-dir processed/
# Use fast pipeline
pyctakes process note.txt --pipeline fast
# Custom configuration
pyctakes process note.txt --config my_config.json --verbose
configure
Create and manage configuration files.
Options:
- --create, -c TEXT
: Create new configuration file
- --template, -t TEXT
: Use template (default, fast, basic, medication)
- --edit, -e TEXT
: Edit existing configuration
- --validate, -v TEXT
: Validate configuration file
Examples:
# Create default configuration
pyctakes configure --create config.json
# Create from template
pyctakes configure --create fast_config.json --template fast
# Validate configuration
pyctakes configure --validate my_config.json
info
Display information about pyCTAKES installation and capabilities.
Options:
- --annotators
: List available annotators
- --pipelines
: List available pipeline templates
- --version
: Show version information
- --dependencies
: Show dependency status
Examples:
# General information
pyctakes info
# List annotators
pyctakes info --annotators
# Check dependencies
pyctakes info --dependencies
demo
Run interactive demonstrations and examples.
Options:
- --example, -e TEXT
: Run specific example (basic, comprehensive, custom)
- --interactive, -i
: Interactive mode
- --sample-text, -s
: Use built-in sample text
Examples:
# Interactive demo
pyctakes demo --interactive
# Run basic example
pyctakes demo --example basic
# Use sample text
pyctakes demo --sample-text
Output Formats
JSON Format (default)
{
"text": "Patient has diabetes and hypertension.",
"sentences": [
{
"start": 0,
"end": 37,
"text": "Patient has diabetes and hypertension."
}
],
"tokens": [
{"start": 0, "end": 7, "text": "Patient"},
{"start": 8, "end": 11, "text": "has"},
{"start": 12, "end": 20, "text": "diabetes"}
],
"entities": [
{
"start": 12,
"end": 20,
"text": "diabetes",
"label": "CONDITION",
"assertion": {
"polarity": "POSITIVE",
"uncertainty": "CERTAIN"
}
}
],
"sections": [
{
"start": 0,
"end": 37,
"section_type": "ASSESSMENT_AND_PLAN"
}
]
}
XML Format
<?xml version="1.0" encoding="UTF-8"?>
<document>
<text>Patient has diabetes and hypertension.</text>
<annotations>
<sentence start="0" end="37">Patient has diabetes and hypertension.</sentence>
<entity start="12" end="20" label="CONDITION" polarity="POSITIVE">diabetes</entity>
<entity start="25" end="37" label="CONDITION" polarity="POSITIVE">hypertension</entity>
</annotations>
</document>
Text Format
TEXT: Patient has diabetes and hypertension.
ENTITIES:
- diabetes (CONDITION) [12-20] POSITIVE/CERTAIN
- hypertension (CONDITION) [25-37] POSITIVE/CERTAIN
SECTIONS:
- ASSESSMENT_AND_PLAN [0-37]
Pipeline Templates
default
Complete clinical NLP pipeline with all annotators: - Tokenization (spaCy) - Section detection - Named entity recognition (rule-based) - Assertion detection - UMLS concept mapping
fast
Optimized for speed with basic functionality: - Tokenization (rule-based) - Named entity recognition (rule-based) - Basic assertion detection
basic
Minimal pipeline for simple entity extraction: - Tokenization (rule-based) - Named entity recognition (rule-based)
custom
User-defined pipeline from configuration file:
Configuration Examples
Create Basic Configuration
Creates:
{
"tokenization": {
"backend": "rule_based"
},
"ner": {
"approach": "rule_based",
"entity_types": ["MEDICATION", "CONDITION", "SYMPTOM"]
}
}
Medication-Focused Configuration
Creates configuration optimized for medication extraction.
Batch Processing
Process Directory
# Process all .txt files in a directory
pyctakes process input_dir/*.txt --output-dir results/
# Process with specific pattern
pyctakes process "notes_*.txt" --output-dir processed/
Parallel Processing
# Process files in parallel (if supported)
pyctakes process *.txt --output-dir results/ --parallel --workers 4
Advanced Usage
Custom Output
# Only output entities
pyctakes process note.txt --format json --filter entities
# Include confidence scores
pyctakes process note.txt --include-confidence
# Minimal output
pyctakes process note.txt --minimal
Logging and Debugging
# Verbose logging
pyctakes process note.txt --verbose
# Debug mode
pyctakes process note.txt --debug
# Log to file
pyctakes process note.txt --log-file processing.log
Performance Monitoring
# Show timing information
pyctakes process note.txt --timing
# Profile performance
pyctakes process note.txt --profile
Integration Examples
Shell Scripting
#!/bin/bash
# Process all patient notes
for file in patient_notes/*.txt; do
echo "Processing $file..."
pyctakes process "$file" --output "results/$(basename "$file" .txt).json"
done
Pipeline Integration
# Use in pipeline
cat input.txt | pyctakes process - --format json | jq '.entities[].text'
# With other tools
pyctakes process notes/*.txt --output-dir results/ && \
python analyze_results.py results/
Error Handling
# Continue on errors
pyctakes process *.txt --continue-on-error
# Skip invalid files
pyctakes process *.txt --skip-invalid
# Error reporting
pyctakes process *.txt --error-report errors.log