Changelog
All notable changes to pyCTAKES will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Comprehensive API documentation with mkdocstrings
- Advanced topics documentation (custom annotators, performance tuning, UMLS integration)
- Development and contributing guidelines
- GitHub Actions workflow for documentation deployment
Changed
- Enhanced mkdocs configuration with material theme
- Improved navigation structure for documentation
[1.0.0] - 2025-01-15
Added
- Core pipeline architecture with configurable annotator chains
- Clinical text processing annotators:
ClinicalSentenceSegmenter
- Clinical-aware sentence segmentationClinicalTokenizer
- Advanced tokenization with POS taggingClinicalSectionAnnotator
- Clinical document section detection- Named entity recognition:
ClinicalNERAnnotator
- Hybrid rule-based and model-based NERSimpleClinicalNER
- Fast pattern-based entity recognition- Assertion and negation detection:
NegationAssertionAnnotator
- pyConText-style context detection- UMLS concept mapping:
UMLSConceptMapper
- Framework for UMLS integrationSimpleDictionaryMapper
- Fast dictionary-based mapping- Three pre-configured pipeline types:
- Default pipeline - Full clinical NLP features
- Fast pipeline - Speed-optimized with rule-based components
- Basic pipeline - Minimal entity extraction
- Command-line interface with multiple output formats
- Comprehensive type system for clinical text annotations
- Configuration-driven annotator behavior
- Extensive test suite with unit and integration tests
- Performance benchmarking and metrics
- Documentation with installation, quickstart, and user guides
- Example configurations and sample clinical notes
Technical Features
- Multiple backend support (spaCy, Stanza, rule-based)
- Clinical abbreviation awareness in tokenization
- Comprehensive clinical entity dictionaries (100+ terms per category)
- Bidirectional context analysis for assertion detection
- Configurable pipeline composition and annotator parameters
- Error handling and recovery throughout processing pipeline
- JSON serialization for all annotation types
- Memory-efficient processing with streaming capabilities
Documentation
- Complete installation and setup instructions
- Quickstart guide with basic usage examples
- Comprehensive user guide covering all features
- API reference documentation
- Configuration examples and best practices
- Performance optimization guidelines
Performance
- Basic Pipeline: 39 annotations in 0.010s
- Fast Pipeline: 36 annotations in 0.001s
- Full clinical note processing: 81 annotations in 0.504s
[0.1.0] - 2025-01-01
Added
- Initial project structure and setup
- Basic pipeline framework
- Core type definitions
- Development environment configuration
- CI/CD pipeline setup
Release Notes
Version 1.0.0 - "Foundation Release"
pyCTAKES v1.0.0 represents the first stable release of our Python-native clinical NLP framework. This release delivers comprehensive feature parity with Apache cTAKES while providing modern Python APIs and superior usability.
Key Highlights:
- 🏥 Clinical-First Design: Built specifically for clinical text processing with medical abbreviation awareness, section detection, and assertion analysis
- 🚀 Multiple Pipeline Types: Choose from default, fast, or basic pipelines based on your accuracy/speed requirements
- 🔧 Flexible Architecture: Modular annotator system allows custom pipeline composition and easy extensibility
- 📊 Production Ready: Comprehensive testing, error handling, and performance optimization for real-world deployment
- 📖 Complete Documentation: Full user guides, API documentation, and practical examples
Migration from cTAKES: pyCTAKES provides a drop-in replacement for Apache cTAKES with simpler deployment (no Java required) and modern NLP backends. Existing cTAKES users can migrate their pipelines using our configuration system.
Next Steps: - Enhanced UMLS integration with full QuickUMLS support - Relation extraction for temporal and dosage relationships - REST API service wrapper - Performance optimizations and Docker containers
Community: pyCTAKES is open source and welcoming contributions. Join us in advancing clinical NLP with modern Python tools!
Version Scheme
pyCTAKES follows Semantic Versioning:
- MAJOR version for incompatible API changes
- MINOR version for backwards-compatible functionality additions
- PATCH version for backwards-compatible bug fixes
Pre-release Identifiers
- alpha (a): Early development versions
- beta (b): Feature-complete but potentially unstable
- rc: Release candidates, stable and ready for release
Example: 1.1.0a1
(first alpha of version 1.1.0)
Release Schedule
- Major releases: Annually or for significant architectural changes
- Minor releases: Quarterly for new features
- Patch releases: As needed for critical bug fixes
Deprecation Policy
- Features marked as deprecated will be removed in the next major version
- Deprecation warnings will be issued for at least one minor version before removal
- Migration guides will be provided for deprecated features
Breaking Changes
All breaking changes will be clearly documented with:
- Description of the change
- Reason for the change
- Migration instructions
- Timeline for old behavior removal
Support
- Current major version: Full support with bug fixes and new features
- Previous major version: Security updates and critical bug fixes for 12 months
- Older versions: Community support only