Changelog

All notable changes to pyCTAKES will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

Comprehensive API documentation with mkdocstrings
Advanced topics documentation (custom annotators, performance tuning, UMLS integration)
Development and contributing guidelines
GitHub Actions workflow for documentation deployment

Changed

Enhanced mkdocs configuration with material theme
Improved navigation structure for documentation

[1.0.0] - 2025-01-15

Added

Core pipeline architecture with configurable annotator chains
Clinical text processing annotators:
ClinicalSentenceSegmenter - Clinical-aware sentence segmentation
ClinicalTokenizer - Advanced tokenization with POS tagging
ClinicalSectionAnnotator - Clinical document section detection
Named entity recognition:
ClinicalNERAnnotator - Hybrid rule-based and model-based NER
SimpleClinicalNER - Fast pattern-based entity recognition
Assertion and negation detection:
NegationAssertionAnnotator - pyConText-style context detection
UMLS concept mapping:
UMLSConceptMapper - Framework for UMLS integration
SimpleDictionaryMapper - Fast dictionary-based mapping
Three pre-configured pipeline types:
Default pipeline - Full clinical NLP features
Fast pipeline - Speed-optimized with rule-based components
Basic pipeline - Minimal entity extraction
Command-line interface with multiple output formats
Comprehensive type system for clinical text annotations
Configuration-driven annotator behavior
Extensive test suite with unit and integration tests
Performance benchmarking and metrics
Documentation with installation, quickstart, and user guides
Example configurations and sample clinical notes

Technical Features

Multiple backend support (spaCy, Stanza, rule-based)
Clinical abbreviation awareness in tokenization
Comprehensive clinical entity dictionaries (100+ terms per category)
Bidirectional context analysis for assertion detection
Configurable pipeline composition and annotator parameters
Error handling and recovery throughout processing pipeline
JSON serialization for all annotation types
Memory-efficient processing with streaming capabilities

Documentation

Complete installation and setup instructions
Quickstart guide with basic usage examples
Comprehensive user guide covering all features
API reference documentation
Configuration examples and best practices
Performance optimization guidelines

Performance

Basic Pipeline: 39 annotations in 0.010s
Fast Pipeline: 36 annotations in 0.001s
Full clinical note processing: 81 annotations in 0.504s

[0.1.0] - 2025-01-01

Added

Initial project structure and setup
Basic pipeline framework
Core type definitions
Development environment configuration
CI/CD pipeline setup

Release Notes

Version 1.0.0 - "Foundation Release"

pyCTAKES v1.0.0 represents the first stable release of our Python-native clinical NLP framework. This release delivers comprehensive feature parity with Apache cTAKES while providing modern Python APIs and superior usability.

Key Highlights: - 🏥 Clinical-First Design: Built specifically for clinical text processing with medical abbreviation awareness, section detection, and assertion analysis - 🚀 Multiple Pipeline Types: Choose from default, fast, or basic pipelines based on your accuracy/speed requirements
- 🔧 Flexible Architecture: Modular annotator system allows custom pipeline composition and easy extensibility - 📊 Production Ready: Comprehensive testing, error handling, and performance optimization for real-world deployment - 📖 Complete Documentation: Full user guides, API documentation, and practical examples

Migration from cTAKES: pyCTAKES provides a drop-in replacement for Apache cTAKES with simpler deployment (no Java required) and modern NLP backends. Existing cTAKES users can migrate their pipelines using our configuration system.

Next Steps: - Enhanced UMLS integration with full QuickUMLS support - Relation extraction for temporal and dosage relationships - REST API service wrapper - Performance optimizations and Docker containers

Community: pyCTAKES is open source and welcoming contributions. Join us in advancing clinical NLP with modern Python tools!

Version Scheme

pyCTAKES follows Semantic Versioning:

MAJOR version for incompatible API changes
MINOR version for backwards-compatible functionality additions
PATCH version for backwards-compatible bug fixes

Pre-release Identifiers

alpha (a): Early development versions
beta (b): Feature-complete but potentially unstable
rc: Release candidates, stable and ready for release

Example: 1.1.0a1 (first alpha of version 1.1.0)

Release Schedule

Major releases: Annually or for significant architectural changes
Minor releases: Quarterly for new features
Patch releases: As needed for critical bug fixes

Deprecation Policy

Features marked as deprecated will be removed in the next major version
Deprecation warnings will be issued for at least one minor version before removal
Migration guides will be provided for deprecated features

Breaking Changes

All breaking changes will be clearly documented with: - Description of the change - Reason for the change
- Migration instructions - Timeline for old behavior removal

Support

Current major version: Full support with bug fixes and new features
Previous major version: Security updates and critical bug fixes for 12 months
Older versions: Community support only