Contributing to pyCTAKES
Thank you for your interest in contributing to pyCTAKES! This guide will help you get started with contributing to the project.
Development Setup
Prerequisites
- Python 3.8 or higher
- Git
- GitHub account
Setup Development Environment
-
Fork and Clone
-
Create Virtual Environment
-
Install Development Dependencies
-
Install Pre-commit Hooks
-
Verify Setup
Development Workflow
Creating a Feature Branch
# Create and switch to a new branch
git checkout -b feature/your-feature-name
# Make your changes
# ... edit files ...
# Commit changes
git add .
git commit -m "Add: description of your changes"
# Push branch
git push origin feature/your-feature-name
Code Style and Quality
pyCTAKES uses several tools to maintain code quality:
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- pytest: Testing
Run quality checks:
# Format code
black src/ tests/
isort src/ tests/
# Check linting
flake8 src/ tests/
# Type checking
mypy src/
# Run tests
pytest tests/
Testing
Running Tests
# Run all tests
pytest
# Run specific test file
pytest tests/test_pipeline.py
# Run with coverage
pytest --cov=pyctakes tests/
# Run integration tests
pytest tests/test_integrated_pipeline.py -v
Writing Tests
Create tests for new functionality:
# tests/test_new_feature.py
import pytest
from pyctakes.types import Document
from pyctakes.your_module import YourClass
class TestYourClass:
def setup_method(self):
"""Setup for each test method."""
self.instance = YourClass()
def test_basic_functionality(self):
"""Test basic functionality."""
doc = Document(text="Test input")
result = self.instance.process(doc)
assert result is not None
assert len(result.annotations) > 0
def test_edge_cases(self):
"""Test edge cases."""
# Empty input
doc = Document(text="")
result = self.instance.process(doc)
assert len(result.annotations) == 0
# Very long input
long_text = "word " * 10000
doc = Document(text=long_text)
result = self.instance.process(doc)
assert result is not None
def test_error_handling(self):
"""Test error handling."""
with pytest.raises(ValueError):
YourClass(invalid_param="invalid")
Contribution Types
Bug Fixes
- Identify the Issue
- Search existing issues
- Create new issue if needed
-
Discuss approach with maintainers
-
Fix the Bug
- Write failing test that reproduces bug
- Implement fix
- Ensure test passes
-
Verify no regressions
-
Submit Pull Request
- Reference issue number
- Describe fix clearly
- Include test coverage
New Features
- Propose Feature
- Create feature request issue
- Discuss design with maintainers
-
Get approval before implementing
-
Implement Feature
- Follow existing code patterns
- Add comprehensive tests
- Update documentation
-
Add examples if needed
-
Submit Pull Request
- Reference feature request
- Explain implementation
- Demonstrate usage
Documentation
- Identify Documentation Needs
- Missing documentation
- Unclear explanations
-
Outdated information
-
Improve Documentation
- Update markdown files
- Add code examples
- Improve docstrings
- Test documentation examples
Code Guidelines
Python Style
Follow PEP 8 and project conventions:
# Good: Clear, descriptive names
class ClinicalNERAnnotator:
def __init__(self, approach: str = "rule_based"):
self.approach = approach
def process(self, doc: Document) -> Document:
"""Process document to extract clinical entities."""
return self._extract_entities(doc)
# Good: Type hints and docstrings
def create_default_pipeline(config: Optional[Dict[str, Any]] = None) -> Pipeline:
"""
Create default clinical NLP pipeline.
Args:
config: Optional configuration dictionary
Returns:
Configured pipeline instance
"""
pipeline = Pipeline()
# ... implementation
return pipeline
Architecture Patterns
Follow established patterns:
# Annotator pattern
class NewAnnotator(BaseAnnotator):
def __init__(self, **kwargs):
super().__init__()
self.config = kwargs
def process(self, doc: Document) -> Document:
# Implementation
return doc
# Configuration pattern
def create_annotator_from_config(config: Dict[str, Any]) -> BaseAnnotator:
annotator_type = config.get("type", "default")
if annotator_type == "new":
return NewAnnotator(**config.get("options", {}))
else:
raise ValueError(f"Unknown annotator type: {annotator_type}")
Error Handling
Use consistent error handling:
from pyctakes.exceptions import pyCTAKESError, AnnotationError
class CustomAnnotator(BaseAnnotator):
def process(self, doc: Document) -> Document:
try:
return self._process_safely(doc)
except Exception as e:
self.logger.error(f"Processing failed: {e}")
raise AnnotationError(f"CustomAnnotator failed: {e}") from e
Documentation Guidelines
Docstring Format
Use Google-style docstrings:
def process_clinical_text(text: str, pipeline: Pipeline) -> Document:
"""
Process clinical text using specified pipeline.
Args:
text: Clinical text to process
pipeline: pyCTAKES pipeline instance
Returns:
Processed document with annotations
Raises:
AnnotationError: If processing fails
Example:
>>> pipeline = create_default_pipeline()
>>> doc = process_clinical_text("Patient has diabetes.", pipeline)
>>> print(len(doc.entities))
1
"""
README Updates
Update README.md for significant changes:
- New features in feature list
- Installation instructions
- Usage examples
- API changes
API Documentation
Update API docs for new classes/methods:
## NewAnnotator
::: pyctakes.annotators.new.NewAnnotator
options:
show_source: false
heading_level: 3
Pull Request Process
Before Submitting
-
Ensure Quality
-
Update Documentation
- Add docstrings to new functions/classes
- Update user guides if needed
-
Add examples for new features
-
Update Changelog
Pull Request Template
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring
## Testing
- [ ] Added tests for new functionality
- [ ] All tests pass
- [ ] Manual testing completed
## Documentation
- [ ] Updated docstrings
- [ ] Updated user documentation
- [ ] Added examples
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] No breaking changes (or clearly documented)
Review Process
- Automated Checks
- CI/CD pipeline runs
- Code quality checks
-
Test coverage
-
Manual Review
- Code review by maintainers
- Documentation review
-
Testing verification
-
Feedback Integration
- Address review comments
- Update code as needed
- Re-request review
Release Process
Version Numbering
pyCTAKES follows semantic versioning: - Major: Breaking changes (1.0.0 → 2.0.0) - Minor: New features (1.0.0 → 1.1.0) - Patch: Bug fixes (1.0.0 → 1.0.1)
Release Checklist
- Prepare Release
- Update version in
pyproject.toml
- Update CHANGELOG.md
-
Update documentation
-
Create Release
- Tag release:
git tag v1.0.0
- Push tags:
git push --tags
-
GitHub Actions handles PyPI release
-
Post-Release
- Verify PyPI package
- Update documentation site
- Announce release
Community Guidelines
Code of Conduct
- Be respectful and inclusive
- Welcome newcomers
- Focus on constructive feedback
- Help others learn and grow
Communication
- GitHub Issues: Bug reports, feature requests
- GitHub Discussions: Questions, ideas
- Pull Requests: Code changes
Getting Help
- Check existing documentation
- Search closed issues
- Ask questions in discussions
- Tag maintainers if urgent
Recognition
Contributors will be recognized in: - CONTRIBUTORS.md file - Release notes - Documentation credits
Thank you for contributing to pyCTAKES! 🎉