Contributing to pyCTAKES

Thank you for your interest in contributing to pyCTAKES! This guide will help you get started with contributing to the project.

Development Setup

Prerequisites

Python 3.8 or higher
Git
GitHub account

Setup Development Environment

Fork and Clone

# Fork the repository on GitHub
git clone https://github.com/YOUR_USERNAME/pyctakes.git
cd pyctakes

Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Development Dependencies
```
pip install -e ".[dev]"
```
Install Pre-commit Hooks
```
pre-commit install
```

Verify Setup

python -m pytest tests/
python -m pyctakes --help

Development Workflow

Creating a Feature Branch

# Create and switch to a new branch
git checkout -b feature/your-feature-name

# Make your changes
# ... edit files ...

# Commit changes
git add .
git commit -m "Add: description of your changes"

# Push branch
git push origin feature/your-feature-name

Code Style and Quality

pyCTAKES uses several tools to maintain code quality:

Black: Code formatting
isort: Import sorting
flake8: Linting
mypy: Type checking
pytest: Testing

Run quality checks:

# Format code
black src/ tests/
isort src/ tests/

# Check linting
flake8 src/ tests/

# Type checking
mypy src/

# Run tests
pytest tests/

Testing

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_pipeline.py

# Run with coverage
pytest --cov=pyctakes tests/

# Run integration tests
pytest tests/test_integrated_pipeline.py -v

Writing Tests

Create tests for new functionality:

# tests/test_new_feature.py
import pytest
from pyctakes.types import Document
from pyctakes.your_module import YourClass

class TestYourClass:
    def setup_method(self):
        """Setup for each test method."""
        self.instance = YourClass()

    def test_basic_functionality(self):
        """Test basic functionality."""
        doc = Document(text="Test input")
        result = self.instance.process(doc)

        assert result is not None
        assert len(result.annotations) > 0

    def test_edge_cases(self):
        """Test edge cases."""
        # Empty input
        doc = Document(text="")
        result = self.instance.process(doc)
        assert len(result.annotations) == 0

        # Very long input
        long_text = "word " * 10000
        doc = Document(text=long_text)
        result = self.instance.process(doc)
        assert result is not None

    def test_error_handling(self):
        """Test error handling."""
        with pytest.raises(ValueError):
            YourClass(invalid_param="invalid")

Contribution Types

Bug Fixes

Identify the Issue
Search existing issues
Create new issue if needed
Discuss approach with maintainers
Fix the Bug
Write failing test that reproduces bug
Implement fix
Ensure test passes
Verify no regressions
Submit Pull Request
Reference issue number
Describe fix clearly
Include test coverage

New Features

Propose Feature
Create feature request issue
Discuss design with maintainers
Get approval before implementing
Implement Feature
Follow existing code patterns
Add comprehensive tests
Update documentation
Add examples if needed
Submit Pull Request
Reference feature request
Explain implementation
Demonstrate usage

Documentation

Identify Documentation Needs
Missing documentation
Unclear explanations
Outdated information
Improve Documentation
Update markdown files
Add code examples
Improve docstrings
Test documentation examples

Code Guidelines

Python Style

Follow PEP 8 and project conventions:

# Good: Clear, descriptive names
class ClinicalNERAnnotator:
    def __init__(self, approach: str = "rule_based"):
        self.approach = approach

    def process(self, doc: Document) -> Document:
        """Process document to extract clinical entities."""
        return self._extract_entities(doc)

# Good: Type hints and docstrings
def create_default_pipeline(config: Optional[Dict[str, Any]] = None) -> Pipeline:
    """
    Create default clinical NLP pipeline.

    Args:
        config: Optional configuration dictionary

    Returns:
        Configured pipeline instance
    """
    pipeline = Pipeline()
    # ... implementation
    return pipeline

Architecture Patterns

Follow established patterns:

# Annotator pattern
class NewAnnotator(BaseAnnotator):
    def __init__(self, **kwargs):
        super().__init__()
        self.config = kwargs

    def process(self, doc: Document) -> Document:
        # Implementation
        return doc

# Configuration pattern
def create_annotator_from_config(config: Dict[str, Any]) -> BaseAnnotator:
    annotator_type = config.get("type", "default")

    if annotator_type == "new":
        return NewAnnotator(**config.get("options", {}))
    else:
        raise ValueError(f"Unknown annotator type: {annotator_type}")

Error Handling

Use consistent error handling:

from pyctakes.exceptions import pyCTAKESError, AnnotationError

class CustomAnnotator(BaseAnnotator):
    def process(self, doc: Document) -> Document:
        try:
            return self._process_safely(doc)
        except Exception as e:
            self.logger.error(f"Processing failed: {e}")
            raise AnnotationError(f"CustomAnnotator failed: {e}") from e

Documentation Guidelines

Docstring Format

Use Google-style docstrings:

def process_clinical_text(text: str, pipeline: Pipeline) -> Document:
    """
    Process clinical text using specified pipeline.

    Args:
        text: Clinical text to process
        pipeline: pyCTAKES pipeline instance

    Returns:
        Processed document with annotations

    Raises:
        AnnotationError: If processing fails

    Example:
        >>> pipeline = create_default_pipeline()
        >>> doc = process_clinical_text("Patient has diabetes.", pipeline)
        >>> print(len(doc.entities))
        1
    """

README Updates

Update README.md for significant changes:

New features in feature list
Installation instructions
Usage examples
API changes

API Documentation

Update API docs for new classes/methods:

## NewAnnotator

::: pyctakes.annotators.new.NewAnnotator
    options:
      show_source: false
      heading_level: 3

Pull Request Process

Before Submitting

Ensure Quality

# Run full test suite
pytest tests/

# Check code quality
pre-commit run --all-files

# Verify documentation builds
mkdocs build

Update Documentation
Add docstrings to new functions/classes
Update user guides if needed
Add examples for new features

Update Changelog

## [Unreleased]

### Added
- New feature description

### Fixed
- Bug fix description

Pull Request Template

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring

## Testing
- [ ] Added tests for new functionality
- [ ] All tests pass
- [ ] Manual testing completed

## Documentation
- [ ] Updated docstrings
- [ ] Updated user documentation
- [ ] Added examples

## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] No breaking changes (or clearly documented)

Review Process

Automated Checks
CI/CD pipeline runs
Code quality checks
Test coverage
Manual Review
Code review by maintainers
Documentation review
Testing verification
Feedback Integration
Address review comments
Update code as needed
Re-request review

Release Process

Version Numbering

pyCTAKES follows semantic versioning: - Major: Breaking changes (1.0.0 → 2.0.0) - Minor: New features (1.0.0 → 1.1.0) - Patch: Bug fixes (1.0.0 → 1.0.1)

Release Checklist

Prepare Release
Update version in pyproject.toml
Update CHANGELOG.md
Update documentation
Create Release
Tag release: git tag v1.0.0
Push tags: git push --tags
GitHub Actions handles PyPI release
Post-Release
Verify PyPI package
Update documentation site
Announce release

Community Guidelines

Code of Conduct

Be respectful and inclusive
Welcome newcomers
Focus on constructive feedback
Help others learn and grow

Communication

GitHub Issues: Bug reports, feature requests
GitHub Discussions: Questions, ideas
Pull Requests: Code changes

Getting Help

Check existing documentation
Search closed issues
Ask questions in discussions
Tag maintainers if urgent

Recognition

Contributors will be recognized in: - CONTRIBUTORS.md file - Release notes - Documentation credits

Thank you for contributing to pyCTAKES! 🎉