API Overview
AnnoRefine provides both command-line tools and Python bindings for genome annotation refinement.
Command-Line Tools
annorefine bam2hints
Convert BAM alignments to Augustus/GeneMark hints format.
Usage:
Required Arguments:
- --input, -i <BAM>: Input BAM file (must be sorted and indexed)
- --output, -o <GFF>: Output GFF hints file
- --stranded, -s <TYPE>: Library strandedness (FR, RF, or UU)
Optional Arguments:
- --priority <INT>: Priority of hint group (default: 4)
- --source <STR>: Source identifier (default: "E")
- --intronsonly: Only retrieve intron hints
- --threads, -t <INT>: Number of threads (default: all available)
- --contig <STR>: Filter to specific contig
- --min-intron-len <INT>: Minimum intron length (default: 32)
- --max-intron-len <INT>: Maximum intron length (default: 350000)
- --min-end-block-len <INT>: Minimum dangling exon length (default: 8)
- --max-gap-len <INT>: Maximum gap to close (default: 14)
Example:
annorefine bam2hints \
--input alignments.bam \
--output hints.gff \
--stranded RF \
--threads 8 \
--priority 4
annorefine utrs
Refine UTRs and detect novel genes using RNA-seq evidence.
Usage:
Required Arguments:
- --fasta, -f <FASTA>: Input genome FASTA file
- --gff3, -g <GFF3>: Input GFF3 annotation file
- --bam, -b <BAM>: Input RNA-seq BAM file
- --output, -o <GFF3>: Output refined GFF3 file
Optional Arguments:
- --min-coverage <INT>: Minimum coverage for UTR extension (default: 5)
- --min-splice-support <INT>: Minimum splice junction support (default: 3)
- --max-utr-extension <INT>: Maximum UTR extension length (default: 1000)
- --enable-novel-gene-detection: Enable novel gene detection
- --min-novel-gene-coverage <INT>: Minimum coverage for novel genes (default: 10)
- --threads, -t <INT>: Number of threads (default: all available)
Example:
annorefine utrs \
--fasta genome.fa \
--gff3 annotations.gff3 \
--bam alignments.bam \
--output refined.gff3 \
--enable-novel-gene-detection \
--threads 8
Python API
See the Python Functions Reference for detailed documentation of all Python functions.
Utility Functions
version()
Get the AnnoRefine version string.
current_num_threads()
Get the current number of threads configured for parallel processing.
Output Formats
GFF Hints Format
AnnoRefine generates hints in GFF format compatible with Augustus and GeneMark:
# Generated by AnnoRefine v2025.9.18
# Command: bam2hints --input alignments.bam --output hints.gff --stranded RF
# Library type: RF
# Introns only: false
chr1 b2h intron 1000 2000 0 + . mult=15;pri=4;src=E;
chr1 b2h exon 2001 2500 0 + . mult=10;pri=4;src=E;
chr1 b2h exonpart 2501 3000 0 + . mult=8;pri=4;src=E;
chr1 b2h dss 2000 2000 0 + . mult=15;pri=4;src=E;
chr1 b2h ass 2001 2001 0 + . mult=15;pri=4;src=E;
Columns: 1. Chromosome/contig name 2. Source (default: "b2h") 3. Feature type (intron, exon, exonpart, dss, ass) 4. Start position (1-based) 5. End position (1-based, inclusive) 6. Score (always 0) 7. Strand (+, -, or .) 8. Frame (always .) 9. Attributes (mult=multiplicity, pri=priority, src=source)
GFF3 Annotation Format
Refined annotations are output in standard GFF3 format:
##gff-version 3
chr1 AnnoRefine gene 1000 5000 . + . ID=gene1
chr1 AnnoRefine mRNA 1000 5000 . + . ID=mRNA1;Parent=gene1
chr1 AnnoRefine five_prime_UTR 1000 1200 . + . Parent=mRNA1
chr1 AnnoRefine exon 1000 2000 . + . Parent=mRNA1
chr1 AnnoRefine CDS 1201 1900 . + 0 Parent=mRNA1
chr1 AnnoRefine exon 3000 5000 . + . Parent=mRNA1
chr1 AnnoRefine CDS 3000 4800 . + 2 Parent=mRNA1
chr1 AnnoRefine three_prime_UTR 4801 5000 . + . Parent=mRNA1
Next Steps
- Python Functions Reference - Detailed function documentation
- User Guide - Installation and usage guides
- GitHub Repository - Source code and issues