Skip to content

API Overview

AnnoRefine provides both command-line tools and Python bindings for genome annotation refinement.

Command-Line Tools

annorefine bam2hints

Convert BAM alignments to Augustus/GeneMark hints format.

Usage:

annorefine bam2hints [OPTIONS] --input <BAM> --output <GFF> --stranded <TYPE>

Required Arguments: - --input, -i <BAM>: Input BAM file (must be sorted and indexed) - --output, -o <GFF>: Output GFF hints file - --stranded, -s <TYPE>: Library strandedness (FR, RF, or UU)

Optional Arguments: - --priority <INT>: Priority of hint group (default: 4) - --source <STR>: Source identifier (default: "E") - --intronsonly: Only retrieve intron hints - --threads, -t <INT>: Number of threads (default: all available) - --contig <STR>: Filter to specific contig - --min-intron-len <INT>: Minimum intron length (default: 32) - --max-intron-len <INT>: Maximum intron length (default: 350000) - --min-end-block-len <INT>: Minimum dangling exon length (default: 8) - --max-gap-len <INT>: Maximum gap to close (default: 14)

Example:

annorefine bam2hints \
    --input alignments.bam \
    --output hints.gff \
    --stranded RF \
    --threads 8 \
    --priority 4

annorefine utrs

Refine UTRs and detect novel genes using RNA-seq evidence.

Usage:

annorefine utrs [OPTIONS] --fasta <FASTA> --gff3 <GFF3> --bam <BAM> --output <GFF3>

Required Arguments: - --fasta, -f <FASTA>: Input genome FASTA file - --gff3, -g <GFF3>: Input GFF3 annotation file - --bam, -b <BAM>: Input RNA-seq BAM file - --output, -o <GFF3>: Output refined GFF3 file

Optional Arguments: - --min-coverage <INT>: Minimum coverage for UTR extension (default: 5) - --min-splice-support <INT>: Minimum splice junction support (default: 3) - --max-utr-extension <INT>: Maximum UTR extension length (default: 1000) - --enable-novel-gene-detection: Enable novel gene detection - --min-novel-gene-coverage <INT>: Minimum coverage for novel genes (default: 10) - --threads, -t <INT>: Number of threads (default: all available)

Example:

annorefine utrs \
    --fasta genome.fa \
    --gff3 annotations.gff3 \
    --bam alignments.bam \
    --output refined.gff3 \
    --enable-novel-gene-detection \
    --threads 8

Python API

See the Python Functions Reference for detailed documentation of all Python functions.

Utility Functions

version()

Get the AnnoRefine version string.

import annorefine
print(annorefine.version())  # "2025.9.18"

current_num_threads()

Get the current number of threads configured for parallel processing.

import annorefine
print(annorefine.current_num_threads())  # e.g., 8

Output Formats

GFF Hints Format

AnnoRefine generates hints in GFF format compatible with Augustus and GeneMark:

# Generated by AnnoRefine v2025.9.18
# Command: bam2hints --input alignments.bam --output hints.gff --stranded RF
# Library type: RF
# Introns only: false
chr1    b2h intron  1000    2000    0   +   .   mult=15;pri=4;src=E;
chr1    b2h exon    2001    2500    0   +   .   mult=10;pri=4;src=E;
chr1    b2h exonpart    2501    3000    0   +   .   mult=8;pri=4;src=E;
chr1    b2h dss 2000    2000    0   +   .   mult=15;pri=4;src=E;
chr1    b2h ass 2001    2001    0   +   .   mult=15;pri=4;src=E;

Columns: 1. Chromosome/contig name 2. Source (default: "b2h") 3. Feature type (intron, exon, exonpart, dss, ass) 4. Start position (1-based) 5. End position (1-based, inclusive) 6. Score (always 0) 7. Strand (+, -, or .) 8. Frame (always .) 9. Attributes (mult=multiplicity, pri=priority, src=source)

GFF3 Annotation Format

Refined annotations are output in standard GFF3 format:

##gff-version 3
chr1    AnnoRefine  gene    1000    5000    .   +   .   ID=gene1
chr1    AnnoRefine  mRNA    1000    5000    .   +   .   ID=mRNA1;Parent=gene1
chr1    AnnoRefine  five_prime_UTR  1000    1200    .   +   .   Parent=mRNA1
chr1    AnnoRefine  exon    1000    2000    .   +   .   Parent=mRNA1
chr1    AnnoRefine  CDS 1201    1900    .   +   0   Parent=mRNA1
chr1    AnnoRefine  exon    3000    5000    .   +   .   Parent=mRNA1
chr1    AnnoRefine  CDS 3000    4800    .   +   2   Parent=mRNA1
chr1    AnnoRefine  three_prime_UTR 4801    5000    .   +   .   Parent=mRNA1

Next Steps