BUSCOlite Usage Guide¶
BUSCOlite provides a command-line interface and a Python API for running BUSCO analysis.
Command-Line Interface¶
Basic Usage¶
Required Arguments¶
-i, --input: Input sequence file in FASTA format (genome or proteome)-o, --out: Output name prefix for result files-m, --mode: Analysis mode, either 'genome' or 'proteins'-l, --lineage: Path to the BUSCO lineage data
Optional Arguments¶
-c, --cpus: Number of CPU threads to use (default: 1)-s, --species: Augustus species model to use (default: anidulans)-f, --flanks: Length of flanking regions for gene prediction (default: 2000)-v, --verbose: Increase verbosity (can be used multiple times)--version: Show version number and exit-h, --help: Show help message and exit
Examples¶
Genome Mode¶
Analyze a genome using the fungi lineage:
Protein Mode¶
Analyze a proteome using the fungi lineage:
Output Files¶
BUSCOlite generates the following output files:
<output_name>.buscolite.gff3: GFF3 file with BUSCO gene annotations (genome mode only)<output_name>.buscolite.tsv: Tab-separated summary of BUSCO results<output_name>.buscolite.json: Comprehensive results in JSON format (includes stats, config, and detailed results)
TSV Output Format¶
The TSV file contains the following columns:
- BUSCO ID
- Status (Complete, Fragmented, Missing)
- Contig/Sequence ID
- Start position
- End position
- Strand
- Score
- Length
JSON Output Format¶
The JSON file contains comprehensive information in a structured format:
{
"version": "25.4.24",
"mode": "genome",
"input": "genome.fasta",
"lineage": {
"name": "fungi_odb12",
"creation_date": "2024-01-15",
"number_of_species": "758",
"number_of_BUSCOs": "758"
},
"stats": {
"total": 758,
"single-copy": 720,
"duplicated": 15,
"fragmented": 18,
"missing": 5
},
"results": {
"BUSCO_ID": {
"status": "complete",
"contig": "scaffold_1",
"location": [1000, 2500],
"strand": "+",
"hmmer": {
"bitscore": 450.2,
"evalue": 1.2e-135
},
"translation": "MPROTEINSEQ...",
...
}
},
"missing": ["BUSCO_ID1", "BUSCO_ID2"],
"command": "buscolite -i genome.fasta -l fungi_odb12 -o output"
}
This structured format includes: * version: BUSCOlite version used * mode: Analysis mode (genome or proteins) * input: Input file path * lineage: Complete lineage configuration * stats: Summary statistics (total, single-copy, duplicated, fragmented, missing) * results: Detailed information for each BUSCO gene (coordinates, scores, status, sequences, structure) * missing: List of missing BUSCO IDs * command: Command used to generate the results
Plotting Results¶
BUSCOlite provides a separate command-line tool for generating publication-quality SVG plots from JSON results.
buscolite-plot Command¶
The buscolite-plot command can generate plots from one or more BUSCOlite JSON files.
Basic Usage¶
Single Sample Plot¶
Generate a plot from a single analysis:
This creates a horizontal stacked bar chart showing: * Complete (Single-copy) - light blue * Complete (Duplicated) - dark blue * Fragmented - yellow * Missing - red
Multi-Sample Comparative Plot¶
Compare multiple samples in a single plot:
buscolite-plot sample1.buscolite.json sample2.buscolite.json sample3.buscolite.json -o comparison.svg
You can also use wildcards to plot all results in a directory:
The multi-sample plot displays each sample as a separate horizontal bar, making it easy to compare assembly or annotation quality across multiple datasets.
Plot Features¶
- Zero dependencies: Uses only Python standard library (SVG generation)
- Publication quality: Clean, professional appearance matching BUSCO's official style
- Automatic detection: Automatically switches between single and multi-sample modes
- Informative: Shows percentages, counts, and summary statistics
- Lightweight: Small file sizes, scalable vector graphics
Example Workflow¶
# Step 1: Run BUSCO analysis on multiple samples
buscolite -i genome1.fasta -o sample1 -m genome -l fungi_odb12 -c 8
buscolite -i genome2.fasta -o sample2 -m genome -l fungi_odb12 -c 8
buscolite -i genome3.fasta -o sample3 -m genome -l fungi_odb12 -c 8
# Step 2: Generate comparative plot
buscolite-plot sample1.buscolite.json sample2.buscolite.json sample3.buscolite.json -o comparison.svg
Python API¶
BUSCOlite can also be used as a Python library:
from buscolite.busco import runbusco
results, missing, stats, config = runbusco(
input="genome.fasta",
lineage="/path/to/fungi_odb10",
mode="genome",
species="anidulans",
cpus=8,
offset=2000,
verbosity=3
)
# Print summary
print(f"Complete: {stats['single-copy']}")
print(f"Fragmented: {stats['fragmented']}")
print(f"Missing: {len(missing)}")
print(f"Total: {stats['total']}")
# Access individual BUSCO results
for busco_id, data in results.items():
if data.get("status") == "complete":
print(f"{busco_id}: {data['location']}")
# Write results to files
import json
from buscolite.gff import gffwriter
from buscolite.utilities import summary_writer
# Write GFF file
with open("output.gff3", "w") as f:
gffwriter(results, f)
# Write summary file
with open("output.tsv", "w") as f:
summary_writer(results, missing, ["test"], config, f, mode="genome")
# Write JSON file
with open("output.json", "w") as f:
f.write(json.dumps(results, indent=2))
For more details on the Python API, see the API Reference.