fastx¶
FASTA/FASTQ file handling utilities.
fastx
¶
translate(dna, strand, phase, table=1)
¶
Translates DNA sequence into proteins.
Takes DNA (or rather cDNA sequence) and translates to proteins/amino acids. It requires the DNA sequence, the strand, translation phase, and translation table.
Parameters¶
dna : str DNA (cDNA) sequence as nucleotides strand : str, (+/-) strand to translate (+ or -) phase : int phase to start translation [0,1,2] table : int, default=1 translation table [1]
Returns¶
protSeq : str string of translated amino acid sequence
Source code in buscolite/fastx.py
fasta2dict(fasta, full_header=False)
¶
Read FASTA file to dictionary.
This is same as biopython SeqIO.to_dict(), return dictionary keyed by contig name and value is the sequence string.
Parameters¶
fasta : filename FASTA input file (can be gzipped) full_header : bool, default=False return full header for contig names, default is split at first space
Returns¶
seqs : dict returns OrderedDict() of header: seq
Source code in buscolite/fastx.py
fasta2headers(fasta, full_header=False)
¶
Read FASTA file set of headers.
Simple function to read FASTA file and return set of contig names
Parameters¶
fasta : filename FASTA input file (can be gzipped) full_header : bool, default=False return full header for contig names, default is split at first space
Returns¶
headers : set returns set() of header names
Source code in buscolite/fastx.py
fasta2lengths(fasta, full_header=False)
¶
Read FASTA file to dictionary of sequence lengths.
Reads FASTA file (optionally gzipped) and returns dictionary of contig header names as keys with length of sequences as values
Parameters¶
fasta : filename FASTA input file (can be gzipped) full_header : bool, default=False return full header for contig names, default is split at first space
Returns¶
seqs : dict returns dictionary of header: len(seq)
Source code in buscolite/fastx.py
explode_fasta(fasta, folder, suffix='.fa')
¶
Read FASTA file and write 1 contig per file to folder
Parameters¶
fasta : filename FASTA input file (can be gzipped) folder : directory directory to write contigs to
Returns¶
seqs : dict returns dictionary of header: len(seq)
Source code in buscolite/fastx.py
getSeqRegions(seqs, header, coordinates)
¶
From sequence dictionary return spliced coordinates.
Takes a sequence dictionary (ie from fasta2dict), the contig name (header) and the coordinates to fetch (list of tuples)
Parameters¶
seqs : dict dictionary of sequences keyed by contig name/ header header : str contig name (header) for sequence in seqs dictionary coordinates : list of tuples list of tuples of sequence coordinates to return [(1,10), (20,30)]
Returns¶
result : str returns spliced DNA sequence