Grok your data
The objective of bioSyntax is to bring you closer to your data, giving you an intuitive & empathetic understanding of biology. To appreciate all that bioSyntax has to offer read this short manual (~10 minutes) and go explore.
- Getting Started
- Reading Data
- Supported File Formats
- Support
- Collaborating on bioSyntax
- See also: bioSyntax Manuscript
Getting Started
See: Installing bioSyntax
bioSyntax integreates seamlessly with vim (Linux / Mac / Win), sublime (Linux / Mac / Win), gedit (Linux / Win), & less (Linux / Mac). After installing bioSyntax files will automatically detected by file-extension.
Reading large-data
For very large data sets, it’s often slow to open them in a text editor. It’s best to use the command-line program less which will read your file from a data-stream.
Read your large-data set with less directly
# If your file is uncompressed, it can be read directly.
# less will recognize the file extension (.XYZ)
cd ~/myData/
less dbSNP107_common.vcf
less hg19.fa
Streaming your data directly into less with pipes |
# If your file is compressed, you can 'pipe' the data
# using the "|" operator from decompression, directly into
# less. You must prefix the file extension you want
# as file formats are not recognized within streams.
cd ~/myCompressedData/
samtools view -h NA12878_hg38.bam | sam-less
gzip -dc dbSNP107_rare.vcf.gz | vcf-less
gzip -dc hg38.fa.gz | fa-less
Bypassing bioSyntax (data in plain-text)
For vim
Type :syntax off in vim
For less
# You may want to view your data without syntax highlighting
# such as where a file is improperly formatted or very large
# files where syntax highlighting may be slow (i.e. VCF files
# with hundreds of columns).
# 1. Pipe your data through cat
cat snp_1000genomes.vcf | less -
# 2. Within less, switch to a visual editor
less snp_1000genomes.vcf
# press 'CTRL-C' to stop process
# press 'v' to switch to visual editor
Reading Data
Nucleotides
bioSyntax implements a novel, full IUPAC Nucleotide Code coloring. Ambiguous bases are represented by an ~additive color-mixing of the parent bases. For example, Thymine (blue) + Cytosine (red) are both pYrimidines (magenta).

An intuitive feature of the bioSyntax color scheme is that the ‘GC-content’ of a sequence can be quickly approximated by how warm (high GC, red-orange) or cool (low GC, blue-green) a sequence looks.
vim myc_gcContent.fa

PHRED Scores
When available, bioSyntax will highlight PHRED quality scores in a step-gradient of blacks (PHRED = 0-10) to whites (PHRED = 40+).

CIGAR Strings
In .sam files the Query:Reference alignment is summarized efficiently but illegibly as a CIGAR String. With a little bit of highlighting these become much easier to read.

Amino Acid Color Schemes
You can choose from several color-schemes for amino-acid fasta files. The Fasta Clustal (Default) syntax colors amino acids based on their physiochemical properties, so does Fasta Hydrophobicity, or you may prefer better discrimination of each amino acids with Fasta Zappo or Fasta Taylor.
Supported File Formats
File format and software compatibility matrix for bioSyntax.
| status | |
|---|---|
| X | Syntax Complete |
| o | In Development |
| - | Unavailable |
Core bioSyntax
| File Format | Description | sublime | vim | gedit | less |
|---|---|---|---|---|---|
| .fasta | Generic nt/aa sequence | X | X | X | X |
| .fastq | Fasta + PHRED quality | X | X | X | X |
| .clustal | Multiple Sequence Alignment | X | X | X | X |
| .bed | Genomic Ranges | X | X | X | X |
| .gtf | Genomic Annotation | X | X | X | X |
| .pdb | Protein Structure | X | X | X | X |
| .vcf | Variant Call Format | X | X | X | X |
| .sam | NGS Sequence Data | X | X | X | X |
Auxillary Syntaxes
| File Format | Description | sublime | vim | gedit | less |
|---|---|---|---|---|---|
| .fasta | fasta alternative AA colors | ||||
| - | Clustal | X | - | X | - |
| - | Taylor | X | - | X | - |
| - | Zappo | X | - | X | - |
| - | Hydrophobicity | X | - | X | - |
| .fai | Fasta Index (faidx) | X | X | X | X |
| .flagstat | samtools flag summary | X | X | X | X |
| .cwl | Common Workflow Language | X | X | X | - |
| .wig | Wiggle data | - | - | X | - |
| .nexus | Phylogenetics data | - | X | - | - |
| .pml | Pymol Script Language | X | X | - | - |
See Also: Alternative/User Syntax Definitions
These syntaxes are not part of the unified bioSyntax suite but often serve specialized functions.
Science Syntaxes
| File Format | Description | sublime | vim | gedit | less |
|---|---|---|---|---|---|
| .gaussian | Gaussian File (chemistry) | - | X | - | - |
If you’d like to add support for another file-format; check the development page to get started.
Support
Report a bug / Ask a question
The fastest way to get an answer is to:
1) Search / Open an issue on the bioSyntax Repo.
Please Include:
- A detailed and descriptive title.
- Enough information about what did for someone else to replicate the problem.
- Information about the operating system / software you’re using (
uname -a) - If it’s a syntax highlighting issue: a screenshot of the error and a small bit of the input file you used.
2) If you really don’t want to make a (fake) github account. Email [email protected] and we’ll open the issue, but it will be slower.
Uninstallation Instructions
Collaborating on bioSyntax
bioSyntax is a community-oriented project for scientific syntax highlighting. We encourage you to change and customize it to suit your needs.
Check out the Development page to create syntax-highlighting for custom file-formats and for other ways to help out.