Grok your data

The objective of bioSyntax is to bring you closer to your data, giving you an intuitive & empathetic understanding of biology. To appreciate all that bioSyntax has to offer read this short manual (~10 minutes) and go explore.

Getting Started
1. See: Installing bioSyntax
2. Reading large-data
Reading Data
Supported File Formats
Support
1. Report a bug / Ask a question
  1. Uninstallation Instructions
Collaborating on bioSyntax
See also: bioSyntax Manuscript

Getting Started

See: Installing bioSyntax

bioSyntax integreates seamlessly with vim (Linux / Mac / Win), sublime (Linux / Mac / Win), gedit (Linux / Win), & less (Linux / Mac). After installing bioSyntax files will automatically detected by file-extension.

Reading large-data

For very large data sets, it’s often slow to open them in a text editor. It’s best to use the command-line program less which will read your file from a data-stream.

Read your large-data set with less directly

# If your file is uncompressed, it can be read directly.
# less will recognize the file extension (.XYZ)

cd ~/myData/

less dbSNP107_common.vcf

less hg19.fa

Streaming your data directly into less with pipes `|`

# If your file is compressed, you can 'pipe' the data 
# using the "|" operator from decompression, directly into
# less. You must prefix the file extension you want
# as file formats are not recognized within streams.

cd ~/myCompressedData/ 

samtools view -h NA12878_hg38.bam | sam-less

gzip -dc dbSNP107_rare.vcf.gz | vcf-less

gzip -dc hg38.fa.gz | fa-less

Bypassing bioSyntax (data in plain-text)

For vim Type :syntax off in vim

For less

# You may want to view your data without syntax highlighting
# such as where a file is improperly formatted or very large
# files where syntax highlighting may be slow (i.e. VCF files
# with hundreds of columns).

# 1. Pipe your data through cat
cat snp_1000genomes.vcf | less - 

# 2. Within less, switch to a visual editor
less snp_1000genomes.vcf
  # press 'CTRL-C' to stop process
  # press 'v' to switch to visual editor

Reading Data

Nucleotides

bioSyntax implements a novel, full IUPAC Nucleotide Code coloring. Ambiguous bases are represented by an ~additive color-mixing of the parent bases. For example, Thymine (blue) + Cytosine (red) are both pYrimidines (magenta).

An intuitive feature of the bioSyntax color scheme is that the ‘GC-content’ of a sequence can be quickly approximated by how warm (high GC, red-orange) or cool (low GC, blue-green) a sequence looks.

vim myc_gcContent.fa

PHRED Scores

When available, bioSyntax will highlight PHRED quality scores in a step-gradient of blacks (PHRED = 0-10) to whites (PHRED = 40+).

CIGAR Strings

In .sam files the Query:Reference alignment is summarized efficiently but illegibly as a CIGAR String. With a little bit of highlighting these become much easier to read.

Amino Acid Color Schemes

You can choose from several color-schemes for amino-acid fasta files. The Fasta Clustal (Default) syntax colors amino acids based on their physiochemical properties, so does Fasta Hydrophobicity, or you may prefer better discrimination of each amino acids with Fasta Zappo or Fasta Taylor.

Supported File Formats

File format and software compatibility matrix for bioSyntax.

	status
X	Syntax Complete
o	In Development
-	Unavailable

Core bioSyntax

File Format	Description	sublime	vim	gedit	less
.fasta	Generic nt/aa sequence	X	X	X	X
.fastq	Fasta + PHRED quality	X	X	X	X
.clustal	Multiple Sequence Alignment	X	X	X	X
.bed	Genomic Ranges	X	X	X	X
.gtf	Genomic Annotation	X	X	X	X
.pdb	Protein Structure	X	X	X	X
.vcf	Variant Call Format	X	X	X	X
.sam	NGS Sequence Data	X	X	X	X

Auxillary Syntaxes

File Format	Description	sublime	vim	gedit	less
.fasta	fasta alternative AA colors
-	Clustal	X	-	X	-
-	Taylor	X	-	X	-
-	Zappo	X	-	X	-
-	Hydrophobicity	X	-	X	-
.fai	Fasta Index (faidx)	X	X	X	X
.flagstat	samtools flag summary	X	X	X	X
.cwl	Common Workflow Language	X	X	X	-
.wig	Wiggle data	-	-	X	-
.nexus	Phylogenetics data	-	X	-	-
.pml	Pymol Script Language	X	X	-	-

Science Syntaxes

File Format	Description	sublime	vim	gedit	less
.gaussian	Gaussian File (chemistry)	-	X	-	-

If you’d like to add support for another file-format; check the development page to get started.

Support

Report a bug / Ask a question

The fastest way to get an answer is to:

1) Search / Open an issue on the bioSyntax Repo.

Please Include:

A detailed and descriptive title.
Enough information about what did for someone else to replicate the problem.
Information about the operating system / software you’re using (uname -a)
If it’s a syntax highlighting issue: a screenshot of the error and a small bit of the input file you used.

Open an Issue

2) If you really don’t want to make a (fake) github account. Email [email protected] and we’ll open the issue, but it will be slower.

Uninstallation Instructions

Collaborating on bioSyntax

bioSyntax is a community-oriented project for scientific syntax highlighting. We encourage you to change and customize it to suit your needs.

Check out the Development page to create syntax-highlighting for custom file-formats and for other ways to help out.

Collaborate!

bioSyntax Manual

Getting Started

See: Installing bioSyntax

Reading large-data

Read your large-data set with less directly

Streaming your data directly into less with pipes `|`

Bypassing bioSyntax (data in plain-text)

Reading Data

Nucleotides

PHRED Scores

CIGAR Strings

Amino Acid Color Schemes

Supported File Formats

Core bioSyntax

Auxillary Syntaxes

See Also: Alternative/User Syntax Definitions

Science Syntaxes

Support

Report a bug / Ask a question

Uninstallation Instructions

Collaborating on bioSyntax

See also: bioSyntax Manuscript

Getting Started

See: Installing bioSyntax

Reading large-data

Read your large-data set with less directly

Streaming your data directly into less with pipes |

Bypassing bioSyntax (data in plain-text)

Reading Data

Nucleotides

PHRED Scores

CIGAR Strings

Amino Acid Color Schemes

Supported File Formats

Core bioSyntax

Auxillary Syntaxes

See Also: Alternative/User Syntax Definitions

Science Syntaxes

Support

Report a bug / Ask a question

Uninstallation Instructions

Collaborating on bioSyntax

See also: bioSyntax Manuscript

Streaming your data directly into less with pipes `|`