Main
Most files should be provided in a tabulated format.
Annotation
SynTView can read feature information from NCBI ptt files.
These protein table files can be obtained for bacteria genomes from NCBI's ftp site: ftp://ftp.ncbi.nih.gov/genomes/Bacteria. (or use a GenBank file and a wrapper script written in Bioperl as provided here)
Below the head of FN598874.ptt file :
Helicobacter pylori B8, complete genome - 1..1673997 1711 proteins
Location Strand Length PID Gene Synonym Code COG Product 154..1527 + 457 298354686 dnaA HPB8_1 - - chromosomal replication initiator protein DnaA
1756..3195 + 479 298354687 - HPB8_2 - - conserved hypothetical protein
3376..4128 + 250 298354688 xthA HPB8_3 - - exodeoxyribonuclease III
4125..4760 - 211 298354689 - HPB8_4 - - conserved hypothetical protein
4764..5111 - 115 298354690 - HPB8_5 - - putative secreted protein
5193..7064 + 623 298354691 recG HPB8_6 - - ATP-dependent DNA helicase RecG
7107..8567 + 486 298354692 mod1 HPB8_7 - - site-specific DNA-methyltransferase (adenine-specific)
BDBH
The synteny information is computed from the correspondence between proteins of different organisms (Bi-Directional Best Hits – BDBH) and the conserved order of the corresponding genes along the genomes. IDs of the genes/proteins must correspond to accession numbers used in the ptt file.
Names of the correspondence files are constructed with the file names of the ptt files separated by an underscore (FN598874_AE000511).
The format is a tab-separated text file having the following columns:
1 id_protein_A
2 id_protein_B
3 score
4 expectency
Try a synteny data set here
1 id_protein_A
2 id_protein_B
3 score
4 expectency
Try a synteny data set here
HPB8_20 HP_1509 429 1e-122
HPB8_391 HP_1113 476 1e-136
HPB8_264 HP_1221 462 1e-132
HPB8_1219 HP_0352 686 0.0
HPB8_1295 HP_0267 724 0.0
HPB8_1304 HP_0259 768 0.0
HPB8_1198 HP_1079 579 1e-167
HPB8_1408 HP_0157 310 1e-86
HPB8_670 HP_0883 317 1e-88
HPB8_340 HP_1159 199 1e-53
...
Snp
The polymorphism information is computed by the mapping of several strains reads sequences against a reference genome.
The SNP format is a tab-separated text file having the following columns:
1 genomic location
2 reference nucleotide
3 strain nucleotide
4 id gene (null for intergenic polymorphism)
5 strand
6 reference codon
7 strain codon
Try a SNP data set here
1 genomic location
2 reference nucleotide
3 strain nucleotide
4 id gene (null for intergenic polymorphism)
5 strand
6 reference codon
7 strain codon
Try a SNP data set here
1549 c A
16254 g A
32517 g A id_0026 + Lys Lys
40004 g T id_0033 + Val Val
41544 t G id_0034 + Val Gly
47299 t C
57394 c T id_0050 - Gln Lys
60091 t C id_0053 + Cys Cys
63734 t C id_0067 + Ala Ala
75633 g A id_0078 + Ala Ala
...
Phylogenetic tree
The Newick format is described here
The names of the leaf nodes are the name of the ptt file.
((FQ670179:25.416891,(CP000153:24.935984,(CP000538:26.817907,(CP001816:24.827831,(CP000012:22.373966,(FN555004:21.37438,AE017125:21.72122):0.7671566):0.2101059):0.4582672):0.48131752):3.414669):9.554459,(((CP002336:2.316484,AM260522:3.6015763):2.2559552,( CP002332:0.5595603,CP002184:1.7166162):0.14640999,AE001439:1.9021473):0.43751955):0.0,(CP001173:1.2607863,(CP000241:1.8142213,(((CP001680:0.49685884,(CP002096:1.3150878,((CP002076:0.18996191,(CP002071:0.055541277,CP001072:0.24794841):0.113527775):0.369269 ,(CP001582:0.55765057,CP000012:0.50456715):0.16183925):0.16442347):0.62226176):0.43979788,(CP002331:1.5892277,CP002334:0.99043846):0.28099108):0.30312443,((FN598874:0.010840178,B128:0.14090848):1.3261669,((FM991728:1.107264,(CP002074:0.59512377,CP002073:0 92232466):0.33431387):0.14433646,(CP001217:0.89065623,AE000511:1.0820305):0.19708943):0.0):0.29496455):0.0):0.108477116):1.3428841):9.554459);