Analyst Protocol for SNP Identification
All transfers are done using the Macintosh computers attached to the sequencers. It is best to try and
analyze all data files you transfer on the same day. This helps to catch tracking errors.
1. CAREFULLY double-check tracking on gel image. Pay close attention to tracking on the outermost
2. Using the FETCH program, login to a workstation. A shortcut within FETCH should be available
for each workstation.
Valid workstations will usually be: 1) haver.mbt.washington.edu
3. Login to each workstation with your username and password.
For example: username: yourname
4. Navigate to the gene_name/new_chromats directory using FETCH. This can be done by double-
clicking on the directory where you want to go within the FETCH transfer window.
For example: If you are working on the lpl gene go to the directory lpl/new_chromats. This
will usually have to be done by moving up one level from your login directory (using ".." within
FETCH) and going into the lpl folder icon and then the new_chromats folder icon.
The standard directory structure for each gene is shown below:
| | | | | |
new_chromats edit_dir chromat_dir phd_dir poly_dir analysis_dir
| | | |
edit_dir chromat_dir phd_dir poly_dir
5. Drag files to be transferred into the FETCH transfer window. The transfer should begin
6. When the transfer is finished, quit the FETCH program.
You can now login to a workstation by using 1) an X-terminal 2) a Macintosh with eXodus (X-terminal
emulation software) or 3) a workstation directly.
7. At this point all work is done on the workstation. Again login using your own personal username
8. Change your directory into the gene_name/new_chromats directory. This can be done from your
login directory by using the commands:
This program will move all the new data files you just transferred into the chromat_dir while
renaming and compressing all files.
9. Change directories into the gene_name/analysis_dir/edit_dir. This can be done by typing:
prompt>cd ../analysis_dir/edit_dir (when you are in the new_chromats directory)
**All your work must be done in this directory.**
10. To look at your current data (from today) files type:
prompt>vp -project -days 0
Note: The -days switch counts backward from the current date. This is why -days 0 = today.
This is the only command you should need to view data on the same day you transferred.
This will begin a series of programs (Phred, Phrap, PolyPhred, and Consed) which will allow you
to view your data.
Other advanced options:
You can also query data by strings within its filename. This is done using the -data switch:
prompt> vp -project -data '010' = will get all files with the '010' string (i.e. all files
from the first product/forward read.
prompt> vp -project -data '010|011' = will get all files from product 01, both forward
and reverse reads.
**This maybe be useful if you sequence a few samples of one product on one day and a more
samples on a subsequent day.
Work in this section will be done in the program Consed. This will allow you to view the assembly of
all your chromatograms and make editing changes.
11. After running 'vp', Consed should automatically open a window asking you to open an ".ace"
assembly file. There should be only a few choices. Generally you should choose the file:
This should be your most recent data you just transferred.
12. A window should next come up with "contigs" in this ".ace" file that contain chromatograms for
your most recent work. One contig should be longer than all the others (equal to the length of the
gene you are sequencing) and should contain most of the chromatograms you just transferred.
Select that contig.
**If you feel your assembly is incorrect -- i.e. many contigs, or many mismatching bases within a
contig -- consult with someone else for help**.
If you have a few contigs (3-5) look at each to check the quality of the reads. If contigs have low
quality reads and don't assemble with the reference sequence contig generally ignore them. They
probably are failed reactions and will come out in your cleanup report. If you have high quality
reads try to join them with the reference sequence contig (See below)
13. If you have more than one contig (other than your contig containing the reference sequence) you can
try to combine the other contigs with your reference sequence contig.
A. View the contig you want to join with the contig containing the reference sequence.
B. Select a sequence near the beginning or end of your contig that has good sequence quality (i.e.
areas highlighted in white).
C. Go to the main Consed window and select "Search for String"
D. Type in (or copy) the sequence you would like to find. Select "OK"
* Note: to easily copy a sequence, swipe it with your cursor to highlight and then go to the are
where you would like to paste it and click M2.
E. A results window will popup hopefully with a result from the contig number containing the
reference sequence and one from the contig you are currently using. Both results should say
"uncomplemented". If they don't you need to use the "Comp Contig" to reverse one or both of
F. Click on the first selection and it will position you in the consensus sequence of that contig.
G. Go back to the search results window and click on the next selection. Again you should be
positioned in the consensus sequence of the second contig. ** It is important to be in the
consensus sequence of each contig **
H. Select "Compare Contig" from the first contig window. A new window should pop up showing
I. Select "Compare Contig" from the second contig window. This sequence should be below and
nearly aligned to the first sequence. Click on the "Align" button.
J. Your contigs should be aligned (possibly with some mismatches shown as "X"). If you scroll
in this window most bases should align perfectly.
K. In the alignment window select "Join contigs". After a second you should have a new contig
window appear, which is numbered +1 from your highest numbered contig. (e.g. if you have 2
contigs (Contig1 and Contig2) the new joined contig should be numbered Contig3 and Contig1
and Contig2 should have disappeared.
L. This process can be repeated for each individual contig until they are all aligned with the
reference sequence contig.
M. At this point verify again the orientation and consistency of the new reads placed into this contig
** If you do any "joining" contigs you will have to save the new assembly and quit Consed before
you move onto "SNP Identification and Genotype Verification " (below).
YOUR FIRST OBJECTIVE IS TO CONFIRM THE VALIDITY AND CONSISTENCY
OF THE ASSEMBLY CONTAINING YOUR MOST RECENT CHROMATOGRAMS.
14. Make sure that all of your sequencing reads are in the correct orientation.
The reference sequence should be directed left _ right (or in the "forward" orientation). (i.e. the arrow
showing the orientation of that read should point left _ right.). If this is not the case, use the "Comp
Contig" button to reverse the orientation of the contig.
Compare the "primer" designation from the read name against the orientation of the reference sequence.
All "forward" reads should be orientated left _ right, while all reverse reverse read right _ left (i.e. the
direction of the arrow.)
15. Make sure that all of your sequencing reads are assembling at the correct position in the reference
sequence and in accordance with the respective primer.
Check "tags" within the reference sequence. This is done by scrolling in the contig window until
you see highlighted regions (about 20 bp) in the reference sequence. By using the 3rd mouse button
(M3) you will see a menu. Select "show tag details". Verify your sequencing read against that
16. If your contig passes these two checks (Steps 14 and 15) then move onto " SNP Identification
and Genotype Verification", otherwise try to figure out why some read may have assembled
incorrectly -- Is is due to lane tracking? Sample loading problems? Sample handling mix-ups?
**DO NOT** do anymore analysis until this has been sorted out. If you remove/replace any files
due to errors you will have rerun 'vp' (i.e. Phred, Phrap, PolyPhred).
SNP Identification and Genotype Verification
Now that you have verified the validity and consistency of your contigs, you will be able to go through
each column marked by PolyPhred to verify its polymorphism identification. If you did any
joining of contigs you must rerun PolyPhred. If you did a join and accepted the filename as
suggested by Consed you should have the filename shown below (gene_name.fasta.screen.ace.2).
To run PolyPhred:
prompt> polyphred -ace gene_name.fasta.screen.ace.2 > gene_name.polyphred.out
prompt> update_tags_and_ace.pl -ace gene_name.fasta.screen.ace.2
If you did a join of your contigs and/or reran PolyPhred, to do your editing you
MUST use the program consed_edit (otherwise your work will not be saved).
If you didn't do any joining you can simply work in the Consed which is running from the program
"vp" (It will save your data automatically).
1. In order to view polymorphic sites tagged by consed you need to open the contig assembly window.
Note the red, orange, and green tags which appear on the consensus sequence. These are rank tags
applied by PolyPhred (you may need to move around in the window). The tags appearing on the
aligned reads below the consensus should be should be either purple (homozygotes) or pink
(heterozygotes). Your job is to look at each column marked as a polymorphism and accept or reject
2. You can navigate to each tag by moving the scroll bar at the bottom of the window or using the
arrows (>, >> or <, <<) at the bottom of the window.
3. To evaluate a column, you should choose a read marked as heterozygous (pink) and another marked
as homozygous (purple) and compare them. Selecting a read is done by using mouse button 2 (M2)
and clicking on the read and position where you want to view the chromatogram.
4. When comparing chromatograms between putative homozygotes and heterozygotes, you are looking
for two characteristics: 1) a drop of 50% in the peak height of the heterozygote 2) the appearance
of a "strong" secondary peak in the heterozygote. You should usually check a minimum of two
heterozygotes to verify that a site is a "real" polymorphic site. To add chromatograms to your
window, use the M2 button to select that read. It should then appear aligned with the other
5. Another easy case is when you have a column with two homozygotes which have a different base at
the same position. One of these base should also look red. This is a sign that it is mismatched from
the consensus sequence. You should also compare these homozygote mismatches against each
other. Ideally, it is nice to compare both types of homozygotes against a heterozygote, though this
may not always be possible, depending on the genotypes of the individuals you are viewing.
6. Once you have built up enough confidence that the site is correctly marked as a real polyphmorphism
by PolyPhred you have to "tag" that column. You only need to tag one of the reads in the column.
Select any of the reads at the polymorphic site using M2. This should bring up the chromatogram.
At the site of the polymorphism (i.e. on the purple or pink tag) click M2 again. This will bring up a
menu for applying a tag. Choose "Add Tag". A window with valid tags will be shown. Choose
"realPolymorphism". This tag will automatically be applied and should now appear dark purple at
the site where the heterozygote (pink) or homozygote (purple) once appeared. Remember you only
need to mark one read at a polymophic site with the "realPolymorphism" tag.
7. If you have looked at a site and tried to verify it by doing multiple comparisons and still can't accept
or reject it you can put a "comment" tag on a read. This is done the same way as applying the
"realPolymorphism" tag except that you will be presented with a dialogue box in which to write a
comment. This comment can be reviewed by someone else at a later date.
8. Once you have verified a site as real and applied the "realPolymorphism" tag, you need to check
every read to verify the genotype which was applied. One easy shortcut for doing this is to bring up
all traces in a particular column and just run down that site in all individuals.
9. Using M3, click on the consensus sequence at the position you are interested in (it should have a
red, green, or orange) tag. A menu will pop up and select the "Display traces for all reads" item.
All the chromatograms should be displayed and you can scroll down looking at each one.
10. If you see any sites where you think a heterozygote was called homozygote (or vice versa) you can
edit the genotype tag for that read. This is done the same way as applying a "realPolymorphism" or
comment tag. That is click with M2 on the tagged sequence and bring up the tags menu. Select
"Add Tag". This time however select the appropriate genotype tag. You should see selections for
all combinations of homozygote and heterozygote genotype tags (i.e. homozygoteAA,
homozygoteTT., heterozygoteAC, heterozygoteAG, ). Select the correct tag for the genotype
you would like to correct. It should then appear on the chromatogram as a dark purple tag.
11. Once you are finished checking all sites you can quit Consed (in the main window). You will be
prompted to save your file (if you made any changes) -- choose "Save before quitting". Accept the
default filename Consed gives to your assembly.