Sequence comparison
By comparing genetic sequences, we can tell how they are related.
While there are fancy programs (such as BLAST and ALIGN) to make these
comparisons, even a very simple comparison will revel a lot.
An easy way to compare two sequences is to make a big grid with one
DNA sequence along the X axis and the other along the Y axis. Wherever
the corresponding DNA bases from the two sequences match, you put a dot.
Now, if the sequences are
nearly the same, there will be a diagonal line where they match up.
The closer the match, the stronger the line.
There will also be random dots where they coincidentally match.
If you mark every match, about 1 in 4 positions will be a random
match. If you only count spots where, say, 5 in a row match, then the
number of random matches will be much lower.
I've written a short C program to do this matching.
It puts a dot if 5 bases in a row match. This page shows the results of
various matchings.
- HIV-1 compared with itself. There is a strong
diagonal line where the sequence matches itself. There are also lots of
random dots.
- HIV-1 compared with HIV-2. Note that the
diagonal line is weaker, indicating that there are many
differences between HIV-1 and HIV-2. Since the line is still visible,
they are clearly related, though.
- HIV-1 compared with SIV. Note that the
two viruses are related.
- HIV-1 compared with visna. There is a
very faint diagonal line visible in places, indicating there is some
relationship between the viruses. Note, however, that HIV-1 is much
closer to SIV than to visna.
- HIV-1 compared with BLV. These viruses
are far enough apart that my program can't detect any similarity.
- HIV-1 compared with HTLV-1. These viruses
are also too far apart for my program.
- A BLV/visna splice compared with visna.
For this image, I simulated a splice of BLV and visna by merging
together parts of the two sequence files.
Note from the image that the splicing is
very clear. By comparing this image with the HIV-1 to visna comparison
above, it should be very clear that HIV-1 was not made by splicing visna
with anything.
- HIV-2 compared with SIV.
Note that HIV-2 is
even closer to SIV than HIV-1 is.
(The exact sequences used were HIV-1=HIVBRUCG, HIV-2=HIVV2RODX, BLV=BLVCG,
SIV=SIVAGMTYO, visna=VLVCG, HTLV-1=HTVPRCAR.)
Conclusions
From these comparisons, several things are clear. Most importantly,
HIV-1 is much closer to SIV than to visna, HTLV-1, or BLV. This
illustrates that HIV came from SIV (or they both came from some
closely related virus). Second, HIV does not show long sequences that
closely match visna, HTLV-1, or BLV. This shows that HIV was not
formed by splicing together parts of these viruses.
Ken Shirriff:
shirriff@eng.sun.com
This page:
http://www.righto.com/theories/seq_comp.html
Copyright 2000 Ken Shirriff.