Wednesday, August 29, 2007

Assembly Tutorial Part I - Hello World!

Assembly Tutorial 1


Hello World!

This set of tutorials assumes that you have access to some version of DOS or Windows with the program debug.exe. We'll move from elementary level assembly which is based on operating system components and then move further into file access, hardware manipulation and computer science theory. We'll start with the ubiquitous and compulsory "Hello World!" programming example as a good, quick way to dive right into assembly coding with "instant gratification". First, the code itself:


The Example


mov   dx, _message
mov ah, 09h
int 21h
_message "Hello World!$"

There we have a working program which prints the sentence "Hello World!" to the screen. A few things to note about this program:



a) it may run properly and spit the text to the screen, but will most likely bomb immediately afterward

b) the "$" doesn't actually get printed-- it's what's called a string terminator

c) we're actually piggybacking existing library code in DOS with what is referred to as int 21

d) we have our first foray into pointers with this code

The Process


With all that having been said, let's move along full steam with our "instant gratification" portion. Don't worry as we'll gloss over many of the details under the assumption that you will continue reading this set of tutorials and will eventually acclimate by repetition, reinforcement and redundancy. On to debug.exe!


At the command prompt, type debug and hit enter. At the - prompt, type 'a' and hit enter. You should now see:



C:\>debug
-a
135C:0100

I'll spare you the full explanation this time around since we're not primarily concerned with debug.exe itself. Just take it as given that this is what we see when we open up debug to create a program. We'll come back to the specifics at a later time, I promise! For now, we can ignore the 4 digits before the colon on each line since it will usually be different for each machine and different each time we run debug.


At this point, we're ready to begin porting our code from the top of the page:


C:\>debug
-a
135C:0100 mov dx,0109
135C:0103 mov ah,09
135C:0105 int 21
135C:0107 int 20
135C:0109
-

On the last line (XXXX:0109; we're ignoring the first 4 digits, remember?) we'll just hit enter to skip past it. Again, we'll ignore the int 20 and we'll ignore the fact that this looks different from the code at the top of the page. You'll understand in due time, let's just get a working program!


Now we'll input the string to print. Let's start with the (now-familiar) 'a' at the prompt, this time with an address afterward:


-a 0109
135C:0109 db "Hello World!$"
135C:0116
-

We've almost got a fully working program. Actually, allow me to rephrase that: we have a fully working program, it's just that it only exists in memory and not on the disk. We'll use a couple of debug.exe-specific codes to actually save this program to disk and give it a test-drive!



-r cx
CX 0000
:0016
-n hello.com
-w
Writing 00016 bytes
-

All right, now we have a fully working program saved to disk and ready to go! Let's give it a test-drive and then, if you're at all interested, we'll go into some more detail about what makes this process tick!


-q

C:\>hello
Hello World!
C:\>

The Explanation


Success! We've covered some very basic assembly here and I believe we'll want to familiarize a little further with some key concepts before continuing. Let's return to that list of notes from earlier in the document and address those briefly.


a) the example program may run properly but will most likely bomb immediately afterward if it runs at all

  • This is because the example code was missing its end line. We corrected this with our debug version (and thus our saved program) by adding the 'int 20' line. More on this later.

b) the "$" doesn't actually get printed-- it's a terminator
  • In 16-bit DOS (which is what we're using or emulating to write our programs in debug.exe) strings end with a "$" character. This is ASCII code 24h (or 36 in decimal). You'll definitely want to remember that for future reference; but, don't worry, we'll touch on this several times in the future. This is just something Microsoft came up with to make sure the operating system knew when a sentence was finished so the program could stop printing. Think of the $STRING variables in BASIC.

c) we're actually piggybacking an existing library by running the int 21h functions
  • That's right. We're using Microsoft's API for 16-bit DOS. We start by loading values into the registers and then calling interrupt 21h to make it work. With DOS and int 21h, the AX register is used to select the procedure to run. In this case, we're using AH=09 which tells DOS to print text to the screen. More later on why it's called AX sometimes and AH others.

d) we have our first foray into pointers with this example
  • Yes, as scary as it sounds we've actually just jumped right into programming using pointers in our first example! Like the previous entry said, we use registers to select our procedure before "invoking" the DOS API with int 21h. In this example the DX register is a pointer to the start of our print string. Start at DX=0109 and stop when you get to the "$". That's it!



In Summary


That's enough for our first tutorial. We still have some ground to cover before this example is even fully explained; but, I hope I've whet your appetite for knowledge with this article. We'll hone in a little more on the 'why's and 'wherefore's of debug.exe, 16-bit DOS and assembly as well as some CPU and other hardware-related facets. Maybe if you stay tuned I'll be nice and throw in some crypto code and theory! Stay tuned!



No comments: