18-348 Lab #4

Spring 2015

NOTE: Lab 4 consists of two components (Lab 4 Part A and Lab 4 Part B).

Relevant lectures:
- Part A: Lecture 6. Embedded Language Use
- Part B: Lecture 7. Coding Tricks; Multiprecision Math; Reviews

NOTE for Lab Part A: Recall from lecture that neither the one's nor two's checksums can detect all two-bit errors. Whether the checksums can detect the errors depends on BOTH the data values (string) and which bits are flipped. For example, try the string "ECE348" to see the difference in detection capability between one's and two's checksums.

NOTE: The HC12 compiler manual can be confusing with regards to calling conventions. Functions with a fixed number of parameters use the Pascal calling convention, which is pushing parameters left to right and caller removing parameters from stack. The C calling convention (which the documentation notes is pushed right to left) is only used for functions with a variable number of parameters, which we don't use in this lab. Our compiler pushes parameters from left to right. When in doubt, trust what the compiler does, not what the compiler manual says it does.

Links to all files referenced in the lab and prelab can be found in the Files section at the end of this document. You might wish to read the lab assignment before starting work on the Pre-lab to help with understanding how to link C to assembly language.

Pre-Lab 4 - Part A:

Goal:

To learn programming patterns for embedded systems and interactions between C and assembly programming.

Discussion:

Bitwise operations in C:

In this lab, you will practice using C language constructs and operators to do bitwise operations. The most common language constructs used to implement bitwise operations are:

Operator (in context)	Usage	Description
&	a & b	bitwise AND two values
\|	a \| b	bitwise OR two values
~	~a	bitwise invert a value
^	a ^ b	bitwise XOR two values
<<	b << n	shift the bits in b to the left by n bits (the upper n bits of b are lost).
>>	b >> n	shift the bits in b to the right by n bits (the lower n bits of b are lost). The shift is an arithmetic for signed integers, but it is a logical shift right for unsigned integers.

The bitwise AND, OR, and INVERT (&, |, ~) should be distinguished from the logical AND, OR, and NOT (&&, ||, !). The logical operations produce boolean results where any zero value is treated as FALSE and any non-zero value is treated as TRUE. The output of a logical operation a C program conforming to standards is either zero or one. The following examples illustrate this:

a = 0xA0;
b = 0x05;
c = a | b;     //bitwise operation: c = 0xA5

d = a || b;    //logical operation: d = 1
e = a & (~b);  //bitwise operation: e = 0xA0
f = a && (!b); //logical operation: f = 0

In general, it is considered bad practice to mix logical and bitwise operations in the same C statement or same line of code because of the potential for confusion.

These operators can be combined to set and clear specific bits in a value. For example:

Statement	Function
b = b & 0xF0;	clear the lower four bits of b (AND with zero is always zero)
b = b \| 0x0F;	set the lower four bits of b (OR with one is always one)
b = b & (~a);	for each bit set to a 1 in a, clear that bit in b (INVERT ones to zeros and then AND); leave other bits in b as they are.

Writing C call stack compatible assembly subroutines

To correctly write an assembly subroutine that interfaces with C code, you must consider each of the following aspects of the call. The list below refers to the example function "unsigned int foo(unsigned int bar, unsigned char bar2, unsigned int bar3)". For more information, refer to "Call Protocol and Calling Conventions on page 526 of the HC12 Compiler Reference Manual.

Location of parameters - The HC12 compiler pushes parameters onto the stack from left to right. So "bar" is the first parameter, then "bar2". The final parameter is passed in a register. In this case, the D register is used because the value is 16 bits. The specific register(s) used depends on the size of the parameter.
Return values - For return values of size four bytes or fewer, the value is returned using registers. The specific registers used depends on the return value size. In the example return value would be expected in the D register since it is a 16-bit value. For values larger than four bytes, a memory address pointer is used.
Register values - In general, save all register values and restore them so that you don't disrupt computations in the calling routine, which might be using the registers to hold intermediate computation values. The exception to this rule is the register used to pass the final parameter and the register used to pass the return value.

Procedure:

Part 1:

Start a new C project using the HC(S)12 Project Wizard. Be sure to include the Full-Chip Simulator as a Target. Replace the main.c file with prelab_4a_prog1.skeleton.c.
Replace the string value in data with the appropriate string for your lab section and group.
Implement the setBit and clearBit functions. Use them to invert the case of the string (e.g. "DemO" becomes "dEMo").
Implement the countBits function.
Run your program in the simulator. Use the memory window to locate data where it has been placed on the stack. Observe that the program correctly inverts the case.
Record the data required to answer question 1 below.
Hand in program as prelab_4a_prog1_XX.c where "XX" is your Andrew ID.

Part 2:

Download the lab_4a_c_asm.zip file. Extract the project and open it in Code Warrior. Append your andrewID to the three prelab_4a_prog2* filenames. Be sure to edit the #include in prelab_4a_prog2.c to point to the new header filename
Add the following function prototypes to the prelab_4a_prog2_asm.h file.
- uint16 bitReverse(uint16 value);
- uint16 addALot(uint16 val1, uint16 val2, uint16 val3, uint16 val4, uint16 val5);
Open prelab_4a_prog2.c. Modify the calls to bitReverse and addALot to the appropriate value for your lab group.
Open prelab_4a_prog2.asm.
- Implement the function bitReverse in assembly. This function takes the passed parameter "value" and reverses the bits. The value should be returned according to HC12 compiler convention. In order to receive full credit, your implementation MUST use the carry bit to transfer bit values between registers. (Hint: Look at section 5.14 in the CPU12 reference manual).
- Implement the function addALot in assembly. The function should add the 5 unsigned integer parameters and return the result according to the HC12 compiler convention. For this function, overflow is allowed (i.e. ignore the carry bit in your addition functions).
- Run the simulator on your code. Use the memory area to observe the stack push operations and verify the functionality of your implementation of bitReverse and addALot.
- Record the values for question 2 and 3 below.

Part A - Questions:

Enter the information observed from Part 1 prelab_4_prog1_gxx.c in the table below:

Parameter	Value
Original String in data
Bit count (number of "1" bits)
Address in memory where byte 0 of data is located.
Address in memory where byte 4 of data is located.
Hexadecimal representation of the original values in data (before case conversion)
Hexadecimal representation of data after case conversion

Draw the stack frame for the call to addALot just after the subroutine call is made (just after the BSR or JSR completes execution). Indicate what each value represents (e.g. "val1 low byte, val1 high byte" etc). Indicate the location of any function parameters not located on the stack.
Enter the information observed from Part 2 in the table below:

Parameter
Value

ASCII characters for call to bit reverse

Hexadecimal value of result from reverse value

Hexadecimal value of the result from call to addALot
Bonus: Write a bit count subroutine in assembly language that uses a loop (not a lookup table) to count the bits in an 8-bit integer register. It must use a loop to do the actual counting. In this loop, you are only allowed to use the following three instructions: Branch if Not Equal to Zero (BNE), Add with carry, and Logical Shift Right (but not necessarily in that order). You can use whichever variants of these instructions makes sense depending on the registers you use (e.g., LSR, LSRA, LSRB, LSRD are all acceptable Logical Shift Right variants). Other instructions can be used before and after the loop, but only these three may be used within the loop that counts bits. When the loop terminates, some register has the number of bits. Write a test program in C to confirm that the subroutine works properly. You will only receive credit for meeting all the requirements of this question -- no partial credit. If you use an instruction other than the three specified within the loop -- then no credit for you!

Pre-Lab 4 - Part B:

Goals:

To learn about integer operations and multiprecision arithmetic
To perform simple peer reviews

Discussion:

Refer to the lecture notes for information on integer division.

Procedure:

Part 1:

Complete the table below. The second column is the binary representation of the number in the first column. The third column is the number in the first column divided by 2 using integer division. The fourth column is the binary representation of the numbers in the third column. Some values have been filled in as examples

Signed 4-Bit Integer (decimal representation)	Signed 4-bit Integer (binary representation	Integer Division (discard remainder)
Signed 4-Bit Integer (decimal representation)	Signed 4-bit Integer (binary representation	i/2 (decimal representation)	i/2 (binary representation)
7	0111	3	0011
6
5
4
3
2
1
0	0000
-1	1111	0	0000
-2
-3
-4
-5
-6
-7
-8	1000	-4	1100

Part 2:

Create a new assemble project using the project stationery. Download the prelab_4b_skeleton.asm file and rename it prelab_4b_gXX_andrewid.asm. Replace the main file with prelab_4b_gXX_andrewid.asm.
The main section of the code is marked off with comments that tell you not to modify the code. DO NOT MODIFY THIS CODE. Any modification of the code in this section will result in no credit being given for this part of the assignment. You are allowed to modify the code in the divByTwo subroutine.
Use the values in the table to develop a short description of the algorithm needed to implement a 16-bit signed divide by 2. (Note that you should use shift and bit set / test instructions; no variation of the DIV instruction is allowed.) Your description should be less than 100 words.
Implement the divByTwo subroutine using the algorithm you described above.
Set the target for the project to the full-chip simulator. Run the simulator.
Step through the code to get a feel for what it is doing. The code runs the divByTwo subroutine, and does the same operation using the IDIVS instruction, then compares the results.
In the simulator, set a breakpoint at BP1 and BP2. Run the simulation. If the simulation reaches the BP2 breakpoint without reaching the BP1 breakpoint, then your divByTwo performs the expected function for all values between 0x0000 and 0xFFFF.

Part 3:

Write, but DO NOT SIMULATE and DO NOT DEBUG code for this section. We EXPECT that your hand-in will have bugs, and do NOT expect that it will run properly! But it should be syntactically correct so that it will "make" and so that all the parts have something that is "close" to right. (We expect you will make a good faith attempt to actually write a reasonable program, and not submit something laughable.) Just to be clear -- your grade will be based on whether the code looks like an OK first draft with bugs, is commented properly, and so on, but NOT on whether it is bug-free! There might be at most one person in the class who can write bug-free code without executing something, but probably not. So don't worry about it.

Each partner will implement one distinct version of 32-bit x 32-bit -> 64-bit unsigned multiply. You should choose from the three methods described in the lecture materials:

Shift-and-add
Partial sums using built-in MUL functions
Partial sums using table lookups for multiply.

Note: You will end up doing all three implementations in the lab, but you are only required to do two (one for each partner) for the pre-lab. This means each partner must do a different implementation. Do this work independently from your partner. You will use this code to conduct reviews in the Lab section.

DO NOT simulate, run, or use a debugger on the code you have written before doing the review. We want you to do the review first so you can try to find bugs in the implementation. In most cases this saves a lot of time compared to debugging using the simulator. So, put another way, we expect you will have bugs in your code going into the review. Finding them more efficiently with help from your partner is the point of the exercise. In previous years, it was common for these reviews to turn up 3 to 5 defects with only a few minutes of effort. For future projects it will be OK to do a preliminary simulate and debug session before review -- but this project is simple enough that we want to make sure you have bugs to find in the review, so do the review before debugging.

Plan on your board being wired as described in Part A of Lab 4.
Create a new assembly project using the project stationery. Download the lab_4b_skeleton.asm file and rename it lab_4b_gXX_andrewid.asm. Replace the main file in the project with this file.
Implement one of the above mul64 subroutines to meet the following requirements:

The subroutine shall multiply the 32-bit data stored in arg1 to arg2 and store the 64-bit sum in result

Additionally,

The subroutine shall preserve all register values and restore them before returning control to the main loop.
The subroutine may use 8-bit or 16-bit operations to implement the 64-bit operation.
We recommend that you use the IDX2 addressing mode to read and write the arguments and results, but any method you choose is acceptable so long as it uses a loop.
For the demo, the main program shall use the pushbuttons to display the results, as described in the code comments. But you are NOT required to implement this for the prelab.

Part B - Questions:

Complete the table from part 1 and hand it in.
Include a description of your signed divide-by-two algorithm (from part 2). Limit your description to 100 words.
Bonus: Give the number of instruction cycles for the IDIVS instruction. Give the number of instruction cycles for the longest path through your divByTwo subroutine (include all cycles from just before the BSR to just after the RTS). Which is faster and by how much?

Prelab Hand-in Checklist: (90 + 18 points)

All non-code submissions shall be in a single PDF document.

Part A

(15 pts) Submit your code listing for Part 1 as prelab_4a_prog1_XX.c where XX is your Andrew ID. Submit only the C file. Code must be fully commented to receive full credit.
(15 pts) Submit the entire project. for Part 2. Your project should be in a folder/subdirectory of your hand-in directory. Name the folder "prelab_4a_asm_c_XX" where XX is your Andrew ID. All files necessary to open the project in Code Warrior and invoke the simulator must be present to receive full credit. Code must be fully commented to receive full credit.
(15) Submit the answers to the questions 1-3 above.
(BONUS 9 points) Submit code for the bonus question. Submit the entire project ready to open and build/execute with the Code Warrior simulator. Name the folder "prelab_4a_bonus_XX" where XX is your Andrew ID.

Part B

(20 pts) Answers the questions 1 & 2 above.
(BONUS 9 points) Answer the bonus question
(15 pts) Submit the prelab_4b_gXX_andrewid.asm file. Code must be fully commented for full credit. Code should work properly and be bug-free.
(10 pts) Submit the lab_4b_gXX_andrewid.asm file. Code must be fully commented for full credit. Code is EXPECTED to have bugs and you will not be penalized for them.

Refer to the LAB FAQ for more information on lab hand-in procedures and file type requirements. You MUST follow these procedures or we will not accept your submissions.

Lab 4 - Part A

Goal:

To practice combining C code with assembly using the HC12 compiler.

Discussion:

Mixing C and Assembly with the HC12 Compiler

This section discusses the techniques for mixing C and assembly. Remember that the stack frame for a subroutine call represents a contract between the calling code and the subroutine code. In this case, the compiler has a specific format for the stack frame. In order to write compatible C code, you must make sure that your code conforms to this format.

There are numerous compiler options and PRAGMA options that can be used to modify the behavior of the compiler with respect to call stacks. A full discussion is beyond the scope of this course. The discussion below and the lab assignments refer to the compiler behavior using the default settings.

To have the CodeWarrior environment integrate C and assembly:

Create a new project using the HC(S)12 New Project Wizard.
Select both C and Assembly
Follow normal procedures for selecting the rest of the wizard options

This procedure gives you 3 source files, which are described below.

main.c - This file contains the main function that is called when the program is executed. It is just like any other C file, except that it may include references to functions defined in the main_asm.h file.
main_asm.h - This is a C-style include file where you can define the C-style functional prototypes for your assembly subroutines. In the default project files, the main_asm() definition gives an example of this.
main.asm - This file contains the implementations of the assembly subroutines. They should be started with a label that is the same as the function name and ended with the RTS instruction[Note 1]. In addition to the function definition, the "XDEF fcn_name" directive must be included. This exports the symbol for the function so that the linker can combine the assembly and C code. In the default project files, the main_asm function demonstrates these features.

Note: a function defined using the __far directive should return with RTC (3-byte return value). A full discussion of this is beyond the scope of this course. For the labs, assume all functions are called using __near, so they use RTS to return.

Location of parameters - The HC12 compiler pushes parameters onto the stack from left to right. So "bar" is the first parameter, then "bar2". The final parameter is passed in a register. In this case, the D register is used because the value is 16-bits. The specific register(s) used depends on the size of the parameter.
Return values - For return values of size four bytes or less, the value is returned using registers. The specific registers used depends on the return value size. In the example return value would be expected in the D register since it is a 16-bit value. For values larger than four bytes, a memory address pointer is used.
Register values - In general, save all register values and restore them. The exception to this rule is the register used to pass the final parameter and the register used to pass the return value.

Checksum Computation

A checksum is an error detection code used by many different embedded and enterprise applications. It is commonly used to provide redundancy for network messages and data storage. On networks, it allows the receiver to check for transmission errors. In the case of storage, it allows a system to verify that the stored data has not changed (e.g. due to file system corruption or soft errors in memory).

To check the correctness of a message + checksum pair, the system recomputes the checksum and compares it to the recorded one. If the two checksums do not match, then the system knows that there is an error somewhere in the message. If the two checksums do match, then the message is presumed to be correct. Note that just because the message appears to be consistent with the checksum does not guarantee that the message is the same as the original one. With all checksums, it is possible to get errors that modify the message or the stored checksum in such a way that they are still consistent. This is called an undetected error. Note that the error detection provided by checksums depends on both the value being checked (i.e., the number of errors) as well as the location of the errors (e.g., some 2-bit errors may be caught while others are undetected due to their location). This effect will be seen in the lab.

A two's complement checksum is computed by simply doing integer addition on each "chunk" of data in a set of data. For our lab, this means doing an integer addition of all the characters in a data string using 8-bit addition. Overflows are ignored, and the 8-bit result of the addition is the checksum. This checksum has the nice property of detecting all one-bit errors in the data, and many other errors as well. But, some two-bit errors are undetected.

A one's complement checksum is computed similarly, but using one's complement arithmetic (remember that from 18-240?). To refresh your memory, in one's complement arithmetic, the value "$FF" treated as equal to the value "$00" -- they are both zero. So, when performing addition, you need to check whether the sum will cross over the "$FF" to "$00" boundary, and add one if it does so that both representations of zero end up being equivalen in value. This can be done with a conditional branch that checks whether either of the following two conditions holds true for signed values and adds one to the resultant sum whenever either condition is met:

Exactly one input is negative and the resultant sum is greater than or equal to zero
- (Hint: using a bitwise OR followed by a bitwise XOR allows you to this with only one value comparison)
Both inputs are negative, regardless of the resultant sum value
- (Hint: using a bitwise AND allows you to do this with only one value comparison)

In general, Cyclic Redundancy Codes (CRCs) provide much stronger error detection properties than arithmetic checksums. A full discussion of the details of the CRC algorithm is beyond the scope of this course, and the code is a little too complex for this lab. But they are similar to other checksums in that they involve "summing" up values across the length of multiple bytes or words of data. We put this note here simply so that you do not think that a one's complement checksum is the best you can do!

In your lab, you should repeat the computation for each byte in the string, starting with a value of zero and the first byte, ending with the last non-zero byte of the string. (This means that you should initilize the checksum value to 0 before processing the first byte of the message).

Reference values to help you test your programs -- make sure you get these results!

Input A	Input B	Two's complement A+B	One's complement A+B
$FF	$FF	$FE	$00
$FE	$83	$81	$82
$75	$A7	$1C	$1D
$B3	$56	$09	$0A
$36	$42	$78	$78
$00	$00	$00	$00
$FF	$00	$FF	$00

String	Two's complement checksum	One's complement checksum
Bert Ernie	$A0	$A3
Ray Koopman	$21	$25

Procedure:

Part 1:

Wire your board with port T as output and port AD as input according to the following table:

MCU Pin	Project board connection	Port Configuration
AD0	PB1	input
AD1	PB2	input
AD2	PB3	input
AD3	PB4	input
AD4	PB5	input
AD5	PB6	input
AD6	PB7	input
AD7	PB8	input
PT0	LED1	output
PT1	LED2	output
PT2	LED3	output
PT3	LED4	output
PT4	LED5	output
PT5	LED6	output
PT6	LED7	output
PT7	LED8	output

Create a project with a C main program called lab_4a_gXX that will contain both C and assembly language files. Put your C code in the file "main.c" and your assembly code in the file "main.asm". The parts of the procedure below will guide you in creating a program that computes checksums in multiple ways. In the end, all the programs must co-exist in a single project (including the bonus if you choose to do it) with this single hand-in directory.
Take a look at the questions before working on the other parts of this procedure so that you are sure to record the necessary data for the lab writeup.
Create four 8-bit integer variables: TwoSumC, OneSumC, OneSumAsm, and OneSumOpt. Just initialize them to constants for now -- we'll tell you how to compute them below.
- If no button is pressed, the LEDS shall be turned off.
- Pressing PB1 shall cause TwoSumC to be displayed on the LEDS.
- Pressing PB2 shall cause OneSumC to be displayed on the LEDS.
- Pressing PB3 shall cause OneSumAsm to be displayed on the LEDS.
- Pressing PB4 shall cause OneSumOpt to be displayed on the LEDS. -- optional; this only applies to the bonus section.
- Pressing PB7 shall cause bottom bits in first two characters to be flipped (see later descriptions)
- Pressing PB8 shall cause top bits in first two characters to be flipped (see later descriptions)
Values shall be displayed uninverted (i.e., an "ON" LED is 1, and an "OFF" LED is 0). These button definitions will let you demo all capabilities of your program on a single string without recompiling.

Part 2:

Comment out references to the main_asm() function (you'll use it later in part 4 and the bonus).
Add the following declaration to main() function in main.c. Replace LN1 and LN2 with the last names of the your group members.
- char myString[]="LN1 LN2";
Implement an 8-bit two's complement checksum calculation using C. In the main program, call this function and put the result in the variable "TwoSumC". Compute the checksum over myString[] from the first character until (but not including) the null byte at the end of the string. Use the following prototype for your function:
unsigned char chk_two_c(char * string);
Run this program and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_two_c from BSR/JSR to RTS.
Add code to flip ("invert") the bottom bit in each of the first two characters when PB7 is pressed so that the value is corrupted to put an error in the value. (Re-iterating: this involves flipping bit 0 of the first byte, and bit 0 of the second byte, resulting in two bytes, each with a single-bit error in the lowest bit position.) Run the program and record the output. Did the checksum detect this error?
Modify the program so that the top bit in each of the first two characters is flipped when PB8 is presssed, again putting an error in the value. (Re-iterating: this involves flipping bit 7 of the first byte, and bit 7 of the second byte, resulting in two bytes, each with a single-bit error in the highest bit position.) Run the program and record the output. Did the checksum detect this error? (It shouldn't detect the error -- the two flipped bits cancel each other out in terms of effect on the checksum. This is a shortcoming of two's complement addition checksums.)

Part 3:

Implement an 8-bit one's complement checksum calculation using C. In the main program, call this function and put the result in the variable "OneSumC".
unsigned char chk_one_c(char * string);.
- Note: there are some very clever ways to speed up this computation -- but you will receive full credit so long as it works correctly and is understandable based on comments. You are not required to be super-clever for this program!
Run this program and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_one_c from BSR/JSR to RTS.
Use PB7 to to flip ("invert") the bottom bit in each of the first two characters so that the value is corrupted to put an error in the value. Run the program and record the output. Did the checksum detect this error?
Use PB8 to flip the top bit in each of the first two characters, again putting an error in the value. Run the program and record the output. Did the checksum detect this error? (It should -- which is why one's complement checksums are usually better.)

Part 4:

Write a new, similar, program that computes an 8-bit one's complement checksum using assembly language with the calling program in C. In the main program, call this function and put the result in the variable "OneSumAsm"
unsigned char chk_one_asm(char * string);
- Note: there are some very clever ways to speed up this computation -- but you will receive full credit so long as it works correctly and is understandable based on comments. You are not required to be super-clever for this program!
Run this program with the specified test string and record the hexadecimal output as displayed on the LEDs. Confirm that it is the correct value per hand computation. Also, use the simulator to compute the number of clock cycles taken by the subroutine chk_one_asm from BSR/JSR to RTS.
Use PB7 to to flip ("invert") the bottom bit in each of the first two characters so that the value is corrupted to put an error in the value. Run the program and record the output. Did the checksum detect this error? (If not, fix the problem.)
Use PB8 to flip the top bit in each of the first two characters, again putting an error in the value. Run the program and record the output. Did the checksum detect this error? (It should -- which is why one's complement checksums are usually better.)
Verify that the assembly and C subroutines produce identical outputs for at least four more-or-less randomly chosen different additional strings.

Part 5: (Bonus)

Optimize chk_one_asm (still compute the one's complement checksum). You must use a loop (you may use a conditional branch instruction) and use only an 8-bit sum register (not a 16-bit sum). (Hint: this involves using the carry-out of the addition.) Put the result in variable "OneSumOpt".
Record the timing for this optimized chk_one_asm. How much faster is it than the previous assembly language? (If it isn't faster, then you probably aren't doing this part right unless you found a very cool optimization for Part 4. Our result takes the same time inside the loop as a two's complement checksum and minimal overhead outside the loop as well.)
Record the checksum values with bottom and top bits flipped of the first two characters as you did in previous parts.

Part A - Questions

Record the results of your experiments above in the table below:

Routine	Checksum value with no bits flipped	Checksum with two bottom bits flipped	Checksum with two top bits flipped
Part 2: TwoSumC
Part 3: OneSumC
Part 4: OneSumAsm
Part 5: (bonus) OneSumOpt

Use the simulator feature of the CW development environment to obtain execution times for the various versions of your program to process the data string containing your last names. Record the values you measure in the table below. Enter the total time to execute chk_one_asm() (including call and return overhead) or other similar function depending on the table row being filled in.

Routine	# of Cycles
Part 2: C two's complement
Part 3: C one's complement
Part 4: ASM one's complement
Part 5: (Bonus) optimized ASM

Part A - Demo Checklist: (20 + 4)

(20 points) Demo your Checksum project to the TA. The TA will ask you to run the program with a different string, and show the resultant computation values with various PB combinations pressed. The TA may also ask you to show a timing calculation with the simulator.
(Bonus: 4 points) Demo your optimized Checksum project to the TA.

Lab 4 - Part B

Goal:

To implement multi-precision adds, subtracts, and multiplies in assembly.
To perform simple peer reviews

Discussion:

Refer to the lecture notes for information on multiprecision add/subtract/multiply. Refer to the lecture notes for information on reviews.

Procedure:

Part 1:

In Part one you will implement 64-bit add and subtract.

Plan on your board being wired as described in Part A of Lab 4.
Create a new assembly project using the project stationery. Download the lab_4b_addsub_skeleton.asm file and copy it to both lab_4b_add_gXX.asm and lab_4b_sub_gXX.asm. Replace the main.asm file with these files. Note: this means you will need to create a separate project for add and subtract (or you may use the same project and add/remove the files so that only one is in use at a time)
Remove references to the function that you are NOT implementing from your file.

Implement both the add64 or sub64 subroutine to meet the following requirements:

add64
- The subroutine shall add the 64-bit data stored in arg1 to arg2 and store the 64-bit sum in result.

sub64
- The subroutine shall subtract the 64-bit data stored in arg1 from arg2 and store the 64-bit difference in result.

Additionally, for both subroutines:

The subroutine shall preserve all register values and restore them before returning control to the main loop.
The subroutine shall use a loop to traverse the argument data and store the result.
The subroutine may use 8-bit or 16-bit operations to implement the 64-bit operation.
We recommend that you use the IDX2 addressing mode to read and write the arguments and results, but any method you choose is acceptable so long as it uses a loop.
For the demo, the main program shall use the pushbuttons to display the results, as described in the code comments. But you are NOT required to implement this for the prelab.

Part 2:

DO NOT simulate, run, or use a debugger on the code you have written before doing the review. We want you to do the review first so you can try to find bugs in the implementation. In most cases this saves a lot of time compared to debugging using the simulator. So, put another way, we expect you will have bugs in your code going into the review. Finding them more efficiently with help from your partner is the point of the exercise. In previous years, it was common for these reviews to turn up 3 to 5 defects with only a few minutes of effort. For future projects it will be OK to do a preliminary simulate and debug session before review -- but this project is simple enough that we want to make sure you have bugs to find in the review, so do the review before debugging.

In this part, you will do a review of each of the multiplication code files (one generated by each team member). For the review, both team members should be present. The person whose code is being reviewed is the developer and the other person is the reviewer. When you do the second review, these roles will be reversed. Complete the information below. You must submit a complete writeup for BOTH reviews.

Developer Name:
Reviewer Name:
File Name:
Date of reviews:
Length of review (in hours, with understanding that 2 people were involved during that time):
Number of (non-blank) lines of code reviewed:
Lines reviewed per hour:
Defects found per hour (scaled if review is less than an hour, which is likely):
List of defects found: (actually describe each defect in 2 or 3 lines of text; can be as a text list if drawing boxes is too hard)

Defect #
Line #
Description of Defect

...

It is understood that line numbers might change as the code is fixed -- don't worry about it and don't go back to fix up line #s if they change.

Part 3:

For this part, for both programs work together to get demos working. It is fine to collaborate on this portion of the lab and help each other with debugging, etc. Keep the following data as you do this on a per-program basis (i.e., two sets of information -- one set per program):

Number of person-hours spent after the review doing debugging (if both of you work together that is 2 person-hours per hour of elapsed time; if you work separately then just count each person's individual hours)
Number of defects found after the review:
Defects found per hour after the review:
List the defects found by debugging (after the review is completed):

Defect #
Line #
Description of Defect

...

Part 4:

Implement the third version of 32-bit x 32-bit -> 64-bit multiply.

Plan on your board being wired as described in Part A of Lab 4.
Create a new assembly project using the project stationery. Download the lab_4b_skeleton.asm file and rename it lab_4b_mul3_gXX.asm. Replace the main file in the project with this file.
Implement the mul64 version your group did not use for reviews to meet the following requirements:

The subroutine shall multiply the 32-bit data stored in arg1 to arg2 and store the 64-bit sum in result

Follow the same restrictions as in Prelab B part 3.

Part 5 (Bonus - optional):

This section is optional and not that easy. You may do these exercises to earn extra credit and get better understanding of multiprecision math. If you are running over 12 hours per week on average for the course, you should NOT be attempting this section!

Implement a 64 bit dividend / 32 bit divisor=> 32 bit quotient; 32 bit remainder in assembly language. Implement one of either restoring or non-restoring division. Save the implementations as lab_4b_div_GXX.asm.

Part 6 (Bonus - optional):

Perform a review of your third multiplication implementation or your division implementation. Follow the formats used in Parts 2 and 3.

Part B - Demo Checklist: (35 + (5 or 10) points)

(15 points) Demo both the multiprecision add and subtract programs to the TA.
(20 points) Demo all three multiplication implementations.
Bonus: (Either 5 points or 10 points) Demo one of the following: restoring division worth 5 points; OR non-restoring division worth 10 points.

Lab - Hand-in Checklist: (150 + 19 + (5 or 10))

Part A

(5 points) List any problems you encountered in the lab and pre-lab, and suggestions for future improvement of this lab. If none, then state so to get these points.
(40 points) Submit the entire project for all parts. Your project should be in a folder called "lab_4a_gXX". All files necessary to open the project in code warrior and invoke the simulator must be present to receive full credit. Code must be fully commented to receive full credit.
(20 points) Answers to the questions above
(13 points) Bonus -- provide code and fill in tables for questions for the optimized assembly version of one's complement checksum.

Part B

(5 points) List any problems you encountered in the lab and pre-lab, and suggestions for future improvement of this lab. If none, then state so to get these points.
(10 points) Submit a listing of the code for lab_4b_add_gXX.asm and lab_4b_sub_gXX.asm
(30 points) Reviews for the both lab partners' code.
(30 points) Corrected and working code for both lab partners, lab_4b_gXX_andrewID1.asm and lab_4b_gXX_andrewID2.asm. Code must conform to the coding style sheet to receive full credit.
(10 points) Submit lab_4b_mul3_gXX.asm with your 64bit multiply subroutine.
(Either 5 or 10 points) Bonus -- Submit only one of the following: restoring division worth 5 points; OR non-restoring division worth 10 points
(6 points) Bonus -- Submit a review, including review metrics as well as development metrics (using formats for parts 2 & 3) for the third implementation of 64-bit multiply or the 64-bit divide you developed . 3 pts for each DISTINCT implementation review, up to 6 total points.

Refer to the LAB FAQ for more information on lab hand-in procedures and file type requirements. You MUST follow these procedures or we will not accept your submissions.

Hints and Suggestions:

Part A

Note that the LEDs on the board are active-high, as opposed to the bar graphs you have been using that are active-low.
If you have problems implementing loops or iterations, please see a TA for guidance during office hours.
Some students find the TFR instruction helpful (if you don't know what it is, this is a good time to look it up).
The DBNE instruction can be very helpful for implementing loops, although for string processing you want to look for the terminating null character.
Note that managing the bit flips can be a little tricky. We recommend you copy the string to a temporary string variable, flip bits, then perform the checksum computation so that the original string value is not corrupted for the next path through the loop. Trying to flip bits in the original string and un-flip them when a button is released is just asking for bugs. One cute way to do this is to use another routine from main to actually do the computations so that it dynamically initializes the string each time it is called.

Part B

Be careful to check if instructions in your loops affect the carry bit! Especially watch out for using a compare instruction within your loop.
If you are short on time, do other parts of your weekly work before attempting the division exercise.

FILES for this lab:

Part A

Part B

Relevant reading:

Also, see the course materials repository page.

Change notes for 2015:

2/3/2015: Changed prelab part A.2 bit reverse to use 16 bit input/output instead of 8 bit. --John
2/12/2015: Updated one's complement checksum hints to be more accurate (add one when result greater than or equal to zero, not just greater than). --John

Parameter	Value
ASCII characters for call to bit reverse
Hexadecimal value of result from reverse value
Hexadecimal value of the result from call to addALot

Defect #	Line #	Description of Defect
...

Defect #	Line #	Description of Defect
...